Web: Stupid HTML trick to get past content filters

by firestorm_v1 on May.02, 2010, under How-To's, Miscellaneous, Networking, Software

I know it’s been a while since I posted, and I do apologize. Life has definitely not been kind to me in the regards of time however I have not forgotten anything. I have two major posts coming up hopefully within the next week, however here’s a quick article about a trick I discovered while working on a project with a friend. The project was to see if their content filter could be broken in their chat application andthrough a little bit of HTML know-how and some PHP code, I was able to crank out a generator to do just that. Read more to find out the details.

The Challenge:

The trick was to figure out how to get certain “four letter words” past the chat app’s filter and into the main chat window without the word being munged by the system. Most chat applications filter out obscene words through a string matching system and replaces it with something that is much less offensive, usually a series of asterisks. The only thing I could use was straight ASCII characters, and I couldn’t use any “img src” HTML tags to do the dirty work (literally).

The Analysis:

All HTML code that is rendered is associated with something called a character set (or code page from the old MS-DOS days). These character sets associate any character with a certain number (often called it’s ASCII value). Although some characters are standard on all character sets, (like “a” = 97), some control characters and characters above 256(decimal) change significantly. In order to properly convey these control characters via the web, urlencoding was created and implemented as part of the HTML spec. What this means is that every character in a character set can be represented in HTML through the use of the percent sign (%) modifier. The syntax for this was %(ASCII value in hexadecimal). The general idea was that if you typed in a russian name using symbols not found in the Latin alphabet, these symbols could be properly represented on the server side.

With that in mind, I examined the UTF-8 character set. In this example, I’ll use the word “taco” to represent the offending word.

How it’s done:

The process for this is as follows:

Find the ASCII value for each character in the word
Find the hexadecimal value for the ASCII value
Add “%” in front of that number
Insert a “null” character somewhere.

For reference, you can use this chart which gives you the ASCII and the ASCII in hex values already

From the chart, we see the following information:

t = 116 (decimal) or 74(hex)

a=97(decimal) or 61(hex)

c= 99(decimal) or 63(hex)

o = 111(decimal) or 6f(hex)

Using this information, we can then create our string, inserting the % where needed. %74 %61 %63 %6f

Only one item remains. In order to spoof some of the more intelligent content filters, you need to put a null character in there somewhere. This throws off the content filter and makes it think that there are different characters represented. For this, I used character 0B which does not have latin equivalent and is a control code that does not render in HTML. I used 0B because 08 rendered as a tab in testing.

Knowing this, I inserted the null character between the urlencoded “a” and the urlencoded “c”: %74 %61 %0B %63 %6F

Testing it out:

All that is needed to test it is to copy and paste the above string into any chat application and hit send. You will need to remove the spaces from between the characters otherwise your application will treat them as renderable characters as well. If it works, you’ll see the word “taco” in your window. Now you know how to get past content filters. If you are in the business of building content filters, now you have a new strategy for blocking people abusing them.

Don’t be a prick!

I posted this information with the hopes that people may find it useful, not so that script kiddies can run around and make asses of themselves. Be smart about how you use this information and last but not least, DON’T BE A PRICK!

:Linux, Webservers

Comments are closed.

RSS feed for this post (comments)

Site News

Welcome to YourWarrantyIsVoid.com where I post articles and document some of the interesting things that I experience while I continue to learn about computers, hardware and electronics.

This site is 100% User Content generated so each article here was created by a human (namely me) for you humans to read and enjoy. There's no auto-poster or spamvertisements or any scripted mess here.

If you like an article or want to share your own experiences, please feel free to leave a comment on the article. I love hearing what you have to say.

Thank you.
FIRESTORM_v1

Recent Posts
Recent Comments
- Istvan on Unifi Video NVR – Gone… and back again!
- I got IPv6 working! :D (3 hours Later) NOOOOOOOOOOOOOO!!! – Victoria's Blog on Networking: Bringing IPv6 into your network using pfSense
- Mohamed Kabba on Networking: Installing and configuring pfSense Embedded
- Graham on Breaking into APC’s BR24BP battery pack
- Bassel Alkhateeb on Breaking into APC’s BR24BP battery pack

Categories
- Editorial/Opinion (3)
- Embedded devices (22)
- Hacking in the News (1)
- Hardware (38)
- Hardware Pr0n (8)
- How-To's (29)
- Investigative Dissassembly (5)
- IoT (1)
- Linux (18)
- Microcontrollers (8)
- Miscellaneous (14)
- Networking (14)
- Product Reviews (6)
- Quick Hacks (3)
- RF and Radio (1)
- Security (6)
- Site News (11)
- Software (17)
- Toys and Games (4)
- Windows (1)

Your Warranty Is Void.com