AntiSamm "Filters your JavaScript from HTML content"

Jason Harwig has posted about AntiSamy, the Java 1.5 compatible library that sanitizes away: About AntiSamy: The OWASP AntiSamy project is a few things. Technically, it is an API for ensuring user-supplied HTML/CSS is in compliance within an application's rules. Another way of saying that could be: It's an API that helps you make sure […]

Jason Harwig has posted about AntiSamy, the Java 1.5 compatible library that sanitizes away:

About AntiSamy: The OWASP AntiSamy project is a few things. Technically, it is an API for ensuring user-supplied HTML/CSS is in compliance within an application's rules. Another way of saying that could be: It's an API that helps you make sure that clients don't supply malicious cargo code in the HTML they supply for their profile, comments, etc. that gets persisted on the server. The term malicious code in terms of web applications is usually regarded only as JavaScript. Cascading Stylesheets are only considered malicious when they invoke the JavaScript engine. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner.

Philosophically, AntiSamy is a departure from all contemporary security mechanisms. Generally, the security mechanism and user have a communication that is virtually one way, for good reason. Letting the potential attacker know details about the validation is considered unwise as it allows the attacker to "learn" and "recon" the mechanism for weaknesses. These types of information leaks can also hurt in ways you don't expect. A login mechanism that tells the user, "Username invalid" leaks the fact that a user by that name does not exist. A user could use a dictionary or phone book or both to remotely come up with a list of valid usernames. Using this information, an attacker could launch a brute force attack or massive account lock denial-of-service. So, we get that.[Full wiki]

Jason Harwig on AntiSamm: I gave a JavaScript security talk last month, and one of the topics was HTML filtering. I gave examples of how MySpace tried to filter executable code, while still allowing HTML tags for formatting. MySpace, of course, failed to foresee every attack vector, and the Samy worm was born.

HTML filtering was never recommended because it was so difficult to get right, and with no proven libraries, trying to build a solution would almost certainly contain security holes. Thanks to Arshan Dabirsiaghi we finally have something to use. He has created the OWASP AntiSamy project to easily sanitize HTML input. AntiSamy is currently implemented as a Java 1.5 compatible library, but there are plans to support other platforms.

Here's a sample usage:

AntiSamy sanitizer = new AntiSamy(); CleanResults results = sanitizer.scan(request.getParameter("html")); String html = results.getCleanHTML(); if (!results.getErrorMessages().isEmpty()) { log.warn("Input contains errors"); } AntiSamm, AntiSamy, Security, JavaScript, Filter, HTML