Design, Usability and Security Dilemmas With User Generated ContentSep 18, 2009 In Design By Karsten Januszewski
Allowing users to add their content—feedback, reviews, expertise, etc.—to a web page is ubiquitous these days. Whether we’re talking about comments on a blog post or wiki articles, user generated content is everywhere.
The mechanisms for dealing with this type of content, however, are hardly standardized. There are usually three approaches. Users can either:
- Enter text, but not format
- Add HTML directly to the comments
- Use an alternative mark-up syntax
Each of these approaches has pros & cons. Here are just a few:
Approach #1—Text Only
Pro: Nobody can pollute the comments with awful images, formatting, or links.
Con: Nobody can enhance the comments with great images, formatting, or links. You can get around the hyperlink problem fairly easily (by converting http:// references into hyperlinks as the data exits the system),but this doesn’t fix the formatting or images issue.
Pro: Users get a lot of power. They can customize pages,profiles, the whole bit. MySpace is an example of this. Some would argue that the success of MySpace is a result of their allowance of this behavior.
Con: First, there is a usability risk: you have to assume that users know HTML, or teach it to them on the fly. And then there’s the design problem: allowing HTML means that users can do all kinds of crazy things—embedding images, adding Flash or Silverlight objects, inserting styles, running the banner tag. MySpace is an example of user-added HTML gone wild and, some would argue it is the “problem” with MySpace.
Another option is to allow a narrow subset of HTML. Just the <a> and the <strong> tag? Or more?
Allowing HTML as user generated comments opens up big security issues – read on for an in-depth discussion of this.
Approach #3—Alternative Mark-up Syntax (Aka the wiki way)
Pro: Wikis, which use their own syntax for formatting, are a perfect example. And, there are other syntaxes out there. The nice part of using one of these syntaxes is that you avoid some of the problems with HTML, as far as security and license to do ill.
Con: Users are forced to learn a new language. And there are lots of languages out there: Textile, Markdown, Markdown with Smarty Pants, Multimarkdown, etc. Heck, Mix Online supports comments written with the Textile syntax and implemented through the by using a library from Codeplex called Textile.NET, though we never tell you in the comment form. (Maybe that’s coming in version 2 – ask Nishant.) In fact, try adding a comment to Mix Online and use the Textile format – you’ll see it works.
It’s All a Security Problem
No matter which approach you take, there is one big Universal Con to opening your doors to user generated content: security. User generated content makes all kinds of attacks possible—from SQL Injection to cross site scripting to who knows what.
Some of the worry goes away with ASP.NET, because it has an attribute (validateRequest) that can prevent someone from inserting malicious content. But, if you want allow HTML, you’ll have to turn validateRequest off, which is turned on by default. That means you have to write your own validation as data enters the system.
With validateRequest or your own home rolled solution, we are talking about checking data as it enters the system. What if something does slip through?
A more thorough procedure for the paranoid among us is to sanitize the data as it leaves the system as well. You can do this manually by encoding all output (HTMLEncode(), UrlEncode(), etc.). Or, in ASP.NET, you can pass all data through the Anti-Cross Site Scripting library (originally from the Microsoft Patterns and Practices group). Implementing this library is easy and I highly recommend it. You’ll notice the recent version of Oxite does just this.
What Do You Think?
I leave you on an inconclusive note. All three approaches have pros/cons, and none is necessarily right. So I’m curious: which approaches do you take as web developers? Which do you prefer as users? Let us know in the comments – formatted with Textile if you’d like — or on Twitter.