Skip to content

An html sanitizer for C#

March 4, 2010

sanitizer After 3 months gestation and some bug fixes, HtmlSanitizer is reporting no hacking successes.

Does it mean that it rocks? I don’t think so, but it is probably strong enough to sail in stormy waters.

See my previous post to know how it works, but mainly test it online with the Patapage playground.

Being honest I received some complaints concerning the black list approach to CSS styles, but no one has hacked the current version (yet :-) ). In any case the code is open to changes, and I’m happy to receive your feedbacks.

Now we are proud to announce that there is a porting to C# by Beyers Cronje (thank you). You can find C# sources here (and Java ones here).  Warning: source code is already patched as suggested by Isaiah.

Other portings are welcome!

12 Comments leave one →
  1. Sal permalink
    August 12, 2010 10:28

    Ciao Roberto, complimenti per il codice, volevo segnalarti una cosa … ho provato a copiare un testo proveniente da word (altro annoso problema) e ho notato che il codice lasciava un tag di chiusura del tipo o:p.

    Saluti

  2. Isaiah permalink
    September 23, 2010 18:22

    I found a couple bugs, present in both the java and C# versions:

    1. Self-closed tags were being converted to a pair of tags;
    test case: <param/><param/> becomes <param><param></param></param>

    2. Incorrect index in the replaceAllNoRegex function;
    buffer.Append(source.Substring(oldPos, pos));
    should be
    buffer.Append(source.Substring(oldPos, search.Length));

    Here is a patch for the C#:

    --- HtmlSanitizer.orig.cs	Thu Sep 23 12:17:54 2010
    +++ HtmlSanitizer.new.cs	Thu Sep 23 12:18:03 2010
    @@ -302,7 +302,10 @@
     
                                 cleanToken = cleanToken + " " + attr + "=\"" + val + "\"";
                             }
    -                        cleanToken = cleanToken + ">";
    +                        if (selfClosed.Match(token).Success)
    +                            cleanToken = cleanToken + "/>";
    +                        else
    +                            cleanToken = cleanToken + ">";
     
                             isAcceptedToken = true;
     
    @@ -316,7 +319,7 @@
                             token = cleanToken;
     
                             // push the tag if require closure and it is accepted (otherwise is encoded) 
    -                        if (isAcceptedToken && !(standAloneTags.Match(tag).Success || selfClosed.Match(tag).Success))
    +                        if (isAcceptedToken && !(standAloneTags.Match(tag).Success || selfClosed.Match(token).Success))
                                 openTags.Push(tag);
     
                             // --------------------------------------------------------------------------------  UNKNOWN TAG 
    @@ -601,7 +604,7 @@
                     int oldPos, pos;
                     for (oldPos = 0, pos = source.IndexOf(search, oldPos); pos != -1; oldPos = pos + search.Length, pos = source.IndexOf(search, oldPos))
                     {
    -                    buffer.Append(source.Substring(oldPos, pos));
    +                    buffer.Append(source.Substring(oldPos, search.Length));
                         buffer.Append(replace);
                     }
                     if (oldPos < source.Length)
    • Isaiah permalink
      September 23, 2010 19:08

      Correction: Bug #2 is ONLY in the C# code. The substring function in C# takes params start position, length, versus the one in java, which takes start position, end position.

    • Isaiah permalink
      September 23, 2010 19:13

      Another correction (sorry):
      buffer.Append(source.Substring(oldPos, search.Length)); is wrong, it should read

      buffer.Append(source.Substring(oldPos, pos – oldPos));

      • Eric Lebetsamer permalink
        December 5, 2010 14:04

        Thanks for the fixes Isaiah.

  3. kibria permalink
    November 10, 2011 10:27

    Thanks a lot for sharing your code.
    Please update the C# source file as fixed by Isaiah.

  4. Thomas permalink
    April 3, 2012 11:20

    It is really nice but it kinda kills relative urls in img tags, src gets “killed”.

  5. June 9, 2012 08:45

    A fantastic blog post, I just psased this onto a university student who was doing a little research on this. And he in fact bought me lunch because I found it for him smile.. So let me reword that: Thank you for the treat! But yeah Thnkx for taking the time to talk about this, I feel strongly about it and enjoy reading more on this topic. If possible, as you gain expertise, would you mind updating your blog with more details? It is extremely helpful for me. Big thumb up for this share!

  6. Mark Anthony permalink
    October 10, 2012 14:54

    not working well

Trackbacks

  1. Is there a good solution for a C# html sanitizer? - How-To Video

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: