An html sanitizer for C#

sanitizer After 3 months gestation and some bug fixes, HtmlSanitizer is reporting no hacking successes.

Does it mean that it rocks? I don’t think so, but it is probably strong enough to sail in stormy waters.

See my previous post to know how it works, but mainly test it online with the Patapage playground.

Being honest I received some complaints concerning the black list approach to CSS styles, but no one has hacked the current version (yet 🙂 ). In any case the code is open to changes, and I’m happy to receive your feedbacks.

Now we are proud to announce that there is a porting to C# by Beyers Cronje (thank you). You can find C# sources here (and Java ones here).  Warning: source code is already patched as suggested by Isaiah.

Other portings are welcome!

Tagged as: , , , , ,

12 thoughts on “An html sanitizer for C#”

  1. Ciao Roberto, complimenti per il codice, volevo segnalarti una cosa … ho provato a copiare un testo proveniente da word (altro annoso problema) e ho notato che il codice lasciava un tag di chiusura del tipo o:p.


  2. I found a couple bugs, present in both the java and C# versions:

    1. Self-closed tags were being converted to a pair of tags;
    test case: <param/><param/> becomes <param><param></param></param>

    2. Incorrect index in the replaceAllNoRegex function;
    buffer.Append(source.Substring(oldPos, pos));
    should be
    buffer.Append(source.Substring(oldPos, search.Length));

    Here is a patch for the C#:

    --- HtmlSanitizer.orig.cs	Thu Sep 23 12:17:54 2010
    +++	Thu Sep 23 12:18:03 2010
    @@ -302,7 +302,10 @@
                                 cleanToken = cleanToken + " " + attr + "="" + val + """;
    -                        cleanToken = cleanToken + ">";
    +                        if (selfClosed.Match(token).Success)
    +                            cleanToken = cleanToken + "/>";
    +                        else
    +                            cleanToken = cleanToken + ">";
                             isAcceptedToken = true;
    @@ -316,7 +319,7 @@
                             token = cleanToken;
                             // push the tag if require closure and it is accepted (otherwise is encoded) 
    -                        if (isAcceptedToken && !(standAloneTags.Match(tag).Success || selfClosed.Match(tag).Success))
    +                        if (isAcceptedToken && !(standAloneTags.Match(tag).Success || selfClosed.Match(token).Success))
                             // --------------------------------------------------------------------------------  UNKNOWN TAG 
    @@ -601,7 +604,7 @@
                     int oldPos, pos;
                     for (oldPos = 0, pos = source.IndexOf(search, oldPos); pos != -1; oldPos = pos + search.Length, pos = source.IndexOf(search, oldPos))
    -                    buffer.Append(source.Substring(oldPos, pos));
    +                    buffer.Append(source.Substring(oldPos, search.Length));
                     if (oldPos < source.Length)
    1. Correction: Bug #2 is ONLY in the C# code. The substring function in C# takes params start position, length, versus the one in java, which takes start position, end position.

    2. Another correction (sorry):
      buffer.Append(source.Substring(oldPos, search.Length)); is wrong, it should read

      buffer.Append(source.Substring(oldPos, pos – oldPos));

  3. A fantastic blog post, I just psased this onto a university student who was doing a little research on this. And he in fact bought me lunch because I found it for him smile.. So let me reword that: Thank you for the treat! But yeah Thnkx for taking the time to talk about this, I feel strongly about it and enjoy reading more on this topic. If possible, as you gain expertise, would you mind updating your blog with more details? It is extremely helpful for me. Big thumb up for this share!

Leave a Reply

Your email address will not be published. Required fields are marked *