The content of this blog is my personal opinion only. Although I am an employee - currently of Imagination Technologies's MIPS group, in the past of other companies such as Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, August 30, 2012

The HTML Mediawiki passes through


I've ranted before about quotification.

In the link is described the HTML that Mediawiki allows in wiki pages - i.e. the stuff it passes through.

Now, Mediawiki has to have a white list, because it cannot allow arbitrary HTML constructs that a maluser might use, e.g. to inject malware onto wiki pages.  E.g. all, or mostly all, user provided JavaScript must not be allowed. Even basic formatting may not be allowed - e.g. an attacker may be able to use CSS styles creatively to render invisble much text on a page, and thereby creating a phishing page with what remains. Plus the usual issues with tracking links, etc.

More basically, Mediawiki must not allow arbitrary user text. In particular, it must not allow text that would interfere with the HTML that Mediawiki is itself producing.

Now, that last is what my "modest proposal for quotification" specifically attacks: basically add a tag bit to every character that the user inserts, so that it can be distinguished from mediawiki added HTML.

But my "modest proposal" would disallow any user added HTML.  Except for the user added HTML that Mediawiki specifically whitelists, finds, filters, and explicitly removes the tag bit from. And, of course, any bugs in that procedure could lead to security holes...

At least HTML, XML, etc. is easy enough to parse that Mediawiki can filter out anything that isn't on the whitelist.  It does not have to worry TOO MUCH that new syntax will be added that it does not know about.  That's the joy about HTML/XML:  the syntax is (relatively) stable. Extensions can be added by adding new elements and attributes, but existing parsers can recognize such additions, and decide to pass them through or filter them out.

But, it would be nice if Mediawiki (or my own tools) could pass through much, much, more mediawiki.  If instead of having to whitelist specific constructs, they could say "Evaluate all HTML that does not consistute a security risk".  And especially Javascript.


  • do not evaluate any HTML or JavaScript that opens client files (without special permission)
  • do not evaluate any HTML or Javascript that renders existing page elements invisible (except for 
  • specially marked stuff.)  I.e. no switching to white on white zero size.
the above is a blacklist.  Or a whitelist:
  • only evaluate HTML or JavaScript or CSS that changes colors in a likmited way, or fonts in a limited way, or text size in a limited way...
and so on.

I.e. it would be nice to be able to EVALUATE arbitrary HTML and Javascript, in a sandbox whose capabilities are explicitly circumscribed.

Whitelist the capabilities. 

Not necessarily the text.

Whitelist what the code does.  Not the input.