Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, December 08, 2011

A Modest 1 bit Proposal about Quotification - making the Default Easy

Listening to an old "Security Now" podcast while doing my morning stretches.

Leo Laporte's TWIT website was hacked, and Steve Gibson, the Security Guy, says "Any time you are soliciting user input, there is a risk of malicious input somehow tricking the backend and executing that input, when it is meant to be, you know, benign [input data, like] user name and password.".

This is typical of the classic SQL injection hack, and, indeed, of any hack where the attacker is able to inject scripting code and fool the backend into executing it.  Typically by embedding quotes or the like in the input string.

(For that matter, Steve's description also applies to binary injection via buffer overflow.  But we won't go there; this page will talk only about non-buffer-overflow attacks, sijnce we have elsewhere described our agenda for preventing buffer overflow attacks.)

Say that you are talking user input like NAME, and are somehow using it to create an SQL or other language command, like "SELECT FIELDLIST FROM TABLE WHERE NAME = '$NAME'  ".   But now the attacker, instead of providing a nicely formed string like "John Doe", provides instead something like "anything' OR 'x' = 'x  ". (I added spaces between the single and double quotes for readability.) I.e. the user provides a string that contains quotes in the target language - not the language where the query string is composed, but a language further along.  So the query string becomes "SELECT FIELDLIST FROM TABLE WHERE NAME = 'anything' OR 'x' = 'x'  ". And now the query matches any row in the table.  (http://www.unixwiz.net/techtips/sql-injection.html provides examples, as does wikip[edia.).

The general solution to this is "quotification": take the user input, and either delete or quote anything that looks like a quote in the target language:. E.g. transform the attacker's string "anything' OR 'x' = 'x  " into either "anything OR x = x  " or "anything\' OR \'x\' = \'x  ".

The problem with deleting stuff from the user string is that sometimes the user is supposed to have quotelike things.  Consider names like "O'Toole".  Or consider prioviding, e.g. via cut and paste, Chinese unicode names in an application whose original programmer was English, but where the system is otherwise capable of displaying Chinese.  It is a pity if the barrier to internationalizaion is the "security" code scattered throughout your application that santizes user input. Worse, that is the sort of code that might get fixed by somebody who fixing internationalization problems who doesn't understand the security issues

The problem with quotifiying stuff is that it is hard.  It is not just a case, for you Perl afficionadoes, of doing s/'/\/g - what about input strings that already have \\' inside them?  And so on.

But the real problem, applicable to both deleting and quotification strategies, is that the code doing the user input sanitization does not necessarily know the syntax of all of the languages downstream.  It may know that there is SQL along the way - but it may not know that somebody has just added a special filter that looks for French quotes, << and >>.  Etc.  Not just special symbols: I have defined sublanguages where QuOtE and EnDqUoTe were the quotes.

The security code may know the syntax at the time the sanitization code was written.  But the downstream processing may have changed.  The syntax of the language may have been extended, in a new revision of the SQL or Perl or ... .  (I found a bug like that last year.)

The problem is that the user input santization code is trying to transform user input from strings that may be unsafe, to strings that are guaranteed to be safe forever and ever, no matter what revisions are made to the language, etc.   The problem is that the default for character strings is that ANY CHARCATER MAY BE PART OF A COMMAND unless specially quoted.

We need to change this default.  Here is my moldest proposal:

Let us define a character set whereby there is a special bit free in all characters.  And whereby, if that special bit is set, it is guaranteed by ANY REASONABLE LANGUAGE that no character with that special bit set will be part of any command or language syntax like a quote symbol.

We should strongly suggest, that the visual display for the characters with and without the special bit set is the same.  Or at least, the same in most situations - in others you may want to distinguish them, e.g., by shading.
.
If you are using something like BNF to describe your language, then it might be:

ORDINARY_CHARACTER ::== 'A' | 'B' |  ...

TAINTED_CHARACTER ::== 1'A' | 1'B' |  ...
POSSIBLY_TAINTED_CHARACTER ::= ORDINARY_CHARACTER | TAINTED_CHARACTER


where I am using the syntax 1'A' to describe a single character literal. with the special bit set.

STRING_LITERAL := QUOTED_STRING | TAINTED_STRING
TAINTED_STRING ::= TAINTED_CHARACTER+


QUOTED_STRING ::= " CHARACTER* "



(Actually, I am not sure whether a quoted string should be the abnove, or
    QUOTED_STRING ::= " POSSIBLY_TAINTED_CHARACTER* "
)


And we require that the only place where the possibly tainted characters with the tainted bit set are ONLY permitted in strings.  Nowhere else in the language.  Not in keywords, symbols, operators....



Then we just have to ensure that all of our input routines set the special bit. If you really need to form operators, the programmer can untaint the data expliocitly.  Btter to have to untaint explicitly in a few p[laces, than to have to quotify correctly in all places.



Perhaps better to make taintimg the default.  To flip the polarity of the special bit.  And to require that language syntax, keywords, etcv., be set only if the special bit is set.




This is just the well known taint or poison propagation strategy.  Exposed to programming language syntax definitions.

I have elsewhere espoused taking advantage of extensible XML syntax for programming languages.  This is similar, although orthogonal.