Krazy Glew's Blog: Quoting is the source of much evil (or the lack thereof)

Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, August 02, 2012

Quoting is the source of much evil (or the lack thereof)

Quotification, or the lack thereof, is the source of much evil. SQL injection type errors, etc.

I've written already about a modest proposal, to apply a simple taint bit to bytes saying "This is data, not language syntax."

I hesitate to expose my own bugs, but what the heck:

Today I just had a bug that looked like

bash: syntax error near unexpected token `('

terminating .bashrc too early, before all path setup was done.

as I tried to make my UNIX and cygwin environments converge.

Bug was insufficient quoting in

eval $pathvar=$elem:"$pathval"

in some code manipulating the PATH. Had problems with filenames containing special characters like space and ()s. Such fiulenames ae relatively uncommon on *IX, but common on Windows, and Cygwin is both.

Fixed by adding quotes

eval $pathvar="'$elem'":"'$pathval'"

At least I hope it's fixed. What if pathvar has a special character? Nah...

Part of the trouble with quotification is that you have to get it just right. Too many quotes are bad, just like too few.

I might like to do

eval SAFE(SAFE(SAFE($pathvar)))=SAFE(SAFE(SAFE($elem)))::SAFE(SAFE(SAFE($pathval)))

which might really be considered

eval $pathvar=$elem:$pathval

where I have used green text background to indicate stff that should have no sntax when eval'ed, and red to indicate syntax.

This works okay of the eval is immediate. But if the results of eval are themselves evalled an unknown number of times, a simple bit is not sufficient. Nesting...

---

khb commented:

So to be clear, you suggest that we have (for want of a better term) a bit to indicate that something should be treated as data? Was that the sort of thing the near legendary Intel 432 did?

I'm not saying it's a bad idea, I'm just trying to map it into historical perspective ;>

Responding inline, because one of blogger's weaknesses is no formatting in comments. (Probably deliberate, to prevent complex conversations.)

I do want metadata associated with data

This might be a tag bit per memory location (but it doesn't have to be)

actually, I can't remember if the iAPX 432 had such tag bits. It may not have needed them, because it treated everything as object oriented
one version of the i960, and the IBM AS/400, had such tag bits. However, they used the tag bit for privilege - what they did would not have helped directly for SQL injection type attacks.

But it doesn't need to be a tag bit. Much discussion on

It doesn't even need to be hardware. Although I may sometimes discuss HW/ucode implementations, this particular issue, SQL injection and other non-machine code, eval, attacks, can be addressed completely in software.

E.g. have the language parser just operate on wide bytes, 16 bit or wider. Wider than you use for normal language input.
Reserve the upper bits for these "tags".
Have the compiler refuse to accept a semicolon-with-non-syntactic bit attached as language syntax. Similarly for keywords.

Yes, it's tags. But not necessarily HW tags like some old AI machines have. SW tags are good enough. And, in this case, only in strings that you are going to give to a language parser, compiler, evaluator. Basically, use wchar, wide char, 16 bit or wider characters, where some of the

Heck: I edit in gnu emacs, where every character in the buffer can have fairly arbitrary attributes - like 'keyword, 'variable-name, etc. Why not make use of such attributes in the character set that the language parser accepts as input? Especially if can reduce security bugs.

Note: we might not yet be ready to jump to semantically enforced tags - where keywords like IF THEN ELSE would be *required* to have the 'keyword tag on them. We probably still need to be able to accept untagged ASCII characters as input. But, we might be ready for semantic hints - tagging something 'definitely-not-a-keyword. If the parser tries to take such definitely not tagged character as part of a keyword, that could be an error. (As opposed to making it part of an identifier.)

2 comments:

khb said...: So to be clear, you suggest that we have (for want of a better term) a bit to indicate that something should be treated as data? Was that the sort of thing the near legendary Intel 432 did?

I'm not saying it's a bad idea, I'm just trying to map it into historical perspective ;>; 8/02/2012 11:24 a.m.
Andy "Krazy" Glew said...: I am going to respond in the blog entry, because there is insufficient formatting in comments.; 8/02/2012 12:53 p.m.

Krazy Glew's Blog

Disclaimer

Thursday, August 02, 2012

Quoting is the source of much evil (or the lack thereof)

2 comments:

Blog Archive

Labels

Search This Blog

Followers

About Me

Links to Me