Quotification, or the lack thereof, is the source of much evil. SQL injection type errors, etc.
I've written already about a modest proposal, to apply a simple taint bit to bytes saying "This is data, not language syntax."
I hesitate to expose my own bugs, but what the heck:
Today I just had a bug that looked like
bash: syntax error near unexpected token `('
terminating .bashrc too early, before all path setup was done.
as I tried to make my UNIX and cygwin environments converge.
Bug was insufficient quoting in
eval $pathvar=$elem:"$pathval"
in some code manipulating the PATH. Had problems with filenames containing special characters like space and ()s. Such fiulenames ae relatively uncommon on *IX, but common on Windows, and Cygwin is both.
Fixed by adding quotes
eval $pathvar="'$elem'":"'$pathval'"
At least I hope it's fixed. What if pathvar has a special character? Nah...
Part of the trouble with quotification is that you have to get it just right. Too many quotes are bad, just like too few.
I might like to do
eval SAFE(SAFE(SAFE($pathvar)))=SAFE(SAFE(SAFE($elem)))::SAFE(SAFE(SAFE($pathval)))
which might really be considered
eval $pathvar=$elem:$pathval
where I have used green text background to indicate stff that should have no sntax when eval'ed, and red to indicate syntax.
This works okay of the eval is immediate. But if the results of eval are themselves evalled an unknown number of times, a simple bit is not sufficient. Nesting...
---
khb commented:
So to be clear, you suggest that we have (for want of a better term) a bit to indicate that something should be treated as data? Was that the sort of thing the near legendary Intel 432 did?
I'm not saying it's a bad idea, I'm just trying to map it into historical perspective ;>
Responding inline, because one of blogger's weaknesses is no formatting in comments. (Probably deliberate, to prevent complex conversations.)
- I do want metadata associated with data
- This might be a tag bit per memory location (but it doesn't have to be)
- actually, I can't remember if the iAPX 432 had such tag bits. It may not have needed them, because it treated everything as object oriented
- one version of the i960, and the IBM AS/400, had such tag bits. However, they used the tag bit for privilege - what they did would not have helped directly for SQL injection type attacks.
- But it doesn't need to be a tag bit. Much discussion on
- It doesn't even need to be hardware. Although I may sometimes discuss HW/ucode implementations, this particular issue, SQL injection and other non-machine code, eval, attacks, can be addressed completely in software.
- E.g. have the language parser just operate on wide bytes, 16 bit or wider. Wider than you use for normal language input.
- Reserve the upper bits for these "tags".
- Have the compiler refuse to accept a semicolon-with-non-syntactic bit attached as language syntax. Similarly for keywords.
Yes, it's tags. But not necessarily HW tags like some old AI machines have. SW tags are good enough. And, in this case, only in strings that you are going to give to a language parser, compiler, evaluator. Basically, use wchar, wide char, 16 bit or wider characters, where some of the
Heck: I edit in gnu emacs, where every character in the buffer can have fairly arbitrary attributes - like 'keyword, 'variable-name, etc. Why not make use of such attributes in the character set that the language parser accepts as input? Especially if can reduce security bugs.
Note: we might not yet be ready to jump to semantically enforced tags - where keywords like IF THEN ELSE would be *required* to have the 'keyword tag on them. We probably still need to be able to accept untagged ASCII characters as input. But, we might be ready for semantic hints - tagging something 'definitely-not-a-keyword. If the parser tries to take such definitely not tagged character as part of a keyword, that could be an error. (As opposed to making it part of an identifier.)