Monday, July 14, 2014

UNIX tools and special characters in filenames

UNIX tools are great, with their composability - find | grep | xargs | etc.

But UNIX tools have problems handling entities or objects, such as filenames, that have special characters such as blank spaces or newlines within them.

UNIX tools typically operate on lines (grep, xargs'input), or on words separated by whitespace (e.g. backtick expansion, xargs' invocation of other tools).

Some UNIX tools provide the option of using null separated strings, such as find -print0 or xargs -0.

But as the stackoverflow page shows, people want such flexibility in other tools, like grep. Of course, GNU grep has provided it - --null - but there are probably other such tools.   ... cat?  but of course tr '\n' '\0' ...   still, the list continues.  Mercurial?  Git?

Moreover, null separated is by no means the last word.   What if nulls are allowed in the strings that your are manipulating?  Need either a quotation system, such as XML (and then we get into the issue of quotes upon quotes), or a strings-with-length system.

I have elsewhere talked about making all UNIX tools work with XML.  This is a generalization.

Strings-with-length is most general.  Possibly fragile.  Possibly XML clauses wrapped around simple "obvious" quoting.

