Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Imagination Technologies's MIPS group, in the past of other companies such as Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Monday, July 14, 2014

UNIX tools and special characters in filenames

See, fior example:  bash - Is there a grep equivalent for find's -print0 and xargs's -0 switches? - Stack Overflow:



'via Blog this'





UNIX tools are great, with their composability - find | grep | xargs | etc.



But UNIX tools have problems handling entities or objects, such as filenames, that have special characters such as blank spaces or newlines within them.



UNIX tools typically operate on lines (grep, xargs'input), or on words separated by whitespace (e.g. backtick expansion, xargs' invocation of other tools).



Some UNIX tools provide the option of using null separated strings, such as find -print0 or xargs -0.



But as the stackoverflow page shows, people want such flexibility in other tools, like grep. Of course, GNU grep has provided it - --null - but there are probably other such tools.   ... cat?  but of course tr '\n' '\0' ...   still, the list continues.  Mercurial?  Git?



Moreover, null separated is by no means the last word.   What if nulls are allowed in the strings that your are manipulating?  Need either a quotation system, such as XML (and then we get into the issue of quotes upon quotes), or a strings-with-length system.



I have elsewhere talked about making all UNIX tools work with XML.  This is a generalization.



Strings-with-length is most general.  Possibly fragile.  Possibly XML clauses wrapped around simple "obvious" quoting.






No comments: