Saturday, January 14, 2012


I'd like to have a text parser, like Perl CPAN Text::ParseWords,
that *only* breaks the text into words
- but which does not transform the words, handle escape characters, etc.

For example,
   Text::ParseWords::
      shellwords("a b 'c d' e")
returns
   a
   b
   c d
   e
i.e. it breaks the text up into words,
but it also transforms the words.

I would like to separate the breakup from the transformation:
   a
   b
   'c d'
   e

Note that if you ever encounter such a list whose words can themselves be further broken up,
then you know that it has been parsed by some tool after your original parser.

[[Category:Programming]] [[Categy::Text]]