The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Monday, February 06, 2012


<indented-block  left-margin="|">
     |      is

< tags with words breaking XML rules >

<tag some attributes>

<test expected-time="10min" start="">

</test elapsed-time=5min>

<tags that occupy a line by themselves are fairly clear/>

and, if so restricted, allowing < and > inside text is simple.

<tag>on same line</tag> and <more/>
no so bad.  But tyhe morte allowed, the more possibilities for confusion.

Serialization Formats

There are so many: http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats And yet, I want some more, with my pseudo-XML (or pseudo-JSON, or whatever)
    ASN/1: an XML subset, that defines how values are represented (as text, not attributes).
    Netstring: length:strings,
    JSON: hashes ansd arrays; keys are strings, just like values. 
    OGDL: trees, indentation based, with commas and parenrtheses for when you don't want new lines. #comments and references. Guaranteed round tripping, except for comments.
    Property lists; NeXT to Apple. 
    YAML: outline indentation, name:, little quoting, {hash: value}, [arrays, ...]; &id anchors and *id references, name: value, name: !!type value, !!binary 
    XML: <tag>barewords</tag>, <>tag/>, attribute metadata. Basically simple, and then a whole lot of crap descended from SGML. Schemas and DTDs :-( Namespaces, references, XQuery, XSLT. An occasionally uncomfortable distinction between metadata (atributes) and data (text).
Why can't we just gedt along? Why can't they all interconvert?

Why I dislike XML

I want to be abkle to write <matching-brackert>
text, maybe some math, like a <b and c> d </matching-brackert>
withiout having to write ugly \&\l\t\; Heck, this WYSIWYG blog ceditor is an example - I can;t figure out how to type thye escape code for & lt ;

Why I like XML

In XML, or pseudo XML, you can see the matching brackets

<matching bracket> stuff
</matching >

whereas in simply parenthesized notations like JSON

{n1: {n2: n3: "do you have enough matching brackets?" }}}


You can usefully define semantics for mismatching constructs. XML also allows unquoted text, because of the very verbosity of its punctuation.

Using = and : binding operators to delineate structure

I have messed around with "pseudo-XML" a lot. Oh, heck, whop cares about the XML part, except that it is a standard?  Structured, simple, data.  Trying to make stuff human readable, but also parseable.

Here's a new-to-me trick:

I often use key=value pairs.

Sometimes I use name: value.

I.e. both = and : are nice ways of binding name and value.

How about:
n1: nA=vA, nB=vB, nC=vC
=>  nA=vA, nB=vB, nC=vC
or whatever way you want to express name/value in XML.

I.e. : as a binding operator that can bind to a list of name=value pairs

Or non-XML, if that's what tickles your fancy.  JSON?:
{ "n1": {"nA": "vA", "nB": "vB", "nC": "vC" } }
(BTW, if I ever declare that I am doing pseudo-JSON, it will be to have fewer quotes.)

{ n1: {nA: vA, nB: vB, nC: vC } }

The biggest difference between shell script languages and regular languages is quoting. "Barewords" are a big characteristic of scripting languages.

:s can bind until the next colon - i.e, all colons have the same strength
n1: nA=vA, nB=vB, n2: v2, n3: nX=vX, nY=vY
{ n1: { nA:vA, nB:vB}, n2:v2, n3: {nX:vX, nY: vY} }
To get more levels deep, would need brackets, or indentation. more operators: ::==

Heck, I'm not trying to define a full language here.  There are at least 3 perfectly good data languages - XML, JSON, S-expressions.

I just want to define some notation that is a bit more user friendly, but can fall back to the other notations if necessary.