Krazy Glew's Blog: Monday, February 06, 2012

Monday, February 06, 2012

Indent

<indented-block left-margin="|">
|this
| is
|indented
</indented-block>

< tags with words breaking XML rules >

<tag some attributes>
...
...
</tag>
e.g.

<test expected-time="10min" start="">

</test elapsed-time=5min>

--
<tags that occupy a line by themselves are fairly clear/>

and, if so restricted, allowing < and > inside text is simple.

<tag>on same line</tag> and <more/>
no so bad. But tyhe morte allowed, the more possibilities for confusion.

Serialization Formats

There are so many: http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats And yet, I want some more, with my pseudo-XML (or pseudo-JSON, or whatever)

ASN/1: an XML subset, that defines how values are represented (as text, not attributes).

Netstring: length:strings,

JSON: hashes ansd arrays; keys are strings, just like values.

OGDL: trees, indentation based, with commas and parenrtheses for when you don't want new lines. #comments and references. Guaranteed round tripping, except for comments.

Property lists; NeXT to Apple.

YAML: outline indentation, name:, little quoting, {hash: value}, [arrays, ...]; &id anchors and *id references, name: value, name: !!type value, !!binary

XML: <tag>barewords</tag>, <>tag/>, attribute metadata. Basically simple, and then a whole lot of crap descended from SGML. Schemas and DTDs :-( Namespaces, references, XQuery, XSLT. An occasionally uncomfortable distinction between metadata (atributes) and data (text). Why can't we just gedt along? Why can't they all interconvert?

Why I dislike XML

I want to be abkle to write <matching-brackert>
text, maybe some math, like a <b and c> d </matching-brackert>
withiout having to write ugly \&\l\t\; Heck, this WYSIWYG blog ceditor is an example - I can;t figure out how to type thye escape code for & lt ;

Why I like XML

In XML, or pseudo XML, you can see the matching brackets

<matching bracket> stuff
</matching >

whereas in simply parenthesized notations like JSON

{n1: {n2: n3: "do you have enough matching brackets?" }}}

?

You can usefully define semantics for mismatching constructs. XML also allows unquoted text, because of the very verbosity of its punctuation.

Using = and : binding operators to delineate structure

I have messed around with "pseudo-XML" a lot. Oh, heck, whop cares about the XML part, except that it is a standard? Structured, simple, data. Trying to make stuff human readable, but also parseable.

Here's a new-to-me trick:

I often use key=value pairs.

Sometimes I use name: value.

I.e. both = and : are nice ways of binding name and value.

How about:

n1: nA=vA, nB=vB, nC=vC
=> nA=vA, nB=vB, nC=vC
=>

or whatever way you want to express name/value in XML.

I.e. : as a binding operator that can bind to a list of name=value pairs

Or non-XML, if that's what tickles your fancy. JSON?:

{ "n1": {"nA": "vA", "nB": "vB", "nC": "vC" } }

(BTW, if I ever declare that I am doing pseudo-JSON, it will be to have fewer quotes.)

{ n1: {nA: vA, nB: vB, nC: vC } }

The biggest difference between shell script languages and regular languages is quoting. "Barewords" are a big characteristic of scripting languages.

:s can bind until the next colon - i.e, all colons have the same strength

n1: nA=vA, nB=vB, n2: v2, n3: nX=vX, nY=vY

{ n1: { nA:vA, nB:vB}, n2:v2, n3: {nX:vX, nY: vY} }

To get more levels deep, would need brackets, or indentation. more operators: ::==

Heck, I'm not trying to define a full language here. There are at least 3 perfectly good data languages - XML, JSON, S-expressions.

I just want to define some notation that is a bit more user friendly, but can fall back to the other notations if necessary.