Thursday, December 20, 2012

Finite Loop

I was going to post more details about JSON-U yesterday, but right about then the entire concept was in pieces around my feet after discovering a few things about how the colon and semicolon are handled in reality. Don't ask.

But that's OK, because I reworked the grammar and managed to remove both special characters, two more, and added new capabilities besides. To the point of making URL's nearly Turing-complete. Ignore that for the moment.

There are a few last things I'm trying to figure out. I've got my encoding to reliably survive most kinds of common mangling, and it really takes some determined effort to make the parser fail, but there's always the lure of handling one more case now so you never have to worry about it again.

Oddly, by being so free to rip things out of the parser, I'm discovering was of putting them back but using only the things remaining. For example, I had an array constructor syntax, and a function call syntax. Then I removed arrays as an explicit data, which is unhead of, but I kept the named function space (now called 'locallizers") and defined the "unnamed function" to "return an array consisting of the parameters passed to the function".

Boom, explicit arrays are gone. Replaced with a "pseudo-call" to a function that creates them. So functions are more primal than arrays, in a data coding. And since the functions use the round brackets, we save two characters from ever being used.

I've gone round and round, pulling things out and replacing them again. (quoted strings are back, but only to read in user-hacked values, never as output.) and it's like an infinite Rubik's cube where a couple more twists leaves a scrambled mess, but then a few more moves and everything is harmonious and better matched than ever.

I'm down to the point where I have a test page that generates hundreds of random data structures, encodes them all, checks they go backwards again properly, and offers up a list for clicking. I can type random crap into the parse-as-you-type text boxes, and JSON-U tends to cope with this better (spends more time in a 'parsed' state that correctly represents the data) than JSON does.

Along the road I've made some funny choices. But they hopefully combine their powers in a good way. For example, here's one consequence:

If you repetitively 'unescape' canonical JSON-U, you get the same string.
if you encodeURI() canonical JSON-U, you get the same string.

That's actually a major ability, and can't be said for any encoding that has a backslash, percent sign, or space in it. Ever.

("Canonical" just means "the one true blessed way, not the way we do it when we're in a rush.")

The single remaining annoying issue turns up when someone uses encodeURIComponent() on something that ends up in the hash fragment. From what I can tell of the standard, the fragment requires only the equivalent of encodeURI() and all possible component characters, including the hash, are allowed from that point on.

Therefore, doing a blanket encodeURIComponent() or escape() on anything destined for the hash fragment is de facto wrong. But that won't stop people from doing it, because who really knows the standards that well? How far do I go to accept the incorrect behavior? I think the answer is actually "not at all". But then, it might be easy to make it work with just a few more twists of the cube.

At the moment my encoding survives most mangling, Can I make it survive all? Perhaps.

Why do I care so much? Because shortly I'll be handing out URLs in the new format that I'll be expected my server to honor for months. Years even. While the code goes through development cycles, and I'm sure I'll want to change the parameters every few damn months. I need something like JSON-U up and running before I even  hand out my first (versioned, metadata enhanced, expiry timed) links.

I have to be able to accurately predict all possible futures for the data I'll want to transmit and consume. As they say, prediction is very hard, especially about the future.

Fortunately, I am a computer scientist. And predictions of those kinds are one of my superpowers.

No comments:

Post a Comment