On JSON

JSON (JavaScript Object Notation) is a data exchange format that’s becoming increasingly popular among the AJAX crowd. Several AJAX frameworks (e.g. Ajax.NET and Microsoft’s ‘Atlas’ project) are embracing JSON on the wire because it’s very simple to produce and consume from a JavaScript-based browser environment.

JSON and objects

Linguistically, JSON is nothing more than JavaScript’s standard object initialization syntax. JScript has long has the ability to initialize objects using a lightweight syntax similar to the following:

     var person = { name: “Steve”, age: 27,  jobTitle: “Program Manager”  };

When the JScript interpreter evaluates this statement, the result is that the person variable holds an object with data slots labeled name, age, and jobTitle. That is, the statement person.name evaluates to “Steve” and person.jobTitle evaluates to “Program Manager”. When you realize that JScript has the ability to dynamically evaluate arbitrary strings via the eval() method, it doesn’t take a great intellectual leap to see how the object initialization syntax can be applied to data exchange. All I have to do is send you a string containing valid initialization syntax by some mechansim (e.g. XmlHttpRequest). You then call eval() on that string and then you’ve got a nice object structure that you can party on. JSON is attractive precisely because the client-side deserialization routines are already embedded in the core JScript platform (you unfortunately have to write the client-side serialization piece yourself, but that’s almost trivially easy to do using JScript prototypes and a few lines of code). That is, you can send JSON to a script-enabled browser and be reasonably sure they’ll be able to evaluate it — something not necessarily true with XML today.

From a data model perspective, the JSON data model is atoms, pairs, sequences, arrays. LISP-heads are shaking their heads going ‘all you need are atoms and pairs’, but the JSON data model maps very well into both Indigo’s [DataContract] system as well as most object-oriented type systems (the nullability of value types may cause issues in some places, however). JSON’s atoms are strings, numbers, booleans, and the ‘null’ token. Objects are expresses as sequences of name : value pairs, with arrays being a special type of sequence containing an ordered list of unnamed values. Beyond that, there’s not much else that you need to know to build a JSON parser (all the details can be found at JSON.org).

JSON and nominal typing

One thing that I really like about JSON is that strict JSON has absolutely no notion of nominal type. JSON documents consist of anonymous complex types that remain forever unnamed. For example, consider the following JSON document:

{
  name: “Steve”,
  address: {
       street: “2323 Foobar Lane”,
       city: “Seattle”,
       state: “WA”,
       zip: 98911
  }
}

Notice how the value of the address field is a complex structure but there’s no idea of “type name” anywhere in the intialization string. When this document is evaluated, the address slot will be bound to an instance of some anonymous complex type. You can drill into this object and ask it for its data (it knows about stree, city, state, and zip — you can ask it for other fields but it will return ‘null’. Asking for missing members is intentionally not an exception — see below on open content models). Compare that to the following code, which is legal JScript but not technically legal JSON:

{
  name: “Steve”,
  address: new StreetAddress( “2323 Foobar Lane”, “Seattle”, “WA”, 98911 }

}

This code requires a shared concept of a nominal ‘StreetAddress” type and requires that both the serializing party and the deserializing party know about it. As long as the client side has a preexisting notion of the Address type the eval() trick still works, but this rubs me the wrong way. I think that you want to keep both sides as independent as possible, and avoid coupling both sides to a shared nominal type system if at all possible. It’s my opinion that data should arrive on the wire gooey and typeless; it’s up to the party consuming the data to inflict whatever local notion of type it needs to. Coupling happens at the structural level (“I need a string called name and an int called zip ”) not at the nominal type level (“I need an instance of a Quux”). Shared nominal typing just makes everything more difficult, and when you’re dealing with a dynamically typed language like JScript that doesn’t really require nominal typing to be successful I think it makes sense to avoid it.

 JSON and XML

One of the nice things about JSON is that there’s a trivial isomorphism between a JSON document and an equivalent XML Infoset. For example, the structure above can be represented in XML as follows:

<object>
   <name>Steve</name>
   <address>
         <street>2323 Foobar Lane</street>
         <city>Seattle</city>
         <state>WA</state>
         <zip>98911</zip>
   </address>
</object>

One side observation is that the JSON representation is much nicer, as the data-to-markup ratio is much higher. Anyway, there are a few things related to array serialization that you need to shake out but in general the process of creating a JSON-equivalent Infoset from an arbitrary JSON document is straightforward and mechanistic.

Because this isomorphism exists, I look at JSON as an alternative serialization of element-normal XML sans namespaces. This is a very useful comparison to make in my mind, because there’s been a lot of work already done to figure out exactly how far you can push this type of XML data. The lack of a JSON equivalent for XML attributes is less troubling to me than the lack of namespace support – but all the arguments both for and against XML namespaces can also be applied to JSON. This comparison is a helpful framework for judging the technical capabilities of JSON within an overall architecture.

JSON-equivalent Infosets and XSD schemas

I currently know of no technologies for describing the expected structure of JSON instance documents in a machine-readable way. As such, the “metadata problem” is something that is definitely on the front burner when it comes to JSON. However, because JSON instance documents are really just a special class of XML instance documents in disguse, it’s probably easier to focus on describing the JSON-equivalent Infoset using exsiting XML technologies and describe the actual JSON itself indirectly. That is, I could write a description of the JSON-equivalent Infoset using something like an XSD schema or a RelaxNG grammar and let you figure out what the corresponding curly-brace representation would look like by applying the mechansitic conversion from XML to JSON.

Of the two description options (XSD and RelaxNG), RelaxNG is unquestionably the better option when it comes to describing JSON documents. This is because interior forest nodes in a JSON documents are implicitly unbounded (they support open content) and unordered — elements at the same level of the heirarchy are not constrained to appear in any particular order. That is, a JSON consumer doesn’t care if you send city, state, zip or zip, city, otherStuff, state so long as the expected elements are all present eventually. You could potentially describe JSON Infosets using XSD’s xs:all construct, but that would require a closed content model as xs:all does not compose well with xs:any due to the inherent UPA violations. RelaxNG does not have these sorts of restrictions and is quite capable of validating JSON-type grammars. As such, it’s technically a much better option for describing JSON-equivalent Infosets.

Conclusion

I can definitely see why JSON appeals to web developers. It’s a simple, lightweight data format that’s extremely easy to produce and consume from the environments in which web developers work. The protocol overhead of JSON is significantly smaller than the corresponding XML, and any JScript environment that supports eval() can consume JSON data and render it in the local programming model as objects. JSON is essentially equivalent to element-normal XML minus namespaces, and that isomorphism is both a blessing and a curse. It’s helpful because existing XML technologies can be brought to bear on JSON without a large degree of pain, but it also means that JSON will be unequipped to solve the large-scale integration problems that the XML community addresses via XML namespaces. I can see JSON being successful for point-to-point integration and data exchange between a brower client and a server backend, but I don’t see it rivaling XML as a data exchange format for large-scale data integration anytime soon. AJAX and JSON are at the point where XML, SOAP, and XML-RPC were about 5 years ago; they just haven’t realized that yet.

 

#1 estee on 9.21.2005 at 8:12 AM

regarding «ability to initialize objects using a lightweight syntax».. recently i was curious enough to parse both Google Maps and MS Virtual Earth client sources, what has astonished me is that they use `var o=new Object()` instead of `var o={}` aLL the time despite an evident effort to compress the code.. (perhaps should ask Eric Lippert is there any advantages of the former syntax ;)speaking about «client-side serialization piece», JScript doesn't have any, but Mozilla's Spider Monkey has it's `uneval()` which made me jealous to construct one for myself ([http://not.from.kiev.ua/biz/000157.html]).. my special interest is of a more general purpose `uneval()` implementaion not limited to AJAX-related types.. In this respect JScript flavour could be even more interesting because of it's exotic creatures like ActiveXObject..with XML representation in JSON, some good ideas could be borrowed from Perl and it's HTML::Element .. For example, when it's come to generate a lot of XML/HTML with JScript, the following data representaion of tag/attr/#text may get in the way: ['tagName',{attrName:'attrValue'},['innerTagName'],'innetText']This implies the simples scheme in which XML element is mapped to array with it's first element being the tag name, followed by any number of: (1) associative arrays (Object), representing attributes, (2) arrays (Array) standing for nested elements and (3) simple scalar values being #text nodes.. According to ECMA specs Object object properties are not guaranteed to retain it's order, so this is exactly what we need from XML attributes. In turn, JScript array (Array object) respect the order of it's elements and this is what we expect for XML elements..Finally, i'd like to thank you for mentioning "RelaxNG".. This is something new to me, will have to investigate it further..

#2 estee on 9.24.2005 at 10:22 AM

2EvalThis:yeah, exactly, under `uneval()` function i mean the same functionality which is also exposed as a `.toSource()` method.. Though, in MS JScript flavour there were neither `uneval()` nor `toSource()` ever. I don't know what does ECMA-262 say about it..

#3 היסטוריה on 7.06.2006 at 2:46 PM

yeah, exactly, under `uneval()` function i mean the same functionality which is also exposed as a `.toSource()` method