• Feeds

    Subscribe in a reader

  • Ads

Why XSD is not a type system

DouglasP is finally making good on his promise to tell us more about Whidbey’s approach to schema versioning using the XmlFormatter. Well, almost; he’s just tantalizing us with some rhetorical questions. It’s so weird the way timing works in the blogosphere; I just finished writing up a whole bunch of docs for work on a lot of the basic ideas he’s asking people to think about. Along those lines:

One of the hardest things to grok about web services is the notion of “type”. We know that services share schemas, not types, but what does that mean, really? What is a “type”? Does that meaning change when we think about XSD types in comparison to .NET types? What do people mean when they say that XSD is not a type system?

The .NET type system (the Common Type System, or CTS) is a nominal type system. Nominal types are characteristically closed representations of structure that are both definitive and authoritative. Looking at these three characteristics in more detail:

  • Closed. A nominal type is closed because once it’s published, it can’t change – there’s no room to cram “other stuff we didn’t think of when we defined the type” into a nominal type. If you want to add new members to an existing type, you must create a new nominal type (either through versioning, namespace differentiation, or subtyping). The closed property of nominal types should be very familiar to anybody who’s had to change a COM interface after they’ve published it.
  • Definitive. A nominal type definition is an exact representation of all of its possible instances. Given a nominal type, you can project the structure of any possible instance of that type, because all nominal types are closed and therefore have identical structure. There are no “optional” members of a nominal type.
  • Authoritative. An instance has exactly one nominal type associated with it. Furthermore, that type cannot be considered nominally equivalent with any nominal type other than itself. There’s no room for interpretation when it comes to nominal type.

This last point brings into question how you tell when to types are nominally equivalent. In .NET, the rule for nominal type equivalency is pretty simple. Two types are considered to have strict nominal equivalence if they have the same AssemblyQualifiedName. Thus, in order to be considered nominally equivalent, not only do the two types have to have the same name and be defined in the same namespace, they have to be defined in the same version of the same assembly. This is why I say that nominal types don’t “really” version – they just pretend they do.

But wait – doesn’t subtyping break the whole authoritative characteristic? After all, doesn’t the C#
is operator respect subtypes? The answer is that the is operator has a slighltly laxer definition for “is” than the type system does. The operator doesn’t really check for nominal type equality; it checks for castability which is a whole different ball game. That is, saying

            a is typeof( Bar )

is quite different than saying

            a.GetType() == typeof( Bar )

because the former will return true if a is equivalent to or derived from Bar (castability), and the latter is a check for strict type equality and will not consider subtypes.

In .NET, each object carries around type information that is readable by the runtime. At any time during your codes execution, you can walk up to an object instance and ask it “what type are you?” by calling Object.GetType(). Your object will happily spit back a bunch of metadata (in the form of a System.Type instances) that tells you exactly, without question, what the type of that object is. You can’t argue with the result – the nominal type of that object is not open to interpretation.

Conversely, consider a blob of XML on the wire. It’s just a happy little set of angle brackets, floating around network space basking in the glow of its well-formedness. On the wire, an XML message has no type – it’s just XML, with no schema information. The most information that you can get out of an XML document is namespace qualifications, but these do not imply type. All a namespace qualifier does is differentiate elements that might coincidentally have the same simple element name. By using namespaces, I can differentiate a from an but I can’t tell anything else. It’s impossible to tell from a namespace alone what a should look like. To do that, I need a schema.

A schema is a set of machine-verifiable expectations that a data consumer maintains and enforces about the data it consumes. This already differentiates it from nominal types, because a nominal type is what an object hands you when it says “here’s what I am!”. A document makes no assertion as to the schema(s) against which it will validate. Rather, schema is how you, as the data consumer, determine if a blob of data looks like you think it should.

XSD schemas have this idea of “types”, which are really just syntactic sugar that makes dealing with complex structures easier. When you refer to an XSD type in a schema, you’re really just using the type name as a shortcut to imply some potentially complex element hierarchy that you don’t feel like typing out every time. Unlike nominal types, XSD types can support an open content mode, which makes them neither definitive nor authoritative.

XSD supports open content via extensibility elements such as . This is the space for “everything else we didn’t explicitly mention”. It’s also the single biggest differentiator between XSD and nominal type systems. Once a type supports open content, it’s impossible for it to be a definitive representation of all possible instance documents. Given an open schema, you can’t enumerate the structures of all possible instance documents (at least, not in finite time). Put another way, the presence of an extensibility element now means that there is a theoretically infinite number of possible instance documents that can be validated by that schema. Thus, an XSD schema type is not a definitive representation of an instance document.

Neither is XSD type an authoritative structural representation of a blob of XML. For any given instance document, there are a potentially large number of schemas out there in the world that will validate that document (I’m not sure if this number is finite or not; I’d have to think about that for a while). Thus, you can’t use can-be-validated-by as the criteria for authoritatively determining if document D “is” XSD type T. If you use that criteria, the answer to that question might be “yes”, with the caveat that according to that criteria, D “is” all of the many other XSD types that also validate D. Thus, it’s impossible to authoritatively say that a given document has any intrinsic type – only that is potentially many types, defined by the set of schemas that validate the document.

Food for thought: is it possible to use XSD as nominal type system? If so, when does it make sense to do so?

To sum up, a nominal type is a closed representation of structure that is both definitive and authoritative. An XSD type is open, which logically makes it neither definitive nor authoritative.

And that’s why I feel it’s ok to say that “XSD is not a type system”

[Update: If you haven't been bored to death already, there's more on this here.]

#1 Patrick Cauldwell on 1.23.2004 at 9:17 AM

Thanks for such a well thought out summary.The one comment I would add is that .NET considers two types to we equal if the have the same AssemblyQualifiedName AND they were both loaded from the same assembly load context.For example, if you load exactly the same assembly, once with Assembly.Load(), and once with Asselmbly.LoadFrom(), the two types will fail an equality check.

#2 Michael Rys on 1.28.2004 at 12:33 AM

I think Don Box's article got it right. XQuery/XPath and their data model use XSD as the basis for their type system. Yes, XML documents can be typed based on many different XSDs. But once you decide on one, the generation of the PSVI will provide you with instance type information and the schema import functionality in XQuery also gives information about the static types.Having wildcard sections and "lax" and "skip" validation that will provide untyped and "partially" typed instances is in my opinion one of the contribution of the document/XML/XSD/semistructured data world to future type systems. The ability to move from typed to untyped data and back is one of the big features of most of the XML-based technology.Continued at sqljunkies.com/.../871.aspx

#3 Steve Maine on 1.28.2004 at 12:42 AM

Very true. But once you've decided on an XSD type, you're inside the service boundary and free to think of things in any way you want.I'm still convinced, though, that what you're describing is solely within the service boundary, which is where nominal types (defined in any type system, even XSD) can play freely. Check out Part II of this post:http://hyperthink.net/blog/PermaLink,guid,dfaf9b81-fa67-4abb-a83a-4eb0f57118ac.aspx