Zen of the Web Programming Model (part 2)

Part 1

I find that I often get a chuckle out of presenting seemingly obvious information in non-obvious ways. That basic juxtaposition and fundamental element of irony goes a long way towards explaining the inner workings of my somewhat obtuse sense of humor (well, that and fart jokes). Anyway, I think that's what I had rolling around in my head when I creating the following slide about the relative popularity of the various HTTP methods:

Relative importance of HTTP methods 

The point being, of course, that GET and POST are by far the most common methods on the Web today (Side note -- I wonder how different this graph would be had early versions of HTML forms done PUT and DELETE bindings instead of just GET and POST). Which brings us to the second element of the Web programming model -- how we think about verbs and actions on the Web.

"View It" + "Do It"

when it comes to verbs,
everyone knows what GET means
all else is obscure

On the Web, 80% of the world is GET. The rest is relative chaos.

GET is special because, by and large, everybody knows what it means. It has a distinct notion of "retrieval" and it allows clients to make certain assumptions about the safety and idempotency of the request. Safety means that if I issue a GET request to you, it won't inadvertently cause some random side effect to happen as a result (or if it does, the sever will never blame the client for causing those side effects). Idempotency means that the side effects of executing the request multiple times will be the same as the side effects induced by executing the request once -- for the case of GET, because it's safe, it will be zero side effects in either case. Practically, this means I can usually pound on the "Reload" button in my browser without fear. Why am I not fearful? Because GET has bounded semantics.

It's funny to observe how the Web punishes those who violate the bounded semantics of GET. I remember a couple of years back the havoc caused by the Google Web Accelerator. This was a seemingly cool idea -- a little local web proxy/crawler that would prefetch sites you often visit so they load faster. It accomplished this by looking at the sites in your history and following walking the link graph on every page. The funny part came about two days after the thing launched -- lots of people started waking up to find that someone had come along during the night and made all their wiki content go away. In reality, the GWA had simply come along and issued an HTTP GET request to all the DELETE PAGE URI's in the wiki. D'oh. This wasn't the GWA's fault -- rather, it was the fault of web sites that were stepping outside the bounded semantics of GET. Things break when you don't play by the rules.

When you play by the rules, all sorts of interesting things are possible. One of the major reasons the web is scalable is because of distributed hierarchical caches. The reason these things work (why it doesn't matter if your GET request makes it all the way back to the server) is because bounded semantics of GET imply that the server wouldn't do anything interesting with the request once it gets it anyway. Intermediate nodes know they can safely return a cached response precisely because everybody has a common understanding of what GET means and what GET implies. Common understanding == power. 

Compare the bounded semantics of GET to the relatively unbounded semantics of everything else. PUT and DELETE are bounded in the sense that they are unsafe but still idempotent -- although that notion of idempotency gets really fuzzy in the face of multiple concurrent writers. As for POST (which makes up 90% of the remaining 20%), well -- let's just say that there's lots of things POST should do if you read the spec but what a given POST actually does is really anybody's guess.

 

image

Once you step out of the world of "view it" and into the "do its", it gets really murky really fast. If you want to talk about the fundamental semantics that drive today's real-world web, there really are just two -- GET and INVOKE, "view it" and "do it". INVOKE is an abstract semantic that can be refined in many different ways for many different purposes.

[WebGet] and [WebInvoke]

You've been able to express both "view it" and "do it" semantics with WCF since the beginning. All you had to do was write a service contract that looks like this:

[ServiceContract]interface ICustomer{//"View It"[OperationContract]Customer GetCustomer()://"Do It"[OperationContract]Customer UpdateCustomerName( string id,string newName );}

Now if you actually went and exposed this contract over the BasicHttpBinding (which uses SOAP 1.1) and looked at a Fiddler trace of the underlying HTTP messages, you'd see that both of these things get sent out using HTTP POST. Why? At this point, it's mainly a SOAP thing. SOAP takes a shortcut and uses the unbounded semantics of POST to subsume the bounded semantics of GET. SOAP 1.2 does have the web method feature (thanks to folks like Mark) but it hasn't really taken off for various reasons. No, SOAP just happily uses POST for everything -- and in doing so, misses out on things like caching that could have potentially been done transparently had SOAP just used the same verb other people were using to mean the same thing.

It's not just a SOAP issue, though. If you take that contract and expose it over an endpoint using the WebHttpBinding + WebHttpBehavior (which WebServiceHost gets you for free, incidentally) you've gotten one step closer because WebHttpBinding doesn't use SOAP. It washes the SOAP off the message using MessageVersion.None. Try the above experiment in this configuration and your protocol traces will still show that GetCustomer is bound to the HTTP POST method. Why? Because the runtime can't read your method names. You have to provide some additional metadata to clue us in that you really want the GetCustomer method to be bound to the HTTP GET method. That additional method comes in the form of the [WebGet] attribute, which is new in the BizTalk Services SDK.

[WebGet] and [WebInvoke] let you control how individual operations get bound to chunks of the endpoint's URI space and the HTTP methods associated with those URI's. For example, adding [WebGet] and [WebInvoke] like so:

[ServiceContract]interface ICustomer{//"View It"[OperationContract][WebGet]Customer GetCustomer()://"Do It"[OperationContract][WebInvoke]Customer UpdateCustomerName( string id,string newName );}

Lets me do things like:

  • GET /GetCustomer
  • POST /UpdateCustomerName

[WebInvoke] defaults to POST but you can use it for other "do it" verbs too:

[ServiceContract]interface ICustomer{//"View It“ -> HTTP GET[OperationContract][WebGet( UriTemplate=“customers/{id}” )]Customer GetCustomer( string id )://"Do It“ -> HTTP PUT[OperationContract][WebInvoke( UriTemplate=“customers/{id}”, Method=“PUT” )]Customer UpdateCustomer( string id, Customer newCustomer );}

The Advanced Web Programming sample shows this off in more detail. 

That about wraps it up for verbs. I've got one more "meta" post queued up to cover formats and entity bodies, and then its on to code :)

Technorati Tags: ,
#1 Erik on 5.10.2007 at 12:44 PM

> let's just say that there's lots of things POST should do if you read the spec but what a given POST> actually does is really anybody's guessAre you talking about POST in RFC2616 (HTTP 1.1)?I'm not disagreeing with you -- I just wanted to know which spec you mean.Or maybe you mean that the spec says you post something "subordinate" relative the URI and everyone just ignores that concept.

#2 Steve Maine on 5.10.2007 at 1:06 PM

I'm referrng to 2616 here. That spec lists 4 example POST scenarios (quoting from section 9.5 for the spec lawyers out there):- Annotation of existing resources;- Posting a message to a bulletin board, newsgroup, mailing list,or similar group of articles;- Providing a block of data, such as the result of submitting aform, to a data-handling process;- Extending a database through an append operation.It's the third example that leaves the door wide open. Take that in combination with "The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI" and "The action performed by the POST method might not result in a resource that can be identified by a URI" and it becomes really hard to argue that POST means anything beyond "please do something".I think a lot of the more formal REST proponents would prefer that the spec tighter than it is, and that the curious third usage senario wasn't actually present in the spec. However, that's all a holdover from HTTP 1.0, which went so far (in Section 8.3) to explicitly say "A successful POST does not require that the entity be created as a resource on the origin server or made accessible for future reference". The semantics of POST were tightened up in HTTP 1.1 as far as they could be without making vast chunks of the working web suddenly uncompliant.The "POST == Append subordiante resource" is a view that arose in HTTP 1.1 as part of the attempt by Fielding et. al. to retrofit a more formal model onto the much less formal world of HTTP 1.0. Empricially, that was easier to do in the spec than in the implementations.

#3 Erik on 5.10.2007 at 8:23 PM

> the more formal REST proponents would prefer that the spec tighter than it isThe spec also says the Request-URI for POST is a resource meant to "handle processing of" the content whereas the Request-URI for PUT is the content's identity.So, POST was explicitly granted the role of catch-all and we are all safe to abuse it at will and without being smited by the Loads of the Internet.At least that's my thinking!

#4 Steve Maine on 5.10.2007 at 8:28 PM

Erik -- I wholeheartedly agree.