 Tuesday, August 24, 2004
I meant to continue the discussion on Software Factories with a response to Darrell’s post, but I didn’t get to it before I left on vacation.
Darrell’s argument is that Software Factories seem like they would work for only for narrow vertical market applications, thereby limiting their appeal to the mass market. I both agree and disagree with him; yes, the idea is most compelling when applied to narrow domains but no, I don’t think that’s a bad thing. In fact, I think constraining Software Factories to a specific domain it a critical factor to the success of such an endeavour.
The problem solved by Software Factories (and programming in general) is essentially one of modeling. A programming language is effectively a modeling language, and I think that attempts to differentiate between the two are ultimately unsuccessful. One thing that we’ve learned about modeling languages is that there is a tradeoff triangle between efficiency of representation, generality, and precision. That is, it seems possible to build a modeling language that can efficiently describe a large number of disparate concepts imprecisely. It’s also possible to construct a language that can efficiently model a small set of concepts with a high degree of fidelity. Constructing an uber-language that efficiently achieves both generality and precision seems to be beyond our reach at this time.
A model is a system in one formal domain that describes a roughly equivalent system in another formal domain. I say “roughly equivalent” because depending on the systems in question there may significant differences between the two. However, these differences are usually unknowable (e.g. in physics, we really have no idea what’s “really” going on at the subatomic level, but we’re satisfied enough with the mathematical description of quantum behavior to equate mathematics with reality) or irrelevant to the task at hand (e.g. relativity does not fully describe the world because it does not include quantum effects that we know to exist, but it seems to be “good enough” so long as we only think about it when we happen to be going very, very fast). In software, we build models with the specific intent of disregarding certain facets of the underlying system as being irrelevant – we use models to reduce complexity.
A good software model is both efficient and precise. An efficient model expresses equivalent concepts from the underlying system, but does so using a compressed representation. Efficient models provide productivity gains because they allow you say the same thing, but with fewer words. The precision of a software model is defined by the ability to convert between the model and its equivalent system via a set of automated deterministic transformations without losing semantic content (this is usually a directed process – going from “most efficient” to “least efficient” is easy, but going the other direction is impossible to do with full fidelity). The software world is already full of models that are both efficient and precise. Consider a C++ program, for example. A program written in C++ is an efficient and precise model for an equivalent system implemented in C. Similarly, a C program is an efficient and precise model of an equivalent system in assembly, which is itself a model of a system in machine code (and, as Ian Griffiths once pointed out to me, it’s possible to continue this line of reasoning all the way down to quarks and beyond). You could also extend this in the other direction and consider a C# program as a model for an equivalent system in C++ (the theoretical compiler that implemented this transformation would be required to emit the entire CLR during its code generation phase – the fact that we can precompile the CLR is simply an optimization and does not alter things from a modeling perspective).
The big question is, from a modeling perspective, what does the layer of abstraction on top of C# look like? I think it’s this question that the Software Factories guys are trying to solve. I don’t think anybody knows exactly what this type of model will look like, but I think one major feature is efficiency and high precision with respect to an equivalent system implemented in C# (or any other language targeting the CLR). That is, a model in our theoretical modeling language should be compilable down to an equivalent representation in our CLR language of choice. And herein lies the challenge, because the inherent tension between efficiency, generality, and precision comes into play.
There are a few approaches that probably lead to failures:
- Favor efficiency, accept generality and compromise precision. This is the approach that standard UML takes. The UML metamodel is flexible enough to allow it to describe virtually any system out there. However, from a formal semantic perspective, the resultant model is gooey and formless which makes it very difficult to compile into anything useful. At best, we can get some approximation of the underlying system via codegen, but even the best UML tools only generate a fraction of the code required to fully realize the model. The lack of precision within the model itself requires operating in both the model domain and the system domain, and implies that some facility exist to synchronize the two. Thus, the imprecision of UML forces us to solve the round-tripping/decompilation problem with 100% fidelity, which is generally difficult to do.
- Favor generality, accept efficiency, and compromise precision. I put Executable UML here. The general applicability of the UML metamodel is still here, along with its characteristic formal ambiguity. The action language proposed adds a layer of expressiveness at the expense of efficiency of representation, but the result is still disconnected from the underlying system due to the imprecision inherent in the model. There’s still the roundtripping problem to solve, which casts doubt on the relevancy of the solutions in this space. There’s also some question as to the overall computablility of xUML (is xUML Turing-equivalent?) – it’s possible that even while favoring generality, they have already sacrificed more than they should have as UML seems to be generally meaningless.
- Favor precision, accept generality, compromise efficiency / Favor precision, accept efficiency, compromise generality. I think you can make the argument that any modeling language that favors precision is effectively no modeling language at all, as it is basically just the original system again. I think solutions in this space are degenerate and discount them.
And a couple that might take us down the road to success:
- Favor generality, accept precision and compromise efficiency. Developing a modeling language that can express the same set of concepts as C# is a challenge, because C# is Turing-complete. Thus, any modeling language that wishes to preserve the generality of C# must also be Turing-complete. This implies a certain degree of notational richness which directly reduces the efficiency of the language. An example of a language that makes this tradeoff is Cw – it models certain concepts (synchronization and hierarchical/relational data) extremely precisely and more efficiently than C#, but on the whole it is not as compact as you would expect a modeling language to be. The goal here is to provide strategic efficiency improvements in certain targeted domains, not to provide an order of magnitude improvement across the board. Languages in this space preserve the generality of the underlying system domain and guarantee that any concept expressible in the underlying system is also expressible in the modeling language.
- Favor efficiency, accept precision, and compromise generality. This, I think, it the sweet spot for Microsoft’s vision of Software Factories. Here’s why: the classic problem faced by modeling languages is Turing equivalency. How do you model a language that is Turing-complete in one that’s not without sacrificing something? The answer is: you don’t. You can either make the modeling language itself Turing-complete (which sacrifices efficiency) or you can limit the scope of the problem by confining yourself to modeling only a specific subset of the things that be expressed in the underlying system domain. Within that subset, it might be possible to model things extremely precisely, but that precision can only be gained by first throwing out the idea that you’re going to be able to efficiently and precisely model everything.
Taking the idea of Software Factories out of the theoretical world and putting it into play in the enterprise application space is a nontrivial problem. Given that we have to choose a subset of all possible problems to model, what should this subset look like? Certainly, dividing applications by vertical market is one way of partitioning this space – you could conceivably build a modeling language for financial services apps, for example. However, I think it would be more valuable to partition according to structure, not necessarily function. By identifying structural similarities within enterprise apps we can begin to distill patterns, and from these patterns we can create a language for modeling these patterns efficiently without compromising precision unless absolutely necessary. When such a compromise is unavoidable, we must look for ways to factor that out of the model completely, so developers must only operate in the system domain or the model domain but never both at the same time. If we do that, we’ll have circumvented a lot of the problems encountered by previous attempts to raise the level of abstraction in software development.
 Monday, August 23, 2004
Back in Seattle from what has become my semiannual trip to Vegas. I took the day off to ease the transition out of the hyperarticulated Vegas reality, and it’s proving to be a wise decision. Context switches of this magnitude need to be taken slowly.
I went with the same group of old college friends from my February trip. We’re pretty geographically distributed now, so answering the casual “So, where are you guys from” question is a little bit complex. The full answer is “Seattle, San Francisco, New York, Chicago, and Madison, Wisconsin” but that usually gets condensed down into a simple “all over”. Given the routes that we’ve all taken since school, we don’t really get to see each other very much – Vegas provides the perfect excuse to get back together again. Seeing all of them again is a good reminder that although we live in different places now, the important stuff doesn’t really change.
Thanks to my friend Meghan, who has what can only be described as “the hook up”, we stayed at Paris this time instead of Bally’s. It’s a great hotel – the rooms are nice, the atmosphere is great, and the staff is wonderful – but the best thing about Paris is the location. It’s right in the middle of the strip, so it’s a great “home base” for various excursions throughout the weekend.
Friday night we had tickets to the new show at Paris, We Will Rock You. This show is what happens when you take Queen’s greatest hits, wrap a loosely constructed quasi-futuristic plotline around it and turn it into a Broadway show – very much Queen’s answer to Abba’s Mamma Mia. I was a little skeptical when Mike suggested this one, but it turned out to be really good! You can’t argue with music, and the show had a great blend of humorous dialogue and fantastic production value that made it a really solid show. If you like Queen’s music, you can’t go wrong with this show.
Of course, we spent some time at the tables too – overall, I did OK. For me, when it comes to Vegas gambling, it’s not so much about actual wins or losses as it is about tracking to plan. I played all weekend and managed to come in under budget, so that’s a win in my book. One minor highlight: I introduced a couple of my friend to craps at the Mirage, and we all managed to walk away from that table up about 200 bucks a piece. Not bad for their first roll! Blackjack, though, was less kind to me – Rhonda, the pagan goddess of Dealing Me 13 On Every Freaking Hand, was looking in my direction most of time and the results were not so good. But it was all in good fun – no harm, no foul.
In short, good times were had by all.
 Thursday, August 19, 2004
Aaron has a great
post on the realities of the standardization process with respect to XSD
and RelaxNG. I think Tim Ewald’s motivations
are pure and the conclusion
he comes to is correct on its technical merits, but Aaron’s point
about the value of industry consensus is well taken. Like it or not, we’re
stuck with XSD in all of its bloated, needlessly complicated glory. It may not
be perfect, but it is common.
I think the problems with XSD stem more from the standardization process itself
than from anything else. XSD is a beast born of compromise, the result of many
competing motivations making sacrifices to produce an end result that works but
doesn’t make anyone really happy. It’s a good example of what
happens when you try to standardize first and implement later.
I think the industry has learned some lessons, though. The standardization
approach being taken by the WS-* specs is much different than the approach
taken by XSD. With WS-*, there are small groups of influential companies
working together to bake these standards via real-world implementations. This
allows them to iterate on the spec much faster and incorporate lessons learned
in the real world without taking on the organizational overhead of a large
standards body during the early phases of development. It’s a better
approach, I think, and it seems to be working.
After blog
mint:
With
that, I’m going on vacation for a long weekend. See y’all next
week.
 Tuesday, August 17, 2004
I think
it is time that I get something off my chest in a public forum: I hate sushi.
To be fair, I’ve never actually tried real sushi. I’m basing these
feeling off of a general dislike for some fish I had once, with the full
knowledge and realization that I very possibly may be depriving myself of
something wonderful. However, I highly doubt that I would like sushi were I to
try it. I’m secure in my sushi hatred, based on nothing more than vague
assumptions and general conceptual dislike of raw fish.
Objectively,
I recognize that my sushi-hatred is both irrational and unjustified. I will,
however, put forth that it is necessary.
I
think that bigotry is an unfortunate side effect of the human condition.
Unfortunately, it’s within our nature to hate things and although we try
we cannot transcend these baser instincts. There’s a certain amount of
irrational hatred stored up in each and every one of us, hardwired into our
genetic code. While I don’t think we have the ability to entirely
eliminate our natural hate, I think we do have some choice as to its target.
Thus, I choose to direct mine towards sushi, because hating sushi is much
better than hating other people.
If
I have to hate something, I might as well hate the most innocuous thing
possible. Thus far, it seems to be working.
Harry points to Jack Greenfield’s article in the MS Architect’s Journal that offers some more detail on the “Software Factories” concept.
Reading this article got me thinking about one of my favorite companies, Timbuk2. Timbuk2 has built a pretty successful business selling customized messenger bags over the internet (they’re great bags, BTW – I have 2). Buying a bag from Timbuk2 involves selecting the general size and type of bag from a list of preset options, and then selecting from a range of options (color, fabric, padding, etc) to come up with a bag that meets your own unique needs. And they can ship them out damn fast, too – customizations and all. In fact, getting a custom bag from them doesn’t really take longer than getting one of their ready-made bags – from my experience, the time cost of the operation is dominated more by billing and shipping than by manufacturing.
How can Timbuk2 turn around custom bags so fast? They leverage micro-customization and economies of scope. All their bags are made of the same constituent parts and are assembled according to the same general pattern. Options that don’t fit the pattern or are built from parts not readily available are discounted outright. For example, you can get any color and fabric combination available, but you can’t get a messenger bag with two shoulder straps (that’s a backpack, and they don’t sell those). By constraining the domain of possible customizations, they’ve removed a bunch of variables from their manufacturing process. Thus, they can focus on streamlining the various combinations of options they do support and find ways to optimize their manufacturing process accordingly. As a result, building a red/silver/red bag with a pencil pouch and interior divider isn’t much different than building a black/orange/black bag with a shoulder pad. Since the general template is the same for all bags and the parts themselves interact with each other in deterministic ways, the overall cost of customization is very small.
Timbuk2 has obviously learned a lot about building messenger bags. I think they teach us a lot about building software, too.
 Monday, August 16, 2004
 Friday, August 13, 2004
I confess,
I don’t do a ton of VB.NET coding. Check that – I don’t do
any VB.NET coding. I’m strictly a curly-braces-and-semicolons kind of
guy. However, I’m finding myself responsible for a code generator that
needs to be bi-lingual between C# and VB.NET. As a result, I’m sort of
forced to code VB by proxy…
Anyway, I wanted to toss out a question to the more VB-minded readers of this
blog: what are the best practices around setting the RootNamespace property in
the project options? On large projects, do you set this to something and omit
namespace declarations from the class files, or do you leave it blank and
declare namespace membership explicitly, a la C#?
Personally, the whole RootNamespace feels like pure evil to my C# mind. The
thought that my class might actually not be in the namespace it declares seems
nefarious at best. I can see how this would be a valuable feature to a VB6
developer who might be namespace-phobic, but I’m curious how this feature
gets used on enterprise-level projects.
 Wednesday, August 11, 2004
BoingBoing
points to a rather amusing case study on why you shouldn’t rely on query
string variables to control the presentation of your web page. With just a
little bit of hacking on the request URL, it’s possible to turn this
(the official page) into this
or even this.
Warning: those links are what would be described in polite conversation as “questionably
tasteful”.
Could somebody be having similar fun with your website?
 Tuesday, August 10, 2004
John Evdemon points out
that WS-Addressing has been submitted to the W3C.
This is good; it’s the first step on the long road to industry-wide
acceptance as a standard. I don’t expect this road to necessarily be
easy, given the competing submission of WS-MessageDelivery from Oracle
et al. David Orchard has a good comparison between the two specs here.
Any
idea why Sun is a co-author on both of these specs? You’d think they’d
just pick one and get on with life…
 Monday, August 09, 2004
Resharper has been making
its way through my team like a virus. I’ve been using it for a couple of
months now, and I think I can say I’m sold. Ironically, it was Whidbey
that caused me to really latch on to Resharper. I sort of got addicted to the
IDE enhancements in Whidbey and felt very naked when switching back to VS2003. Resharper
is the one thing that still allows me to use VS2003 and not feel completely
disappointed.
There
are many cool features that make Resharper worth it: advanced
syntax highlighting, refactoring,
Intellisense
add-ons, the usual stuff that’s talked about when IDE addons are
discussed. However, there are a couple things that I find particularly
compelling. First, the syntax highlighter crawls your XML documentation looking
for invalid “see cref=” and “param name=” tags. Normally,
these won’t be caught until compile time, and even when they do it can be
hard to locate the source of the error. Resharper highlights mistake in bright
read, so you can easily see what XML doc elements are in error. This has proven
to be a real timesaver for me.
The other feature I really like is Code
Reformatting. Everyone has their own style when it comes to formatting
code. For instance, I’m inclined to write void Foo( int bar ), while
others on my team write void Foo(int bar). I don’t know about you, but
these little stylistic things are practically hardwired into me – I don’t
consciously think about them, they just happen. Since everyone tends to have stylistic
instinct that are *just slightly different* than everyone else’s,
you can end up with a code base that is formatted inconsistently. Not a huge
deal if you’re only shipping binaries, but in our case the source code is
a primary deliverable and we want it to look nice and consistent. Rather than
forcing everyone to change their style to conform to a standard, we just
configure a default set of Resharper formatting rules and periodically run them
on the whole solution. It’s proven to be a big win because it removes
distractions, keeps our code looking nice, and doesn’t require anyone to
change their own hardwired formatting rules.
Resharper’s
good stuff. Go download it and check it out if you haven’t already.
 Wednesday, August 04, 2004
Diving deep into the implementation of streams in Cw has given me a new appreciation for the power of iterators.
Wes has posted some really cool stuff about how iterators can be applied to problems other than simple iteration. I haven’t quite parsed all of it yet, but his ideas are pretty interesting. Raised my eyebrows a bit, certainly.
Update: fixed the link
 Tuesday, August 03, 2004
Darrell and Sam both posted a couple of good
comments in response to my rather rhetorically charged post on Everything’s
a Platform.
Darrell’s point, if I may summarize, was that unbounded extensibility is
expensive. I agree, and I’d even go further – unbounded extensibility
is not only expensive, it’s a really bad way to build software. Too many
extensibility points lead to too many moving parts, and eventually you get to a
point where you can’t change anything without breaking something. Instead
of unbounded extensibility, you want clearly defined abstraction barriers. You
want to make a clear delineation between “internal” and “external”,
so that the two can vary independently. And you want to keep these
extensibility points to a strategic minimum, so that you expose only the
interesting stuff while hiding the mundane details. I think that if you adopt
services as a metaphor from the beginning, your architecture will evolve in
such a way as to make identifying these kinds of strategic extension points
more natural.
Once you’ve identified the boundaries within your application, you have a
whole range of options on how you want to cross them. I don’t think that
identifying a service boundary within an app necessarily means that you need to
immediately go through a Big Design Up Front phase to figure out how to cross
it. Depending on the goals of your project, you might choose to acknowledge
that the boundary exists and let the interface evolve somewhat organically. Simply
knowing that a boundary is being crossed will influence the design positively
to some degree, and you can always come back and spend later iterations
polishing up the wire contract should such a need arise. My argument is that
you’ll see benefits from drawing those boundaries earlier rather than
later, not that you have to scope out every detail of those interfaces Right
Now.
To Sam’s point about YAGNI: the YAGNI principle is about being strategic
about your development, so that you don’t waste cycles solving hard problems
you don't really need to solve. If those problems are easy, calling YAGNI is
less important. It’s in this area where I think service-oriented tools like
Indigo can really add value. Right now, implementing a strong service boundary
is more work than it needs to be. There’s still too much focus on
plumbing details if you need to think about security and reliability. Robustly
crossing a service boundary via web services today involves a non-trivial
amount of work – even if your app architecture is already built with
services in mind. As such, YAGNI is a perfectly reasonable response to service
construction today. However, as the tools progress, the amount of plumbing work
required to implement secure, reliable, transacted services will diminish
greatly. When your tools make “flipping the switch” between
spanning a service boundary in memory and spanning that boundary across a
network almost trivially easy, calling YAGNI is much less compelling. In the
time it takes you to explain why YAGNI, it’s already implemented J If the tools are solid, the controlling
question around exposing a service to the outside world is not ‘why?’
but ‘why not?’.
Of course, tools will only help you if you’ve already made the
architectural commitment to use services as the organizational metaphor for
your application. Again, my argument is that there is value to be had in
adopting this architectural metaphor early in the lifecycle, even if you don’t
fully realize the underlying service plumbing until you have a concrete
motivation to do so in the future.
 Monday, August 02, 2004
Alan
Perlis once said “every program has (at least) two purposes: the one for
which it was written, and another for which it wasn’t”. I take that
to mean that if you spend some time writing a non-trivial application, the odds
are good that there are any number of other uses for that functionality beyond
the primary use case you had in mind when you originally wrote the software. The
earlier you recognize this fact in the design process, the more valuable your
software will be in the long term.
The line between ‘application’ and ‘development platform’
is blurring. Several years ago, the distinction was very clear – Windows was
‘the platform’, and you built ‘applications’ on top of
it. This is quite different from the current state of the world – what is
Word, exactly? Is it an app or a platform? How about Visual Studio? On one
hand, those packages are applications that deliver strong usability out of the
box. On the other, they provide a rich design surface on which to build your
own apps. You can take bits and pieces of them and mix them together with your
own code to create solutions to problems that weren’t even thought of
when the original software was written. There’s clearly power in being
able to integrate easily.
In 1982, Perlis also wrote: “every program is part of some other program,
and rarely fits”. It’s remarkable how accurate that statement still
is today, even though there have been so many different attempts to solve the
reuse problem. Service orientation is another iteration in the grand quest to
make all the pieces fit together a little bit better.
SO != Indigo. SO != WS-*. SO != <insert name of technology X>. SO is a
perspective on application design. It’s about drawing explicit boundaries
around autonomous functionality. It’s about specifying your data in a way
that’s not tied to the bitwise interior of any one runtime. It’s
about the implicit acknowledgement that every non-trivial application is a
development platform.
If you’re going to take the time to implement an application, chance are
good that sometime in the future someone else will appreciate not having
to take the time to reinvent something you’ve already written. If you’re
app does anything interesting at all, there will come a day when someone else
wants to integrate with it. The decisions you make now can make this persons
life very easy or very hard – it’s your choice. By thinking about
service orientation now, you provide the future opportunity for someone else to
use your bits in new and powerful ways.
To quote Doug and Don, there is only one application and it’s still being
written. On a long enough timeline, every application will interact (directly
or indirectly) with every other program. It’s Six Degrees of Kevin Bacon,
but for software. Integration is inevitable – are you going to make that
easy or hard?
 Saturday, July 31, 2004
COmega (or Cw, as most people seem to be referring to it) is proving to be the source of all sorts of interesting things. If you’re interested in compilers or programming languages, Cw has layers upon layers of new and nifty stuff.
Frankly, I’m not sure what’s more interesting to me: the code that Cw lets you write, or the code that Cw writes for you. I started looking at Cw output in an effort to figure out what was causing the bug that David Findley noticed. The bug David found dealt with lifted member access over a stream of Int32’s. I’ve slightly modified his example to work over a stream of strings, for reasons that will become clear later. However, before we get to that, there are some more fundamental ideas that are worth exploring.
Basic Streams in Cw
Let’s say that you have a function that returns a string* (a stream of zero or many strings), like so:
public string* FromTo( char a, char b ) {
for( int i=(int)a; i <= (int)b; i++ ) yield return ((char)i).ToString(); }
Using this function, I can say something like string* letters = FromTo( 'A', 'E' ); and end up with a variable whose logical contents are {“A”, “B”, “C”, “D”, “E” }. I say “logical contents” because the contents of the stream are produced at consumption time, not creation time. This is sort of counterintuitive at first, as the body of the FromTo() method generally doesn’t execute when you call it . In order to get the yield return statement to run, I need to iterate over the stream with some sort of foreach statement. For example, I could say:
foreach( string s in letters ) Console.Out.Write( s );
and get ABCDE printed out on the console window.
If all of this seems sort of familiar, it’s because iterators in C# 2.0 also use the yield return contextual keyword to do the same sort of thing. I think they also use the same general implementation so I believe that most of what I’m about to say will also hold generally true for C# 2.0 iterators as well.
What’s really happening inside of FromTo()?
Running the compiled output of the FromTo() function through Reflector reveals the following code (I’ve cleaned up the variable names a little bit and removed some decompiler spam for clarity ):
public IEnumerable FromTo(char a, char b) { fromToClosure closure = new fromToClosure(); closure.this value = this; closure.a = a; closure.b = b; return closure; }
What we have here is the creation of a “lexical closure” – an object that effectively takes a snapshot of some stuff that currently lives on the stack and sticks that state on the heap so it can be referred to later. In this specific case, the code is just capturing the current values of the two method parameters and storing them in fields of the same name on the closure object. Once this is done, the function returns the closure back to its caller. Thus, when you call FromTo( 'A', 'E' ), the net result is an concrete implementation of IEnumerable that has two character fields – one whose value is ‘A’ and one whose value is ‘E’.
Implementing streams as enumerable closures
The fromToClosure type provides implementations of both IEnumerable and IEnumerator, allowing the contents of the closure to be traversed using foreach. The IEnumerable.GetEnumerator implementation is simple:
IEnumerator IEnumerable.GetEnumerator() { return ((fromToClosure) base.MemberwiseClone()); }
That is, it just returns a clone of the “world as it was” at the time the closure was created.
According to the IEnumerator pattern, the function of IEnumerator.MoveNext() is to return the “next value” in the stream. Thus, we’d expect the guts of the yield return statement from the original FromTo() source to end up in fromToClosure.MoveNext(). Looking at the decompiled output of this method reveals that this is exactly where that code went (again, I cleaned up the decompiler output and numbered the lines a bit for clarity).
public override bool MoveNext() { char currentCharacter; switch (this.current Entry Point) { case 0: { (1) this.i = ((int) this.a); break; } case 1: { (2) this.i++ (3) if ( this.i <= ((int) this.b)) { (4) currentCharacter = ((char) this.i); (5) this.current Value = currentCharacter.ToString(); (6) this.current Entry Point: = 1; (7) return true; } } } (8) return false; }
If you stand back and squint a bit, it’s easy to see the original for loop in there. The code that started out as
for( int i=(int)a; i <= (int)b; i++ ) yield return ((char)i).ToString();
has been cracked open by the Cw compiler and turned into this implementation of MoveNext(). The initializer statement can be found at (1) – although it’s now initializing a private field on the closure object instead of a local variable. The original conditional statement is replicated almost verbatim in (3), and the increment statement now lives at (2). The body of the loop live at (4) – (6), and the return statements at (7) and (8) ensure that MoveNext() properly returns false when the enumerator has reached the end of the stream.
Stepping through this code a couple of times with a mental debugger should illustrate a significant behavioral difference between streams and a standard foreach: iterators have “lazy list” semantics because the contents of the iteration are not produced until MoveNext() gets called – which might be long after the original call that created the iterator.
Member lifting and implicit iteration
Cw has a really interesting language feature called “member lifting”, which allows for an implicit iteration over each element in a stream. Remember that the outcome of FromTo( 'A', 'E' ) is the stream {“A”, “B”, “C”, “D”, “E”}. Using member lifting, we can transform that stream into the modified stream {“a”, “b”, “c”, “d”, “e”} by saying FromTo( 'A', 'E' ).ToLower(). Given this statement, the Cw compiler will generate code to lift the ToLower() operation out and apply it individually to all elements in the stream returned by the call to FromTo().
So, how does this work? We can get some clues into the implementation from the type system. The static type of FromTo( 'A', 'E' ).ToLower() is string*. This makes sense because an individual call to ToLower() returns a string and we’re aggregating a whole bunch of those calls. Since we’ve already seen how streams are implemented as closure types, it’s expected that this statement would compile out to an instantiation of a closure type. And since there are two method calls in the original statement, we would expect that there would be two calls two methods implemented on this closure type. Looking at the decompiled output, we see that this is indeed the case.
First, here’s the input function as written in Cw:
public string* SimpleLift() { //The ToLower() call lifts correctly return FromTo( 'A', 'E' ).ToLower(); }
And here’s what the compiled version of this function looks like in Reflector:
public IEnumerable SimpleLift() { toLowerClosure closure = new toLowerClosure(); closure.thisValue = this; closure.streamToLower( closure.thisValue.FromTo('A', 'E') ); return closure; }
The invocation of FromTo() is fairly obvious – it’s just capturing the current value of the this pointer in the closure object, and then using that to invoke FromTo(). What’s more surprising is the implementation of the streamToLower() function on the generated closure class. That actually looks like this:
IEnumerable streamToLower(IEnumerable Collection) { toLowerClosure.foreachClosure closure = new toLowerClosure.foreachClosure(); closure.Collection = Collection; return closure; }
Hey, wait a minute! Instead of doing any work, the dang thing just returned another closure! When does work actually get done?? This behavior makes sense – member lifting is accomplished lazily using iterators, and the common implementation of iterators and streams has already been explored. Calling SimpleLift() does nothing except instantiate a bunch of closures and return an IEnumerable implementation. As with all streams, the work won’t really get done until someone iterates over the stream and causes MoveNext() to get called.
The implementation of MoveNext() that we're interested in lives on the generated toLowerClosure.foreachClosure type. In Reflector, that implementation looks like this:
public override bool MoveNext() { IEnumerable fromToResults; switch (this.current Entry Point) { case 0: { (1) fromToResults = this.Collection; if ( fromToResults == null) { return false; } (2) this.foreachEnumerator = fromToResults.GetEnumerator(); if ( this.foreachEnumerator == null) { return false; } } case 1: { (3) if (!this.fromToResults.MoveNext()) { return false; } (4) string text2 = this.foreachEnumerator.Current; (5) this.currentValue = text2.ToLower(); this.current Entry Point = 1; return true; } } }
This MoveNext() is a little more complex because it does two things – due Cw’s lazy evaluation rules, both the FromNext and ToLower productions need to happen on every iteration of the loop.
Line (1) obtains the results of FromTo() that were captured when the stream was created. Although this variable is typed as IEnumerable, it has a concrete type identity of fromToClosure(which is totally reasonable, since that’s what FromTo() actually returns). Line (2) obtains an enumerator, causing the closure to return a clone of itself and its captured state.
Line (3) triggers the production of an element of the FromTo stream. The details of this operation have already been explored in detail, so there’s no need to rehash them hear. The interesting thing to note is that we’re just now getting around to executing yield statement inside of FromTo(), even though it seems like we called that function an eternity ago.
Assuming that the FromTo stream wasn’t at the end and actually yielded an element, line (4) retrieves the produced value and line (5) finally gets around to calling ToLower() on it. This value gets returned all the way back to whomever is foreaching across the stream returned from SimpleLift(), and the whole process repeats until there are no more elements left in the stream to process.
Why bother with lazy lists?
I’m sure there are a number of Haskell programmers out there who could elaborate on the virtues of lazy lists far better than I. However, one benefit of the lazy list pattern that particularly strikes me is the memory cost of this method compared to traditional approaches. Non-lazy (motivated? Protestant?) lists have an O(n) memory cost associated with processing them, because all nodes in the list are memory resident. With the lazy list pattern, because items are produced on demand there’s only one element in memory at a given time. This O(c) characteristic can be very useful when dealing with large numbers of large things.
Ok, so what’s causing David’s bug?
Popping several stack frames back to the issue that originally triggered this post: if member lifting seems to work in the general case, what’s causing the behavior that David noticed? If I can lift ToLower() over a stream, why can’t I lift ToString()? Sadly, after all of this, I don’t have a root cause analysis. But I do have an observation:
public void BrokenLift() { //The ToString() call lifts incorrectly FromTo( 'A', 'E' ).ToString(); }
Does not compile into the standard closure pattern that ToLower() does. Instead, we get this:
public void BrokenLift() { Unboxer.ToObject(this.FromTo(65, 69)).ToString(); }
Rather than generating a closure to wrap the implicit iteration, the Cw compiler is doing something decidedly different. Looking at this output, I’m pretty sure the incorrect behavior is caused by a bug in the Cw compiler rather than a confusion as to the expected results. If the Cw developers were for some reason interested in prohibiting member lifting for methods inherited from Object, I’m guessing they would have simply emitted “this.FromTo(65, 69)).ToString()” and skipped the redundant call to the unboxer.
After blog mint: Check out this interesting presentation on the Common Compiler Infrastructure that Cw makes use of: http://research.microsoft.com/Collaboration/University/Europe/Events/dotnetcc/Version2/Crash%20Course.ppt
 Wednesday, July 28, 2004
For
those of you who use the Application Blocks, the MS Patterns and Practices team
announced something today that should have you salivating: Enterprise Library
will be coming in early 2005. From the announcement
on the PAG site:
Enterprise
Library 1.0 will bring together new releases of the most widely reusable blocks
into a single integrated download.
The major themes of Enterprise Library are:
|
•
|
Consistency – all Enterprise Library Application
Blocks will feature consistent design patterns and implementation approaches,
configuration mechanisms and tools, documentation, samples, deployment and
operational processes.
|
|
•
|
Extensibility – all blocks include defined
extensibility points which allow developers to customize the behavior of the
blocks by ‘plugging in’ their own code. Enterprise Library will
also ship with guidance to assist developers with building their own blocks
that integrate with the Library.
|
|
•
|
Ease of Use – Enterprise Library will offer
numerous usability improvements, including a graphical configuration tool,
simpler installation, and clearer and more complete documentation and
samples.
|
|
•
|
Integration – the Enterprise Library Application
Blocks are designed and tested to work well together. It will also be
possible to use the blocks individually, thus catering for a range of
different usage scenarios.
|
The initial release of Enterprise Library will include Application Blocks that
support the following scenarios: data access, exception handling, caching,
configuration, logging, security and cryptography. In the future the library
will be expanded to include additional Application Blocks that support a wider
range of scenarios.
I’m
pretty excited about this. I’ve been tangentially involved in this
project for a while now, and I think the final product will really drive a lot
of value to enterprise customers on the .NET platform. Plus, it’s always
nice to see my company get some props:
The
first version of Enterprise Library is being developed by Microsoft in partnership
with Avanade. In recognition of Avanade’s role in helping build the
foundation of this deliverable, this version will be available to
Avanade’s enterprise customers in late 2004 prior to the general release.
It will then be released to the general public in early 2005
Not
much else that can be said in this forum at this time. However, you can get a
sneak peak over on the GotDotNet workspace
right now.
© Copyright 2008 Steve Maine
Theme design by Bryan Bell
newtelligence dasBlog 1.9.7174.0  | |  | Page rendered at Friday, May 16, 2008 8:13:53 PM (Pacific Standard Time, UTC-08:00)
Reset | BlogXP | business | calmBlue | Candid Blue | dasBlog | dasblogger | DirectionalRedux | Discreet Blog Blue | Elegante | essence | Hyperthink | Just Html | MadsSimple | Mobile | Mono | Movable Radio Blue | Movable Radio Heat | nautica022 | orangeCream | Portal | Project84 | Project84Grass | Slate | Sound Waves | Tricoleur | useit.com | Voidclass2
|
|