COmega (or Cw, as most people seem to be referring to it) is proving to be the source of all sorts of interesting things. If you’re interested in compilers or programming languages, Cw has layers upon layers of new and nifty stuff.
Frankly, I’m not sure what’s more interesting to me: the code that Cw lets you write, or the code that Cw writes for you. I started looking at Cw output in an effort to figure out what was causing the bug that David Findley noticed. The bug David found dealt with lifted member access over a stream of Int32’s. I’ve slightly modified his example to work over a stream of strings, for reasons that will become clear later. However, before we get to that, there are some more fundamental ideas that are worth exploring.
Basic Streams in Cw
Let’s say that you have a function that returns a string* (a stream of zero or many strings), like so:
public string* FromTo( char a, char b )
{
for( int i=(int)a; i <= (int)b; i++ )
yield return ((char)i).ToString();
}
Using this function, I can say something like string* letters = FromTo( 'A', 'E' ); and end up with a variable whose logical contents are {“A”, “B”, “C”, “D”, “E” }. I say “logical contents” because the contents of the stream are produced at consumption time, not creation time. This is sort of counterintuitive at first, as the body of the FromTo() method generally doesn’t execute when you call it . In order to get the yield return statement to run, I need to iterate over the stream with some sort of foreach statement. For example, I could say:
foreach( string s in letters )
Console.Out.Write( s );
and get
ABCDE
printed out on the console window.
If all of this seems sort of familiar, it’s because iterators in C# 2.0 also use the yield return contextual keyword to do the same sort of thing. I think they also use the same general implementation so I believe that most of what I’m about to say will also hold generally true for C# 2.0 iterators as well.
What’s really happening inside of FromTo()?
Running the compiled output of the FromTo() function through Reflector reveals the following code (I’ve cleaned up the variable names a little bit and removed some decompiler spam for clarity ):
public IEnumerable
{
closure.this value = this;
closure.a = a;
closure.b = b;
return closure;
}
What we have here is the creation of a “lexical closure” – an object that effectively takes a snapshot of some stuff that currently lives on the stack and sticks that state on the heap so it can be referred to later. In this specific case, the code is just capturing the current values of the two method parameters and storing them in fields of the same name on the closure object. Once this is done, the function returns the closure back to its caller. Thus, when you call FromTo( 'A', 'E' ), the net result is an concrete implementation of IEnumerable that has two character fields – one whose value is ‘A’ and one whose value is ‘E’.
Implementing streams as enumerable closures
IEnumerator{
return ((fromToClosure) base.MemberwiseClone());
}
That is, it just returns a clone of the “world as it was” at the time the closure was created.
public override bool MoveNext()
{
char currentCharacter;
switch (this.current Entry Point)
{
case 0:
{
(1) this.i = ((int) this.a);
break;
}
case 1:
{
(2) this.i++
(3) if ( this.i <= ((int) this.b))
{
(4) currentCharacter = ((char) this.i);
(5) this.current Value = currentCharacter.ToString();
(6) this.current Entry Point: = 1;
(7) return true;
}
}
}
(8) return false;
}
for( int i=(int)a; i <= (int)b; i++ )
yield return ((char)i).ToString();
Member lifting and implicit iteration
First, here’s the input function as written in Cw:
public string* SimpleLift()
{
//The ToLower() call lifts correctly
return FromTo( 'A', 'E' ).ToLower();
}
And here’s what the compiled version of this function looks like in Reflector:
public IEnumerable
{
toLowerClosure closure = new toLowerClosure();
closure.thisValue = this;
closure.streamToLower( closure.thisValue.FromTo('A', 'E') );
return closure;
}
IEnumerable
toLowerClosure.foreachClosure closure
= new toLowerClosure.foreachClosure();
closure.Collection = Collection;
return closure;
}
The implementation of MoveNext() that we're interested in lives on the generated toLowerClosure.foreachClosure type. In Reflector, that implementation looks like this:
public override bool MoveNext()
{
IEnumerable
{
case 0:
{
(1) fromToResults = this.Collection;
if ( fromToResults == null)
{
return false;
}
(2) this.foreachEnumerator = fromToResults.GetEnumerator();
if ( this.foreachEnumerator == null)
{
return false;
}
}
case 1:
{
(3) if (!this.fromToResults.MoveNext())
{
return false;
}
(4) string text2 = this.foreachEnumerator.Current;
(5) this.currentValue = text2.ToLower();
this.current Entry Point = 1;
return true;
}
}
}
This MoveNext() is a little more complex because it does two things – due Cw’s lazy evaluation rules, both the FromNext and ToLower productions need to happen on every iteration of the loop.
Line (1) obtains the results of FromTo() that were captured when the stream was created. Although this variable is typed as IEnumerable
Line (3) triggers the production of an element of the FromTo stream. The details of this operation have already been explored in detail, so there’s no need to rehash them hear. The interesting thing to note is that we’re just now getting around to executing yield statement inside of FromTo(), even though it seems like we called that function an eternity ago.
Assuming that the FromTo stream wasn’t at the end and actually yielded an element, line (4) retrieves the produced value and line (5) finally gets around to calling ToLower() on it. This value gets returned all the way back to whomever is foreaching across the stream returned from SimpleLift(), and the whole process repeats until there are no more elements left in the stream to process.
Why bother with lazy lists?
I’m sure there are a number of Haskell programmers out there who could elaborate on the virtues of lazy lists far better than I. However, one benefit of the lazy list pattern that particularly strikes me is the memory cost of this method compared to traditional approaches. Non-lazy (motivated? Protestant?) lists have an O(n) memory cost associated with processing them, because all nodes in the list are memory resident. With the lazy list pattern, because items are produced on demand there’s only one element in memory at a given time. This O(c) characteristic can be very useful when dealing with large numbers of large things.
Ok, so what’s causing David’s bug?
Popping several stack frames back to the issue that originally triggered this post: if member lifting seems to work in the general case, what’s causing the behavior that David noticed? If I can lift ToLower() over a stream, why can’t I lift ToString()? Sadly, after all of this, I don’t have a root cause analysis. But I do have an observation:
public void BrokenLift()
{
//The ToString() call lifts incorrectly
FromTo( 'A', 'E' ).ToString();
}
Does not compile into the standard closure pattern that ToLower() does. Instead, we get this:
public void BrokenLift()
{
Unboxer
}
Rather than generating a closure to wrap the implicit iteration, the Cw compiler is doing something decidedly different. Looking at this output, I’m pretty sure the incorrect behavior is caused by a bug in the Cw compiler rather than a confusion as to the expected results. If the Cw developers were for some reason interested in prohibiting member lifting for methods inherited from Object, I’m guessing they would have simply emitted “this.FromTo(65, 69)).ToString()” and skipped the redundant call to the unboxer.
After blog mint:
Check out this interesting presentation on the Common Compiler Infrastructure that Cw makes use of: http://research.microsoft.com/Collaboration/University/Europe/Events/dotnetcc/Version2/Crash%20Course.ppt
