• Feeds

    Subscribe in a reader

  • Ads

Inside streams and member lifting in Cw

COmega (or Cw, as most people seem to be referring to it) is proving to be the source of all sorts of interesting things. If you’re interested in compilers or programming languages, Cw has layers upon layers of new and nifty stuff.

Frankly, I’m not sure what’s more interesting to me: the code that Cw lets you write, or the code that Cw writes for you. I started looking at Cw output in an effort to figure out what was causing the bug that David Findley noticed. The bug David found dealt with lifted member access over a stream of Int32’s. I’ve slightly modified his example to work over a stream of strings, for reasons that will become clear later. However, before we get to that, there are some more fundamental ideas that are worth exploring.

Basic Streams in Cw

Let’s say that you have a function that returns a string* (a stream of zero or many strings), like so:

public string* FromTo( char a, char b )
{

   for( int i=(int)a; i <= (int)b; i++ )
       yield return ((char)i).ToString();
}

Using this function, I can say something like string* letters = FromTo( 'A', 'E' ); and end up with a variable whose logical contents are {“A”, “B”, “C”, “D”, “E” }. I say “logical contents” because the contents of the stream are produced at consumption time, not creation time. This is sort of counterintuitive at first, as the body of the FromTo() method generally doesn’t execute when you call it . In order to get the yield return statement to run, I need to iterate over the stream with some sort of foreach statement. For example, I could say:

foreach( string s in letters )
  Console.Out.Write( s );

and get 
    ABCDE
printed out on the console window.

If all of this seems sort of familiar, it’s because iterators in C# 2.0 also use the yield return contextual keyword to do the same sort of thing. I think they also use the same general implementation so I believe that most of what I’m about to say will also hold generally true for C# 2.0 iterators as well.

What’s really happening inside of FromTo()?

Running the compiled output of the FromTo() function through Reflector reveals the following code (I’ve cleaned up the variable names a little bit and removed some decompiler spam for clarity ):

public IEnumerable FromTo(char a, char b)
{
  fromToClosure closure = new fromToClosure();
  closure.this value = this; 
  closure.a = a;
  closure.b = b; 
 
 
return closure;
}

What we have here is the creation of a “lexical closure” – an object that effectively takes a snapshot of some stuff that currently lives on the stack and sticks that state on the heap so it can be referred to later. In this specific case, the code is just capturing the current values of the two method parameters and storing them in fields of the same name on the closure object. Once this is done, the function returns the closure back to its caller. Thus, when you call FromTo( 'A', 'E' ), the net result is an concrete implementation of IEnumerable that has two character fields – one whose value is ‘A’ and one whose value is ‘E’.

Implementing streams as enumerable closures

The fromToClosure type provides implementations of both IEnumerable and IEnumerator, allowing the contents of the closure to be traversed using foreach. The IEnumerable.GetEnumerator implementation is simple:

IEnumerator IEnumerable.GetEnumerator()
{
  return ((fromToClosure) base.MemberwiseClone());
}


That is, it just returns a clone of the “world as it was” at the time the closure was created.

According to the IEnumerator pattern, the function of IEnumerator.MoveNext() is to return the “next value” in the stream. Thus, we’d expect the guts of the yield return statement from the original FromTo() source to end up in fromToClosure.MoveNext(). Looking at the decompiled output of this method reveals that this is exactly where that code went (again, I cleaned up the decompiler output and numbered the lines a bit for clarity).

public override bool MoveNext()
{
   char currentCharacter;
   switch (this.current Entry Point)
   {
    case 0: 
    {
(1)   this.i = ((int) this.a);
      break;
    }
    case 1:
    {
(2)   this.i++
(3)   if ( this.i <= ((int) this.b))
      { 
(4)      currentCharacter = ((char) this.i);
(5)      this.current Value = currentCharacter.ToString();
(6)      this.current Entry Point: = 1;
(7)      return true; 
      }
     }
    }
(8) return false;
}

If you stand back and squint a bit, it’s easy to see the original for loop in there. The code that started out as

       for( int i=(int)a; i <= (int)b; i++ )
              yield return ((char)i).ToString();

has been cracked open by the Cw compiler and turned into this  implementation of MoveNext(). The initializer statement can be found at (1) – although it’s now initializing a private field on the closure object instead of a local variable. The original conditional statement is replicated almost verbatim in (3), and the increment statement now lives at (2). The body of the loop live at (4) – (6), and the return statements at (7) and (8) ensure that MoveNext() properly returns false when the enumerator has reached the end of the stream.

Stepping through this code a couple of times with a mental debugger should illustrate a significant behavioral difference between streams and a standard foreach: iterators have “lazy list” semantics because the contents of the iteration are not produced until MoveNext() gets called – which might be long after the original call that created the iterator.

Member lifting and implicit iteration

Cw has a really interesting language feature called “member lifting”, which allows for an implicit iteration over each element in a stream. Remember that the outcome of FromTo( 'A', 'E' ) is the stream {“A”, “B”, “C”, “D”, “E”}. Using member lifting, we can transform that stream into the modified stream {“a”, “b”, “c”, “d”, “e”} by saying FromTo( 'A', 'E' ).ToLower(). Given this statement, the Cw compiler will generate code to lift the ToLower() operation out and apply it individually to all elements in the stream returned by the call to FromTo().

So, how does this work? We can get some clues into the implementation from the type system. The static type of FromTo( 'A', 'E' ).ToLower() is string*. This makes sense because an individual call to ToLower() returns a string and we’re aggregating a whole bunch of those calls. Since we’ve already seen how streams are implemented as closure types, it’s expected that this statement would compile out to an instantiation of a closure type. And since there are two method calls in the original statement, we would expect that there would be two calls two methods implemented on this closure type. Looking at the decompiled output, we see that this is indeed the case.

First, here’s the input function as written in Cw:

public string* SimpleLift()
{   

   //The ToLower() call lifts correctly

   return FromTo( 'A', 'E' ).ToLower();

}

And here’s what the compiled version of this function looks like in Reflector:

public IEnumerable SimpleLift()
{
  toLowerClosure closure = new toLowerClosure();
  closure.thisValue = this;
  closure.streamToLower( closure.thisValue.FromTo('A', 'E') );
  return closure;
}

The invocation of FromTo() is fairly obvious – it’s just capturing the current value of the this pointer in the closure object, and then using that to invoke FromTo(). What’s more surprising is the implementation of the streamToLower() function on the generated closure class. That actually looks like this:

IEnumerable streamToLower(IEnumerable Collection)
{
   toLowerClosure.foreachClosure closure
           = new toLowerClosure.foreachClosure();
   closure.Collection = Collection;
   return closure;
}

Hey, wait a minute! Instead of doing any work, the dang thing just returned another closure! When does work actually get done?? This behavior makes sense – member lifting is accomplished lazily using iterators, and the common implementation of iterators and streams has already been explored. Calling SimpleLift() does nothing except instantiate a bunch of closures and return an IEnumerable implementation. As with all streams, the work won’t really get done until someone iterates over the stream and causes MoveNext() to get called.

The implementation of MoveNext() that we're interested in lives on the generated toLowerClosure.foreachClosure type. In Reflector, that implementation looks like this:

public override bool MoveNext()
{
   IEnumerable fromToResults;
  
switch (this.current Entry Point)
   {
    
case 0:
    
{
(1)     fromToResults = this.Collection;
        if ( fromToResults == null)
       
{
         return false;
        }
(2)     this.foreachEnumerator = fromToResults.GetEnumerator();
        if ( this.foreachEnumerator == null)
        {
          return false;
        }
      }
      case 1:
     {
(3)  if (!this.fromToResults.MoveNext())
     {
       return false;
     }
(4) string text2 = this.foreachEnumerator.Current;
(5) this.currentValue = text2.ToLower();
    this.current Entry Point = 1;
    return true;
    }
   }
}

This MoveNext() is a little more complex because it does two things – due Cw’s lazy evaluation rules, both the FromNext and ToLower productions need to happen on every iteration of the loop.

Line (1) obtains the results of FromTo() that were captured when the stream was created. Although this variable is typed as IEnumerable, it has a concrete type identity of fromToClosure(which is totally reasonable, since that’s what FromTo() actually returns). Line (2) obtains an enumerator, causing the closure to return a clone of itself and its captured state.

Line (3) triggers the production of an element of the FromTo stream. The details of this operation have already been explored in detail, so there’s no need to rehash them hear. The interesting thing to note is that we’re just now getting around to executing yield statement inside of FromTo(), even though it seems like we called that function an eternity ago.

Assuming that the FromTo stream wasn’t at the end and actually yielded an element, line (4) retrieves the produced value and line (5) finally gets around to calling ToLower() on it. This value gets returned all the way back to whomever is foreaching across the stream returned from SimpleLift(), and the whole process repeats until there are no more elements left in the stream to process.

Why bother with lazy lists?

I’m sure there are a number of Haskell programmers out there who could elaborate on the virtues of lazy lists far better than I. However, one benefit of the lazy list pattern that particularly strikes me is the memory cost of this method compared to traditional approaches. Non-lazy (motivated? Protestant?) lists have an O(n) memory cost associated with processing them, because all nodes in the list are memory resident. With the lazy list pattern, because items are produced on demand there’s only one element in memory at a given time. This O(c) characteristic can be very useful when dealing with large numbers of large things.

Ok, so what’s causing David’s bug?

Popping several stack frames back to the issue that originally triggered this post: if member lifting seems to work in the general case, what’s causing the behavior that David noticed? If I can lift ToLower() over a stream, why can’t I lift ToString()? Sadly, after all of this, I don’t have a root cause analysis. But I do have an observation:

public void BrokenLift()
{  
  
//The ToString() call lifts incorrectly

   FromTo( 'A', 'E' ).ToString();

}

Does not compile into the standard closure pattern that ToLower() does. Instead, we get this:

public void BrokenLift()
{
  Unboxer.ToObject(this.FromTo(65, 69)).ToString();
}

Rather than generating a closure to wrap the implicit iteration, the Cw compiler is doing something decidedly different. Looking at this output, I’m pretty sure the incorrect behavior is caused by a bug in the Cw compiler rather than a confusion as to the expected results. If the Cw developers were for some reason interested in prohibiting member lifting for methods inherited from Object, I’m guessing they would have simply emitted “this.FromTo(65, 69)).ToString()” and skipped the redundant call to the unboxer.

After blog mint:
Check out this interesting presentation on the Common Compiler Infrastructure that Cw makes use of:
http://research.microsoft.com/Collaboration/University/Europe/Events/dotnetcc/Version2/Crash%20Course.ppt