Wow. Last week went really fast. The last several days have been a blur of coding and Star Wars DVD’s. I’ve been meaning to write this post for a while, but apparently a whole week has gone by. Whoops.
Anyway, a couple of recent posts by Steve Eichert and Udi Dahan got me thinking about some implementation issues that arise out of the techniques that Eric Evans lays out in his book on Domain Driven Design. The specific issue involves figuring out how to share types across aggregate boundaries.
First, some background. In the language of Evans’ book, an Aggregate is a set of related types that are created and managed as a unit. Even though an Aggregate may span several object types, the logic for creating and storing all types within the Aggregate is centralized in a single Repository. The example that is used in the book involves Cars – the Car aggregate relates several individual types like Tires and Door. However, since a Tire doesn’t need to be tracked outside of the context of the Car instance it’s associated with, the lifecycles of Tire instances are managed by the same code that manages Cars (the CarRepository). Evans presents a pretty cogent argument as to how partitioning sets of types into Aggregates according to the natural relationships within the domain makes for a better and more manageable system, so I won’t rehash that here.
The issue that got Steve and Udi talking is something that’s not really addressed by the book, and that’s the question of what to do about Aggregates that hold references to other Aggregates. To borrow Udi’s example, consider a Customer type that contains an Address. In Udi’s domain, Addresses make sense outside of the context of Customers and vice versa, so both are considered to be “aggregate roots”. As such, they have their own respective Repository implementations. How, then, does a Customer load itself given that the CustomerRepository doesn’t know how to directly load an Address?
Udi presents a possible solution:
For those interested in how save can take place in the example "repository.Save(customer);" when the customer has an address, the answer is to use events. (This is beginning to reflect my views on the subject). The question was how do we keep the customer repository from knowing about how to save addresses, and, more so, from knowing about the address repository. As a part of saving the customer, the customer repository calls the Save method on the address property. This is OK because the customer repository knows about the structure of the customer and knows that it has an address that requires saving. But, that's as far as its knowledge goes. The beauty of this solution is that the address object's Save method does nothing but raise an event. The address repository subscribed to this event when it reconstituted the address object, and can now handle the saving without any ties to the customer object, its repository, or anything else.
I agree with his intent (ensuring that the Customer doesn’t need to know how to persist an Address), but disagree with his event-based implementation. The red flag for me is the implication that the Address entity object needs to implement a Save() method. One of the principles I took away from Domain Driven Design is that entity objects don’t expose their own lifecycle management. Instead, that responsibility is handled by their associated Repository class. From that, I take away that a good entity object should not implement a Save() method. Instead, something needs to call into the AddressRepository to persist the address. Unfortunately, figuring out exactly what that ‘something’ is can be a bit challenging.
The Simplest Thing That Could Possibly Work
The simplest possible solution is to not load any information about related aggregates during object construction. In real terms, this means that the CustomerRepository only loads the primary key value of the related Address, not the whole Address itself. Since the addressKey is very much a property of a Customer entity (it’s right there in the Customer table, after all), this approach is very easy to implement. At runtime, the CustomerEntity just passes this key value to the Address repository to return the fully populated address.
public class Customer
{
private int AddressKey;
public Address
{
get{ return AddressRepository.Load( this.addressKey ); }
set
{
this.AddressKey = value.Key;
CustomerRepository.Update( this );
}
}
}
This implementation makes it very easy to load up the associated address, but it has some serious flaws. Number one, it’s extremely chatty with the AddressRepository. While this might be a good thing if Address entities are very turbulent and change often, most likely there’s an opportunity for a performance improvement here by locally caching the returned Address instance. Secondly, setting the Address property to a new value should not trigger an update of the whole Customer in the database. It would be nice to batch up several updates to a Customer instance and commit them en masse.
Refactoring Our Way to Glory Part 1: Lazy load
We can solve both of these problems by using lazy load. To accomplish this, we need a new private field on Customer to store its fully-hydrated Address. We also need to ensmarten ( J ) the Address getter to implement lazy load semantics:
public class Customer
{
private int addressKey;
private Address address
{
get
{
if( null == this.address )
this.Address = AddressRepository.Instance.Load( this.addressKey );
Debug.Assert( this.addressKey = this.address.Key );
return this.address;
}
set
{
this.address = value;
this.addressKey = value.Key;
}
}
}
This is a significant improvement over the first version, but we’ve now run into the issue that Udi and Steve were talking about originally – namely, that the Customer entity directly calls the AddressRepository. It would be very nice if the Customer entity needed only to talk to the CustomerRepository and didn’t have a direct dependency on another Aggregate’s Repository.
Refactoring Our Way to Glory Part 2: Delegating to the CustomerRepository
One logical step would be to move the responsibility for calling AddressRepository.Load() into the CustomerRepository. This doesn’t do anything towards eliminating the direct dependency, but it does purify the Customer entity by removing the call to another aggregate’s Repository.
public class Customer
{
private int addressKey;
private Address address
{
get
{
if( null == this.address )
this.Address = CustomerRepository.Instance.LoadAddress( this.addressKey );
Debug.Assert( this.addressKey = this.address.Key );
return this.address;
}
set
{
this.address = value;
this.addressKey = value.Key;
}
}
}
public class CustomerRepository
{
public static CustomerRepository Instance = new CustomerRepository();
public Customer Load( int customerKey )
{
DataRow dr = FetchCustomerFromTable( customerKey );
return CustomerFactory.Create( dr );
}
public Address LoadAddress( int addressKey )
{
return AddressRepository.Instance.Load( addressKey );
}
}
After this refactoring, the Customer entity no longer calls the Address repository. However, we’ve now introduced two different ways to get an address – one via AddressRespository.Load() and one via CustomerRepository.LoadAddress(). This is definitely sub-optimal, as it raises questions as to which way is the “correct” way of obtaining an address. We can make this intent more clear if we change things around a bit.
Refactoring Our Way to Glory Part 3: Clarifying the intent of Customer.LoadAddress()
We need to make clear that CustomerRepository is only intended to be used in certain cases (namely, as a helper function for lazily loading a customer’s address). This signature is a step in the right direction:
public Address LoadAddressForCustomer( Customer c )
One look at this signature is enough to tell you that it’s probably not the right method to call if you’re interested in loading Address’s in the general case. However, there’s a technical problem now. As currently implemented, the Customer entity stores its addressKey in a private field. LoadAddressForCustomer needs access to that field in order to load the correct Address, but it’s defined on another class and has no access to Customer’s private state. We have a couple of options – make addressKey a public property or move the LoadAddressForCustomer() to someplace that has access to private state.
The first option is bad. Given that the Customer entity exposes its address as a fully-populated object, exposing the AddressKey publicly means that we’re now exposing the AddressKey in two places (once on the Customer, and once on the Address instance). This smells bad, as it violates the “once and only once” principle. Clearly, we must move the LoadAddressForCustomerMethod someplace else – but where? Putting it back on the Customer is an option, but that violates the rule that entities don’t do their own lifecycle management. Where else can we put it so that it has access to Customer’s private state?
The solution is to pick up the whole CustomerRepository class and make it a nested class inside the Customer entity. Nested classes have access to the private state of their containing type, so the member accessibility problem is solved. Plus, the model now more closely mirrors the domain, because the nesting concretely expresses the implicit relation ship between the Customer and the CustomerRepository. The repository class has no meaning on its own – its sole purpose in life is to manage the lifecycle of Customer. Thus, it seems reasonable that the CustomerRepository concept is closely related to the Customer concept via nested classes.
public class Customer
{
private int addressKey;
[other members elided for clarity]
public class CustomerRepository
{
public static CustomerRepository Instance = new Customer.CustomerRepository();
public Customer Load( int customerKey )
{
DataRow dr = FetchCustomerFromTable( customerKey );
return CustomerFactory.Create( dr );
}
public Address LoadAddressForCustomer( Customer c )
{
return Address.AddressRepository.Instance.Load( c.addressKey );
}
}
}
Like the previous refactorings, this is an improvement over the previous state. It’s now clearer that LoadAddressForCustomer() only deals with Customer instances and is not a supported mechanism of loading Addresses outside of a Customer context. Things are looking pretty good. However, all those singleton respositories are bothersome. They’ll do nothing but cause headaches for unit testing, since we’ll most likely be wanting to mock out our repositories at some point.
Refactoring Our Way to Glory Part 4: Eliminating Singletons
In a real-world system, singletons are a real headache. Both Scott Densmore and Brian Button have raised some very good points around this topic. Right now, the current Repository pattern is singleton-based. In the current design, every entity has a singleton repository, which is far too many singletons. It would be great if we can eliminate some of those. In an external email conversation, Steve Eichert suggested to me that introducing a RepositoryFactory would help alleviate the singleton problem. He’s totally right, so that’s the next refactoring.
In order to make the RepositoryFactory work, we first need to introduce some sort of interface common to all Repositories. Eventually, we’ll use this to implement a common CRUD contract that all the repositories support. Since we’re only worried about loading objects right now, the interface definition can be quite simple:
public interface IRepository<T>
T Load( int a );
}
This is a great opportunity to take advantage of generics. Without them, we’d have to make Load() return an object and consequently force everyone to downcast the returned value. Generics give us the opportunity to eliminate those casts in most cases. The next step is to implement the interface on the CustomerRepository type:
public class Customer
{
public class CustomerRepository : IRepository< Customer >
{
}
//This satisfies IRepository<T>
public Customer Load( int key )
{
}
}
Finally, we need to implement the generic RepositoryFactory. It has one generic method, GetRepository<T>, that returns an instance of IRepository
public class RepositoryFactory
{
public IRepository<T>
{
IRepository<T>
if( typeof( T ) == typeof( Customer ) )
result = (IRepository<T>
else if( typeof( T ) == typeof( Address ) )
result = (IRepository<T>
return result
}
}
With that in place, the current implementation of the customer entity looks this:
public class Customer
{
private int addressKey;
private Address address;
private RepositoryFactory repositoryFactory;
public Customer( RepositoryFactory factory )
{
this.repositoryFactory = factory;
}
public Address
{
get
{
if( null == this.address )
{
CustomerRepository cr = (CustomerRepository)
repositoryFactory.GetRepository<Customer>();
this.address = cr.LoadAddressForCustomer( this );
}
return this.address;
}
set
{
this.address = value;
this.addressKey = value.Key;
}
}
//The nested RepositoryFactory class
public class CustomerRepository : IRepository<Customer>
{
private RepositoryFactory repositoryFactory;
public CustomerRepository( RepositoryFactory factory )
{
this.repositoryFactory = factory;
}
public Customer Load( int customerKey )
{
DataRow dr = FetchCustomerFromTable( customerKey );
return CustomerFactory.Create( dr );
}
public Address LoadAddressForCustomer( Customer c )
{
return repositoryFactory.GetRepository<Address>().Load( c.addressKey );
}
private DataRow FetchCustomerFromTable( int customerKey )
{
//elided for clarity
}
}
public static class CustomerFactory
{
public static Customer Create( DataRow r )
{
//elided for clarity
}
}
}
Final Thoughts
After four separate refactorings, we’ve arrived at a pretty solid and flexible implementation. More importantly, we’ve adhered to our rules:
- Entities do not perform their own lifecycle management. The lifecyle for an entity is managed via its Repository.
- Each entity talks only to its own Repository.
- Repositories talk to other repositories through the RepositoryFactory.
- Singletons have been largely removed from the design.
There are still a couple of outstanding issues – not the least of which is the fact that our repository only knows how to Load objects. We haven’t even begun to touch on the other side of the coin, the Save() method. That has its own set of issues and will likely merit its own article at some later date. Also, we still have the CustomerFactory class hanging around as a static. That may or may not be bad, although I’m inclined to leave it until there’s a good reason to change it. Any other comments or feedback is certainly welcome.
By way of conclusion, here’s a link to a stub solution that implements this pattern. Since it uses generics, you’ll need Whidbey Beta 1 to load the solution file. I also went a step further and used partial classes to implement the nested types in their own files for the sake of keeping class files small.
Happy hacking.
Update #1: Fixed typos in code and unmunged formatting a bit.
