Freitag, 25. Januar 2013

Time traveling: When ORM reaches its limits

Relational databases are a proven technology. In many organizations they are the only platform that is even considered. They are reliable, they are well understood, there are enough people able to maintain them. Reporting tools love tabular data, developers love their concurrency guarantees. But there are use cases where the relational model and ORM just hit the wall.

One such use case is auditing. When something went wrong in a mission critical application, you need to know why. Problems are often caused by wrong configuration, so you need a trail of changes for every critical entity. E.g. in a financial batch process a misconfigured bank identifier can lead to millions of dollars being sent to the wrong place. In many cases you don't just want to see a long trail of changes, but you want to be able to travel back in time and see how an entity looked as a whole. 

This requirement leads to the term "entity versioning" or "temporal database design". In the simplest case (a table without any relations), you can add a "validFrom" and "validTo" timestamp to the table. Getting the current version is then done with a max-query. This can be made more convenient using a "versions" table and a "current version" view. A stored procedure converts updates to the view to inserts into the version table. Deletes are just a special case of update, too. 
But in realistic scenarios, your tables will have relations. Saving and querying a consistent view of such an aggregate leads to SQL that makes normal Hibernate look like "Hello World". Have a look at the implementation of Hibernate Envers if you want to know exactly what I mean. Also, performing migrations on such monstrous schemas is anything but fun.

So, when a relational database is not suited for this task, what kind of persistence mechanism is? I think that Event Sourcing really shines here. The basic idea is the following: We already have a "BankIdentifierChanged" event anyway, so why not persist it? And instead of using ORM, we save the whole state of our Account entity as a series of events. Updating is done by adding events to the event store. Reading is done by selecting all events with a given entity ID up to the desired date and instantiating our Account from the resulting event list. Instead of setters and getters our Account entity now has methods that take events and change the internal state accordingly.

I guess the reason that this pattern is not more widespread is the general fear to let go off the relational model. There are good reasons for this fear, I described them at the start of this article. But stretching the relational model too far can be a greater risk than trying another, more fitting technology. You dont' have to go all-out on event sourcing either. If you have proper abstraction layers in place, you can use a mix of ORM, event sourcing and other technologies depending on the use case.