Rethinking Agility in Databases - Part II: Builds & Deployment

Skip Navigation Links

Rethinking Agility in Databases 2/25/2008 12:48 PM

view as multiple pages

by Max Guernsey, III - Managing Member, Hexagon Software LLC

Introduction

If you haven’t read the introduction for this channel of articles, I highly recommend you do so now. For those who just need a brief reminder, the point is this:

In each article, we pick a practice used in traditional, program-centric Agility. We deconstruct that practice until we have reduced it to its essence: principles and values. Then, we put Humpty-Dumpty together again, with data in mind. The result should be a new practice which fills the same need as the one with which we started but is applicable to persistent data stores.

You will also need to understand the concepts from Part I: Evolution. If you have not already done so, you probably should read that now. If you just need a refresher, here is the gist:

Source code does and should evolve. Evolution in source code permits emergent design without which Agility could not function. However, the process of evolution operates on groups of things and functions by the destruction of one generation and creation of another. This is not possible with databases, because the information in an individual database has too much value in it to permit its destruction and replacement.

In this installment, we’re going to talk about deployment in the database world. In the software world, we have highly reliable deployment mechanisms and techniques. These are important because they let us distribute the software we write. If you cannot distribute a program to its users, it is very difficult to realize any of its potential value. While the techniques in the software world are highly refined and usually pretty reliable, the processes we use for databases are largely manual and/or unreliable. We are going to try and figure out why deployment of programs and components is so reliable and deployment of database schemas is, by comparison, unreliable.

Reliable Deployment Tools

In the software world (C#, C++, Java, etc.) we have reliable methods of deployment for our programs. There is a buffet of options for us – zip files, MSIs, JAR packages, web deployment solutions, and so on. Not only that, but the set of available options seems to be expanding while existing technologies are continually refined.

These highly repeatable deployment mechanisms can easily be tested. Ideally, they are driven from tests. Even when they are not, though, the fact that they could be means that we can gain a reasonably high degree of confidence from manual or after-the-fact tests.

The reason these things can so easily be validated is that deployment, usually, boils down to copying files. File-copies start out as fairly reliable things even without tests. They are essentially large assignments and, like assignments, they are better when accompanied by tests but still pretty safe without.

Programs have Reliable Build Tools

The reason that we can rely on file-copying to distribute our software is that we have trustworthy build tools. These days – especially in the case of managed code such as .Net or Java – running the same source through the same compiler will produce the same binary every time, regardless of machine, environment settings, or what-have-you.

On the surface, this looks like repeatability – and it is – but it is also dependability. If we can depend on our compiler to produce output that is strongly linked to the intentions we have expressed in our code, then we can have confidence that output.

If our compilers were not reliable we wouldn’t be building things that propagate their output. Instead, we would be building software on the machine that was going to use it and making sure it worked in that environment. In fact, that is how they used to do things back when dinosaurs roamed the Earth, if I’m not mistaken.

Confidence

The force of confidence has a strong influence over agility. Consider the common housecat – one of the most agile creatures known to man – they move with stunning speed, start, stop, and turn on a dime, and regularly do things you probably would only expect to see in the movies or cartoons. Yet, when exposed to something that challenges their confidence – something that makes them unable to predict the outcomes of their actions like snow – their agility vanishes and they become comically trepid.

Low confidence leads to a number of problems. Developers become hesitant to make changes for fear that they will break something. This fear is justified: poor confidence leads to a lot of defects being introduced. Because of all these things, it is likely that there will be a lot of rework.

The same force drives practices such as test-driven or test-first development. Without good tests, it’s very unlikely that you will gain a high-degree of confidence in how your program or database functions. However, without a reliable build process, you will not be able to have any such confidence. Confidence is important because it lets you go forward without looking back.

Database Deployment

Most of the time database deployment is done by a human interfacing with a database, manually executing scripts. When all of the scripts are applied some kind of test is usually run. Sometimes this is an automated test, but usually it is a manual test such as ensuring that a table is present. Regardless, the testing at this point is rarely thorough or repeatable.

Even if the process is not manual it is still probably not very reliable. Usually the scripts are highly assumptive; they might work when they are run but, if a few months went by, they might no longer be meaningful. The scripts usually have a weak correlation with what’s in production. For instance, they may take into account the structures in production but not the data. The scripts are poorly ordered. If required to redeploy a new database from scratch after several iterations, most people would be hard-pressed to do it in exactly the same way it was done before.

The bottom line is that our database deployment mechanisms are very weak and largely done with an “operations” mentality: a human must babysit the process to make sure it “worked.”

Where’s the Build?

It seems like we go straight from source to deployment. So where is the build process? Maybe the database wasn’t deployed. Maybe it was built in place.

There are two steps to distribute a program or software component: build and then we deploy. However, in the database world there is no such distinction. It appears that we build and deploy at the same time. This would lead one to believe that building a database is deployment and that the converse is also true.

Your first instinct is probably to ask the question “can we split the two concerns apart from each other?” Unfortunately, we cannot – at least not in a way that is apparent. There is little value to separating builds from deployment in the data world. The reason for this is illustrated in Part I of this series: each database is a unique instance; its identity and, most importantly, its information is too valuable to abandon just for some new features.

Which do we Fix?

Do we focus on increasing the dependability of our database builds or our deployment mechanisms? Lean says to find the root cause of a problem and then fix that…

In the world of traditional software development it would be clear which was the root and which the consequence: if our builds are a crap shoot, it really doesn’t matter how reliable our deployment tools are. Because building is the same thing as deployment in the world of data storage it may not be as clear which we should address.

I propose that we need to stop thinking of creating a database as deployment and start thinking of it as a build processes that happens at deploy time. The difference is subtle but it seems to have a valuable psychological impact. Deployment is an operational concern and, without disparaging administrative types, operations are necessarily to some extent manual.

It is justified that administration and operations require some manual interaction. When a complex system has many users it just makes sense for a person to be responsible for making sure it rolls out correctly… no matter how well we’ve tested our roll-out methods.

Nevertheless, we still want to automate and test as much of the process as possible; we want to limit the administrator/operator’s duties to configuration and validation, or as close as we can get. This is why I think the “build” paradigm will serve us better. It carries with it a different set of questions which drive toward repeatable dependability instead of one-off correctness. It allows us to build a species of databases rather than a bunch of individuals living in a vacuum which hopefully share some traits.

Conclusion

Without reliable builds and deployments there is no confidence. Without confidence there is no agility. We need to find a way to make our database creation more reliable.

Switching from a “deployment” paradigm to a “build in place” paradigm seems to change the questions asked in a useful way while still addressing all of the deployment concerns. We can start creating testable, reusable, reliable build tools that have the side-effect of producing a database as part of the build. This, in turn, causes the deployment side-effect itself to be far more reliable than any manual or quasi-manual process could ever be.

All of this lets us build a testable, dependable species of databases rather than just individual specimens.