view as multiple pages
by Max Guernsey, III - Managing Member, Hexagon Software LLC
If you haven’t read the
introduction for this channel of articles,
I highly recommend you do so now. For those who just need a brief reminder, the point is this:
In each article, we pick a practice used in traditional, program-centric Agility. We deconstruct that practice until we have
reduced it to its essence: principles and values. Then, we put Humpty-Dumpty together again, with data in mind. The result
should be a new practice which fills the same need as the one with which we started but is applicable to persistent data
You will also need to understand the concepts from
Part I: Evolution.
If you have not already done so, you probably should read that now. If you just need a refresher, here is the gist:
Source code does and should evolve. Evolution in source code permits emergent design without which Agility could
not function. However, the process of evolution operates on groups of things and functions by the destruction of
one generation and creation of another. This is not possible with databases, because the information in an individual
database has too much value in it to permit its destruction and replacement.
In this installment, we’re going to talk about deployment in the database world. In the software world, we have
highly reliable deployment mechanisms and techniques. These are important because they let us distribute the software
we write. If you cannot distribute a program to its users, it is very difficult to realize any of its potential value.
While the techniques in the software world are highly refined and usually pretty reliable, the processes we use for
databases are largely manual and/or unreliable. We are going to try and figure out why deployment of programs and
components is so reliable and deployment of database schemas is, by comparison, unreliable.
Reliable Deployment Tools
In the software world (C#, C++, Java, etc.) we have reliable methods of deployment for our programs. There is a
buffet of options for us – zip files, MSIs, JAR packages, web deployment solutions, and so on. Not only that, but
the set of available options seems to be expanding while existing technologies are continually refined.
These highly repeatable deployment mechanisms can easily be tested. Ideally, they are driven from tests. Even
when they are not, though, the fact that they could be means that we can gain a reasonably high degree of confidence
from manual or after-the-fact tests.
The reason these things can so easily be validated is that deployment, usually, boils down to copying files.
File-copies start out as fairly reliable things even without tests. They are essentially large assignments and,
like assignments, they are better when accompanied by tests but still pretty safe without.
Programs have Reliable Build Tools
The reason that we can rely on file-copying to distribute our software is that we have trustworthy build tools.
These days – especially in the case of managed code such as .Net or Java – running the same source through the same
compiler will produce the same binary every time, regardless of machine, environment settings, or what-have-you.
On the surface, this looks like repeatability – and it is – but it is also dependability. If we can depend on our
compiler to produce output that is strongly linked to the intentions we have expressed in our code, then we can have
confidence that output.
If our compilers were not reliable we wouldn’t be building things that propagate their output. Instead, we would be
building software on the machine that was going to use it and making sure it worked in that environment. In fact,
that is how they used to do things back when dinosaurs roamed the Earth, if I’m not mistaken.
The force of confidence has a strong influence over agility. Consider the common housecat – one of the most
agile creatures known to man – they move with stunning speed, start, stop, and turn on a dime, and regularly
do things you probably would only expect to see in the movies or cartoons. Yet, when exposed to something
that challenges their confidence – something that makes them unable to predict the outcomes of their actions
like snow – their agility vanishes and they become comically trepid.
Low confidence leads to a number of problems. Developers become hesitant to make changes for fear that they
will break something. This fear is justified: poor confidence leads to a lot of defects being introduced.
Because of all these things, it is likely that there will be a lot of rework.
The same force drives practices such as test-driven or test-first development. Without good tests, it’s very
unlikely that you will gain a high-degree of confidence in how your program or database functions. However,
without a reliable build process, you will not be able to have any such confidence. Confidence is important
because it lets you go forward without looking back.
Most of the time database deployment is done by a human interfacing with a database, manually executing scripts. When
all of the scripts are applied some kind of test is usually run. Sometimes this is an automated test, but usually it
is a manual test such as ensuring that a table is present. Regardless, the testing at this point is rarely thorough or
Even if the process is not manual it is still probably not very reliable. Usually the scripts are highly assumptive;
they might work when they are run but, if a few months went by, they might no longer be meaningful. The scripts usually
have a weak correlation with what’s in production. For instance, they may take into account the structures in production
but not the data. The scripts are poorly ordered. If required to redeploy a new database from scratch after several
iterations, most people would be hard-pressed to do it in exactly the same way it was done before.
The bottom line is that our database deployment mechanisms are very weak and largely done with an “operations” mentality:
a human must babysit the process to make sure it “worked.”
Where’s the Build?
It seems like we go straight from source to deployment. So where is the build process? Maybe the database wasn’t deployed.
Maybe it was built in place.
There are two steps to distribute a program or software component: build and then we deploy. However, in the database
world there is no such distinction. It appears that we build and deploy at the same time. This would lead one to believe
that building a database is deployment and that the converse is also true.
Your first instinct is probably to ask the question “can we split the two concerns apart from each other?” Unfortunately,
we cannot – at least not in a way that is apparent. There is little value to separating builds from deployment in the
data world. The reason for this is illustrated in Part I of this series: each database is a unique instance; its identity
and, most importantly, its information is too valuable to abandon just for some new features.
Which do we Fix?
Do we focus on increasing the dependability of our database builds or our deployment mechanisms? Lean says to find the
root cause of a problem and then fix that…
In the world of traditional software development it would be clear which was the root and which the consequence: if
our builds are a crap shoot, it really doesn’t matter how reliable our deployment tools are. Because building is the
same thing as deployment in the world of data storage it may not be as clear which we should address.
I propose that we need to stop thinking of creating a database as deployment and start thinking of it as a build processes
that happens at deploy time. The difference is subtle but it seems to have a valuable psychological impact. Deployment
is an operational concern and, without disparaging administrative types, operations are necessarily to some extent manual.
It is justified that administration and operations require some manual interaction. When a complex system has many users
it just makes sense for a person to be responsible for making sure it rolls out correctly… no matter how well we’ve tested
our roll-out methods.
Nevertheless, we still want to automate and test as much of the process as possible; we want to limit the
administrator/operator’s duties to configuration and validation, or as close as we can get. This is why I think the
“build” paradigm will serve us better. It carries with it a different set of questions which drive toward repeatable
dependability instead of one-off correctness. It allows us to build a species of databases rather than a bunch of
individuals living in a vacuum which hopefully share some traits.
Without reliable builds and deployments there is no confidence. Without confidence there is no agility. We need to find
a way to make our database creation more reliable.
Switching from a “deployment” paradigm to a “build in place” paradigm seems to change the questions asked in a useful way
while still addressing all of the deployment concerns. We can start creating testable, reusable, reliable build tools that
have the side-effect of producing a database as part of the build. This, in turn, causes the deployment side-effect itself
to be far more reliable than any manual or quasi-manual process could ever be.
All of this lets us build a testable, dependable species of databases rather than just individual specimens.
Introduction to this Series,
Part I: Evolution,