Cloud Configuration Management: Where the Rubber Hits the Clouds

Your data center, sometime in the mid 1990’s: the server you ordered finally arrives. Could be Windows, Linux, some flavor of Unix, doesn’t matter. You unpack it. Boot it up. Patch the OS. Configure the OS. Install software off of CDs. Patch the software. Configure the software. Move data to the box. Test. Tweak. Test again. Finally, the box goes live.

Cut to 2012. You’re working in the Cloud now. You provision a virtual machine (VM) instance in the Cloud. Or three. Or maybe a few dozen. Only you’re not just provisioning VMs. You also provision some dynamic storage. Maybe some Cloud-based queues. You also want some SaaS-based services.

And your software release cycles? Weekly. No, daily. How about hourly?

Now what?

Clearly, it’s impractical to set up your Cloud instances manually, the way we used to set up servers in the good old days. So you go through the process once and create an image file that represents your Platonic ideal of what a fully configured VM instance should look like. Now, every time you need to provision a new VM instance, simply reconstitute the image. Right?

Not so fast! There are numerous gotchas to this scenario. Every time you need to patch anything, you would need to create a new image. If different VM instances are meant to differ in any way – say, contain different application data – you would need to configure those differences manually. But most significantly, there is far more to your Cloud environment than single VM instances. What about the storage? Databases? Network configuration? What about the architecture?

The Basics of Automated Provisioning

Remember, “automated” means “not manual,” in the sense that hands are not allowed. You want the ability to deploy, update, and repair your entire application infrastructure using nothing but pre-defined, automated procedures. Ideally, you want to automatically provision your entire environment from bare-metal (hardware with no operating systems – or anything else – installed on them) all the way up to running business services completely from a pre-defined specification, including the network configuration. Furthermore, there should be no direct management of individual boxes. You want to manage the entire Cloud deployment as a single unit.

Deploying sophisticated provisioning tools, of course, is a large part of the secret. And the more sophisticated the tools, the less skilled your staff has to be. Ideally, any people familiar with a few basic commands and appropriate permissions should be able to deploy any release to any integrated development, test, or production environment. They only require minimal domain specific knowledge. You don’t need a senior sysadmin. You don’t even need a systems developer. Any junior techie should be able to handle the task.

If something goes wrong, you should be able to revert to a “previously known good” state at any time. In a mature Cloud environment, it’s always easier to reprovision than it is to repair. Reprovisioning could mean an automated cycle of validating and regenerating application and system configurations, or even rerunning the full provisioning cycle from the base OS up to running business applications.

In many cases, of course, the previously known good state isn’t good enough, typically because there are live data in the real time state that would be lost with this kind of rollback. As a result, such rollbacks must be handled carefully, as they really aren’t rollbacks in the sense of a two-phase commit. Instead, with fully automated provisioning, the provisioning system should be able to “roll forward to a previous version,” where the provisioning tools will automatically return your applications to a functionally acceptable state, with all your data intact.

Automated provisioning depends upon the environment specification. This spec is essentially a declarative representation of how you want your entire deployment. Your provisioning tools will then essentially execute the spec, starting with bare metal and possibly stock virtual machine images, and then they will automatically deploy, configure, and start up the entire system or the application stack (or both), with no runtime decisions or tweaking by an operator. The spec should also contain sufficient detail to direct the appropriate tools to test whether the automation is implemented correctly, and if it isn’t, to take the appropriate action.

This specification can be as sophisticated as your tools and your architecture allow it to be. It may vary from release to release, and you should be able to break it down for specific tools that handle different parts of the configuration. The spec may also have conditional logic, and can also specify deployment or configuration changes over time, for example, the instruction to provision additional instances when traffic numbers cross a threshold.

You may also want to handle the automatic configuration of the application stack separately from the configuration of the system stack, as your applications may change more frequently than the systems. The goal is to make the spec sufficiently sophisticated so that the automation itself doesn’t vary from release to release. It will only require updates when your requirements call for a significant architectural change.

The View from Above and Below the Clouds

There are fundamentally two sides to this story: the view from the perspective of the Cloud service provider (including the internal providers of private Clouds), vs. the view from the consumer of Cloud-based resources. Clearly, Amazon, Microsoft, and the other public Cloud providers have figured out how to automate the configuration of their public Cloud environments. For organizations building their own private Clouds, the challenge is to take a page out of the public service providers’ playbooks on how to run a Cloud environment. Bottom line: if you don’t get automated configuration management down pat, you’re not running a private Cloud at all. You simply have a traditional data center with some Cloud-like features – and furthermore, you have a data center that is more expensive to run than necessary.

If you’re in a position to consume Cloud resources, regardless of the Cloud deployment model, then automated provisioning is every bit as important as it is for Cloud service providers, only now it impacts your existing IT processes and policies. As organizations adopt the Cloud, they increasingly transform the role of operations. No longer does your ops team actually take care of servers, networks, and applications. Instead, you’re automating that work, shifting the expertise required to the development team who must now create and manage the automation scripts that form the specification. Or perhaps the ops team moves their cubicles to the dev area, working hand-in-hand with developers to handle those scripts. Either way, Cloud changes everything in the IT department.

The Realization of the DevOps Vision

Reworking the relationship between dev and ops, or DevOps, is nothing new, of course. According to Wikipedia, “DevOps is an emerging set of principles, methods and practices for communication, collaboration and integration between software development (application/software engineering) and IT operations (systems administration/infrastructure) professionals. It has developed in response to the emerging understanding of the interdependence and importance of both the development and operations disciplines in meeting an organization’s goal of rapidly producing software products and services.” While ZapThink hasn’t discussed DevOps by name up to this point, we have been calling for iterative, full-lifecycle governance for several years now – an essential enabler of success with SOA in particular and agile architectures in general.

With the rise of Cloud Computing, DevOps is entering what might be its “golden age.” As Cloud provisioning specifications become more sophisticated, creating them becomes more of a development task than an operational one. Ops doesn’t go away, of course, but it moves to the other side of the Cloud: supporting Cloud data centers. In other words, if you have a private Cloud, your ops team is responsible for managing the private Cloud infrastructure. And yes, if you use a public Cloud, well, you have the luxury of outsourcing operations to your Cloud provider. Good sysadmins need not worry, of course. If anything, demand for your skills is only increasing with the move to the Cloud.

The ZapThink Take

First there was software development. Write a bunch of code and run it on a computer – “the computer is the computer.”

Then there was systems development. Write a bunch of code and put it on a bunch of computers, and have them serve up bits of it to many more computers – “the network is the computer.”

Now we’re at the dawn of Cloud development: create sophisticated Cloud provisioning/deployment/management specifications, and run those in the Cloud. Yes, the Cloud itself becomes the computer. We’re not talking IaaS, PaaS, or SaaS here. Even those oh-so-2011 Cloud service models are only elements of the spec, for automated provisioning tools to provision and configure dynamically.

We’re not there yet, of course – but there are a number of increasingly sophisticated automated provisioning tools on the market today, and an increasing number of organizations are leveraging them to take full advantage of the Cloud. Want to learn more? ZapThink covers automated provisioning, including a broad discussion of available tools, in our Enterprise Cloud Computing and Architecting for the Cloud courses. We’re running them in Singapore February 23-24, Sydney February 27-28, Melbourne Australia February 29-March 1, Delhi March 5, Mumbai March 6, Hyderabad March 7, Bengaluru March 8, and London March 15-16. Be prepared for a comprehensive, vendor-independent, architecture-focused fire hose of everything Cloud. There is no way to get this material from anyone but ZapThink. Classes are filling up so register now. We hope to see you at one of the courses soon!

Image source: Horia Varlan