The Secret to a RESTful Cloud

If you’ve been following ZapThink for the last few years, you know we’re talking less about SOA and more about REST and Cloud. Not that there’s anything wrong with SOA—we’re simply focusing on the current challenges organizations face when building agile architectures. So, it should come as no surprise that we finally write a ZapFlash on RESTful Clouds.

You might think the story we have to tell about RESTful Clouds has to do with RESTful APIs to the Cloud. Sure, we want to be able to access Cloud resources as well as Cloud management capabilities via RESTful interfaces. Makes so much sense that the Apache Foundation has already been working on it in their Deltacloud initiative, to name one of many such efforts.

Deltacloud and its brethren are promising, to be sure, but that’s not what we’re talking about here. RESTful APIs to the Cloud miss the point to REST, and don’t address the core challenge of architecting for the Cloud, two trends ZapThink has been exploring over the last few months. In this ZapFlash, it’s finally time to tie them together into a neat holiday bow:

Trend #1: We explored how the elasticity property of Clouds impacts how we architect applications for the Cloud. We even called this aspect of Cloud architecture a “blind spot” in the ZapFlash Architecting Beyond Cloud Computing’s Horseless Carriage.

Trend #2: We discussed how REST really isn’t about interfaces and APIs, it’s about distributed hypermedia applications in the ZapFlash The Right End of REST.

The story of RESTful Clouds, therefore, isn’t about RESTful APIs to the Cloud at all, useful though they may be. Instead, the fact that the REST architectural style focuses on hypermedia applications addresses one of the knottiest challenges of architecting for the Cloud: how we deal with application state.

The Challenge of Application State in the Cloud

The traditional approach to maintaining application state for any distributed application is to use some kind of stateful object on the server. A stateful object contains data that maintain the context of that application for a particular user across conversations consisting of multiple calls to a specific instance of that object. In essence, stateful objects keep track of what individual users are doing when they use an application.

A familiar example of a stateful object is the traditional shopping cart. Every shopping cart instance belongs to an individual customer. Therefore, if you had ten thousand customers shopping at the same time, you would have ten thousand shopping cart instances, which can cause problems for your application. Not only do all these carts present an obvious scalability challenge, but if you need to update a given customer’s shopping cart, you must first locate the correct shopping cart instance for that customer. No other shopping cart will do.

Furthermore, remember that in the Cloud, you must plan for and expect failure. If you are dependent on a single instance of a shopping cart, and the resources that support that cart crash, then you are faced with another challenge. Hopefully you’ve been saving the cart’s state somewhere (and of course, the state for every other cart for every other customer), so that you can reconstitute the failed cart elsewhere. Perhaps the worst that would happen would be that the customer would have to start their shopping over, but in other situations, the failure of a stateful instance can leave the customer in an inconsistent state. That’s a surefire recipe for losing customers.

Stateless objects, on the other hand, don’t contain any information or context between calls of that object. In other words, each such call stands alone and doesn’t rely upon prior calls as part of an ongoing conversation. You could call one instance of a stateless object, and then make a call on a different instance of the same object, and you wouldn’t able to tell the difference.

If an object is stateless, therefore, the Cloud is free to use any instance of that object to get its work done. You no longer have to worry about contention for a single instance of an object, a situation that could lead to a variety of distributed computing challenges including race conditions, deadlocks, and starvation. Instead, you can simply rely upon the elasticity of the Cloud to add more instances as needed.

It is an important best practice, therefore, that all application logic in the Cloud should be stateless. No object instances, no session Beans, no server cookies. If the load on your application spikes, the Cloud should respond elastically by provisioning adequate resources to meet the need. The more stateless your application is, the better able the Cloud will be to achieve this seamless elasticity.

HATEOAS to the Rescue

Elastic, stateless shopping carts are all well and good, but what about my shopping cart? I just put my holiday shopping in there. Of course it has to be stateful!

Here’s where our discussion of state gets a bit murky. There are actually two different types of state at play here: application state and resource state. In the case of the shopping cart, the application state consists of what happens to be in individual carts as customers work through the purchasing process, and where in the process they are at any point in time. The resource state includes information the application must access or update, including the customer mailing address, credit card information, and the actual purchase transaction. In other words, the application state is dynamic and the resource state is persistent.

When you deal with resource state in the Cloud, therefore, you’re working at the persistence tier, where traditional approaches to scalability like database sharding and traditional approaches to reliability like replication and caching work reasonably well. There are limitations on Cloud persistence, however; don’t expect to achieve two-phase commit levels of reliability, because the Cloud’s inherent partition tolerance and availability prevent it from exhibiting immediate data  consistency. Data within a particular Cloud instance, however, are still internally consistent.

Application state is a different matter. Treating application state as though it were resource state—writing your application state to your database every time a customer does anything with their shopping cart—limits your scalability, elasticity, and reliability. Don’t go there if you can avoid it. Instead, you want hypermedia to be the engine of application state. In other words, your stateless application instance must give the client everything it needs to know in order to work its way through the purchasing process, and the client maintains the application state for the entire process. You don’t need to spawn a stateful shopping cart instance on the server every time a customer hits your application, since after all, the more users you have, the more clients you have. Why not let the client do the work?

A RESTful Shopping Cart

To explain how RESTful shopping carts might work in the Cloud, we must set up two separate RESTful interactions. The first is between the application tier and the persistence tier. The application tier serves as the client in this case, requesting a representation of the customer’s shopping cart instance from the resource on the persistence tier. This application tier client is stateless; the representation has all the necessary information about the customer as well as the process logic for the purchasing processes that you want the customer to follow.

The second interaction is between the customer’s client and the application tier, which now serves as the server for this interaction. The customer follows links in the representations that the resources on the server return, and thus the client executes the purchasing process as per the customer’s requirements. But the code on the application tier is still completely stateless; it is in charge of following the instructions provided by the persistence tier in a declarative manner.

If the application instance on the application tier crashes, the Cloud environment automatically spawns a replacement. When the customer clicks a link, that replacement knows to repopulate the customer’s shopping cart representation based upon the information in the GET, POST or other operation on that link’s URI, and furthermore, knows where the customer was in their purchasing process based on the information in the operation on the URI as well. In other words, the client runs the shopping cart application by enabling the customer to follow links, and those requests tell the Cloud environment everything it needs to know to meet the customer’s needs, without maintaining any application state of its own. That’s HATEOAS in action.

A RESTful Cloud in Action

A great application of RESTful Cloud principles is the shopping cart. Try this experiment: log into your account simultaneously from two different browsers. Add an item to your cart from one browser. Reload the Amazon home page on the other: note that the number of items in your cart went up by one. Why? Because you haven’t actually begun the purchasing process yet. The shopping cart count on the Amazon home page is part of your resource state.

Next, proceed on one browser as though you were purchasing the item. In the middle of the process, change the quantity of the item you’re trying to purchase from one to two. Again, reload the Amazon home page from the other browser: this time the cart count doesn’t change. You have two items in your cart according to one browser and one item according to the other, even though you only have one shopping cart.

What’s going on here? Amazon’s persistence tier handed your purchase process off to a Cloud instance, and your first browser is maintaining application state for that instance. The Cloud instance can therefore be entirely stateless, which enables Amazon to maximize the elasticity of their environment. If Amazon does it that way, then so should you.

The ZapThink Take

Architecting your Cloud-based app so that all Cloud-based code is stateless is essential for implementing rapid elasticity—and furthermore, such Cloud-based code may be in virtual machine instances, in application packages running on PaaS environments, or in SaaS applications. Furthermore, HATEOAS is essential for handling application state in such a way that enables server resources to be stateless. Therefore, REST is essential for rapid elasticity in the Cloud.

I fear, however, that many RESTafarians won’t fully understand this important conclusion, because so many of them focus on RESTful interfaces—the proverbial forest for the trees. They take a developer’s perspective rather than an architect’s. From the architect’s perspective, REST is no more or less than an architectural style for building distributed hypermedia applications. And you’re not doing REST unless you follow HATEOAS.