Should Services be Stateful?
A while back I saw the movie The Terminal, in which a character played by Tom Hanks was forced to live in an airport because he was a “stateless citizen” — his citizenship status was unrecognized due to his homeland country collapsing and disappearing from existence while en route to the US. Of course, I thought that this was just a comic fantasy created by script writers who needed some new plot device to capture audience attention. But in fact, the movie is based in part on a real-life story of Merhan Karimi Nasseri, a man who spent almost twenty years in the Charles de Gaulle Airport in Paris, because of immigration problems with France and his unwillingness to return to his homeland in Iran. Now, this doesn’t have anything to do with Service-orientation per se, but this reminded me of a question that someone asked at a recent conference on SOA: should Services be stateless?
Of all the core principles of Service-Oriented Architecture (SOA), perhaps the most fundamental is the loose coupling between Service providers and consumers. The core agility and reuse benefits of SOA depend upon the ability of Service providers and consumers to operate independently of each other within the context of the Service contract that specifies such interactions. Furthermore, it’s essential to proper SOA that we don’t impose unnecessary requirements or restrictions on the behavior or ability to for either Service providers and consumers to change over time.
Essential to this requirement of loose coupling is the principle of statelessness. A Service, after all, exposes an interface to software that communicates by sending and receiving messages. There is no notion of a Service instance analogous to object instances; instead, the Service simply stays put, exchanging messages as per its contract. More specifically, since SOA involves collaborations of independent entities through Service composition, we are only dealing with passing messages and data from one Service to the next. These loosely-coupled, heterogeneous, and composite applications require that all Services should be stateless such that they don’t impose their knowledge of a system or process unnecessarily on other parties, thus tightening their level of coupling, reducing their ability for reuse, and limiting broad applicability.
However, the primary reason for composing Services is to implement business processes, and processes must be stateful, since after all, one instance of a process may be in a different state than another. The challenge of how best to maintain process state in an environment of stateless Services becomes a critical issue for architects planning and implementing SOA. Implement process state improperly, and the loose coupling benefits of stateless Services, and hence the SOA itself, are at risk.
Maintaining State in SOA
Wikipedia defines state this way: a state stores some information about the past, such as a customer record or purchase transaction, and in general reflects the total sum of all the changes a system has been through from inception to present. On top of this idea of state, many of the other concepts of computing follow: transitions from one state to another and actions that describe the conditions for those transitions to occur. State is especially important to processes because without it, we in effect have a system with no history or record. Indeed, most of the systems that we have developed are either responsible for storing state in some form or another or acting upon that state. Furthermore, companies depend on state in order to back out of transactions and situations that result in some error. In essence, without state, we can’t accomplish any sort of reliable transaction because there would be no way to undo the things we have done.
So, given that state is so core to the way that IT systems operate, wouldn’t it make sense that the representation or maintenance of state would have some role within an SOA? The key to answering that question is understanding whether state is something in a loosely coupled environment that should be shared or isolated within a Service implementation.
Does Loose Coupling Require Statelessness?
What makes a Service-oriented Architecture effective is its loosely coupled nature. That is, an SOA facilitates agility and enables reuse in as much as each Service operates independently of others such that assumptions don’t impose unnecessary requirements or restrictions on the behavior or ability to change of Service consumers. In that light, it would seem to make sense that any notion of state for a particular Service activity should be maintained internally within the Service and not exposed to Service consumers, lest changes need to be made to the state, thus requiring changes to all Service consumers. More specifically, since SOA involves collaborations of independent entities through Service composition, we are only dealing with passing messages and data from one Service to the next, and as a result, the side effect of this sort of interaction is a system in which all Services maintain their own state.
Yet, even in the context of SOA, therefore, we must maintain state outside of each independent Service implementation in order to achieve long-running processes that span organizational or company boundaries. In addition, there must be some representation of state as well for most security, governance, and quality processes which maintain some context across multiple Services.
Maintaining Stateful Systems with Stateless Services
The stateless Web presented the very same problem in which Web sites had to maintain some notion of a session across individual Web queries, each of which maintained no state on their own. There were basically two approaches to maintaining state on the Web: store a cookie on the browser that maintained state across multiple interactions, or track state somehow on the server side, using ordinary web protocol exchanges for maintenance of a session.
Cookies only worked to maintain state on the Web because they were a feature of HTTP, the underlying transport protocol of the Web, and every browser by definition supported HTTP. In the case of Services, however, we’re allowing for arbitrary system-to-system communication, with no expectation that consumers will all be browsers, or in general, support any particular protocol. That leaves the message itself as the only place we can maintain state in the context of SOA.
In essence, it’s possible to maintain state across multiple Service invocations by applying some sort of persistent token to each Service message. This token would represent a persistent state across multiple Service interactions, and well-behaved Services will simply pass on that token to other Services in a composition without needlessly modifying or removing it, as per the contracts that govern those Services. In this manner, individual Services remain stateless, but the messages can maintain the state that particular Service compositions require.
This message-based approach to maintaining state begs the question as to how to manage the processes that the Service compositions represent. Traditional Business Process Management (BPM) tools utilize a runtime composition engine that maintains state on behalf of all running processes. The advantage to this approach is that it provides visibility into running processes and maintains state across relevant Services.
However, this approach has some critical issues: a central process execution environment can only maintain state for Services and compositions that the server has visibility into. Once a Service request passes outside the boundaries of the system, the process tool can no longer control the process. Secondly, the robustness of the processes depends upon the robustness of the process tool — if the tool crashes and loses state information, then there is no way to recover process instances in progress. But perhaps most significantly, a centralized process execution environment reduces loose coupling, because all Service providers and consumers must defer control over the processes to the centralized tool.
Furthermore, what we’ve done here is separate traditional BPM from Service-oriented BPM based on the fundamental issue of state management. One way of understanding the difference is to try the “unplug the server” thought exercise. What happens to running process instances when a centralized BPM tool goes down? Since the centralized tool controlled all process logic, including state logic, bringing down the tool typically hoses all running processes, often in an unrecoverable way.
The answer to these problems is to maintain process state essentially in a Service-oriented manner. In other words, offer state management via contracted Services whose purpose is to maintain state for process instances. In essence, this approach uses messages as events that the state maintenance Services can audit, log, and later analyze to determine some given state. This approach basically considers state to be an implicit side effect of a running system rather than something that a runtime process environment must persist. This event-driven, Service-oriented approach tracks all relevant events, and a separate set of Services analyze the event stream and perform specific actions based on process requirements, policies, and Service contracts.
Now, what happens when a process management Service goes down? Since the messages that the Service providers and consumers are exchanging contain the persistent token that represents the process state, no process information is lost. Instead, messages to the process management Service should simply queue up, waiting for it to come back online. Once it does recover, it can continue executing the process logic where it left off, since the queued messages will tell it everything it needs to know about current process state.
The ZapThink Take
From the architects’ perspective, it’s essential to think about state in a Service-oriented way. Services don’t maintain state unless they are specifically contracted to do so in the case when they are state management Services, and in that case, they are managing state for running processes that are external to the state management Service itself. In no instance do we have a Service manage its own state, because a Service consumer would have to know the internal state of the Service in order to determine whether to send a message to that Service. That situation would break the loose coupling and encapsulation principles of SOA.
Effective enterprise architects realize that, like with all issues relating to SOA, good design hinges on compromise. That is, while statelessness is a desired virtue of independent Services to achieve agility through loose-coupling, statefulness is a desired virtue of business processes to achieve the business’ desired outcomes, and the trick therefore is balancing and meeting both of these needs in a Service-oriented manner.