Semantic Integration: Loosely Coupling the Meaning of Data

ZapThink recently had a number of interesting conversations with vendors and end users representing a wide array of interests, implementations, and solution categories. Despite the differences in their approaches, they all expressed the same profound belief that it was “data” (the information we intend to understand and use), rather than “applications” (the systems that process that information) that are the lifeblood of any organization. In many ways, they are right – information forms the basic foundation and reason for being for IT. However, these two concepts – data and applications- are inextricably linked. Data by themselves are often inaccessible and unintelligible without the applications that process them, and applications serve no usable purpose without data. As such, as we continue to delve into the transformative role that Web Services and Service-Oriented Architectures (SOAs) are playing in the enterprise, the question of how the nature of data will evolve continues to surface. Specifically, how will the fact that Web Services-based SOAs abstract the source of information change the way we deal with that information?

The unifying notion in all of ZapThink’s research is the concept that standards-based, loosely coupled, distributed computing will bring a sea change to the way that companies architect, build, deploy, and manage their IT infrastructures. Our fundamental assumption is that standards lower the barriers to interoperability by smoothing the differences among different data representations. Furthermore, loose coupling is a discipline that requires application providers to isolate their specific implementations from the consumers of the information their applications provide. Embedded in the concepts of standards-based, loosely coupled computing is the idea that users won’t be required — or even able — to know if the data they are consuming originated in a database, an enterprise application, a file system, another company, or anywhere else for that matter. In fact, in a Web Services-based SOA, the data users consume are entirely decoupled from the source of the data.

The promise of ubiquitously accessible data freed from their sources is intoxicating for companies as they struggle with integration challenges, and is a significant driver for XML, Web Services, and SOAs. However, since Web Services abstract the source of data, there is a pending collision between companies traditionally focused on data integration tasks (the Extract-Transform-Load [ETL] and Enterprise Information Integration [EII] markets), those focused on data transport and messaging tasks (the message bus, Enterprise Service Bus [ESB], and B2B integration markets), and those vendors focused on integrating with application interfaces (the Enterprise Application Integration [EAI] market). Service-Oriented Integration (SOI) hides the differences among these different integration approaches. As SOI removes the barriers to data and application integration we are left with one significant challenge: semantic integration.

The Re-emergence of the Role of Data
What makes semantic integration a challenge is two-fold: first, the representation of information and the information itself are often bound tightly together; and second, that information frequently lacks context. Developers often think not of the data itself but rather the structure of those data: schemas, data types, relational database constructs, file formats, and so forth – structures that don’t pertain directly to the information at hand, but rather our assumption of what the data should look like. In tightly-coupled architectures, data structures are absolutely necessary, since they provide systems a way of coping with the information they are being fed.

However, in a standards-based, loosely coupled architecture, when the barriers to application integration are removed, instead of being helpful constructs, these various data structure representations actually get in the way. How information is stored and represented interferes with the meaning of that information. To be more precise, the meaning of information and the structure of that information aren’t one and the same. For example, “August 7, 2003” is a date for sure, but whether or not it is stored as a string, date type, or integer shouldn’t matter. Yet, developers often needlessly combine the structure and meaning together inextricably. Furthermore, there isn’t enough context in the structure to understand if the date is a birth date, the date this ZapFlash was written, or any other date.

Thus, when one developer’s assumption of a particular structure for some datum conflicts with another’s representation, you get an impedance mismatch – in other words, a data integration problem. In order for data to flow unimpeded in a Service-Oriented Architecture, Service providers must isolate requesters from the underlying data structure assumptions. The issue here is therefore one of loose coupling. While we might loosely couple application interfaces through the use of SOAs, if we deal with data the same way we’ve always done – by imposing the data structures of Service providers on Service requesters, the result is every bit as tightly coupled as previous architectural approaches. In order to provide the promise of seamless data integration, we must transcend simply loosely coupling the application interface and in addition provide loose coupling at the semantic level.

The Data View and Zipping up a Notch on the Integration Zipper
The practice of SOA includes different ways of looking at the problems that architecture solves, known as “views.” In the information view, an information architect focuses on the meaning of the information that moves through the company, who is responsible for it, and what people do with it. The work in this view includes identifying how information is created, transported, secured, stored, and destroyed. The data architect takes the data view, in which he or she focuses on the taxonomies that the company will use. The bulk of the work in this view consists of normalizing the various vocabularies across the enterprise, and understanding and delineating just what data the company wants to use. The end-product of the activities in this view are the schemas and namespaces that the business processes and the business services they contain will reference.

So, here’s where data and Services fit together. In an SOA, the Service Model serves as the referee between the business requirements on the one hand, and the technical implementations on the other. And, therefore, the Services represented in the Service Model must provide the layer of abstraction between the data representations on the one hand and how they are consumed by business-level Services on the other. However, in today’s early SOA implementations, companies often implement static service definitions, which means that the Web Services’ interface contracts are set at design time. While UDDI and Service-Oriented Management provide the means for dynamic discovery of such Services, those Services are still essentially static.

In order to achieve the sort of semantic data integration we are seeking, we must implement dynamic service definitions. In essence, the definition of the Service interface must change based on the context of the Service requester. As a result, a Service can change its contract between invocations. For example, the fact that a Service provider requires first names to be no longer than 40 characters should not require the requester to know that fact. The contracted Service interface is supposed to provide that isolation. Service interfaces must therefore become much smarter. Instead of having to know ahead of time what specific data requirements are needed by a Service, the Service requester should be able to dynamically discover a Service interface that can not only provide the needed functionality, but also understand the information payload.

In order to follow this “Just-in-time” integration style, for Service requesters to be able to consume data in an SOA, the data must be decoupled from any specific technical assumption (such as a specific data schema or format) so that they can be accessed via discoverable, loosely coupled, dynamically bindable Services. Now this requirement doesn’t mean that the data shouldn’t have any structure at all, it just means that the Service interface hides the details of that structure from the user, and the Service interface itself is dynamically created based on the context of the Service requester. Sound complicated? Well, it is. Fortunately, we have a bit of a head start: XML provides the technical means to isolate the specifics of a data format from the consumer of the data. Web Services in turn provide the means to discover and understand how to consume the data.

However, while an SOA simplifies the aspect of integrating with data sources (our first challenge mentioned above), it does nothing to smooth out the semantic issues that occur when users with different contexts try to understand the data that flow through these Services. To do that, we must first take ourselves up a few notches on the Integration Zipper, as shown in the figure below:

ZapThink’s Integration Zipper is a visual metaphor that asserts that as we solve lower-level integration issues, we are faced with more challenging, higher-level integration issues that can only be tackled when the lower-level issues are resolved. Fortunately, solving lower-level integration issues makes solving the higher-level ones both easier and more cost-effective. In this case, once we solve the problems with exchanging data between applications – application integration — we must then solve the integration challenge of understanding those data – in other words, semantic integration.

Since we now know how to address the challenge of integrating with application interfaces by using Web Services and SOAs, we can begin to solve our semantic integration challenges by implementing Services that provide the basics of semantic integration such as data transformation, classification of data into categories, and encoding of unstructured data with metadata. We can therefore envision an IT universe where data are accessible through Services that in turn provide key pieces of functionality that resolve the semantic differences between systems.

Solving the Data Integration Challenge
Applying SOA to solve integration of application functionality doesn’t make all the problems of integration go away – it simply brings the issue of semantic integration to the surface. Companies should be thinking of semantic integration in the context of SOA – using the techniques of encapsulation and loose coupling to make data flow seamlessly. But, while this sounds good in theory, the truth is that these problems don’t have simple solutions today. There are still significant barriers to be overcome for semantic integration. Fortunately, while Web Services-based SOAs don’t provide the answers to the problems of semantic integration, they at least enable us to ask the right questions.