Whatever Happened to XML Schemas?

Early in the growth of XML as a data format, even before the widespread adoption of Web Services, one of the most popular and heated debates was on how best to represent the structure and syntax of data in an XML document. Commonly known as XML schemas, a wide range of proposals emerged for how to best indicate which elements were required in an XML document, as well as the nature, repetition, and hierarchy of those elements. The goal of these formats was simple: provide an easy way of defining the requirements of an XML document, and then validating those documents against those requirements so that two unrelated parties can reliably exchange and process XML documents.

Most of the XML schema proposals hinged on the general assessment that the traditional way of detailing schema, the Document Type Definition (DTD), was too arcane, limiting, and cumbersome to use. After much hassle, it seemed that the W3C XML Schema (informally called WXS) was the approach that won out. But since the declaration of WXS as the winning standard, the conversation about the importance of XML schemas has died down, begging the question of the importance of XML schemas or even their necessity in system-to-system interchange. Basically, why are people not talking about XML schemas as much anymore? Is it because schemas are no longer important, or is it simply that we’ve moved on to more complex issues now?

Validating the XML Payload: Is it Necessary?
There are really two parts to answering the question about the relevance of XML schemas: their importance to the data that companies exchange and care about (the message payload), and their relevance to the interfaces to access the data (specifically, Web Services). Remarkably, each application of XML schemas is leading to different patterns of adoption and value to enterprise users.

When applying XML schemas to message payloads, companies use XML schemas as a way of verifying that incoming XML documents are valid enough to be processed by their automated systems. Clearly, such validation is important because systems aren’t capable of coping with data that are not properly structured or that they can’t easily parse and understand. The XML syntax of schema languages like those specified in the WXS is meant for machines to easily consume, parse, and transform messages that follow that syntax, by using standard XML technologies.

Yet, the XML basis of these schema languages are also their downfall, because XML is verbose: it consumes considerable processing power, bandwidth, and storage space. In addition, developing an XML schema in an XML language is an incredibly complex proposition. While other formats, such as the REgular LAnguage description for XML Next Generation (RELAX NG) Compact Syntax, emerged to solve the development difficulties of WXS, creating and debugging XML schema documents still requires significant experience and training.

However, as we detailed in an earlier ZapFlash, evidence shows that most companies are not even making use of the basic validation benefits of XML schemas. Instead, companies that are using schemas at all are using them only during the testing and debugging phases of their projects, and turning off runtime validation during the production phase. Why? Because validating XML documents every time they are parsed eats up too much processing power, developing XML schemas is too complex for iterative, cross-organization implementations, and many validity issues (such as whether or not a requested item is in a given database) can’t be resolved within a single document anyway. Given these serious implementation challenges, it’s not entirely clear that XML schemas are even necessary for handling basic document validity issues. That role might more clearly fall to the Service providing the XML document itself. Given that, how can XML schemas be used not just to validate the payload, but provide added value to a Service interface?

XML Schemas: Necessary for the Service Contract
If XML schemas are not being used by most companies to make sure that data are provided in a format that their systems can handle, what will be the mechanism companies will use to guarantee validity, interoperability, and semantic integration? How about the Service interface itself, specified in a Web Services Description Format (WSDL) document, and all the other pieces of Web Services technology that define the Service contract?

The creators of WSDL smartly realized that they needed to provide a way to represent what Services can expect as inputs and outputs (what you might call the contract semantics) in a way that systems could automatically process. However, it is difficult to make any strict data typing assumptions in the Service interface without jeopardizing the loose coupling of the interface. To put it simply, if one Service provider demands that a given input be a string of a certain size, and therefore all Service consumers comply with that demand, when the Service interface changes, all the consumers break. Such behavior is clearly not loosely coupled. Thus, the developers of WSDL perceived XML schemas as a way to share Service semantics without tightly coupling the requirements into the Service implementations.

As a result, WSDL does not introduce a new type definition language, but rather supports WXS as its canonical type system. The WSDL creators also aimed to dodge the bullet of recommending a single XML schema format, thus allowing for the use of other type definition languages via the extensibility of XML. Through the use of XML schemas, WSDL can allow Services to utilize a wide range of primitive types (such as strings, integers, and other data types) as well as complex structures such as enumerations and regular expressions, lists, unions, extension and restriction of complex types, and more. Since the validation of the interface happens only upon Service discovery and binding, there is little overhead in the actual transactions themselves. Validation thus happens upon Service negotiation, and as long as the Service requester complies with the requirements of the Service interface, there is no need to validate every single message that flows through the interface.

However, there is a more important reason to consider the use of XML schemas at the Service interface. Too many developers are thinking about building Services from the bottom-up. As ZapThink has discussed numerous times, to gain all the advantages of a Service-Oriented Architecture, namely loose coupling, coarse granularity, and asynchrony, you must conceive of Services separate from the technologies that implement them. Basically, you can’t be thinking of Java’s data types when you create a Service interface. Instead, there must be some data typing system that exists separate from the implementations that underlie the Services.

XML schemas are the answer to this loose coupling need. By thinking of Service interfaces in terms of schema data types, you can escape the pitfall of only including those data types that a given implementation can support. Many features of XML schema, such as lists, unions, disjunctions and restricted complex types are not present in many programming languages, but might be a requirement of a given Service. If IT shouldn’t be an impediment to meeting the needs of business requirements, why should the data types of some arcane language? Starting from the WSDL Service definition, Service producers and consumers can share and enforce the same vision of message content, using the robust typing of XML schema.

Validation at the Service Interface
Increasingly, ZapThink is seeing that the Service interface is performing a greater number of functions. At first, this interface simply provided the details that Service requesters can use to identify and bind to Service functionality. Now, however, the Service contract takes on a wider variety of functions. Validation of the business payload is one of those functions that the Service interface can handle. Likewise, security, management, process, and quality-of-service decisions and actions can also happen at the Service interface. But, in order to handle all these needs, users need a means to specify the requirements of the Service interface in an application-neutral language. And so, while XML schemas may no longer be the center of our daily discussion, they still remains the center of how we can achieve the basic goals of loosely coupled, asynchronous, and coarse-grained integration.

AGiLiENCE-122002-ZTZN-1130-1