Will Binary XML Solve XML Performance Woes?

XML’s blatant inefficiency is one oft-cited downside to anything XML-based, including Web Services. Text-based, metadata-laden XML is intended both for machine processing and human readability, resulting in message sizes that can easily be 10 to 50 times larger than equivalent messages sent via binary encodings. To make matters worse, conducting a simple point-to-point exchange between XML conversant endpoints might require each of the following operations: decryption, validation, parsing, marshalling, serialization, canonicalization, document signing, and encryption. Each of these steps must be executed on a per-message basis, and as such can impose a significant load on processing machines. To make matters worse, XML traffic is content-oriented, rather than protocol-oriented. As a result, devices responsible for performing any operation on XML traffic must make decisions based upon the content of the messages, rather than the protocols that underlie those messages. All of these operations impact XML processing performance, threatening to grind such XML processing to a halt.

While XML’s verbosity and inefficiency may be acceptable for situations with moderate transaction volumes, XML’s processing overhead, storage requirements, and bandwidth consumption become quite problematic when transaction volumes are high. As a result, companies increasingly require techniques for improving the performance of such critical tasks as content-based security, policy enforcement, malformed message protection, authorization and authentication, encryption and decryption, and schema validation of XML messages.

One emerging approach for improving the performance of XML processing treats XML as a binary format — rather than the text-based format so often maligned as the root of XML’s inefficiency. Such an iconoclastic approach to XML flies in the face of the conventional wisdom about the benefits of text-based XML. Nevertheless, binary XML is gaining some traction in the marketplace, and may help solve many of the performance problems that promise to swamp tomorrow’s XML-laden networks.

Why aren’t Compression or Encoding Good Enough?
The most straightforward approach to reducing the size of XML messages is to apply data compression or encoding technologies, such as zip or base64. Since XML is a text-based format, using common binary compression formats like zip can squeeze over 90% of the volume out of XML files. However, the problem with compression is that it actually increases the amount of processing required before transmitting an XML document and again before parsing it at the receiving end.

So, compression may solve the bandwidth issue, but it worsens the processing problem. In addition, GZIP is not type-aware and does not compress large sets of floating-point numbers well. Furthermore, compression and encoding formats like zip and base64 offer an “all or nothing” approach — once a message is encoded, the recipient must decode the entire message in order to work with any part of it. However, much like compression, usage of base64 or equivalents requires a first-pass encoding step as well as a decoding step once the file is received by the end-point, so any marginal gains in network bandwidth are also lost in processing time. Finally, it’s quite likely that one might not even realize network performance gains using encoding, since encoded documents can often be much larger than their original format.

Moving and Processing XML in Binary
To resolve the limitations of all-or-nothing compression and its processing overhead, the W3C has begun the development an alternate, binary encoding of XML that promises to significantly alter the processing, bandwidth, and storage penalties that currently plague XML. This encoding uses binary, rather than text-based, means for serializing and transmitting XML information. This binary representation of XML is far more sophisticated than simply compressing the XML format into a binary form. The binary XML approach takes advantage of XML language grammar to simultaneously compress, validate, and optimize the processing of XML documents.

In this natively binary format, it’s possible to transmit whole XML documents or stream them incrementally, without sacrificing performance speed. At the same time, the encoding software validates the documents as a side effect of the binary encoding mechanism. In effect, the binary XML format is a “pre-parsed” version of an XML document that an endpoint can readily consume, without any additional decompression or validation. The endpoint can also consume just the part of the message it needs, without needing to consume or parse the entire message. As a result, endpoints can process binary-encoded documents many times faster than the equivalent text-encoded XML files, and considerably faster than any other XML compression scheme.

Another advantage of binary XML is the ability to handle data types in their native format. For example, the binary encoding represents floating point numbers so that the endpoint need not translate between strings and integers, for example, and thus impinge on processing time. As a result of the pre-parsing and binary representation, binary XML promises performance improvements several orders of magnitude over their text-based brethren.

The Downside to Binary XML
Binary XML, however, is not without its downside. The greatest challenge of any binary encoding is that all points on the communication path need to be able to not only tolerate the format, but be able to process it. While proponents often talk about how end-points can easily be configured to deal with binary XML, they often neglect the fact that intermediaries between the communicating parties often must be able to inspect and make decisions on that traffic. As a result, binary XML’s global acceptance hinges upon all security, process, management, and transformation systems or devices being able to understand and process the binary XML format. Furthermore, binary XML raises the specter of potential compatibility and vendor lock-in concerns. For example, the format chosen to represent numerical data, such as integers, floating point numbers, or arrays, must be platform independent, so that different consuming platforms are able to take advantage of the performance boost that such native formatting offers — a tall order in today’s complex, heterogeneous IT environment.

It is also not clear if solving the parsing and transmission problems of XML will truly result in significant overall performance increase. In many situations, XML processing represents only a small part of the overall processing load for a given XML message. Binary XML does not address the processing costs that result from security look-ups, semantic mapping, transformation, and other complex processing tasks, suggesting that binary XML might not be worth the trouble, since the processing bottleneck may be elsewhere.

Alternatives to Binary XML
Furthermore, there are ways to upgrade the text-based XML format itself to give it some of the benefits of binary XML. Providing indexes in XML documents can speed up the parsing of a document, and it is possible to include a binary index for search as well as provide for more lenient DOM processing that may reduce the processing required for very large documents. In addition, there are a number of optimized hardware and software approaches for improving the performance of XML parsing and processing, as ZapThink covers in its most recent report, High Performance and Appliance approaches for XML.

The ZapThink Take
But most importantly is the philosophical concern that XML is meant to be a format of interoperability, not necessarily one of efficiency. At what point does a binary “XML-like” format leave the standards behind and represent a proprietary data format? In this move towards efficiency, will the loosely coupled, implementation-agnostic XML format become a tightly coupled, proprietary implementation, putting at risk the advantages of using XML for system-to-system communication that led to its popularity in the first place?

On the other hand, XML by itself is a great technology, but you need more than just XML to do anything important. Security, reliability, process, management, and loose coupling require more than just a document format language, leading to bloat, complexity, and vendor influence on the XML format. After all, business users simply want products that provide the benefits of business agility and IT asset reuse in the face of IT heterogeneity.

Binary XML addresses the bloat of XML, but represents a movement away from the simpler roots of the language. So, who’s right? And more importantly, will binary XML gain adoption as a solution to XML’s performance challenges? At the end of the day, it is the technology consumers, and not the technology producers, that determine the viability of a technology. Binary XML offers significant benefits in particular situations, including high transaction volume environments, the exchange of large documents, and interactions burdened with limited bandwidth and limited processing capability, such as on mobile phones and PDAs. However, the downsides of limited capabilities for Service intermediaries and a somewhat vendor-dependent implementation will limit binary XML’s applicability in more fixed environments where interoperability trumps performance.

Click here to read more about this topic in ZapThink’s latest report: High Performance and Appliance approaches for XML Download File