Monday, February 19, 2007

Types of XML Content Interoperability: Pros and Cons

In my last post, I talked about why we need XML interoperability. Now, let's talk about different strategies for implementing interoperability. We'll also discuss the pros and cons for each approach.

There is a common thread with each approach: XSLT. What makes XML remarkably flexible and resilient (and widely adopted) is its ability to transformed into so many different formats for both human and computer consumption. It's also why XML Interoperability can even be discussed.

Types of XML Interoperability

There are three basic strategies for acheiving interoperability between XML Document Standards:

  • Content Model Interoperability
  • Processing Interoperability
  • Roundtrip Interoperabilty

Each of these approaches has valid use case scenarios, and should not be dismissed out of hand. Yet, each of these approaches makes certain assumptions about the business processes, and environments that will work in some circumstances, but are less than optimal in others.


Content Model Interoperability

Content Model Interoperability is centered around enabling all or part of one standard's content model to be included as part of another standard. For example, DITA's specialization capabilities could be employed to create custom topic types for DocBook sections or refentries (in a DITA-like way). Conversely, DocBook's DTDs are designed to create customizations on top of the core content model.

In addition to customizing the DTDs (or Schemas), there is an additional step to support the new content in the standard: You need to account for these custom elements in the XSLT stylesheets - for each intended output format.

While on the surface this approach appears to be the most logical way to ensure that your content can interoperate with another standard, this is not an approach to be undertaken for the faint of heart. Working with DTD's and schemas is doable, but will require a thorough understanding of both standards before you begin. There are other limitations:

  1. This approach allows you to accept content from one standard, but doesn't allow you to share or leverage this content with other collaration partners. In effect, this approach is "shoehorning" content from one standard into yours. However, if you are dealing with receiving content from only one partner (and you aren't sharing content elsewhere), this could be a viable approach. But keep in mind...
  2. You and your partner are now both bound to a fixed version of the standards that will be sharing content. If either you or your partner decide to move to a later version of the respective standards, you may have to rework your customizations to support the new content models. You also run the risk that your legacy content won't validate against the new DTDs or schemas.
  3. Be aware that while content in different namespaces may provide "short-term" relief, it can also cause "long-term" headaches (much in the same way that Microsoft's COM architecture introduced us all to "DLL Hell"). It also means that your content must also be in a namespace (even if it is the default one).
Processing Interoperability

In this approach, content from one standard is either transformed or pre-processed into the other using XSLT. This approach is less risky in some ways compared to Content Model Interoperability: You don't have to maintain a set of DTDs to enable content interoperability, and it's whole lot easier to share the transformed content with partners once it's transformed into a single DTD.

There is a slightly different angle you you can take: You could say that you won't preprocess the content into your DTD, but instead use your XSLT stylesheets to incorporate the "foreign" content into the final output. For some cases, where you may be simply "rebranding" content, this might be a viable approach, yet keep in mind that this might mean some additional investment in incorporating other tools in your tool chain. For example, DITA and DocBook content employ very different processing models (i.e., the DITA Open Toolkit vs. the DocBook XSLT stylesheets). This may require a hefty development effort to integrate these tools properly in your environment. However, if you intend to leverage the content elsewhere in your own content, this angle can become a lot harder to implement.

For organizations sharing content back and forth, or for groups that are receiving content from one partner and are sharing it with other partners in the pipeline, this could be a reasonable approach. Still, there are potential risks here:

  1. This "uni-directional" approach is more flexible than than Content Model Interoperability, but, you still potentially have the same DTD/Schema version problem. And it only works realistically for one pair of standards, for example DocBook and DITA.
  2. If your partner begins creating content in a newer version of their DTD, you may have to upgrade your transforms to enable the content to be used by you.
  3. You still need to be well-versed in both standards to ensure each plays nicely in your environment

  4. Be prepared for dealing with validation issues. While each standard does include markup for standard content components like lists, tables, images, etc., there are structures that do not map cleanly. In these cases, you will need to make some pretty hard decisions about how they will (or will not) be marked up.


Roundtrip Interoperability

This is perhaps the most ambitious approach to creating interoperable content and encompasses being able to transform one standard into another and round trip that content back into the original standard. Like Processing Interoperability, you still have some very tricky issues to contend with:

  1. How do you handle round tripping between different versions of the standards? The net result is that you will need multiple stylesheets to support each version permutation.

  2. It's bi-directional, meaning that the round trip only works between the two standards (and with specific versions of those standards).
The following figures (taken from Scott Hudson and my presentation at DITA 2007 West) illustrate the problem:




In this example, we're only dealing with two standards, DocBook and DITA. But as you can see, there are numerous permutations that are potential round trip candidates. Now let's add another standard, like ODF






You can see that this quickly becomes a very unmanageable endeavor.



Conclusion

I've gone over three different strategies for approaching XML interoperability, situations where they work well, and some of the problems you may encounter when choosing one of these strategies. In my next post, I'll look at another approach for handling XML interoperability.

No comments: