Thursday, February 8, 2007

A "Sociology" of XML Languages

In a perfect world, XML content could easily be shared between different users and organizations because everyone would be sharing the same markup and semantics. Information interchange could be seamless; content could be repurposed and reused with minimal effort between different functional teams; XML processing tools could be optimized.

Yet, there are numerous reasons why we see so many different XML grammars used by different organizations. I'll focus on two of these briefly:
  1. Organizational Dynamics
  2. Multiple XML Standards

Organizational Dynamics

I never thought that my background in Sociology would ever be useful, but it certainly is applicable here (I'm very rusty since I haven't consciously thought about this subject in over 10 years): Looking back at the works of Emile Durkheim, Max Weber, and Frederick Winslow Taylor, we see that organizations are structured around distinct divisions of labor to enable individuals to specialize their skills and work on discrete aspects of the "production process" (keep in mind that most of these theories during the peak of the Industrial Revolution).

What's more interesting here is how groups are organized. And in part, this shaped by many different factors including the industry vertical, the size of the organization, the relationships with other organizations. There is plenty of literature about these subjects that I won't delve into them here.

The key takeaway is that all these factors have a direct effect on the organization's processes, meaning that for information development groups (Tech Pubs, Training, etc.), this affects how information (content) is created, managed, and distributed.

Through an organization's processes, there is interesting side-effect on language. Organizations create their own vocabulary to express, and even rationalize their processes (there are other implications of this like "group identification" at work here too). For example, there is an often quoted line from the movie, Office Space, "Did you get the memo about the TPS Reports?" Even within the same industry where there are common terms (like "GUI" or "Menu" in software), there are distinct "dialects" that evolve over time, much in the same way that there are different Spanish dialects: A spanish speaker from Spain could probably converse with a spanish speaker from Argentina, but there might be word or phrases that aren't understood.

And this manifests itself in the XML syntax adopted by these organizations used to create content. A logical strategy for these organizations is to adopt known XML standards like DocBook or DITA that fit their organization's process the closest, and modify these standards to incorporate words or phrases of their own into their XML syntax.

Multiple XML Standards

One of the incredibly powerful aspects of XML is its ability to evolve over time to support different syntaxes. The unintended consequences, however, is that we now have several well known XML Document standards like DocBook and DITA. While they are different architecturally, and to some extent, semantically, they're both targeted at virtually the same audience (information developers), produce the same kinds of output formats (PDF, HTML Help, HTML, Java Help), and probably more important, contain the same kinds of structural components (paragraphs, lists, tables, images, formatting markup), albeit using different element names ("A rose by any other name would smell as sweet" - Romeo and Juliet).

Yet, by having multiple standards, it can create an "informational impasse," where the DTDs get in the way of sharing content across organizations. And for many organizations, this is a real problem. Across all industry verticals, we're seeing news forms of collaboration and partnering across companies (and organizations), along with consolidation (mergers and acquisitions). And from an XML content perspective, the question is, "My content is in 'X' and my partners' content is in 'Y' and 'Z'. How do I reconcile these disparate document types?"

And therein lies the need for a Doc Standards Interoperability Framework, which I will describe in future posts.

No comments: