Showing posts with label DocBook. Show all posts
Showing posts with label DocBook. Show all posts

Saturday, March 28, 2009

DocBook Going Modular

Scott Hudson, Dick Hamilton, Larry Rowland and I (AKA, “The Colorado DocBook Coalition”) recently drafted a proposal to support “modular” DocBook and presented it to the DocBook TC yesterday.  In general, this proposal is in response to huge demand for DITA-like capabilities for DocBook. 

Many core business factors are driving DocBook in this direction:
  • more distributed authoring: authors are responsible for specific content areas rather than whole manuals.  Content could be authored by many  different authors, even some in different organizations altogether.
  • content reuse: This has long been a "holy grail" of information architects:  write content once, reuse in many different contexts
  • change management:  isolate the content that has changed.  This is a key driver for companies that have localization needs.  By modularizing their content, they can drive down costs by targeting only the changed content  for translation.

Additionally, there are additional downstream opportunities for modularized content:

  • dynamic content assembly:  create "publications" on the fly using an external assembly file that identifies the sequence and hierarchy of modular components rather than creating a single canonical instance.

The following excerpts from the proposal detail the preliminary features (Important: these are not yet set in stone and are subject to change).  The final version will be delivered with the 5.1 release. 

Assemblies

The principle metaphor for Modular DocBook is the “assembly”.  An assembly defines the resources, hierarchy and relationships for a collection of DocBook components.  The <assembly> element can be the structural equivalent of any DocBook component, such as
a book, a chapter, or an article.  Here’s the proposed content model in RelaxNG Compact mode:

db.assembly =
  element assembly {
    db.info?, db.toc*, db.resources+, db.relationships*
  }

Resources

The <resources> element is high-level container that contains one or more resource objects that are managed by the <assembly>.  An <assembly> can contain 1 or more <resources> containers to allow users to organize content into logical groups based on profiling attributes.

Each <resources> element must contain 1 or more <resource> elements.

db.resources =
  
element resources {
      db.common.attributes, db.resource+
   }

Specifying Resources

The <resource> element identifies a "managed object" within the assembly. Typically, a <resource> will point to a content file that can be identified by a valid URI.  However a <resource> can also be a 'static' text value that behaves similarly to a text entity.

Every <resource> MUST have a unique ID value within the context of the entire <assembly>

db.resource =
  element resource {
    db.common.attributes,
    attribute fileref { text }?,
    attribute resid {text}?,
    text?
  }

Content-based resources can also be content fragments within a content file, similar to an URI fragment:  file.xml/#ID.

Additionally, a resource can point to another resource.  This allows users to create "master" resource that can be referenced in the current assembly, and indirectly point the underlying resource that the referenced resource identifies.

For example:

<resource
    id="master.resource" 
    fileref="errormessages.xml"/>
<resource
   id="class.not.found"
   resid="{master.resource}/#classnotfound"/>
<resource
   id="null.pointer"
   resid="{master.resource}/#nullpointer"/>

The added benefit of indirect references is that users can easily point the resource to a different content file, provided that it used the same underlying fragment ids internally.  It could also be used for creating locale-specific resources that reference the same resource id.

Text-based resources behave similarly to XML text entities.  A content-based resource can reference a resource, provided that both the text resource and the content resource are managed by the same assembly.

assembly.xml:

...
<resource id="company.name">Acme Tech, Inc.</resource>
<resource id="company.ticker">ACMT</resource>
...

file1.xml:

<para><phrase resid="company.name"/> (<phrase resid="company.ticker"/>) is a
publicly traded company...</para>

Organizing Resources into a Logical Hierarchy

The <toc> element defines the sequence and hierarchy of content-based resources that will be rendered in the final output.  It behaves in a similar fashion to a DITA map and topicrefs.  However, instead of each <tocentry> pointing to a URI, it points to a resource in the <resources> section of the assembly:

<toc>
    <tocentry linkend="file.1"/>
    <tocentry linkend="file.2">
        <tocentry linkend="file.3"/>
    </tocentry>
</toc>

<resources>
    <resource id="data.table" fileref="data.xml"/>
    <resource id="file.1" fileref="file1.en.xml"/>
    <resource id="file.2" fileref="file2.en.xml"/>
    <resource id="file.3" fileref="{data.table}/#table1"/>
</resources>

Creating Relationships Between Resources

One of the more clever aspects of DITA’s architecture is the capability to specify relationships between topics within the context of the map (and independent of the topics themselves).  The DocBook TC is currently considering several proposals that will enable resources to be related to each other within the assembly.

The Benefits of a Modular DocBook

There is a current mindset (whether it’s right or wrong is irrelevant) that DocBook markup is primarily targeted for “monolithic” manuscripts.  With this proposal, I think there many more possibilities for information architects to create new types of content: websites, true help systems, mashups, dynamically assembled content based on personalized facets (Web 2.0/3.0 capabilities), a simplified Localization strategy like that which has been advocated in DITA.

What’s more: the design makes no constraints on the type of content resources referenced in an assembly:  In fact they can be any type: sections, chapters, images, even separate books (or assemblies) to mimic DocBook’s set element.

The design takes into account existing DocBook content that currently exists as “monolithic” instances, but is flexible enough to support other applications like IMF manifests for SCORM-compliant content, making it easy to create e-Learning content.

As the first draft of the proposal, I would expect that there will be changes between now and the final spec.  Yet, the core of the proposal should remain relatively intact.  If you would like to get involved or have other ideas, let me know.  Stay tuned.

Technorati Tags: ,,
del.icio.us Tags: ,,

Monday, February 9, 2009

Implementing XML in a Recession

With the economic hard times, a lot of proposed projects that would allow companies to leverage the real advantages of XML are being shelved until economic conditions improve.  Obviously, in my position, I would love to see more companies pushing to using XML throughout the enterprise. We’ve all heard of the advantages of XML: reuse, repurposing, distributed authoring, personalized content, and so on. These are underlying returns on investment for implementing an XML solution.  The old business axiom goes, “you have to spend money to make money.”  A corollary to that might suggest that getting the advantages of XML must mean spending lots of money.

However, here’s the reality: implementing an Enterprise-wide XML strategy doesn’t have to break the bank. In fact, with numerous XML standards that are ready to use out of the box, like DITA and DocBook for publishing and XBRL for business, the cost of entry is reduced dramatically compared to a customized grammar. 

And while no standard is always a 100 percent perfect match for any organization’s business needs, at least one is likely to support at least 80 percent.  We often consult our clients to use a standard directly out of the box (or with very little customization) until they have a good “feel” of how well it works in their environment before digging into the real customization work.  Given that funding for XML projects is likely to be reduced, this is the perfect opportunity to begin integrating one of these standards into your environment, try it on for size while the economy is slow, and when the economy improves, then consider how to customize your XML content to fit your environment.

Any XML architecture must encompass the ability to create content and to deliver it, even one on a budget.  Here again, most XML authoring tools available on the market have built-in support for many of these standards, with little to no effort, you can use these authoring environments out of the box and get up to speed. 

On the delivery side, these same standards, and in many cases the authoring tools have prebuilt rendering implementations that can be tweaked to deliver high-quality content, with all of the benefits that XML offers.  In this case, you might want to spend a little more to hire an expert in XSLT.  But it doesn’t have to break the bank to make it look good.

The bottom line: A recessionary economy is a golden opportunity to introduce XML into the enterprise. In the short term, keep it simple, leverage other people’s work and industry best practices and leave your options open for when you can afford to do more.  Over time when funding returns, then you can consider adding more “bells and whistles” that will allow you to more closely align your XML strategy with your business process.

Wednesday, April 30, 2008

Interoperability Framework Mentioned in DC Lab Article

Terry Mulvihill wrote an article for Data Conversion Labs, called "DocBook versus DITA: Will the Real Standard Please Stand Up?" In the article she mentioned the Interoperability Framework. Along with that, there's a quote from Eric Severson.

Saturday, April 26, 2008

Metadata Interoperability

We recently started working on our Interoperability Framework again (yeah!). In the course of looking through the design, we realized that we were missing a key facet: Metadata. So we started digging through the DITA and DocBook standards to determine how we could could map the metadata content models to a common metadata markup. But the question came up about which metadata model would we use in our interop framework.

My belief is that we should use and leverage existing standards. The core interop framework is designed around this principle, so the metadata should be too. Based on that, the logical choice is to use Dublin Core. The task now is to map the metadata content models that are used by DITA and DocBook to the Dublin Core standard.

This is when I realized that each standard has in large part reinvented the wheel around metadata. Both standards have metadata semantics that are also defined in Dublin Core. Both standards also include unique metadata markup presumably designed around the unique needs of the standard, which is probably why both haven't adopted the Dublin Core standard.

DCMI has been in existence since 1995 and is actively developing and evolving the standard. It encapsulates core metadata like author and publisher in the base element set. For other metadata, Dublin Core provides a model that is extremely extensible to enable any metadata to be assigned. In fact, it's relatively trivial to integrate other metadata models like DRM.

So going back to my argument about leaky abstractions, both standards have a problem here. Out of the box, both DocBook and DITA assume taxonomies that are relevant and applicable to their models. Other metadata can be incorporated through customization or specialization. This is all well and good, except that interoperability is greatly diminished when additional "non-standard" metadata markup is included within the content model.

Perhaps it's time that both standards consider integrating Dublin Core directly as the default metadata model. Right now, both standards can integrate DCMI along side their existing metadata, but there is a certain level of redundancy. The benefit of a standard metadata model is increasingly valuable as more and more content is managed via a CMS or XML database, and as more content is designed for reuse.

Monday, April 14, 2008

Cool Stuff - Read Dick Hamilton's Article on The Content Wrangler

Dick Hamilton has written a very insightful and balanced article about some considerations for when to choose DITA or DocBook on the Content Wrangler (Scott Abel's site). Check it out:

http://www.thecontentwrangler.com/article/choosing_an_xml_schema_docbook_or_dita/

Sunday, April 13, 2008

Do We Need Structured Document Formats?

Eric Armstrong has posed a very interesting question about structured document markup languages. And there is a great deal of merit to his question. I want to take a look at some of his points and provide my own thoughts.

Is Markup Too Complicated?

Eric writes:


Those observations explain why structured document formats are so difficult to use: They force you to memorize the tagging structure. They require training, as a result, because it's virtually impossible for the average user to be productive without it.

The editing situation is much better with DITA (120 topic tags, plus 80 for maps) than it is with DocBook (800 tags), or even Solbook (400 tags), but it is still way more difficult than simple HTML--80 tags, many of which can nest, but few of which have to.

But even with a relatively simple format like HTML, we have manual-editing horror stories. In one instance, a title heading was created with 18 non-breaking spaces and a 21-point font. (Try putting that through your automated processor.)

If I had a nickel every time I've heard someone tell me, "I don't care about what tag I use, I just want to write my document", I could retire right now and live off the interest. There's no doubt that transitioning from traditional unstructured desktop authoring tools to structured authoring tools often causes turmoil and cognitive dissonance. Which brings up an interesting question in my mind: Are all semantic markup languages are inherently problematic?

And this where I think Eric and I have a slight difference in opinion. Eric suggests that Wikis offer an alternative to the "tag jambalaya" (my term) of markup languages. Wikis are incredibly good at enabling users to quickly create content without being encumbered by a whole lot of structure or learning curve. For content like Wikipedia, enabling users of various skills to contribute their knowledge to this resource, this makes sense.

However, if I'm writing a manual (collaboratively or not - we'll touch on this later), a reasonable amount of structure is desirable. I agree that a typical user will likely never use a majority of the tags that are built in to DITA, DocBook, or even HTML - this is the price of being an open standard: content models tend to become "bloated" with markup deemed necessary by a wide range of interests. In the past, I wrote manuals for a Unix operating system using DocBook. Of the 400 or so elements in the grammar, I only used 70 or 80 of these elements. The rest didn't apply to the subject matter. I also can't recall the last time I used the samp tag in HTML. It's there, but I don't have to use it.

Even for many of our clients, we end up creating new DITA DTD shells specifically to strip out unnecessary domains to simplify the content model. I will say that's often easier to remove what you don't need than it is to integrate something that isn't there. The new DocBook 5 schemas (developed with RelaxNG) makes it very easy to both remove unwanted elements and add new ones. The DocBook Publisher's Subcommittee schema (currently under development) removes many existing DocBook elements that aren't needed while creating a few additional elements that are relevant for publishers.

This also leads me to another question: which wiki markup? There are literally dozens of wiki markup languages out there, each a little different than the others. Where is the interoperability?

Standard structured markup languages like DocBook and DITA (and even XHTML) are essentially like contracts that state that if you follow the rules within the schema, the document can be rendered into any supported format, and the markup can be shared with others using the same schema. I can even leverage the content into other markup formats.

But where structured, semantic markup shines is in the case where business rules dictate that each DITA task topic must contain a context element (it doesn't now, but you could enforce such a rule in the DTD), or that all tables must contain a title. Unstructured markup like wikis will have a hard time enforcing that, as will HTML. But structured markup with a DTD or schema makes this very easy.

A not so ancillary point to structured semantic markup is the ability to identify that content for its intended meaning - an admonition tagged as a caution or warning is much easier to find (and reuse) than a text block (or generic HTML div or table) that starts with the word "Caution" or "Warning" despite the fact that they might be rendered the same way. And if the admonition contains more than one paragraph of text, having that containment within markup to indicate the start and end of a particular structure is very useful. This is not to mention that
from an Localization perspective, tagged semantic markup is the way to go.

Eric rightfully points out that tools like Open Office allow users to create content without knowing that the native format is a markup language. The same is true for many WYSIWYG HTML editors these days (and there's pretty cool web-based gadgets out there too!). Most users never have to see what the underlying HTML looks like. This is where we need to focus our attention. It isn't that markup languages themselves are difficult. Rather, it's that the tools that we use to create the underlying markup are perhaps too difficult for authors to use.

And the excuse we use is that going from unstructured to structured authoring means that authors have to sacrifice some of the flexibility. There's no question that this response is wearing thin, and that most authors (both professional and casual) believe that there has to be a better way.

Conditional Metadata

Eric's point about conditional metadata filtering has had some serious discussion recently on the Yahoo DITA Users Forum. And arguably, there is merit in some of the ideas presented there. Eric's point here deserves mention:


But the fact that such a thing can be done does not mean that it is necessarily desirable to do so. Experience suggests that reuse gets tricky when your environment changes--because your metadata reflects your environment. If your environment doesn't change, then your metadata is fixed. You define it, assign it and that's the end of it. The metadata tagging adds some complexity to your information, but you can live with it, and it buys you a lot.

Metadata is only meaningful when it has context. Context is this case means that there is a relationship between the content and some known "variable" - a particular audience group, an operating platform, or other target that scopes the content's applicability. Where I see churn is in the area of "filtering" content, i.e., suppressing or rendering content based metadata values. To me, this is an implementation problem rather than a design problem.

In the classic case of conditionality, overloading any markup with multiple filtering aspects purely for rendering or suppressing content can lead to serious problems, and requires special treatment and another discussion. However, if we look at metadata as a means of creating a relationship between the tagged content and specific target(s) - the potential for more targeted search and focused, dynamic content assembly expands greatly.

Transclusion and Reuse:


So maybe a really minimal transclusion-capability is all we really need for reuse. Maybe we need to transclude boilerplate sections, and that's about all.

There's no question that transclusion can be abused to the point that a document is cobbled together like Frankenstein's Monster. However, there are cases when transcluding content does make sense, and not just for boilerplate content. We're only beginning to really see the possibilities of providing users with the right amount of information, when they want it, and targeted for that user's level of detail based on metadata (see Flatirons Solutions Whitepaper:
Dynamic Content Delivery Using DITA
). Essentially, content can be assembled from a wide range of content objects (topics, sections, chapters, boilerplate, etc.). I would be reluctant to suggest that "boilerplate" or standardized content is the only form of reuse we need.

Still, Eric's question is valid - what is optimal reuse? The answer is that it depends. For some applications, standard boilerplate is right; for others the ability to transclude "approved" admonitions is necessary. And for some, transclusion of whole topics, or sections or chapters is appropriate. The point is that the information design, based on a thorough analysis of the business and its goals, along with evaluating the content will dictate the right amount of reuse.

From a collaborative and distributive authoring perspective, enabling writers to focus on their own content and assemble everything together in a cohesive manner definitely makes a great deal of sense. Wikis work well if you're dealing with collaboration on the same content, but don't really solve the problem of contributing content to a larger deliverable.

Formatting and Containment

Eric's argument is that HTML pretty much got it right because it limited required nesting and containment to lists and tables. Now if I were working with ATA or S1000D all the time, I would agree wholeheartedly. Even DocBook has some odd containment structures (mediaobject comes to mind, but there are benefits for this container that I also understand). From the point of pure simplicity and pure formatting intent, he's right. But the wheels get a little wobbly if we always assume that we're working with a serial content stream solely for format.

One area where containment makes a great deal of sense is in the area of Localization. By encapsulating new and/or changed content into logical units of information, you can realize real time savings and reduced translation costs.

Containment also makes transclusion more useful and less cumbersome. Assuming that we aren't creating Frankenstein's Monster, the ability to point to only the block of content I want, with out cutting and pasting is a distinct advantage.

Conclusion

At the heart of Eric's article, I believe, is the KISS principle. Inevitably, from a content perspective, when you boil structured document formats down to their essence, you get headings, paragraphs, lists, tables, images, and inline markup (look at the Interoperability Framework white paper that Scott Hudson and I wrote to illustrate this). So why use structured markup at all when my desktop word processor can do that right now? In my view, there are numerous reasons, some of them I've discussed here, and others like the potential for interoperability that make structured document markup languages extremely flexible and highly leverageable.

There is no doubt that today's structured markup tools don't always make it easy for users to create content without the markup peeking through the cracks. That doesn't mean that structured markup is the problem. For example, one of my web browsers won't display Scalable Vector Graphics (SVG) at all. It doesn't mean that the SVG standard is a problem, it means that I need to use a web browser that supports the standard.

Eric's article is thought-provoking and well done. It raises the level of discussion that we need to have around why we use structured content (and not because it's the coolest fad), and how we create that content. Let's keep this discussion going.

Monday, February 19, 2007

Types of XML Content Interoperability: Pros and Cons

In my last post, I talked about why we need XML interoperability. Now, let's talk about different strategies for implementing interoperability. We'll also discuss the pros and cons for each approach.

There is a common thread with each approach: XSLT. What makes XML remarkably flexible and resilient (and widely adopted) is its ability to transformed into so many different formats for both human and computer consumption. It's also why XML Interoperability can even be discussed.

Types of XML Interoperability

There are three basic strategies for acheiving interoperability between XML Document Standards:

  • Content Model Interoperability
  • Processing Interoperability
  • Roundtrip Interoperabilty

Each of these approaches has valid use case scenarios, and should not be dismissed out of hand. Yet, each of these approaches makes certain assumptions about the business processes, and environments that will work in some circumstances, but are less than optimal in others.


Content Model Interoperability

Content Model Interoperability is centered around enabling all or part of one standard's content model to be included as part of another standard. For example, DITA's specialization capabilities could be employed to create custom topic types for DocBook sections or refentries (in a DITA-like way). Conversely, DocBook's DTDs are designed to create customizations on top of the core content model.

In addition to customizing the DTDs (or Schemas), there is an additional step to support the new content in the standard: You need to account for these custom elements in the XSLT stylesheets - for each intended output format.

While on the surface this approach appears to be the most logical way to ensure that your content can interoperate with another standard, this is not an approach to be undertaken for the faint of heart. Working with DTD's and schemas is doable, but will require a thorough understanding of both standards before you begin. There are other limitations:

  1. This approach allows you to accept content from one standard, but doesn't allow you to share or leverage this content with other collaration partners. In effect, this approach is "shoehorning" content from one standard into yours. However, if you are dealing with receiving content from only one partner (and you aren't sharing content elsewhere), this could be a viable approach. But keep in mind...
  2. You and your partner are now both bound to a fixed version of the standards that will be sharing content. If either you or your partner decide to move to a later version of the respective standards, you may have to rework your customizations to support the new content models. You also run the risk that your legacy content won't validate against the new DTDs or schemas.
  3. Be aware that while content in different namespaces may provide "short-term" relief, it can also cause "long-term" headaches (much in the same way that Microsoft's COM architecture introduced us all to "DLL Hell"). It also means that your content must also be in a namespace (even if it is the default one).
Processing Interoperability

In this approach, content from one standard is either transformed or pre-processed into the other using XSLT. This approach is less risky in some ways compared to Content Model Interoperability: You don't have to maintain a set of DTDs to enable content interoperability, and it's whole lot easier to share the transformed content with partners once it's transformed into a single DTD.

There is a slightly different angle you you can take: You could say that you won't preprocess the content into your DTD, but instead use your XSLT stylesheets to incorporate the "foreign" content into the final output. For some cases, where you may be simply "rebranding" content, this might be a viable approach, yet keep in mind that this might mean some additional investment in incorporating other tools in your tool chain. For example, DITA and DocBook content employ very different processing models (i.e., the DITA Open Toolkit vs. the DocBook XSLT stylesheets). This may require a hefty development effort to integrate these tools properly in your environment. However, if you intend to leverage the content elsewhere in your own content, this angle can become a lot harder to implement.

For organizations sharing content back and forth, or for groups that are receiving content from one partner and are sharing it with other partners in the pipeline, this could be a reasonable approach. Still, there are potential risks here:

  1. This "uni-directional" approach is more flexible than than Content Model Interoperability, but, you still potentially have the same DTD/Schema version problem. And it only works realistically for one pair of standards, for example DocBook and DITA.
  2. If your partner begins creating content in a newer version of their DTD, you may have to upgrade your transforms to enable the content to be used by you.
  3. You still need to be well-versed in both standards to ensure each plays nicely in your environment

  4. Be prepared for dealing with validation issues. While each standard does include markup for standard content components like lists, tables, images, etc., there are structures that do not map cleanly. In these cases, you will need to make some pretty hard decisions about how they will (or will not) be marked up.


Roundtrip Interoperability

This is perhaps the most ambitious approach to creating interoperable content and encompasses being able to transform one standard into another and round trip that content back into the original standard. Like Processing Interoperability, you still have some very tricky issues to contend with:

  1. How do you handle round tripping between different versions of the standards? The net result is that you will need multiple stylesheets to support each version permutation.

  2. It's bi-directional, meaning that the round trip only works between the two standards (and with specific versions of those standards).
The following figures (taken from Scott Hudson and my presentation at DITA 2007 West) illustrate the problem:




In this example, we're only dealing with two standards, DocBook and DITA. But as you can see, there are numerous permutations that are potential round trip candidates. Now let's add another standard, like ODF






You can see that this quickly becomes a very unmanageable endeavor.



Conclusion

I've gone over three different strategies for approaching XML interoperability, situations where they work well, and some of the problems you may encounter when choosing one of these strategies. In my next post, I'll look at another approach for handling XML interoperability.

Friday, February 2, 2007

XSLT 2.0 is Fantastic, but there are some hurdles

When XSLT 1.0 became a W3C Recommendation back in 2001, I thought it was the coolest thing out there. Oh the things I could do with XML+XSLT 1.0+XalanSaxon! Later on, when I wanted to do things like grouping and outputting to more than one result file, I realized this wasn't built in. Even now, I can't fully wrap my head around the Meunchian Method for grouping; and for outputting multiple result files, I had to rely on XSLT extensions to support this. This meant that my stylesheets now were bound to a particular XSLT processor. This completely sent shivers up my spine - The whole idea behind XSLT in my (perhaps idealistic, naive) view was that you should be able to take an XML file and any compliant XSLT engine to create an output result (set). Still, despite the warts and shortcomings, XSLT 1.0 proved to be a faithful companion to my XML content.

Enter XSLT 2.0. In so many ways it is so much better than its predecessor! Built-in grouping functionaly, multiple output result documents were now part of the specification! Huzzah!

But wait! There's more! In-memory DOMs (Very nice!), Functions (Very handy), XQuery, XPath 2.0, unstructured-text processing (very handy for things like embedding CSS stylesheets, processing CSV files), better string manipulation functions, including Regex processing. This is just a taste of things in the latest version.

It just became a WC3 Recommendation (along with XQuery and XPath 2.0) last month. Yeah! Finally!

Still, this latest version has major obstacles to overcome before it can enjoy widespread adoption. There's only one notable XSLT 2.0 compliant engine: Saxon 8 by Dr. Michael Kay. It is developed in Java, but there is a .NET port (via the IKVM Libraries).

Not that I have anything against Saxon. It is outstanding. Yet where is Xalan? MSXSL? Why haven't they come to party? Scouring the blogs and mailing lists, there doesn't appear to be activity on Xalan toward an XSLT 2.0 implementation. Microsoft's current priority is XLinq, and has decided that it will support XQuery, but not XSLT 2.0 or XPath 2.0.

Microsoft's decision not to implement XSLT 2.0 and XPath 2.0 could have an unfortunate effect on adoption of these standards. While XQuery is extremely powerful (and wicked fast) and can do all the things that XSLT can do, I wouldn't necessarily recommend trying to create XQuery scripts to transform a DocBook XML instance (the XSLT is already complex enough).

I would rather write matches against the appropriate template than attempt to write a long complex set of switch cases to handle the complex content model. That said, it could be done, but it won't be a trivial task.

XSLT 2.0 is amazingly powerful with many of the features that were lacking in the 1.0 Recommendation. In fact, for the DocStandards Interop Framework intends to use XSLT 2.0 to take advantage of many of these new features to support different things like generating topic maps or bookmaps from the interchange format. Looks like Saxon will be the de facto engine of choice, though not a hard choice to make.

Go to DITA West, Young Man

My colleague, Scott Hudson and I are presenting a paper at the DITA 2007 West Conference in San Jose, February 5-7. I am very excited about this.

The thesis of the paper focuses on proposing a DocStandards Interoperability Framework to enable various document markup languages like (but not limited to) DocBook, DITA, and ODF to share and leverage content by using an interchange format that each standard can write to and read from.

There are several advantages to this approach:
  • It doesn't impede future development for any standard, since the interchange is a "neutral" format. This means that new versions of a document markup standard can leverage content from earlier versions
  • Since it is neutral, it can potentially be used by virtually any document markup standard

This work stems from Scott's and my involvement in the DocStandards Interoperability List, an OASIS forum. We're hoping to spark interest in the XML community to push this along and create a new OASIS Technical Committee for DocStandards Interoperability.

We're still in the process of editing the whitepaper, which will be posted on Flatirons Solutions' website in the near future.