Saturday, April 26, 2008

Metadata Interoperability

We recently started working on our Interoperability Framework again (yeah!). In the course of looking through the design, we realized that we were missing a key facet: Metadata. So we started digging through the DITA and DocBook standards to determine how we could could map the metadata content models to a common metadata markup. But the question came up about which metadata model would we use in our interop framework.

My belief is that we should use and leverage existing standards. The core interop framework is designed around this principle, so the metadata should be too. Based on that, the logical choice is to use Dublin Core. The task now is to map the metadata content models that are used by DITA and DocBook to the Dublin Core standard.

This is when I realized that each standard has in large part reinvented the wheel around metadata. Both standards have metadata semantics that are also defined in Dublin Core. Both standards also include unique metadata markup presumably designed around the unique needs of the standard, which is probably why both haven't adopted the Dublin Core standard.

DCMI has been in existence since 1995 and is actively developing and evolving the standard. It encapsulates core metadata like author and publisher in the base element set. For other metadata, Dublin Core provides a model that is extremely extensible to enable any metadata to be assigned. In fact, it's relatively trivial to integrate other metadata models like DRM.

So going back to my argument about leaky abstractions, both standards have a problem here. Out of the box, both DocBook and DITA assume taxonomies that are relevant and applicable to their models. Other metadata can be incorporated through customization or specialization. This is all well and good, except that interoperability is greatly diminished when additional "non-standard" metadata markup is included within the content model.

Perhaps it's time that both standards consider integrating Dublin Core directly as the default metadata model. Right now, both standards can integrate DCMI along side their existing metadata, but there is a certain level of redundancy. The benefit of a standard metadata model is increasingly valuable as more and more content is managed via a CMS or XML database, and as more content is designed for reuse.

No comments: