Sunday, May 10, 2009

DITA’s New Keys and the 80/20 Rule

Have you ever used the lorem() function in Microsoft Word? How about the rand() function? Do you know all the function keys? Most of us have used Microsoft Word for countless years and don’t know about all of the “hidden” functionality that it offers. Chances are, you’ll know a few of these, but you won’t know all of them simply because you’ve never needed them. Many of these functions are extremely powerful utilities that make Word a versatile application beyond a standard formatted text editor. But they’re available if you ever have the need.

The same is true with some of the new functionality being made available in the forthcoming DITA 1.2 release currently being worked on. Of particular interest is the introduction of keys. Keys provide a way for authors to create addresses to resources through the use of a named identifier rather than to a specific URI pointer. In other words, I can create an easy-to-remember key, like “ms-word-functions” that actually resolves to a URL “http://support.microsoft.com/kb/211982” and link to this URL using the key name in my DITA topic.

Here’s an example of how it works. In my map, I define a topicref and set the keys attribute with an identifier. I also set my href to the physical location of the resource I want to reference.

<map>
    <topicref keys="ms-word-functions" 
        href="
http://support.microsoft.com/kb/211982"
        scope="external"/>
</map>

In my topic file, I can reference the key that's defined in my map:

<topic id="my.topic">
    <title>SampleTopic</title>
    <body>
        <p>
            Lorem ipsum dolor sitamet,  
            consectetuer adipiscing elit. Maecenas
            porttitor conguemassa. Fusce posuere, agna 
            sed pulvinar ultricies, purus
<xref keyref="ms-word-function">lectus</xref>
            malesuada libero, sit amet commodo magna eros
            quisurna.
        </p>
   </body>
</topic>

Now, when the topic is rendered, it will resolve itself to the Microsoft URL defined in my map. Pretty cool stuff. And powerful too. This has many potential uses: localizers can create translated versions of a resource using the same key reference and resolve the link to a locale-specific version of the reference. Consumers can be directed to different resources based on their profile or context within a website.

From an authoring perspective, there's another neat user story: I can reference a "yet-to-be-determined" resource via a key, and when that resource has been created, the key's definition in the map file will resolve the key reference.

Technically, a key definition doesn't need to be reside directly in the map that references that topic. It can live in an "ancestor" map that pulls in the topic indirectly by way of the map referencing that topic. In fact key values can be overridden: Let's assume that I define a key, called "company-website" in Map A that points to "www.company-a.com", and in Map B, I define the same key as"www.company-b.com". Map B also references Topic-1.dita which contains a keyref to "company-website". Map A references Map B. When the Topic-1.dita is rendered in the context of Map B as the primary map, the keyref will resolve to"www.company-b.com"; when Map A is the primary map, the same topic willreference www.company-a.com.

  • Map A
    Key: company-website = "www.company-a.com"
    • Map B
      Key: company-website = “www.company-b.com”
      • Topic-1.dita
        keyref: company-website
        resolves to: www.company-a.com
  • Map B
    Key: company-website = “www.company-b.com”
    • Topic-1.dita
      keyref: company-website
      resolves to: www.company-b.com

With all great power comes even greater responsibility. Any time a topic makes use of a key reference, that topic is explicitly binding itself to a map(or many maps), meaning that a topic is no longer a unit of information that is completely independent of any particular context in which it is assembled into. You could make the argument that any reference defined in a topic to an external resource (e.g., an image or a cross-reference to another topic) by definition creates a dependency on that topic. And arguably, the referenced (the endpoint) resource is unaware of the object that is referencing it, regardless of whether it's a topic reference or a cross-reference. But there is an additional dependency in the case of keys: Any map that references a topic with a key reference must define the key. So in a sense, not only does the map (or an ancestor map) need to know about the topic, it needs to discover what the topic is about, specifically related to any key references it points to. Consequently, somewhere along the line, at least one map must define the keys used by a topic.  Did you get all that?  Imagine what your XML authoring tools, CMS systems, and rendering platforms will need to do to manage this.

This is pretty sophisticated and powerful functionality.  But the question is, do you need to use keys and keyrefs in order to use DITA?  More importantly, will your tools need to support keys to take advantage of DITA's other capabilities?  The short answer is no.  In fact, I would expect that keys/keyref -enabled DITA support is still a way off for most DITA-enabled tools.  Nevertheless, you can still use DITA with the current tools and get most, if not all, of what you need.  Just like Microsoft Word with features like MailMerge, keys and keyref will be there if you need them, but chances are, you can get by without them for most content without ever knowing you missed it.

Finally,  the possibility of defining indirect links has opens the door to many different possibilities for dynamically driven, profile- and locale-specific content.  This is very cool stuff - the kind of thing XML folks like me get excited about.  But from a practical standpoint, there are potential downsides too.  Keys and key references add another layer of complexity to planning the authoring, deployment and management of DITA content.  In reality, most tools aren't ready for this complexity just yet.  So while the the standard is ahead of the game, the rest of the industry will be playing catch up.  Still, Ride the Wave. 

Wednesday, April 29, 2009

Content Management Strategies/DITA North America Conference Review

I wasn’t able to attend many of the session since I was manning the Flatirons Solution booth.  Yet from talking with the attendees who visited with us, here are some of the key takeaways:

  • DITA is here to stay. This is not news, but the key point here is that organizations are adopting the standard in earnest, as evidenced by the 150-200 attendees who came despite a bad economy, and discretionary budgets being whittled to next to nothing.  This means that organizations are thinking about DITA as an integral part of their long term strategy.
  • DITA’s scope is not only Technical Publications.  Again, not earth-shattering news.  With specializations like Machine Industry, Learning Objects, and gobs of others, DITA is extending its reach to whole industries that haven’t been able to take advantage of XML before now.  At the conference, I spoke to attendees in a wide range of industries including bio-tech, and manufacturing.
  • Shifting focus from Content Authoring to Content Management and Content Delivery Services.  This is a fundamental shift.  Eric Severson emphasized this point when he demonstrated that Microsoft Word could be used to create DITA for a specific class of users that aren’t the primary audience for more conventional XML authoring solutions.  Obviously this raised a few eyebrows in the audience, but the point is that DITA’s architecture is such that even casual contributors, given a few minor constraints in Word, can certainly provide content that can be easily turned into DITA.
  • DITA will live in Middleware. This is a key point. While the focus of the conference was centered around DITA and content management, there’s more here than meets the eye.  I had the opportunity to sit in on the open forum that discussed upcoming v1.2 features.  Many of these features are centered around link handling (things like keyref, conref push, and conref keys [conkeyref]).  There will be greater emphasis on managing all kinds of linking, including indirect links that could  have significant implications on vendors’ existing architectures.  While it still will be possible to manage small projects from simple file management strategies (including things like Subversion), larger projects and enterprise-wide implementations, particularly those that want to take advantage of these new features will need more sophisticated applications (read: a content management system) to manage the myriad of link strategies being made available. 

    Even rendering tools will need to be more sophisticated to support these new features.  The DITA Open Toolkit is currently working on a new version (1.5) to support these.  Other rendering applications will need to start thinking about how they plan to support these features.

    I’ll have more thoughts on this particular topic later.   Suffice it to say that there are some key assumptions that current DITA adopters take for granted and make impact how they design and create content in the future.
  • XML Authoring tools will get more complex.  To support all the new features coming in DITA 1.2, DITA-aware XML authoring tools will need to be tightly integrated into middleware systems, particularly the CMS.  There will also be a strong emphasis for authoring tools to handle a wide variety of link and referencing strategies.  I anticipate that these applications will be more process-intensive, with larger footprints on a user’s PC.  I also anticipate that the level of sophistication required to “operate” these tools will be much higher.  So the emphasis for XML Authoring tool vendors will have to focus on both features and usability. 

This conference was illuminating on many different facets.  Even the vendors I spoke to seemed to realize that DITA is a truly disruptive technology that has changed the way the entire industry thinks about XML. In the current economic reality, this is the perfect time to be thinking about what this all means and how organizations can take advantage of these innovations in their environment.  Ride the wave.

Saturday, April 25, 2009

XProc and DITA: Random Thoughts

I’ve been following James Sulak's Blog.  He has some pretty impressive detailed discussions about using XProc.  XProc is an XML pipeline processing language, specifically designed to provide instructions for processing XML content. The Recommendation specifies many different kinds of “steps” that can be assembled in virtually any order to control the sequencing and output from one step to another.

Right now, DITA’s reference implementation, the DITA Open Toolkit (DITA OT) uses Apache Ant and custom tasks to process DITA XML content.  One of the principle limitations with the DITA OT is its reliance on XSLT 1.0 and extensions (particularly the Idiom FO Plugin) to handle the rendering.  

With XProc-enabled tools like Calabash, it seems like DITA could easily processed using XProc, along with an upgrade of the stylesheets to 2.0. 

Content Management Strategies/DITA North America Conference

I’ll be attending the conference in St. Petersburg, FL.  Come visit the Flatirons Solutions booth while you’re there.  It should be a very interesting conference. 

Eric Severson, CTO of Flatirons Solutions will be presenting a potentially “game-changing” presentation that speaks to lowering the “barrier to entry” into XML authoring. I recommend seeing this one.

Saturday, March 28, 2009

DocBook Going Modular

Scott Hudson, Dick Hamilton, Larry Rowland and I (AKA, “The Colorado DocBook Coalition”) recently drafted a proposal to support “modular” DocBook and presented it to the DocBook TC yesterday.  In general, this proposal is in response to huge demand for DITA-like capabilities for DocBook. 

Many core business factors are driving DocBook in this direction:
  • more distributed authoring: authors are responsible for specific content areas rather than whole manuals.  Content could be authored by many  different authors, even some in different organizations altogether.
  • content reuse: This has long been a "holy grail" of information architects:  write content once, reuse in many different contexts
  • change management:  isolate the content that has changed.  This is a key driver for companies that have localization needs.  By modularizing their content, they can drive down costs by targeting only the changed content  for translation.

Additionally, there are additional downstream opportunities for modularized content:

  • dynamic content assembly:  create "publications" on the fly using an external assembly file that identifies the sequence and hierarchy of modular components rather than creating a single canonical instance.

The following excerpts from the proposal detail the preliminary features (Important: these are not yet set in stone and are subject to change).  The final version will be delivered with the 5.1 release. 

Assemblies

The principle metaphor for Modular DocBook is the “assembly”.  An assembly defines the resources, hierarchy and relationships for a collection of DocBook components.  The <assembly> element can be the structural equivalent of any DocBook component, such as
a book, a chapter, or an article.  Here’s the proposed content model in RelaxNG Compact mode:

db.assembly =
  element assembly {
    db.info?, db.toc*, db.resources+, db.relationships*
  }

Resources

The <resources> element is high-level container that contains one or more resource objects that are managed by the <assembly>.  An <assembly> can contain 1 or more <resources> containers to allow users to organize content into logical groups based on profiling attributes.

Each <resources> element must contain 1 or more <resource> elements.

db.resources =
  
element resources {
      db.common.attributes, db.resource+
   }

Specifying Resources

The <resource> element identifies a "managed object" within the assembly. Typically, a <resource> will point to a content file that can be identified by a valid URI.  However a <resource> can also be a 'static' text value that behaves similarly to a text entity.

Every <resource> MUST have a unique ID value within the context of the entire <assembly>

db.resource =
  element resource {
    db.common.attributes,
    attribute fileref { text }?,
    attribute resid {text}?,
    text?
  }

Content-based resources can also be content fragments within a content file, similar to an URI fragment:  file.xml/#ID.

Additionally, a resource can point to another resource.  This allows users to create "master" resource that can be referenced in the current assembly, and indirectly point the underlying resource that the referenced resource identifies.

For example:

<resource
    id="master.resource" 
    fileref="errormessages.xml"/>
<resource
   id="class.not.found"
   resid="{master.resource}/#classnotfound"/>
<resource
   id="null.pointer"
   resid="{master.resource}/#nullpointer"/>

The added benefit of indirect references is that users can easily point the resource to a different content file, provided that it used the same underlying fragment ids internally.  It could also be used for creating locale-specific resources that reference the same resource id.

Text-based resources behave similarly to XML text entities.  A content-based resource can reference a resource, provided that both the text resource and the content resource are managed by the same assembly.

assembly.xml:

...
<resource id="company.name">Acme Tech, Inc.</resource>
<resource id="company.ticker">ACMT</resource>
...

file1.xml:

<para><phrase resid="company.name"/> (<phrase resid="company.ticker"/>) is a
publicly traded company...</para>

Organizing Resources into a Logical Hierarchy

The <toc> element defines the sequence and hierarchy of content-based resources that will be rendered in the final output.  It behaves in a similar fashion to a DITA map and topicrefs.  However, instead of each <tocentry> pointing to a URI, it points to a resource in the <resources> section of the assembly:

<toc>
    <tocentry linkend="file.1"/>
    <tocentry linkend="file.2">
        <tocentry linkend="file.3"/>
    </tocentry>
</toc>

<resources>
    <resource id="data.table" fileref="data.xml"/>
    <resource id="file.1" fileref="file1.en.xml"/>
    <resource id="file.2" fileref="file2.en.xml"/>
    <resource id="file.3" fileref="{data.table}/#table1"/>
</resources>

Creating Relationships Between Resources

One of the more clever aspects of DITA’s architecture is the capability to specify relationships between topics within the context of the map (and independent of the topics themselves).  The DocBook TC is currently considering several proposals that will enable resources to be related to each other within the assembly.

The Benefits of a Modular DocBook

There is a current mindset (whether it’s right or wrong is irrelevant) that DocBook markup is primarily targeted for “monolithic” manuscripts.  With this proposal, I think there many more possibilities for information architects to create new types of content: websites, true help systems, mashups, dynamically assembled content based on personalized facets (Web 2.0/3.0 capabilities), a simplified Localization strategy like that which has been advocated in DITA.

What’s more: the design makes no constraints on the type of content resources referenced in an assembly:  In fact they can be any type: sections, chapters, images, even separate books (or assemblies) to mimic DocBook’s set element.

The design takes into account existing DocBook content that currently exists as “monolithic” instances, but is flexible enough to support other applications like IMF manifests for SCORM-compliant content, making it easy to create e-Learning content.

As the first draft of the proposal, I would expect that there will be changes between now and the final spec.  Yet, the core of the proposal should remain relatively intact.  If you would like to get involved or have other ideas, let me know.  Stay tuned.

Technorati Tags: ,,
del.icio.us Tags: ,,

Friday, February 13, 2009

XMetaL Reviewer Webinar

I attended a webinar yesterday hosted by Just Systems for their XMReviewer product.  The problem space is that conventional reviewing processes are cumbersome and inefficient, particularly when there are multiple reviewers that need to review a document concurrently.  In general, most review processes rely on either multiple draft copies being sent out, one to each reviewer, and then it’s up to the author to “merge” the comment feedback into the source.

With XMReviewer, the entire review process is centralized on the XM Reviewer server:  Reviewers simply access the document online, provide their comments and submit. What’s really cool is that reviewers are notified in almost real time when another reviewer has submitted their comments and can integrate their fellow reviewer’s comments into their own.

The real advantage is that authors have all reviewer comments integrated and merged into a single XML instance, and in context. Very Nice. 

There’s also a web service API that allows you to integrate XMReviewer with other systems including a CMS that can automatically deploy your content to the XMReviewer server.

There are some nice general reporting/auditing features built in as well.  However, I didn’t see anything that would allow me to customize the reports or to manipulate the data, but I wouldn’t consider that a show stopper.

For folks used to “offline” reviews, e.g., providing comments at home, or on a plane, this won’t work for you as it is a server application.  Nonetheless, having the ability to have full control and context for review comments far outweighs the minor inconvenient requirement of being online and getting access to the server (most companies these days have VPN, so it’s not a showstopper).  Though, I can envision the possibility of the server downloading and installing a small-footprint application that would allow users to review the document “offline” and being able to “submit” the comments back to the server when the reviewer is back online. 

The only other limitation right now is that XMReviewer doesn’t support DITA map-level reviews in which you can provide comments on multiple topics within a map.  This is currently in development for a future release – stay tuned.

Overall, XMReviewer looks great and can simplify your content review process.  Check it out.

Wednesday, February 11, 2009

Microsoft Live Writer Convert

After reading a few blogs here and there, I’ve seen a few posts about Microsoft’s Live Writer for creating blog posts.  Always on the lookout for new toys and tools, I decided to download it and try it out. I gotta admit, I’m sold.  This is a pretty nice application that allows me to work offline to write and edit my posts and when I am ready and able to connect, I simply push the “Publish” button and away it goes.  Sweet.

It’s simple to install, and simple to configure to point to virtually any blog host out there.  In short: It just works. 

This is what software should be like.  It should solve a particular set of problems and only those problems well without requiring massively complex installation and configuration steps.  The interface should be intuitive (Live Writer is wickedly intuitive) and should help rather than hinder me in my productivity.  This tool does that.  Well done, Microsoft!