Jim's Thoughtspot: DITA's Leaky Abstractions

If you haven't read Joel Spolsky's Law of Leaky Abstractions before, here's the basic premise: constructs that are designed to "simplify" our lives can sometimes fail and result in even bigger problems than the abstraction intended to solve.

In DITA, there are two potential leaky abstractions:

Specialization

References

Before you think that I'm disparaging DITA, read on.

DITA is perhaps one of the most transformative ideas to come out of XML. It has enabled users to create content for a wide range of purposes and a wide range of industries - from the traditional Tech Pubs to Finance, Industrial, and Aerospace. And this is just scratching the surface. The door is just beginning to open up to the possibilities for adopting DITA. And the vendors who've jumped on the DITA bandwagon continues to grow.

There are so many reasons for adopting DITA as an XML platform: The architecture is designed with reuse in mind. Instead of thinking of content as large monolithic documents, DITA changes the paradigm by thinking of content as smaller, single units of information, that can be assembled into many different documents in many different ways. And with conref, you can reuse even smaller pieces of content, like product names or common terminology.

If reuse isn't a big selling point for you, the ability to create your own content types and semantics (specializiation) that fit your processes. No need for a one-size-fits-all content model. With specialization, you can derive new topic types or new semantic elements from existing DITA elements, provided that the underlying content model for these topics or semantic elements (inline elements, AKA "domain specializations" in DITA parlance) comply with the underlying content model pattern of the "parent". This is really cool. You can create wholly new content markup that you understand, or you can refine existing content models to be tighter based on what you need.

Where's the Leak?

By reading this far, you're probably confused. I've said that DITA has leaky abstractions, particularly with Specialization and References, and I also said that DITA's really cool because you can assemble documents from many different topics, you can conref content from other resources, and you can create specializations. So let me go back to Spolsky's Law of Leaky Abstractions. In his blog, Spolsky says:

"Abstractions fail. Sometimes a little, sometimes a lot. There's leakage. Things go wrong. It happens all over the place when you have abstractions"

The point here is that abstractions like specialization and conref aren't always problematic - in general they work well - but they can break, and when they do, they cause all kinds of problems. So now I'll explain where the leaks are in these constructs.

Leaky Abstraction #1: Specializations

Specialization allows you to create your own markup semantics that are meaningful to you. For example, you can create a specialized topic type for a memo that contains the following constructs:

To (who should read this memo)

From (who sent the memo)

Subject (what's the memo about)

Body (the contents of the memo)

And let's say that a memo's body can contain only paragraphs and lists.

No problem. Using the DITA Language Specification, I see that DITA's standard topic element has pretty much everything I need (and more), so I just need to create to weed out the elements I don't want, and add a few that I need that aren't yet defined. I open Eliot Kimber's fantastic specialization tutorial to guide me through the details and within an hour, I have my new memo topic DTD. Specialization works.

Now let's look at where specialization is leaky. I need to create a parts list for a plane assembly that contains an optional title and some special metadata elements that identify the planes' tail numbers that this list is effective for. The list can also nest for sub-parts using the same metadata elements to further refine the effectivity to a subset of tail numbers declared in the parent list. Oh, it can can appear in a wide variety of content blocks. Oops. <ul>only allows <li>elements. <dl>? Well... maybe. I might be able specialize <dlhead>. But it's stretch. And there's a lot of overhead to acheive what I want. We have a leak. A small one, but a leak nonetheless.

Leaky Abstraction #2: References

Conref is a transclusion mechanism that can reference content from another source and include it in another context, provided that the conref'ed content is allowed within the current context. Cool. I can create standard warning notices and simply conref them into the right location:

warning-notices.dita


<topic id="warnings">
    <body>
        <note id="empty.fuel.tank.warning" type="warning">
           <p>
              Make sure that the aft fuel tank is completely empty before
              starting this procedure.
           </p>
        </note>
        <note id="warning2" type="warning">
            ...
        </note>
    </body>
</topic>

proctopic1.dita


<topic id="my.topic">
    <body>
        <p>...</p>
        <note conref="warning-notices.dita#warnings/empty.fuel.tank.warning"/>
    </body>
</topic>

That's OK. Straightforward and what conref was intended to do. Here's the rub: it works like a charm if you're managing the links on a local file system.

Things start getting really hairy for example, if you have a shared resource, like the common warnings example above, on say a Windows file server, where I've mapped it to my Z: drive. Now my conref must point to the physical location of that file. Here's the first potential leak: If Joe Writer maps the file server to his Y: drive and Jane Author maps the same to her W: drive, and we all start sharing topics that each of us has written, we all could have broken conrefs. Guess what. The same holds true for topicrefs and potentially any other topic-based link. The referencing logic is heavily dependent on the physical location of the file.

Introduce a CMS, many of which manage topics as individual objects with references handled by by some form a relationship mechanism (e.g., a relationship table in a database with object IDs rather than physical file addresses), and the leaky abstraction can be a gaping hole.

Plugging the Leaks

While these examples fit the definition of leaky abstraction, much of what DITA offers is solid - so there's no need to abandon DITA at all. In fact, DITA works like it should most of the time. But like any abstraction, there are potential gotchas. Considering how new DITA is, the level of sophistication and stability is pretty darn good. And these aren't excruciatingly difficult problems to solve. But it will require careful thought along with smart dialog with vendors and implementors who believe DITA has the capability to transforming the paradigms of how content is created.

Jim's Thoughtspot

Saturday, April 12, 2008

DITA's Leaky Abstractions

Where's the Leak?

Leaky Abstraction #1: Specializations

Leaky Abstraction #2: References

Plugging the Leaks

1 comment:

Labels