Jim's Thoughtspot: May 2008

There's a common notion that you don't want to know how sausage is made if you like eating them. In fact, there's a butcher (reminiscent of days gone by) down the street from where I live who makes all kinds of sausage from "old family recipes". What I read into this sort of thing is the butcher is saying, "Don't ask, I'm not telling." To which my reply would be, "I won't, because it tastes good."

In much the same fashion as I don't want to know what goes into the sausage, many writers I've had the experience of working with or consulting for really don't want to know or care about angle brackets, tags, attributes, containment, content models, namespaces, DTDs, schemas or processing instructions. In other words, "Let me write, and get out of my way!"

Part of this sturm und drang over markup languages is partly the unintended consequences of sophisticated desktop publishing tools that enable virtually anyone to create publication-ready content. The problem is that structure and semantics takes a back seat to formatting and presentation. And for the last 15-plus years, DTP applications have been the main tool in the writing tool chest. It didn't matter that the underlying format was proprietary - it just worked.

But the price of "freedom" with using these tools resulted in additional overhead to ensure that "publication-ready" content conformed to internal company standards for style and format.So here comes XML, with its promise of unlocking content from format and proprietary binary formats read only by proprietary applications. Structure and semantics now matter. In so many ways, this is a great thing. Consistency can be enforced by the underlying schema - no rogue "Heading 1" styles followed by a "Heading 3" or funky new format styles that deviate from the corporate template. Better yet, no more tedious hours reformatting content from other sources like OEM content, or being limited to "one-off" copies from originals.

The rub is that the tools needed to write XML put more onus on the writer to understand the content model. Even if these tools hide or abstract the content model, there is still a significant change management issue to train users that structured content authoring is signficantly different than what they might be used to. In exchange for "creative freedom," back-end publishing systems now have a reliable content to work with. But the problem is that some folks creating the content feel like they've gotten the short end of the stick. For those that do feel short shrifted, it feels like they're right in the middle of the sausage factory.

The real challenge is to develop tools that make XML content creation simple. Users don't want to know a hyperlink or image is marked up, they simply want to identify the URL and link text. Similarly, some users don't care if the content model is using a CALS or OASIS Exchange or HTML table model, they want to insert rows and cells. In many respects, this seems to suggest that the current markup complexity is getting in the way.

Many of the tools do have WYSIWYG-ish views into the content. Yet, as Eric Armstrong pointed out, even these tools fall short and tend to confuse users when they want to insert a particular element or structure not allowed in the context of the current cursor location.In many regards, this is too much overhead for the average user. They want to have a tool that allows them to insert paragraphs, lists, tables, images and section headings. One suggestion might have a higher level abstraction layer that all tools can easily support, and build the more complex XML markup under the covers.

One possible approach might be to use HTML-like WYSIWYG editors that support basic content structures. Under the covers, it can convert the HTML into more complex structures required by grammars like DocBook and DITA. In fact, the Interoperability Framework Scott Hudson and I have worked on uses this same approach.

While this type of approach takes away some control over very specific semantics, it reduces the complexity and clutter that more structured models introduce. That doesn't preclude editing tools to be smarter about how they enable more specific semantics. Much in the same way many HTML editors display a popup dialog for users to specify the URL and text for a hyperlink, tools can provide users with other dialogs for specifying (more like classifying) an abstract span element to indicate more detailed meaning.

This is not to suggest that "non-narrative" content, like data structures or configurations that are are used by many modern applications will want to use this approach. However, publication level markup can take advantage of this type of approach to simplify how users create content, and honor their request to remain unaware of how the sausage is made and will be happier to create content in a model they do understand.

Jim's Thoughtspot

Saturday, May 3, 2008

I Don't Care How the Sausage Is Made

Labels