Sunday, November 29, 2009

Fun With XProc

I've been so busy with clients over the last 6 months that I haven't had much time to tinker with XProc much. I took the Thanksgiving holiday week off with the hope of having a little time to dabble with the language. Up until yesterday, I didn't open my computer once (has to be a new record) since we were busy with other things. A side note: If you're in Denver before February 7th, I highly recommend you see the Ghengis Khan exhibit at the Museum of Nature and Science.

As I often do, I had already done some preliminary reading beforehand. James Sulak's blog is a must-read. Another very useful and informative website is from EMC: "XProc: Step By Step", originally authored by Vojtech Toman. Even the W3C specification is generally helpful.

The biggest hurdle for me was to stop thinking of XProc working in the same way I think of Ant. While Ant does process XML content, it isn't the tool's principle focus - Ant was principally designed as Java implementation of MAKE tools. For that purpose, Ant has become the de facto standard. Before XProc was conceived, many of us used Ant as a way to control the sequencing of complex XML publishing pipelines. I worked on XIDI, a DocBook-based publishing system at HP, which was principally based on Ant scripts; the DITA Open Toolkit is an Ant-based build environment. For the most part Ant works admirably, but there are limitations. The biggest limitation is the xslt task's static parameter declarations, and the indirect nature by which parameter values are passed to an XSL Transformation through property values. It works, but it can get kludgy pretty fast for complex stylesheets. More importantly, Ant is primarily a developer's tool that acts like a Swiss Army knife that has a tool for just about every purpose. Most of these tools work very well for very specific tasks, but they aren't intended to perform specialized tasks. For that, you'll need to create custom Ant tasks or use other specialized tools. XProc is one of these specialized tools that is designed specifically for XML processing.

So the biggest conceptual difference to grok in XProc (I like this…) is how steps are connected together to form a complete pipeline process. Rather than using target dependencies and explicit target calls like you do in Ant, XProc uses the concept of pipes to connect the output of one step to the input of the next step. It's very much like Unix shell or DOS command line pipelines. For example:

ps -ax | tee processes.txt | more

Since many steps (including the p:xslt step) can have more than one input and one or more outputs (think of xsl:result-document in XSLT 2.0) we need to explicitly bind uniquely named output streams to input streams of subsequent steps. It's very analogous to plumbing, and another way that XProc is different than Ant: Ant's tasks are very dependent on the file system to process inputs and outputs; XProc pipelines are in-memory input and output streams until you explicitly serialize to the file system.

With this I was able to create my first "real" XProc pipeline to generate Doxsl output. Here it is:

<p:declare-step name="doxsl" type="dxp:generate-doxsl-docs"
psvi-required="false"
xmlns:p="
http://www.w3.org/ns/xproc"

    xmlns:dxp="urn:doxsl:xproc-pipeline:1.0">  
<p:input port="source" kind="document" primary="true"
sequence="false" />
<p:input port="parameters" kind="parameter" primary="false"
sequence="true"/>
<p:output port="result" primary="true" sequence="false" >
<p:pipe step="transform" port="result"/>
</p:output>
<p:output port="secondary" primary="false" sequence="true" >
<p:pipe step="transform" port="secondary" />
</p:output>
<p:option name="format" select="'dita'"/>
<p:choose name="select-stylesheet">
<p:when test="$format='dita'">
<p:output port="result" primary="true"
sequence="false" >
<p:pipe step="load-dita-stylesheet"
port="result"/>
</p:output>
<p:load name="load-dita-stylesheet">
<p:with-option name="href"
select="'../../dita/doxsl.xsl'" >
<p:empty/>
</p:with-option>
</p:load>
</p:when>
<p:when test="$format='html'">
<p:output port="result" primary="true"
sequence="false">
<p:pipe port="result"
step="load-html-stylesheet"/>
</p:output>
<p:load name="load-html-stylesheet">
<p:with-option name="href"
select="'../../html/doxsl.xsl'"/>
</p:load>
</p:when>
</p:choose>
<p:xslt name="transform">
<p:input port="source" >
<p:pipe step="doxsl" port="source"/>
</p:input>
<p:input port="stylesheet" >
<p:pipe step="select-stylesheet" port="result"/>
</p:input>
<p:input port="parameters">
<p:pipe port="parameters" step="doxsl"/>
</p:input>
<p:with-param name="debug" select="'true'"/>
</p:xslt>
</p:declare-step>



Here's a diagram, built with EMC's XProc Designer.  This tool is a great way to visualize and start your XProc scripts:




Essentially, I used the p:declare-step declaration so that I can declare it as a custom step (dxp:generate-doxsl-docs), which will allow it to be integrated into other pipelines. It has one option, format, which is used to specify which output format to generate (for Doxsl, 'html' and 'dita' are supported). The first step ("select-stylesheet") evaluates the format option and loads the appropriate stylesheet into the step's "result" output stream. This is used by the second step's ("transform") stylesheet port. The transform's source file (the XSLT stylesheet to be documented) is bound to the root step's source port, as is the parameters port. I also set the stylesheet's 'debug' parameter to true to inject output to the "transform" step's result port.

All of this is done in memory and not serialized to the file system. This is intentional so that other pipelines can integrate this custom step.

I've tested this with Calabash. I still need to evaluate with Calumet.

Right now, these are baby steps. I think that XProc has a lot of potential. I think the next big task is to consider an XProc implementation for DITA XML processing.

Sunday, May 10, 2009

DITA’s New Keys and the 80/20 Rule

Have you ever used the lorem() function in Microsoft Word? How about the rand() function? Do you know all the function keys? Most of us have used Microsoft Word for countless years and don’t know about all of the “hidden” functionality that it offers. Chances are, you’ll know a few of these, but you won’t know all of them simply because you’ve never needed them. Many of these functions are extremely powerful utilities that make Word a versatile application beyond a standard formatted text editor. But they’re available if you ever have the need.

The same is true with some of the new functionality being made available in the forthcoming DITA 1.2 release currently being worked on. Of particular interest is the introduction of keys. Keys provide a way for authors to create addresses to resources through the use of a named identifier rather than to a specific URI pointer. In other words, I can create an easy-to-remember key, like “ms-word-functions” that actually resolves to a URL “http://support.microsoft.com/kb/211982” and link to this URL using the key name in my DITA topic.

Here’s an example of how it works. In my map, I define a topicref and set the keys attribute with an identifier. I also set my href to the physical location of the resource I want to reference.

<map>
    <topicref keys="ms-word-functions" 
        href="
http://support.microsoft.com/kb/211982"
        scope="external"/>
</map>

In my topic file, I can reference the key that's defined in my map:

<topic id="my.topic">
    <title>SampleTopic</title>
    <body>
        <p>
            Lorem ipsum dolor sitamet,  
            consectetuer adipiscing elit. Maecenas
            porttitor conguemassa. Fusce posuere, agna 
            sed pulvinar ultricies, purus
<xref keyref="ms-word-function">lectus</xref>
            malesuada libero, sit amet commodo magna eros
            quisurna.
        </p>
   </body>
</topic>

Now, when the topic is rendered, it will resolve itself to the Microsoft URL defined in my map. Pretty cool stuff. And powerful too. This has many potential uses: localizers can create translated versions of a resource using the same key reference and resolve the link to a locale-specific version of the reference. Consumers can be directed to different resources based on their profile or context within a website.

From an authoring perspective, there's another neat user story: I can reference a "yet-to-be-determined" resource via a key, and when that resource has been created, the key's definition in the map file will resolve the key reference.

Technically, a key definition doesn't need to be reside directly in the map that references that topic. It can live in an "ancestor" map that pulls in the topic indirectly by way of the map referencing that topic. In fact key values can be overridden: Let's assume that I define a key, called "company-website" in Map A that points to "www.company-a.com", and in Map B, I define the same key as"www.company-b.com". Map B also references Topic-1.dita which contains a keyref to "company-website". Map A references Map B. When the Topic-1.dita is rendered in the context of Map B as the primary map, the keyref will resolve to"www.company-b.com"; when Map A is the primary map, the same topic willreference www.company-a.com.

  • Map A
    Key: company-website = "www.company-a.com"
    • Map B
      Key: company-website = “www.company-b.com”
      • Topic-1.dita
        keyref: company-website
        resolves to: www.company-a.com
  • Map B
    Key: company-website = “www.company-b.com”
    • Topic-1.dita
      keyref: company-website
      resolves to: www.company-b.com

With all great power comes even greater responsibility. Any time a topic makes use of a key reference, that topic is explicitly binding itself to a map(or many maps), meaning that a topic is no longer a unit of information that is completely independent of any particular context in which it is assembled into. You could make the argument that any reference defined in a topic to an external resource (e.g., an image or a cross-reference to another topic) by definition creates a dependency on that topic. And arguably, the referenced (the endpoint) resource is unaware of the object that is referencing it, regardless of whether it's a topic reference or a cross-reference. But there is an additional dependency in the case of keys: Any map that references a topic with a key reference must define the key. So in a sense, not only does the map (or an ancestor map) need to know about the topic, it needs to discover what the topic is about, specifically related to any key references it points to. Consequently, somewhere along the line, at least one map must define the keys used by a topic.  Did you get all that?  Imagine what your XML authoring tools, CMS systems, and rendering platforms will need to do to manage this.

This is pretty sophisticated and powerful functionality.  But the question is, do you need to use keys and keyrefs in order to use DITA?  More importantly, will your tools need to support keys to take advantage of DITA's other capabilities?  The short answer is no.  In fact, I would expect that keys/keyref -enabled DITA support is still a way off for most DITA-enabled tools.  Nevertheless, you can still use DITA with the current tools and get most, if not all, of what you need.  Just like Microsoft Word with features like MailMerge, keys and keyref will be there if you need them, but chances are, you can get by without them for most content without ever knowing you missed it.

Finally,  the possibility of defining indirect links has opens the door to many different possibilities for dynamically driven, profile- and locale-specific content.  This is very cool stuff - the kind of thing XML folks like me get excited about.  But from a practical standpoint, there are potential downsides too.  Keys and key references add another layer of complexity to planning the authoring, deployment and management of DITA content.  In reality, most tools aren't ready for this complexity just yet.  So while the the standard is ahead of the game, the rest of the industry will be playing catch up.  Still, Ride the Wave. 

Wednesday, April 29, 2009

Content Management Strategies/DITA North America Conference Review

I wasn’t able to attend many of the session since I was manning the Flatirons Solution booth.  Yet from talking with the attendees who visited with us, here are some of the key takeaways:

  • DITA is here to stay. This is not news, but the key point here is that organizations are adopting the standard in earnest, as evidenced by the 150-200 attendees who came despite a bad economy, and discretionary budgets being whittled to next to nothing.  This means that organizations are thinking about DITA as an integral part of their long term strategy.
  • DITA’s scope is not only Technical Publications.  Again, not earth-shattering news.  With specializations like Machine Industry, Learning Objects, and gobs of others, DITA is extending its reach to whole industries that haven’t been able to take advantage of XML before now.  At the conference, I spoke to attendees in a wide range of industries including bio-tech, and manufacturing.
  • Shifting focus from Content Authoring to Content Management and Content Delivery Services.  This is a fundamental shift.  Eric Severson emphasized this point when he demonstrated that Microsoft Word could be used to create DITA for a specific class of users that aren’t the primary audience for more conventional XML authoring solutions.  Obviously this raised a few eyebrows in the audience, but the point is that DITA’s architecture is such that even casual contributors, given a few minor constraints in Word, can certainly provide content that can be easily turned into DITA.
  • DITA will live in Middleware. This is a key point. While the focus of the conference was centered around DITA and content management, there’s more here than meets the eye.  I had the opportunity to sit in on the open forum that discussed upcoming v1.2 features.  Many of these features are centered around link handling (things like keyref, conref push, and conref keys [conkeyref]).  There will be greater emphasis on managing all kinds of linking, including indirect links that could  have significant implications on vendors’ existing architectures.  While it still will be possible to manage small projects from simple file management strategies (including things like Subversion), larger projects and enterprise-wide implementations, particularly those that want to take advantage of these new features will need more sophisticated applications (read: a content management system) to manage the myriad of link strategies being made available. 

    Even rendering tools will need to be more sophisticated to support these new features.  The DITA Open Toolkit is currently working on a new version (1.5) to support these.  Other rendering applications will need to start thinking about how they plan to support these features.

    I’ll have more thoughts on this particular topic later.   Suffice it to say that there are some key assumptions that current DITA adopters take for granted and make impact how they design and create content in the future.
  • XML Authoring tools will get more complex.  To support all the new features coming in DITA 1.2, DITA-aware XML authoring tools will need to be tightly integrated into middleware systems, particularly the CMS.  There will also be a strong emphasis for authoring tools to handle a wide variety of link and referencing strategies.  I anticipate that these applications will be more process-intensive, with larger footprints on a user’s PC.  I also anticipate that the level of sophistication required to “operate” these tools will be much higher.  So the emphasis for XML Authoring tool vendors will have to focus on both features and usability. 

This conference was illuminating on many different facets.  Even the vendors I spoke to seemed to realize that DITA is a truly disruptive technology that has changed the way the entire industry thinks about XML. In the current economic reality, this is the perfect time to be thinking about what this all means and how organizations can take advantage of these innovations in their environment.  Ride the wave.

Saturday, April 25, 2009

XProc and DITA: Random Thoughts

I’ve been following James Sulak's Blog.  He has some pretty impressive detailed discussions about using XProc.  XProc is an XML pipeline processing language, specifically designed to provide instructions for processing XML content. The Recommendation specifies many different kinds of “steps” that can be assembled in virtually any order to control the sequencing and output from one step to another.

Right now, DITA’s reference implementation, the DITA Open Toolkit (DITA OT) uses Apache Ant and custom tasks to process DITA XML content.  One of the principle limitations with the DITA OT is its reliance on XSLT 1.0 and extensions (particularly the Idiom FO Plugin) to handle the rendering.  

With XProc-enabled tools like Calabash, it seems like DITA could easily processed using XProc, along with an upgrade of the stylesheets to 2.0. 

Content Management Strategies/DITA North America Conference

I’ll be attending the conference in St. Petersburg, FL.  Come visit the Flatirons Solutions booth while you’re there.  It should be a very interesting conference. 

Eric Severson, CTO of Flatirons Solutions will be presenting a potentially “game-changing” presentation that speaks to lowering the “barrier to entry” into XML authoring. I recommend seeing this one.

Saturday, March 28, 2009

DocBook Going Modular

Scott Hudson, Dick Hamilton, Larry Rowland and I (AKA, “The Colorado DocBook Coalition”) recently drafted a proposal to support “modular” DocBook and presented it to the DocBook TC yesterday.  In general, this proposal is in response to huge demand for DITA-like capabilities for DocBook. 

Many core business factors are driving DocBook in this direction:
  • more distributed authoring: authors are responsible for specific content areas rather than whole manuals.  Content could be authored by many  different authors, even some in different organizations altogether.
  • content reuse: This has long been a "holy grail" of information architects:  write content once, reuse in many different contexts
  • change management:  isolate the content that has changed.  This is a key driver for companies that have localization needs.  By modularizing their content, they can drive down costs by targeting only the changed content  for translation.

Additionally, there are additional downstream opportunities for modularized content:

  • dynamic content assembly:  create "publications" on the fly using an external assembly file that identifies the sequence and hierarchy of modular components rather than creating a single canonical instance.

The following excerpts from the proposal detail the preliminary features (Important: these are not yet set in stone and are subject to change).  The final version will be delivered with the 5.1 release. 

Assemblies

The principle metaphor for Modular DocBook is the “assembly”.  An assembly defines the resources, hierarchy and relationships for a collection of DocBook components.  The <assembly> element can be the structural equivalent of any DocBook component, such as
a book, a chapter, or an article.  Here’s the proposed content model in RelaxNG Compact mode:

db.assembly =
  element assembly {
    db.info?, db.toc*, db.resources+, db.relationships*
  }

Resources

The <resources> element is high-level container that contains one or more resource objects that are managed by the <assembly>.  An <assembly> can contain 1 or more <resources> containers to allow users to organize content into logical groups based on profiling attributes.

Each <resources> element must contain 1 or more <resource> elements.

db.resources =
  
element resources {
      db.common.attributes, db.resource+
   }

Specifying Resources

The <resource> element identifies a "managed object" within the assembly. Typically, a <resource> will point to a content file that can be identified by a valid URI.  However a <resource> can also be a 'static' text value that behaves similarly to a text entity.

Every <resource> MUST have a unique ID value within the context of the entire <assembly>

db.resource =
  element resource {
    db.common.attributes,
    attribute fileref { text }?,
    attribute resid {text}?,
    text?
  }

Content-based resources can also be content fragments within a content file, similar to an URI fragment:  file.xml/#ID.

Additionally, a resource can point to another resource.  This allows users to create "master" resource that can be referenced in the current assembly, and indirectly point the underlying resource that the referenced resource identifies.

For example:

<resource
    id="master.resource" 
    fileref="errormessages.xml"/>
<resource
   id="class.not.found"
   resid="{master.resource}/#classnotfound"/>
<resource
   id="null.pointer"
   resid="{master.resource}/#nullpointer"/>

The added benefit of indirect references is that users can easily point the resource to a different content file, provided that it used the same underlying fragment ids internally.  It could also be used for creating locale-specific resources that reference the same resource id.

Text-based resources behave similarly to XML text entities.  A content-based resource can reference a resource, provided that both the text resource and the content resource are managed by the same assembly.

assembly.xml:

...
<resource id="company.name">Acme Tech, Inc.</resource>
<resource id="company.ticker">ACMT</resource>
...

file1.xml:

<para><phrase resid="company.name"/> (<phrase resid="company.ticker"/>) is a
publicly traded company...</para>

Organizing Resources into a Logical Hierarchy

The <toc> element defines the sequence and hierarchy of content-based resources that will be rendered in the final output.  It behaves in a similar fashion to a DITA map and topicrefs.  However, instead of each <tocentry> pointing to a URI, it points to a resource in the <resources> section of the assembly:

<toc>
    <tocentry linkend="file.1"/>
    <tocentry linkend="file.2">
        <tocentry linkend="file.3"/>
    </tocentry>
</toc>

<resources>
    <resource id="data.table" fileref="data.xml"/>
    <resource id="file.1" fileref="file1.en.xml"/>
    <resource id="file.2" fileref="file2.en.xml"/>
    <resource id="file.3" fileref="{data.table}/#table1"/>
</resources>

Creating Relationships Between Resources

One of the more clever aspects of DITA’s architecture is the capability to specify relationships between topics within the context of the map (and independent of the topics themselves).  The DocBook TC is currently considering several proposals that will enable resources to be related to each other within the assembly.

The Benefits of a Modular DocBook

There is a current mindset (whether it’s right or wrong is irrelevant) that DocBook markup is primarily targeted for “monolithic” manuscripts.  With this proposal, I think there many more possibilities for information architects to create new types of content: websites, true help systems, mashups, dynamically assembled content based on personalized facets (Web 2.0/3.0 capabilities), a simplified Localization strategy like that which has been advocated in DITA.

What’s more: the design makes no constraints on the type of content resources referenced in an assembly:  In fact they can be any type: sections, chapters, images, even separate books (or assemblies) to mimic DocBook’s set element.

The design takes into account existing DocBook content that currently exists as “monolithic” instances, but is flexible enough to support other applications like IMF manifests for SCORM-compliant content, making it easy to create e-Learning content.

As the first draft of the proposal, I would expect that there will be changes between now and the final spec.  Yet, the core of the proposal should remain relatively intact.  If you would like to get involved or have other ideas, let me know.  Stay tuned.

Technorati Tags: ,,
del.icio.us Tags: ,,

Friday, February 13, 2009

XMetaL Reviewer Webinar

I attended a webinar yesterday hosted by Just Systems for their XMReviewer product.  The problem space is that conventional reviewing processes are cumbersome and inefficient, particularly when there are multiple reviewers that need to review a document concurrently.  In general, most review processes rely on either multiple draft copies being sent out, one to each reviewer, and then it’s up to the author to “merge” the comment feedback into the source.

With XMReviewer, the entire review process is centralized on the XM Reviewer server:  Reviewers simply access the document online, provide their comments and submit. What’s really cool is that reviewers are notified in almost real time when another reviewer has submitted their comments and can integrate their fellow reviewer’s comments into their own.

The real advantage is that authors have all reviewer comments integrated and merged into a single XML instance, and in context. Very Nice. 

There’s also a web service API that allows you to integrate XMReviewer with other systems including a CMS that can automatically deploy your content to the XMReviewer server.

There are some nice general reporting/auditing features built in as well.  However, I didn’t see anything that would allow me to customize the reports or to manipulate the data, but I wouldn’t consider that a show stopper.

For folks used to “offline” reviews, e.g., providing comments at home, or on a plane, this won’t work for you as it is a server application.  Nonetheless, having the ability to have full control and context for review comments far outweighs the minor inconvenient requirement of being online and getting access to the server (most companies these days have VPN, so it’s not a showstopper).  Though, I can envision the possibility of the server downloading and installing a small-footprint application that would allow users to review the document “offline” and being able to “submit” the comments back to the server when the reviewer is back online. 

The only other limitation right now is that XMReviewer doesn’t support DITA map-level reviews in which you can provide comments on multiple topics within a map.  This is currently in development for a future release – stay tuned.

Overall, XMReviewer looks great and can simplify your content review process.  Check it out.

Wednesday, February 11, 2009

Microsoft Live Writer Convert

After reading a few blogs here and there, I’ve seen a few posts about Microsoft’s Live Writer for creating blog posts.  Always on the lookout for new toys and tools, I decided to download it and try it out. I gotta admit, I’m sold.  This is a pretty nice application that allows me to work offline to write and edit my posts and when I am ready and able to connect, I simply push the “Publish” button and away it goes.  Sweet.

It’s simple to install, and simple to configure to point to virtually any blog host out there.  In short: It just works. 

This is what software should be like.  It should solve a particular set of problems and only those problems well without requiring massively complex installation and configuration steps.  The interface should be intuitive (Live Writer is wickedly intuitive) and should help rather than hinder me in my productivity.  This tool does that.  Well done, Microsoft!

Monday, February 9, 2009

Implementing XML in a Recession

With the economic hard times, a lot of proposed projects that would allow companies to leverage the real advantages of XML are being shelved until economic conditions improve.  Obviously, in my position, I would love to see more companies pushing to using XML throughout the enterprise. We’ve all heard of the advantages of XML: reuse, repurposing, distributed authoring, personalized content, and so on. These are underlying returns on investment for implementing an XML solution.  The old business axiom goes, “you have to spend money to make money.”  A corollary to that might suggest that getting the advantages of XML must mean spending lots of money.

However, here’s the reality: implementing an Enterprise-wide XML strategy doesn’t have to break the bank. In fact, with numerous XML standards that are ready to use out of the box, like DITA and DocBook for publishing and XBRL for business, the cost of entry is reduced dramatically compared to a customized grammar. 

And while no standard is always a 100 percent perfect match for any organization’s business needs, at least one is likely to support at least 80 percent.  We often consult our clients to use a standard directly out of the box (or with very little customization) until they have a good “feel” of how well it works in their environment before digging into the real customization work.  Given that funding for XML projects is likely to be reduced, this is the perfect opportunity to begin integrating one of these standards into your environment, try it on for size while the economy is slow, and when the economy improves, then consider how to customize your XML content to fit your environment.

Any XML architecture must encompass the ability to create content and to deliver it, even one on a budget.  Here again, most XML authoring tools available on the market have built-in support for many of these standards, with little to no effort, you can use these authoring environments out of the box and get up to speed. 

On the delivery side, these same standards, and in many cases the authoring tools have prebuilt rendering implementations that can be tweaked to deliver high-quality content, with all of the benefits that XML offers.  In this case, you might want to spend a little more to hire an expert in XSLT.  But it doesn’t have to break the bank to make it look good.

The bottom line: A recessionary economy is a golden opportunity to introduce XML into the enterprise. In the short term, keep it simple, leverage other people’s work and industry best practices and leave your options open for when you can afford to do more.  Over time when funding returns, then you can consider adding more “bells and whistles” that will allow you to more closely align your XML strategy with your business process.

Friday, February 6, 2009

DOXSL: Reflexive Code Documentation and Testing, and other random XSLT thoughts

One of the cool things about Doxsl is that I can test it on itself.  Since Doxsl is an XSLT application (v2.0), I can create documentation using itself.  I'll be posting these on the Sourceforge project website soon - when I finish documenting my own code.  Hmmm... walking the talk and eating your own dogfood at the same time - who woulda thunk it?

There's something about reflexive tools that is just pretty cool.  I built another application to document the DocBook RelaxNG schemas into DocBook.  

The Doxsl DocBook stylesheets are coming along.  If I can manage to get some free time at night, I might be able to finish these in about a week.  The one thing I really need to do is check out xspec to see if I can write test cases against the code.  I've tried XMLUnit about a year ago, but the critical difference is that it tests the artifact of the transform, rather than the code itself.  Implicit testing is better than no testing at all, but it doesn't mean that it's optimal.  I love JUnit and NUnit for testing my Java and .NET code, and it's great for the large enterprise-wide projects I work on.  While Doxsl is just a teeny, tiny little application (tool is more like it), there is enough code right now that even simple changes can cause big problems.  I'll let you know what I think about xspec when I've had a chance to tinker with it.

Another XSL application I've been working over the last year or so is an alternative to the DITA Open Toolkit.  The OT is OK as a reference implementation, but it can be a bear to work with even to handle minor customizations.  Part of the problem, in my opinion, is that the OT's stylesheets are dependent on the Ant scripts that drive it.  In fact, it takes some fancy footwork to get the stylesheets to run outside of the ant environment.  And here again, Ant is the tool for creating a consistent and reliable sequence of build steps for a development environment.  Where it falls short is dealing with sophisticated XSLT applications that have lots of parameters (optional or otherwise).  The parameters have to be "hardcoded" into the XSLT task.  Not my idea of extensible.

Add to that: the stylesheets are still using XSLT 1.0 - ehhh.  I'll use 1.0 if I have to (thanks Microsoft).  There's just so much more that 2.0 provides that makes stylesheet development much, much easier.  At any rate, I've been working on my own implementation of DITA using XSLT 2.0 and with relying on Ant.  HTML and CHM are working, FO is the hard part.  What I find interesting is that I can process a map containing over 160 topics into HTML in about 20 seconds with my stylesheets.  It takes over 2 minutes with the OT! The results are anecdotal , and I haven't really tested the stylesheets on anything really big, but I like what I see so far (in fact the DOXSL website uses DITA and my stylesheets to render it).


Wednesday, February 4, 2009

DOXSL Shout out

I recently found a post by a former client of mine, James Sulak who had some very positive feedback for my open source project, DOXSL. Check out his post. Thanks for the shout out, James!

On that front, I have been working on a new release. There are a few bugs I've discovered when I started processing some DITA stylesheets, particularly when trying to look for overrides using DITA's matching patterns (e.g., *[contains(@class,'- topic-type/element-name ')] - things got borked with speicializations when the class tokens contain the "parent" token.

There's still the other things on my roadmap to work on:
  • DocBook output (I have some designs in mind, specifically around modularity)
  • The Comment Stub Generator: this is a high priority for me
  • Comment Collector: This is similar to what .NET does when it compiles code documentation into a single XML file. The intent is to make the XSLT less "noisy"
  • There are additional validation ideas that Ken Holman suggested that I'd like to build in.

Progress is slow, given that my day job still takes precedence. Stay tuned.

It's Been Awhile

It's been a long time since I've posted to this blog. Life and work have been a bit insane. Work-wise, I've been working on a very complex project that marries DITA modular architecture with a more conventional monolithic document approach. It's been an interesting excercise - one that I will discuss in more detail later.

Personally, my father passed away on December 17th. He was in ICU since the beginning of December, and had been ill for quite some time before that. Needless to say, blogging was the last thing on mind.

I did have a presentation prepared for the XML 2008 conference, called "Optimizing Schemas for Enterprise Content", which I had intended to attend, but couldn't make it due to my father's illness. Thankfully, Eric Severson graciously and adeptly stepped in for me. From all accounts, it was received well. The premise behind the paper is that schemas shouldn't be the end-all/be-all for constraining content models (e.g., validation). Instead, I offer other strategies for controlling content.

In the meantime, I'm getting back in the swing with the DITA TC, hopefully having more time to devote to moving the v1.2 spec along. I'm also working on a proposal for creating "modular DocBook." There are some interesting concepts I'm playing around with in the proposal. Hopefully, I'll have enough to present to the DocBook TC later this month.

So much to do, so little time.