Friday, February 6, 2009

DOXSL: Reflexive Code Documentation and Testing, and other random XSLT thoughts

One of the cool things about Doxsl is that I can test it on itself.  Since Doxsl is an XSLT application (v2.0), I can create documentation using itself.  I'll be posting these on the Sourceforge project website soon - when I finish documenting my own code.  Hmmm... walking the talk and eating your own dogfood at the same time - who woulda thunk it?

There's something about reflexive tools that is just pretty cool.  I built another application to document the DocBook RelaxNG schemas into DocBook.  

The Doxsl DocBook stylesheets are coming along.  If I can manage to get some free time at night, I might be able to finish these in about a week.  The one thing I really need to do is check out xspec to see if I can write test cases against the code.  I've tried XMLUnit about a year ago, but the critical difference is that it tests the artifact of the transform, rather than the code itself.  Implicit testing is better than no testing at all, but it doesn't mean that it's optimal.  I love JUnit and NUnit for testing my Java and .NET code, and it's great for the large enterprise-wide projects I work on.  While Doxsl is just a teeny, tiny little application (tool is more like it), there is enough code right now that even simple changes can cause big problems.  I'll let you know what I think about xspec when I've had a chance to tinker with it.

Another XSL application I've been working over the last year or so is an alternative to the DITA Open Toolkit.  The OT is OK as a reference implementation, but it can be a bear to work with even to handle minor customizations.  Part of the problem, in my opinion, is that the OT's stylesheets are dependent on the Ant scripts that drive it.  In fact, it takes some fancy footwork to get the stylesheets to run outside of the ant environment.  And here again, Ant is the tool for creating a consistent and reliable sequence of build steps for a development environment.  Where it falls short is dealing with sophisticated XSLT applications that have lots of parameters (optional or otherwise).  The parameters have to be "hardcoded" into the XSLT task.  Not my idea of extensible.

Add to that: the stylesheets are still using XSLT 1.0 - ehhh.  I'll use 1.0 if I have to (thanks Microsoft).  There's just so much more that 2.0 provides that makes stylesheet development much, much easier.  At any rate, I've been working on my own implementation of DITA using XSLT 2.0 and with relying on Ant.  HTML and CHM are working, FO is the hard part.  What I find interesting is that I can process a map containing over 160 topics into HTML in about 20 seconds with my stylesheets.  It takes over 2 minutes with the OT! The results are anecdotal , and I haven't really tested the stylesheets on anything really big, but I like what I see so far (in fact the DOXSL website uses DITA and my stylesheets to render it).


4 comments:

Anonymous said...

Hello, I use Dita. Coming from the world of Docbook, I was skeptical and disoriented at first. Now I have more respect. I'm curious to hear more detail on your concerns...

"bear to work with even to handle minor customizations"
like what?

"takes some fancy footwork to get the stylesheets to run outside of the ant environment"
Why do that? Dita is an Ant project that uses XSLT and Java tasks. Do you think it should all be in XSLT?

'The parameters have to be "hardcoded" into the XSLT task'
In Dita, the parameters that one would want to change, are not hardcoded... they're passed as properties in your master build file.

Are you familiar with Dita's plugin framework?... It's not documented very well on their website, but once I got a hold of that, I was happier. In dita, you plug a customization *into* your dita install (w/o modifying it), whereas in Docbook you *wrap* your docbook install with a customization.

Jim Earley said...

I do use DITA frequently. I respectfully disagree that DITA is an Ant application. It's an XML application that has as it default implementation the DITA Open Toolkit, which is an Ant application. As such, I have no objection to using Ant, or the DITA OT for that matter for very light "out-of-the-box" implementations, but where things break down is when you want to integrate DITA XML processing into other tool chains that aren't Java (invoking a new "shell" process is the simplest approach, but it's not a robust solution - exception handling in particular). Nor does it give me the ability to handle content transformation "in memory" for more sophisticated business logic.

I'm very aware of DITA's "plugin" architecture also - we've integrated many DITA implementations with the IDIOM FO plugin.

My overall "concern" is that the XSLT stylesheet are wholly dependent on the build sequence within the ANT scripts to preprocess the content into a particular structure so that when the final rendering stylesheets take over, it can handle that rendering correctly. And... it's XSLT 1.0, which means that you aren't taking advantage of the latest and greatest functionality that XSLT provides.

My additional concern is that Ant's xslt task is static with respect to parameters. So let's assume that I need to create a stylesheet override to process tables for HTML to allow for different options such as alternating row shading, solid shading, and no shading. I create my stylesheet, add an XSL parameter called table-shading. That should be the end of it. But... not true because I also have to add that parameter within the Ant task, so that it knows to pass it along, and, I need to create a property that can be set and used as the xslt parameter value. Even if I pass the property in via the command line (e.g., -Dproperty=value), the Ant XSLT task still has have that parameter identified to pass it to the underlying transformation engine.

Finally, having lived in both the DocBook and the DITA worlds, and having implemented both for clients and employers, no two implementations are the same. The rendering requirements, often mean stylesheet customizations, typically using overrides to existing templates. It's just as easy to "wrap" an existing XSLT application with your overrides without the Ant layer in the middle.

By all means, the OT is excellent for certain implementations, but when it comes to "heavy duty" customization, the Ant scripts add to that complexity and overhead.

One other aspect to consider, from my point of view, let's assume that the "base stylesheets" change a la DocBook. I can simply point my stylesheet customizations to the the new version of the stylesheets (via my include/import statement) and I'm off and running. Now let's assume that the DITA OT changes, I have reconfigure it to work with my customizations all over again, and I better hope that I haven't made any changes to the Ant scripts.

To your point, I do think the OT has done some very clever things for extensibility with it's plugin architecture, and it makes exceptional sense for complex build processes with many steps (a la help rendering).

Thanks for your comment,

Cheers,

Jim

Anonymous said...

Hey Jim:
thanks for taking the time to respond. you're right, Dita and Dita-OT are two different things. I kinda took a shortcut by calling Dita-OT by "Dita"

a few responses to your responses.

>Even if I pass the property in via the command line (e.g., -Dproperty=value),
>the Ant XSLT task still has have that parameter identified
> to pass it to the underlying transformation engine.
The plugin framework has extension points for adding new xslt and ant parameters (for most of the important xslt calls). No need to alter any files in the installation directly or pass in parameters to Ant using the shell. You can create a plugin that houses your customization.

>It's just as easy to "wrap" an existing XSLT application with your overrides without the Ant layer in the middle.
Although not quite the same, Dita-OT does support the "wrap" approach you're talking about using the args.xsl parameter
see here: http://dita-ot.sourceforge.net/doc/ot-userguide/xhtml/processing/antparms.html
But better in my opinion to build a plugin.

> Now let's assume that the DITA OT changes, I have reconfigure it to work with my customizations all over again
if your customization lives in a plugin, you might have to make a change, but there's a good chance you wont. No different than Docbook.

Again, the unfortunate part is that the power of the plugin architecture is way undersold. The user guide barely talks about it.
This the best explanation I've seen: http://www.ditausers.org/tutorials/open_toolkit/anderson/

Jim Earley said...

Anonymous,

First off, thanks for the lively discussion. I welcome diverse opinion and I am enjoying this conversation.

Though, I would enjoy it better, if we could speak on a first name basis :)

You make great points, and yes the DITA OT does have a well thought out architecture, despite its lack of clear, comprehensive documentation.

I'll try to sum up where I think where I think we agree and where we differ. And I think you'll find that we have more in common.

First off, I was involved in a project at a previous employer that did something very similar to the DITA OT, several years before the OT was released: It was called XIDI (XML Information Delivery Initiative) and it encompassed DocBook along with Ant to control the build sequence for multiple formats and dozens of languages. Alas, this was in the days before XSLT 2.0 and XProc. Saxon was pretty new and didn't have an Ant task. It worked very well for us.

I deal with a lot of different platforms and technologies that work with DITA. Some of these aren't Java-based, and my customer's requirements vary greatly. With that said, most use the OT very effectively, including the plugin architecture.

To that point, designing a new "plugin" isn't necessarily the problem, it comes down to differences in design approach. First, with the OT, the XML passes through different processing stages where the XML markup is massaged to be handled by the next stage. In one instance, the IDIOM FO Plugin actually combines the topics into a single XML file for processing in the FO rendering phase. The main point is that there is always a dependency in that the markup between stages N and N+1 have to remain consistent or the hand-off fails at the latter stage. Not that this is a serious problem (or ever has been a problem), but I prefer an approach that separates my build logic (processing sequence) from my content rendering so that I can integrate my rendering process exactly where I want it, when I want it and using many different platforms (.NET, Python, PHP, Java, Perl, etc.,) and tools. With the OT, there are built-in assumptions about how the build process should work. Not that these assumptions are wrong, but just that they don't encompass all possible scenarios. Even in this case, designing a plugin to work within or to modify the underlying processing assumptions that the OT lays out requires an intimate knowledge of the target sequencing and knowing exactly the precise spot to inject your build logic. I would submit that this is not something endeavored by the faint of heart or someone who is not familiar with Ant (not that Ant is hard to learn, but I can tell you from first-hand experience that there are folks - the uninitiated - who cringe at the thought of trying learn it)

Again, Ant is a fantastic platform, I use it daily in my work for multi-tier web based applications dealing with content management systems and integrating DITA applications. I also use DITA with C# applications. Binding to Ant to enable a transformation now means that I have to fork a new process execute the build, and hope there are no fatal errors. Exception handling is wholly dependent on the return value that Ant sends back when the process terminates (successfully or otherwise). Sometimes I've seen Ant return "false positives" (e.g., '0') when in fact it failed. That's extra overhead that I need to build into my downstream systems to handle any possible exceptions. In short - blech!

Ant logic and design principles aside, the OT still uses XSLT 1.0 for all of its processing. Again, having worked with XSLT 2.0, it is ~substantially~ better, extremely powerful, and more functionally complete. Things like sequences, built-in and custom functions, and built-in support for serializing more than one result document just make this version so much easier to work with. Saxon's implementation is dead on (and it has Java and .NET implementations!), and makes my code so much more extensible and flexible. My SourceForge project, DOXSL, uses DITA and my prototype stylesheets to render the website and documentation. It's fast and it works great!

Again, thank you for the lively discussion. I think we can agree that DITA is definitely shown itself to be an excellent XML standard (I'm on the TC, and there are some incredibly brilliant people working on it, doing great things) that will be here for a long time. I think we can also agree that the DITA OT has shown itself to be an impressive and generally flexible reference implementation of DITA. With that said, I would think that other DITA implementations might well be useful for approaches not well suited to the OT, and hopefully extend DITA's adoption into areas not yet touched.

Cheers,

Jim