Saturday, January 9, 2016

It's been a while

It's been a while since I've posted on this blog.  A lot has happened in the intervening months (years). Mostly, I've moved forward and backward in the tech world.  I've hummed and hawed over the direction of my career.  I've also been somewhat distracted by local events that required my attention. You can label it a higher calling; a change in priorities from a completely geeky world that I have embraced as my own to one that encompasses the future of my geeks-in-training.

Needless to say, I haven't abandoned the world of XML, XSLT, XPath, XQuery entirely.  I've evolved.  I've had a gap year (or two) and seen the world outside the comfortable confines of angle brackets and FLOWR statements, and it has changed me - a bit.

For those who've read some of my posts, I drank the kool-aid in the 90's, and wanted everyone else to share from the same cup.  What the last few years have shown is that kool-aid is for kids.  It's time to grow up.  The technology and content worlds have changed, and I need to change with it.

Primarily, what has changed is my thinking about the role of XML technologies in the landscape.  It has a place, and honestly, it's a very important player in the wide ranging landscape - just not in the way I perceived it 5, 10, or even 15 years ago.  In fact, I'm not sure that Sir Berners-Lee would have envisioned the path that markup languages have taken.  Nonetheless, it's time to embrace these changes for what they are.

  1. XML is here to stay.  It's mature, it's lived up to its promise of extensibility, and it won't go away.
  2. XML technologies are stable.  There is little in the way of implementation variability among different providers now.  Whether you are using Java, a classic 'P' (Python, PHP, Perl) language, or any one of the newer languages, they all must honor XML in order to be complete.
  3. Incremental changes in XML technologies are principally to support scale.  DOMs are nice, elegant, and easy-to-use structures, but quickly turn into boat-anchors when we attempt to embrace internet-scale data within them.  Streaming XML is the new sexy.
  4. Virtually any data model can be represented in XML for a myriad of business purposes with self describing semantics and the capability to flex its node hierarchy based on the data.  For this reason alone, XML has been, and will continue to be, a workhorse. Think about Spring, one of the web's most successful Java frameworks.  XML is the underlying data for nearly part of it. 
  5. As a data persistence layer, XML plays well with tabular, relational, and hierachical structures. With its rich semantics and vendor-agnostic format, XML technologies are powerful, flexible, and scalable. Yes, it's also a great storage model for pernicious content models - like DITA, DocBook, and, gulp, OOXML (I'll shower later for that). 
  6. From XML, I can deliver and/or display content/data in virtually any format imaginable, even to the point of backward compatibility to legacy formats (ask me about HTML 3.2/EDGAR transformations sometime)
With all that XML has going for it, what can go wrong? Well, depending on who you ask, the answer will vary. Some criticize XML for not living up to the hype of Web 2.0. XML's initial purpose was to be the "SGML for the web." To some degree, it is, but it is far from ubiquitous.  That isn't to say that we didn't try. From XML Data Islands to XMLHttpRequest objects in Javascript, XML was given first class status on the web.  The problem was (and is) that, as a DOM, extracting data often relied on a lot of additional code to recurse through the XML content.  For some, the browser's tools felt like a blunt instrument when finer grained precision was needed.  Eventually, JSON became the lingua franca for web data, and rightfully so. 

Perhaps its biggest limitation or failure is the countless attempts to make XML usable for the masses. I'll admit that I was one of the biggest evangelists.  I honestly believed that we could build authoring tools that were intuitive and easy-to-use back by powerful semantic markup.  We would be able to enrich the web (and by proxy, the world) with content that had meaning - it could searched intelligently, reused, repurposed, translated, and delivered anywhere.  As one of my friends and mentor, Eric Severson, said, XML has the capability of making content personal, designed for a wide audience and personalized for an "audience of one."

Intrinsically, I still have some faith in the idea, but the implementation never lived up to the hype.  For over twenty years, we've tried to build tools that could manage XML authoring workflows from creation to delivery.  Back in the late 90's and early 2000's, I remember evangelizing for XML authoring solutions to a group of technical writers for a big technology firm.  I was surprised by the resistance and push back I got.  Despite the benefits of XML authoring, the tools were still too primitive, and instead of making them more productive, it slowed them down.  Nevertheless, I kept evangelizing like Linus in the pumpkin patch.

Eventually, the tools did improve.  They did make authoring easier... for some. What we often glossed over was the level of effort required to make the tools easier to use.  Instead of being tools that could be used by virtually anybody who didn't want to see angle brackets (tools for the masses), we made built-for-purpose applications.  For folks like me who understood the magical incantations and sorcery behind these applications, they were fantastic.  They were powerful.  They also came with a hefty price tag, too.  And, because they were often heavily customized, users were locked in to the tools, the content model, and the processes designed to support it.

Even if we attempted to standardize on the grammar to enable greater interchange, it still required high priests and wizards to make it work. The bottom line is that the cost of entry is just too high for many.  The net result is that XML authoring is a niche, specialized craft left to highly trained technical writers and the geekiest of authors.

Years ago, I read Thomas Kuhn's The Structure of Scientific Revolutions. The main premise is that we continue to practice our crafts under the premise of well-accepted theory.  Over time, through the course of repeated testing, anomalies emerge. Initially, we discard these anomalies, but as they continue to accumulate, we realize that we can't ignore these anomalies anymore.  New theories emerge.  However, we reject these new ideas and vigorously debate that the old theories are still valid, until enough evidence disproves them entirely.  At that moment, a new paradigm emerges.

We are at that moment of paradigmatic shift.  No longer can XML be thought of as a universal theory of information and interchange.  Instead, we need to reshape our thinking to accept that XML solves many difficult problems, and has a place in our toolbox of technology, but other technologies and ideas are emerging that are easier, cheaper, faster methods for content authoring.  For many, the answers to "intelligent content" aren't about embedding semantics within, but rather to extend content with rich metadata about the content that live as wrappers on the content - that can be dynamic, contextual, and mutable.

Before I'm labeled a heretic, let me be clear.  XML isn't going away, nor is it inherently a failed technology.  Quite the opposite.  Its genius is in its relative simplicity and flexibility to be widely used in a vast number of technologies in an effective manner.  The difference is that we've learned that we could never get enough inertia behind the idea of XML as a universal data model for content authoring, and it was too cumbersome for web browsers to manipulate.  We have other tools for that.