Richard MacManus gets microformats wrong
Richard MacManus is talking about how to design with microcontent. There’s few quotable bits that I have to disagree with:
XML has largely lived up to its promise of being the data format of choice for the Web 2.0 era.
Eh, don’t think so. I wasn’t a web professional when XML first hit the scene, but the OG’s have told me all about this potential. I can’t help but contrast Richard’s point with Simon St. Laurent’s:
XML has occasionally found its way to the Web, but it’s hard to remember now that once upon a time, XML was supposed to be directly on the Web, the files people loaded and manipulated…
The particular XML Web described by Bosak and Bray [leaders of developing XML -ed.] never happened. (It still could, but hasn’t.)
XML has failed to live up to its potential on the web. Sure, it works great behind firewalls and in specific applications, but, remember, XML was supposed to replace HTML. Even XHTML doesn’t precisely work.
XML has come no where near matching HTML in terms of distribution and interoperability. Certainly, there are some incompatibilities between various user agents, but those are being improved as I write (and most of the problems are regarding rendering, not HTML itself).
On the web, HTML still outweights XML, in many ways:
- There’s more data in HTML than XML.
- There’s more HTML resources than XML resources.
- There’s more people who can competently author HTML than XML.
Today, the format of the web, 2.0 or not, is HTML. It may change in the future, but it hasn’t yet.
Anyway, on the the point where I want to disagree with MacManus. He says:
Microformats is the generic name given to any format that builds on XML to provide additional metadata about web objects.
Actually Richard, no.
You must not have been paying attention, because microformats are built on HTML, not XML. Sure, you can use them with with XHTML, but that is by no means a requirement.
Also, ‘microformats’ refers to a specific way of extending the web, via modularization and iteration on top of existing formats with existing schemas (where possible). This is much different that Structured Blogging, which ignores the most common format on the web (HTML) and manages to replicate and hide the interesting data.
The interesting data is in the content. Putting data in arbitrary XML is not useful and lacks the sharing potential of the WWW.

March 22nd, 2006 at 7:30 pm
I would disagree with #3, I would say that an increasing majority of pages on the web are generated by some kind of content management system. The user-produced web has exploded, but not because everyone started learning HTML overnight. (How much of what Technorati consumes is hand-coded?)
March 22nd, 2006 at 8:25 pm
Obligatory note: Structured Blogging certainly does not ignore HTML: the plugins use microformats (mainly hReview and relLicense at this point, but also stuff like COinS, an ISBN lookup microformat from the academic community) to mark up quite a few things in the HTML output.
There are still bits and pieces that aren’t marked up as fully as they could be, but that’ll improve when we get around to doing another big chunk of work on the plugins, or if anyone in the user community wants to download our code and fix up the templates to support whatever’s current in the microformat world. Whichever comes first :-)
The plugins *also* publish the source XML, which is used internally to edit and format the post, but it’s intended as a fallback for the case that no microformatted output is available, rather than as the only way to get at the data.
March 22nd, 2006 at 8:55 pm
You are correct to say it should read (X)HTML rather than XML in that last quote. Thanks for alerting me to that oversight, I’ve corrected it now in my post.
On your point about XML, I would counter that RSS and Atom are XML dialects and so in those terms XML has most certainly become the lingua franca on the Web. Around 70% of my site traffic is RSS, for example. I’m not saying HTML still isn’t really important, perhaps still moreso than XML, but I do think the future of data formats on the Web is XML and its dialects (esp. RSS/Atom).
March 22nd, 2006 at 11:17 pm
Matt-
Granted that most web resources are produced with a CMS, there’s still more people who know HTML than XML.
Phillip-
You say:
I’m sorry, I’ve conflated Structured Blogging, the formats (as they were originally positioned) and Structured Blogging, the tools. I was talking about the former, but obviously not being clear.
Richard-
You say: Around 70% of my site traffic is RSS,, but I think that misses the point.
First of all, I was talking in terms of numbers of resources, not numbers of requests, which is the important measure in terms of authoring.
Secondly, the higher proportion of RSS traffic is a result of how RSS is used. If HTML were polled regularly, the proportions would be different.
You also say: XML and its dialects.
See, one of the problems with XML-on-the-web are the dialects. See Tantek’s Tower of Babel Problem for more explanation.
I know I might be sounding a bit idealistic to suggest that people can actually use a shared vocabulary, but that how the web works.
Maybe it would have been more fun if I’d posted about this on supr.c.ilio.us.
March 23rd, 2006 at 3:14 am
Fair point about rss polling, but really (to borrow Dave Winer’s slipped note to John Markoff recently) “it doesn’t have to be adversarial.” XML has wormed its way into HTML, via XHTML, so the two (xml and html) complement each other well. Sure you can add semantics to html to make the world, er I mean Web, a better place. But xml, rss etc are going to be increasingly important in the future Web (he says, neatly avoiding using the ‘2.0’ term). Babel schmabel…
And yes, this would’ve been far more entertaining on supr.c.ilio.us – although when I first read this post I thought the Snark Factor was present (“Use the Snark, Ryan… uuuuse the snark.”) :-)
March 24th, 2006 at 12:17 am
[…] ix ‘06 keynote, Ray Ozzie pimping them at ETECH, LinkedIn coming to the party, folks misrepresenting core ideas already… I mean sweet! […]