RDFa vs microformats from Evan Prodromou

I'm fascinated by the idea of including semantic markup in Plain Old XHTML pages, and I'm excited by recent developments in this area. But I'm also concerned about the growing discrepancy between the W3C's initiative, namely RDFa, and the more established but conversely less official microformats effort. I think that having competing standards efforts in this area is going to hurt the advancement of so-called small-s semantic Web technologies, which is going to be bad for everyone.

Using XHTML as its own metadata substrate makes for some interesting applications, some of which are starting to disseminate on the Web. Rubhub is an interesting distributed social network analyzer; Flocktails is a fascinating example of extracting semantic data ; Ray Ozzie's Live Clipboard shows how to use embedded HTML data objects to make rich browser-based applications. I think there's a lot of room to grow here, but there's a lot of promise.

On the surface, the two systems are remarkably similar. Both have the goal of encoding semantic information into (X)HTML documents such that the same content can be re-used for human-readable data and machine-readable metadata. In both systems, this goal is achieved by using XHTML attributes that are typically hidden from users to indicated the location and meaning of the metadata.

But the two projects have different histories, and those histories are influencing their style and their future. Microformats.org has grown out of the work of blog aggregator Technorati's developer community, influenced by the mysterious and cabalistic Global Multimedia Protocols Group.

RDFa, on the other hand, has been a long-smoldering concept of the W3 Semantic Web group and its associated thinkers. It's mostly defined by its problem domain and requirements rather than any implementations -- but a recent push by Ben Adida representing Creative Commons seems to have kicked the process into gear.

Examples

An example is probably in order. The following is my basic contact information expressed in the hCard microformat:

    <div class="vcard">
      <img src="http://evan.prodromou.name/images/Evan48.jpg" alt="photo" class="photo"/>
      <a class="url fn" href="http://evan.prodromou.name/">Evan Prodromou</a>
      <div class="adr">
        <div class="street-address">1481 rue Rachel Est</div>
        <span class="locality">Montreal</span>, 
        <span class="region">QC</span>
        <span class="postal-code">H2J2K3</span>
      </div>
      <div class="tel">+1-514-554-EVAN</div>
      <a class="email" href="mailto:evan@prodromou.name">evan@prodromou.name</a>
    </div>

People familiar with the Web will note that the essential data is kept visible to the user, either as image data or as text string. class attributes are used to mark the semantic values of data, and a elements reference other resources.

Now, here's the same data encoded with RDFa:

    <div class="vcard" xmlns:v="http://www.w3.org/2001/vcard-rdf/3.0#"
         about="http://evan.prodromou.name/">
      <img src="http://evan.prodromou.name/images/Evan48.jpg" alt="photo" property="v:photo"/>
      <a property="v:FN" href="http://evan.prodromou.name/">Evan Prodromou</a>
      <div role="v:ADR">
        <div property="v:Street">1481 rue Rachel Est</div>
        <span property="v:Locality">Montreal</span>, 
        <span property="v:Region">QC</span>
        <span property="v:Postal-code">H2J2K3</span>
      </div>
      <div role="v:TEL">
        <span property="v:Type">Voice</span>:
        <span property="v:Value">+1-514-554-EVAN</span>
      </div>
      <a rel="v:EMAIL" href="mailto:evan@prodromou.name">evan@prodromou.name</a>
    </div>

Note that XML namespaces are used to provide a namespace for the metadata vocabulary. In layperson's terms, that just means that if you're using the word "whip" as a label in the US House of Representatives, you can specify that it's different from the word "whip" used as a label in an S&M dungeon. (Usually.)

Second, the attributes that represent the metadata are new ones: "property", "role", "about". These were introduced as part of the Metainformation Attributes module of XHTML 2. (Worried that you're falling behind by not using XHTML 2 yet? Don't. Nobody else is, yet, either.)

Differences

Here are some tabulated differences I've seen between microformats and RDFa.

microformats RDFa
flat namespace XML namespaces provide namespacing
Works with HTML 4 and XHTML 1.x only works with XHTML 2
Uses latent metadata attributes that have been lingering in HTML for years introduces new metadata attributes
defined by one organization interoperable definitions between multiple organizations
new formats require new data models re-uses data models created for RDF
burgeoning support and Web cred no implementations
unofficial and ad hoc part of XHTML's next version
some marked-up content on Web no marked-up content on Web
uses shortcuts and abbreviations limited shortcuts

Here is my gut feeling about the two projects: I like being able to do semantic markup today, with existing microformats. Looking around my site you can see XFN, rel-license and rel-tag implementations.

However, I'm concerned that µf's (as the Unicode hipsters call them) aren't going to be able to scale with the growth of small-s semantic content. I think we're going to need namespacing to make general-purpose semantic Web processors work correctly, and I think we're going to need to re-use the existing data formats that have already been built as RDF. I also think the RDF model is extremely powerful for capturing knowledge, and I think it's a good idea to leverage rather than discard that.

Collision ahead

Microformats and RDFa show all the classic signs of an upcoming ''de jure'' vs. ''de facto'' standards dispute. On the microformats side is a loose aggregation of bloggers and Web 2.0 hobbyists and a small sprinkling of commercial interests, forging ahead on real-world problems and rolling out frequent solutions. On the other side is a standards organization with the moral authority to lead the Web in new directions. One group is getting things done; the other is thinking carefully about how to do things.

There have been many instances of this kind of dynamic in the Web's past. The most notorious case would be the wide variation of HTML extensions implemented by browser vendors in the mid-90s. Another would be the RSS fork from the beginning of the millenium. In neither previous case was there much gained by anyone in having competing and conflicting efforts, and much momentum and goodwill was lost.

In the case of HTML extensions, literally millions of person-hours have gone into bringing the Web back to a standards-based platform -- something almost universally recognized as a Good Thing. In the case of RSS, the fork has inhibited uptake of syndication technologies and retarded its use by about 5 years.

Request to involved parties

I think that if microformats.org and the RDFa effort continue moving forward without coordinating their effort, the future looks kind of bleak. I think that the smart money is on microformats, but having RDFa become part of HTML 2 makes microformats.org's future seem kind of like a cul-de-sac.

I'm not anyone special; just another person interested in Web technologies. I'm the main technical person for Wikitravel, and a developer on MediaWiki, and I think there's a lot of opportunity in both of those projects to create this kind of XHTML+semantic data combination. I'm excited by these developments, but I'm concerned about the way forward.

My confusion and concerns are a microcosm of what other implementors and content creators will be thinking over the next few months, unless coordination starts soon. There is a window that's open right now, but it's closing fast as microformats gain more implementation ground and RDFa advances on its standards track. I think there are several possible outcome scenarios for this situation. In descending order of probability:

I don't think any of these outcomes is desirable or valuable to the Web community. So I'd like to propose another way forward, namely: that microformats become an early adopter's entryway into a future semantic-data-in-XHTML world defined by RDFa. To get there, I think this requires the following:

  1. RDFa gets acknowledged and embraced by microformats.org as the future of semantic-data-in-XHTML
  2. The RDFa group makes an effort to encompass existing microformats with a minimum of changes
  3. microformats.org leaders join in on the RDFa authorship process
  4. microformats.org becomes a focus for developing real-world RDFa vocabularies

As you can see in the above examples, I don't think the space betwen RDFa and existing microformats is very large. I think that the namespacing issue could be dealt with by allowing some kind of default profile specified in the <head> element, or perhaps with <link> elements, so that the above hCard could become valid RDFa by adding:

    <link rel="default RDFa namespace" href="http://microformats.org/wiki/hCard" />

There are probably better solutions than that; I just think that it's not necessarily an impossibility to make these two formats backwards and forewards compatible.

tags: