You are reading a MIX Online Opinion. In which we speak our minds. Joshua Allen Meet Joshua Arrow

Opinions

3Comment Retweet

The HTML5 Semantics Debate

Aug 31, 2009 In Web Culture By Joshua Allen

If you’ve been following the development of the new HTML5 specification, you’re probably familiar with the demo-friendly new features like Canvas, Drag-Drop, and the Video tag.  But unless you’re a microformats or semantic web geek like we are, you may have missed the new "semantic" tags in HTML5. 

Last I looked, the HTML5 draft proposed 24 new HTML tags, ranging from <nav> and <footer> to <article> and <time>.  As you can guess, these new tags are used to mark up things like navigation links, footers, articles, and units of time.  In HTML5, you would use one of these new tags,instead of the traditional technique of class="nav",and apply your CSS styles to the new elements.  Other than your markup looking a little different (with fewer class attributes), everything works the same as before.

Boring, right?  Still, these new tags have prompted a healthy debate.  I’ll summarize the common positions in the debate, and then offer my own opinion.  The common concerns fall broadly into three categories:

Tags are Too Final

Some suggest that the new tags were poorly researched.  No formal field research, ethnographic interviews, or anything else you might expect.  In essence, someone queried a few Google indexes for common web developer practices, and decided to invent new tags for what people were already doing.  Yes, that’s pretty much how it happened.

An HTML tag is the most fundamental unit of the language.  A tag is about as solid and final as it gets.  Should these new semantics be codified as top-level HTML tags?  Or would it be smarter to use a mechanism which could adapt to the way people use the web in the future?

John Allsopp articulates this position and offers some alternative approaches in his article on HTML5 semantics.

Extensibility is Ideal

Most people are willing to compromise on the existence of new top-level tags.  I mean, HTML already contains a bunch of tags we routinely ignore, so what’s a few more?  However, many still argue that extensibility is ideal, or even a must-have.

What do we do when common web development practice changes, and doesn’t exactly match the patterns that the HTML5 drafting committee supposed?  Do we need to spin up a completely new standards process, or will there be an easy extensibility mechanism that doesn’t involve hacks and “convention”?

Jeffrey Zeldman approximates this “middle of the road” position, especially in the comments thread here.

Do Tags Right

Finally, there are those who think the overall approach is just fine, and simply want to make sure that the new tags are defined properly.  They may disagree about which tags should be added, and vary in their assessments of the spec ambiguities, but they think that inventing new tags is a good idea in general.  And they are reasonably confident that the spec can be versioned in the future, when practices change.  They argue that the best strategy is to get involved in the process, to specify tags in a clear and useful way.

Jeremy Keith approximates this position in his post from today.

My Opinion

Out of the three broad categories we’ve discussed, which one makes the most sense to me?  As fascinating as this discussion has been, I’m afraid that I have to cop-out by saying “none of them”, and “all of them”.

The purist in me sympathizes with the “tags are too final” assessment.  But the web is nothing but a huge, glorious collection of hacks anyway.  This place is harsh to perfectionists.

On the other hand, the web’s chaotic nature is no excuse for sloppy, haphazard design.  It would certainly be nice to see more effort here.

And ultimately, what happens in the spec may matter a lot less than we currently think.  As John’s article points out, there are three “extensibility” mechanisms already supported by existing web browsers today: authors can use the class attribute (as microformats do), invent new attributes, or even invent new top-level tags.  We already ignore the <address> tag in favor of a microformats class convention — and HTML5 can’t fix this anyway.  My prediction is that web developers will continue to use a mix of techniques to express semantics.

More importantly, semantics are only important when they’re understood and used by specialized user agents or search tools.  Do we imagine that calendar import of embedded <time> tags will suddenly become a killer feature of user agents, when microformats have been available for years?  Will browsers resurrect the old and abandoned in-chrome site-tree features, and tie them only to <nav> tags, encouraging web developers to abandon their existing techniques en masse?  We’ll see.  Once again, I predict that the techniques for expressing semantics will become more diverse, not simpler.

So what do you think?  Join the discussion below, or send us a comment on Twitter.

Follow the Conversation

3 comments so far. You should leave one, too.

Sands Fish said on Sep 1, 2009

Yes, microformats seem to be one way toward extensibility. Good point on how we''re probably just in for another round of specification redefinition.

And what of RDFa ???

It seems to me that with all of the leaps in web technology over the past decade, we''re going to be in for some real big changes in web markup. There''s so much to accommodate.

Joshua Allen said on Sep 1, 2009

Yes, RDFa is a big one I forgot to mention. RDFa is making some strides in awareness and adoption recently, and if Google continues to support it then that will be a big incentive for authors to use it.

Which raises the obvious point that search indexes are not likely to support only one semantics format -- it''s in the engines'' interests to support as much semantics as possible, and they get a competitive edge by extracting semantics that are difficult or capital-intensive for competitors to match. So that, also, does not bode well for convergence.

Lars Gunther said on Sep 2, 2009

You do not mention the accessibility aspect. Since elements can be made visible to AT and are short and easy to learn, they will see more usage than ARIA roles.

Classes OTO are not exposed to AT at all.