Jump to content

How to Add People Microdata to Your Website Using HTML5

  chco's Photo
Posted Aug 23 2010 08:52 PM

The following excerpt from HTML5: Up and Running shows you how to add microdata about people to your website with HTML5.
The easiest way to integrate microdata into a personal website is on your “about” page. You do have an “about” page, don’t you? If not, you can follow along as I extend this sample “about” page with additional semantics. The final result is here: http://diveintohtml5...-microdata.html.

Let’s look at the raw markup first, before any microdata properties have been added:

  <img width="204" height="250"
       alt="[Mark Pilgrim, circa 2000]">

  <h1>Contact Information</h1>
    <dd>Mark Pilgrim</dd>

    <dd>Developer advocate for Google, Inc.</dd>

    <dt>Mailing address</dt>
      100 Main Street<br>
      Anytown, PA 19999<br>
  <h1>My Digital Footprints</h1>
    <li><a href="http://diveintomark.org/">weblog</a></li>
    <li><a href="http://www.google.com/profiles/pilgrim">Google profile</a></li>
    <li><a href="http://www.reddit.com/user/MarkPilgrim">Reddit.com profile</a></li>
    <li><a href="http://www.twitter.com/diveintomark">Twitter</a></li>

The first thing you always need to do is declare the vocabulary you’re using and the scope of the properties you want to add. You do this by adding the itemtype and itemscope attributes on the outermost element that contains the other elements that contain the actual data. In this case, that’s a <section> element:

<section itemscope itemtype="http://data-vocabulary.org/Person">

Note: You can follow along online with the changes made throughout this section. Before: http://diveintohtml5...les/person.html; after: http://diveintohtml5...-microdata.html.

Now you can start defining microdata properties from the http://data-vocabulary.org/Person vocabulary. But what are those properties? As it happens, you can see the list of properties by navigating to http://data-vocabulary.org/Person. The microdata specification does not require this, but I’d say it’s certainly a “best practice.” After all, if you want developers to actually use your microdata vocabulary, you need to document it. And where better to put your documentation than the vocabulary URL itself? Table 10-2 lists the properties of the Person vocabulary.

Table 10-2. Person vocabulary

photoAn image link
titleThe person’s title (for example, “Financial Manager”)
roleThe person’s role (for example, “Accountant”)
urlLink to a web page, such as the person’s home page
affiliationThe name of an organization with which the person is associated (for example, an employer)
friendIdentifies a social relationship between the person described and another person
contactIdentifies a social relationship between the person described and another person
acquaintanceIdentifies a social relationship between the person described and another person
addressThe location of the person (can have the subproperties street-address, locality, region, postal-code, and country-name)

The first thing in this sample “about” page is a picture of me. Naturally, it’s marked up with an <img> element. To declare that this <img> element is my profile picture, all we need to do is add itemprop="photo" to it:

<img itemprop="photo" width="204" height="250"
    alt="[Mark Pilgrim, circa 2000]">

Where’s the microdata property value? It’s already there, in the src attribute. If you recall from Table 10-1, the “value” of an <img> element is its src attribute. Every <img> element has a src attribute—otherwise it would just be a broken image—and the src is always a URL. See? If you’re using HTML correctly, microdata is easy.

Furthermore, this <img> element isn’t alone on the page. It’s a child element of the <section> element, the one we just declared with the itemscope attribute. Microdata reuses the parent-child relationships of elements on the page to define the scoping of microdata properties. In plain English, we’re saying: “This <section> element represents a person. Any microdata properties you might find on the children of the <section> element are properties of that person.” If it helps, you can think of the <section> element as the subject of a sentence. The itemprop attribute represents the verb of the sentence—something like “is pictured at”—and the microdata property value represents the object of the sentence:

This person [explicit, from <section itemscope itemtype="...">]

is pictured at [explicit, from <img itemprop="photo">]

http://diveintohtml5...000_05_mark.jpg [implicit, from <img src> attribute]

The subject only needs to be defined once, by putting itemscope and itemtype attributes on the outermost <section> element. The verb is defined by putting the itemprop="photo" attribute on the <img> element. The object of the sentence doesn’t need any special markup at all, because Table 10-1 says that the property value of an <img> element is its src attribute.

Moving on to the next bit of markup, we see an <h1> header and the beginnings of a <dl> list. Neither the <h1> nor the <dl> needs to be marked up with microdata. Not every piece of HTML needs to be a microdata property. Microdata is about the properties themselves, not the markup or headers surrounding the properties. This <h1> isn’t a property; it’s just a header. Similarly, the <dt> that says “Name” is just a label, not a property:

<h1>Contact Information</h1>
    <dd>Mark Pilgrim</dd>

So where is the real information? It’s in the <dd> element, so that’s where we need to put the itemprop attribute. Which property is it? It’s the name property. Where is the property value? It’s the text within the <dd> element. Does that need to be marked up? Table 10-1 says no, <dd> elements have no special processing, so the property value is just the text within the element:

<dd itemprop="name">Mark Pilgrim</dd>

What did we just say, in English? “This person’s name is Mark Pilgrim.” Well OK then. Onward.

The next two properties are a little tricky. This is the markup, pre-microdata:

<dd>Developer advocate for Google, Inc.</dd>

If you look at the definition of the Person vocabulary, the text “Developer advocate for Google, Inc.” actually encompasses two properties: title (“Developer advocate”) and affiliation (“Google, Inc.”). How can you express that in microdata? The short answer is, you can’t. Microdata doesn’t have a way to break up runs of text into separate properties. You can’t say “the first 18 characters of this text are one microdata property, and the last 12 characters of this text are another microdata property.”

But all is not lost. Imagine that you wanted to style the text “Developer advocate” in a different font from the text “Google, Inc.” CSS can’t do that either. So what would you do? You would first need to wrap the different bits of text in dummy elements, like <span>, then apply different CSS rules to each <span> element.

This technique is also useful for microdata. There are two distinct pieces of information here: a title and an affiliation. If you wrap each piece in a dummy <span> element, you can declare that each <span> is a separate microdata property:

<dd><span itemprop="title">Developer advocate</span> for
    <span itemprop="affiliation">Google, Inc.<span></dd>

Ta-da! In English, this says: “This person’s title is ‘Developer advocate.’ This person is employed by Google, Inc.” Two sentences, two microdata properties. A little more markup, but a worthwhile tradeoff.

The same technique is useful for marking up street addresses. The Person vocabulary defines an address property, which itself is a microdata item. That means the address has its own vocabulary (http://data-vocabulary.org/Address) and defines its own properties: street-address, locality, region, postal-code, and country-name.

If you’re a programmer, you are probably familiar with dot notation to define objects and their properties. Think of the relationship like this:

  • Person

  • Person.address

  • Person.address.street-address

  • Person.address.locality

  • Person.address.region

  • Person.address.postal-code

  • Person.address.country-name

In this example, the entire street address is contained in a single <dd> element. (Once again, the <dt> element is just a label, so it plays no role in adding semantics with microdata.) Notating the address property is easy. Just add an itemprop attribute on the <dd> element:

<dt>Mailing address</dt>
<dd itemprop="address">

But remember, the address property is itself a microdata item. That means we need to add the itemscope and itemtype attributes too:

<dt>Mailing address</dt>
<dd itemprop="address" itemscope

We’ve seen all of this before, but only for top-level items. A <section> element defines itemtype and itemscope, and all the elements within the <section> element that define microdata properties are “scoped” within that specific vocabulary. But this is the first time we’ve seen nested scopes—defining a new itemtype and itemscope (on the <dd> element) within an existing one (on the <section> element). This nested scope works exactly like the HTML DOM. The <dd> <element has a certain number of child elements, all of which are scoped to the vocabulary defined on the <dd> element. Once the <dd> element is closed with a corresponding </dd> tag, the scope reverts to the vocabulary defined by the parent element (<section>, in this case).

The properties of the Address vocabulary suffer the same problem we encountered with the title and affiliation properties. There’s just one long run of text, but we want to break it up into several separate microdata properties. The solution is the same. We wrap each distinct piece of information in a dummy <span> element, then declare microdata properties on each <span> element:

  <dd itemprop="address" itemscope
    <span itemprop="street-address">100 Main Street</span><br>
    <span itemprop="locality">Anytown</span>,
    <span itemprop="region">PA</span>
    <span itemprop="postal-code">19999</span>
    <span itemprop="country-name">USA</span>

In English: “This person has a mailing address. The street address part of the mailing address is ‘100 Main Street.’ The locality part is ‘Anytown.’ The region is ‘PA.’ The postal code is ‘19999.’ The country name is ‘USA.’” Easy peasy.

Ask Professor Markup

Q: Is this mailing address format U.S.-specific?

A: No. The properties of the Address vocabulary are generic enough that they can describe most mailing addresses in the world. Not all addresses will have values for every property, but that’s OK. Some addresses might require fitting more than one “line” into a single property, but that’s OK too. For example, if your mailing address has a street address and a suite number, they would both go into the street-address subproperty:

<p itemprop="address" itemscope
  <span itemprop="street-address">
    100 Main Street
    Suite 415

There’s one more thing on this sample “about” page: a list of URLs. The Person vocabulary has a property for this, called url. A url property can be anything, really. (Well, it has to be a URL, but you probably guessed that.) What I mean is that the definition of the url property is very loose. The property can be any sort of URL that you want to associate with a Person: a blog, a photo gallery, or a profile on another site like Facebook or Twitter.

The other important thing to note here is that a single Person can have multiple url properties. Technically, any property can appear more than once, but until now, we haven’t taken advantage of that. For example, you could have two photo properties, each pointing to a different image URL. Here, I want to list four different URLs: my weblog, my Google profile page, my user profile on Reddit, and my Twitter account. In HTML, that’s a list of links: four <a> elements, each in its own <li> element. In microdata, each <a> element gets an itemprop="url" attribute:

<h1>My Digital Footprints</h1>
  <li><a href="http://diveintomark.org/"
  <li><a href="http://www.google.com/profiles/pilgrim"
         itemprop="url">Google profile</a></li>
  <li><a href="http://www.reddit.com/user/MarkPilgrim"
         itemprop="url">Reddit.com profile</a></li>
  <li><a href="http://www.twitter.com/diveintomark"

According to Table 10-1, <a> elements have special processing. The microdata property value is the href attribute, not the child text content. The text of each link is actually ignored by a microdata processor. Thus, in English, this says: “This person has a URL at http://diveintomark.org/. This person has another URL at http://www.google.com/profiles/pilgrim. This person has another URL at http://www.reddit.com/user/MarkPilgrim. This person has another URL at http://www.twitter.com/diveintomark.”

Table 10-1: Where do microdata property values come from?

Microdata’s data model is name/value pairs. A microdata property name is always declared on an HTML element. The corresponding property value is then taken from the element’s DOM. For most HTML elements, the property value is simply the text content of the element. However, there are a handful of exceptions, as Table 10-1 illustrates.

<meta>content attribute




src attribute

href attribute
<object>data attribute
<time>datetime attribute
All other elementsText content

HTML5: Up and Running

Learn more about this topic from HTML5: Up and Running.

If you don't know about the new features available in HTML5, now's the time to find out. This book provides practical information about how and why the latest version of this markup language will significantly change the way you develop for the Web. HTML5: Up & Running carefully guides you though the important changes in this version with lots of hands-on examples, including markup, graphics, and screenshots. You'll learn how to use HTML5 markup to add video, offline capabilities, and more -- and you’ll be able to put that functionality to work right away.

See what you'll learn

0 Subscribe

1 Reply

  Julian Beaujardin's Photo
Posted Oct 18 2011 08:47 AM

Optimum7 (an Internet Marketing Services Company) is implementing a plugin for people to add microdata to their wordpress blogs. There is an article that reads what those guys are doing in this area. Microdata for Wordpress