Jump to content

How to Make the Most of Metadata

0
  gavinbell's Photo
Posted Oct 29 2009 09:45 AM

Metadata creates connections between content on your site; it is in your interest to support it. For photography, there is the data held in EXIF (EXchangeable Image File) format, which is embedded inside the data of each digital photo describing the parameters of the photo as taken in the camera. Similarly, for video there is clip length and camera type. Books have publication information such as page count, edition, publisher, and ISBN.

If you treat metadata as a means of enabling organization for your site, it can give additional structure for a little extra work. Every item of metadata you add to the site is a possible link to something else on your site and additional discoverable information for people searching the Web. Place, time, and intrinsic file format information are the most obvious types of information to gather. In addition, there is a wealth of sensor data to come from new devices connecting to the Internet, from Arduino and ZigBee electronics to embedded GPS and even e-book readers. In time there will be a lot more data to associate with individuals.

However, one of the best types of metadata comes from your users, who will add tags to their content, and in some cases will tag other people’s content, too. These tags provide a good means of organizing your site. Machine tags are a special type of tag that uses a key/value paring such as name:gavin to make data values explicit. They are useful for getting better value from simple tagging (for more information, see http://www.flickr.co...57594497877875/). Machine tags allow for a key/value pair as a single tag; this means you can store lat:51.0234 as a single tag and not have to guess which of the numeric values in the four tags “lat, long, 51.0234, 0.003” is the latitude and which is the longitude.

Some of the data you capture will be really messy. Place of work and address information is notorious for being noisy data, as there is a lot of variance in how people relate address information. Getting the place of work and street address for a university department from 10 people at the same department can result in 10 variations on the address, depending on how complete or abbreviation-laden each one is.

Normalization is hard, because you will usually not have a fixed list of places to work with. Auto-complete and auto-suggestion can be effective here. Depending on your country, you might be able to get a postcode or zip code database. Then you can start determining location based on postcode first. If that is not possible, a type and suggest interface can work well, where the input from the user is automatically used as search input while she types.



Tags:
0 Subscribe


0 Replies