Jump to content

How to Fetch Public Data about People Using Email Addresses

0
  chco's Photo
Posted Apr 07 2011 02:34 PM

The following excerpt helps you find information about users on their systems using an email address as a search term. Since it’s common to have email addresses for your own users, it’s often possible to fetch additional data on them from their other public profiles. To learn more about this please consider the recent O'Reilly publication Data Source Handbook.
If you retrieve a location, real name, portrait, or description from an external service, you can use it to prepopulate your own “create a profile” page. You can find open source code examples demonstrating how to use most of these APIs at http://github.com/pe...den/findbyemail, and there’s a live web demo at http://web.mailana.c...bs/findbyemail/.

WebFinger

WebFinger is a unified API that you can use to discover additional information about a person based on his or her email address. It’s very much focused on the discovery protocol, and it doesn’t specify much about the format of the data returned. It’s supported by Google, Yahoo and AOL. You can also see PHP source code demonstrating how client code can call the protocol. It’s a REST interface, it returns its results in XML format, and it doesn’t require any authentication or keys to access.

Flickr

As a widely used service, the Flickr REST/XML API is a great source of information on email addresses. You’ll see a location, real name, and portrait for people with public profiles, and you’ll be able to suggest linking their Flickr accounts with your own site. You’ll need to register as a developer before you can access the interface:

curl "http://api.flickr.com/services/rest/?\
method=flickr.people.findByEmail&api_key=<key>&find_email=tim%40oreilly.com"

<?xml version="1.0" encoding="utf-8" ?>
<rsp stat="ok">
<user id="36521959321@N01" nsid="36521959321@N01">
        <username>timoreilly</username>
</user>
</rsp>

curl "http://api.flickr.com/services/rest/?\
method=flickr.people.getInfo&api_key=<key>&user_id=36521959321@N01"

<?xml version="1.0" encoding="utf-8" ?>
<rsp stat="ok">
<person id="36521959321@N01" nsid="36521959321@N01"
 ispro="1" iconserver="1362" iconfarm="2" path_alias="timoreilly">
        <username>timoreilly</username>
        <realname>Tim O'Reilly</realname>
        <location>Sebastopol, CA, USA</location>
        <photosurl>http://www.flickr.com/photos/timoreilly/</photosurl>
        <profileurl>http://www.flickr.com/people/timoreilly/</profileurl>
        <mobileurl>http://m.flickr.com/photostream.gne?id=10317</mobileurl>
        <photos>
                <firstdatetaken>2002-08-03 13:40:04</firstdatetaken>
                <firstdate>1093117877</firstdate>
                <count>1379</count>
        </photos>
</person>
</rsp>


Gravatar

This service lets you pass in an MD5 hash of an email address, and for registered users, it will return a portrait image. Thanks to its integration with Wordpress, quite a few people have signed up, so it can be a good way of providing at least default avatars for your own users. You could also save yourself some coding by directing new users to Gravatar’s portrait creation interface. There’s also a profile lookup API available, but I haven’t had any experience with how well-populated this is:

md5 -s pete@mailana.com
MD5 ("pete@mailana.com") = 03e801b74b01f23957a3afdd9aaaed00

<img src="http://www.gravatar.com/avatar/03e801b74b01f23957a3afdd9aaaed00" />


Figure 1-3. Gravatar portrait image

Attached Image


Amazon

Like Yahoo!, Amazon doesn’t expose very much information about each user when you look up an email address, but you can often get at least a location. The sheer size of Amazon’s user base means that you’ll find information on a large percentage of emails. There’s also the chance to discover public wishlists, which could be helpful for creating default interests for your new users’ profiles.

The API is REST/XML-based, but it does require a somewhat complex URL signing scheme for authentication.

AIM

You can look up an AOL Instant Messenger account from an email address, and you get a portrait image and username back. The exact information returned depends on whether the user is online, and you’ll only get a default image if he or she is away. The service uses a REST/JSON API, and it requires a sign up to access:

curl "http://api.oscar.aol.com/presence/get?f=json&k=<key>&\
t=petewarden%40aol.com&emailLookup=1&notFound=1"

{"response":{"statusCode":200, "statusText":"Ok", "data":{"users":[{
  "emailId":"petewarden@aol.com",
  "aimId":"petewarden",
  "displayId":"petewarden",
  "state":"offline",
  "userType":"aim",
  "presenceIcon":"http://o.aolcdn.com/aim/img/offline.gif"
}]}}}


FriendFeed

FriendFeed never had a lot of users, but many influential early adopters signed up and created profiles including their other accounts. This makes it a great source of Twitter and Facebook account information on tech-savvy users, since you can look up their FriendFeed accounts by email address, and then pull down the other networks they mention in their profiles. It’s a REST/JSON interface, and it doesn’t require any authentication or developer signup to access:

curl "http://friendfeed.com/api/feed/user?emails=tim%40oreilly.com"

{
...
"user":{"profileUrl":"http://friendfeed.com/timoreilly",
  "matchedEmail":"tim@oreilly.com",
  "nickname":"timoreilly",
  "id":"d85e8470-25c5-11dd-9ea1-003048343a40",
  "name":"Tim O'Reilly"}
}]}

curl "http://friendfeed.com/api/user/timoreilly/profile"

{"status":"public","name":"Tim O'Reilly",
...
"services":[
{"url":"http://en.wikipedia.org/wiki/Blog","iconUrl":"...",
 "id":"blog","profileUrl":"http://radar.oreilly.com","name":"Blog"},
{"username":"timoreilly","name":"Disqus","url":"http://www.disqus.com/",
 "profileUrl":"http://www.disqus.com/people/timoreilly/","iconUrl":"...","id":"disqus"},
{"username":"timoreilly","name":"Flickr","url":"http://www.flickr.com/",
 "profileUrl":"http://www.flickr.com/photos/36521959321%40N01/",
 "iconUrl":"...","id":"flickr"},
{"username":"timoreilly","name":"SlideShare","url":"http://www.slideshare.net/",
 "profileUrl":"http://www.slideshare.net/timoreilly",
 "iconUrl":"...","id":"slideshare"},
{"username":"timoreilly","name":"Twitter","url":"http://twitter.com/",
 "profileUrl":"http://twitter.com/timoreilly",
 "iconUrl":"...","id":"twitter"},
{"username":"tadghin","name":"YouTube","url":"http://www.youtube.com/",
 "profileUrl":"http://www.youtube.com/profile?user=tadghin",
 "iconUrl":"...","id":"youtube"},
{"url":"http://www.facebook.com/","iconUrl":"...","id":"facebook",
 "profileUrl":"http://www.facebook.com/profile.php?id=544591116",
 "name":"Facebook"}],
"nickname":"timoreilly","id":"d85e8470-25c5-11dd-9ea1-003048343a40"}


Google Social Graph

Though it’s an early experiment that’s largely been superseded by Webfinger, this Google API can still be useful for the rich connection information it exposes for signed-up users. Unfortunately, it’s not as well-populated as you might expect. It doesn’t require any developer keys to access:

curl "http://socialgraph.apis.google.com/lookup?\
q=mailto%3asearchbrowser%40gmail.com&fme=1&edi=1&edo=1&pretty=1&sgn=1&callback="

{ "canonical_mapping": {
  "mailto:searchbrowser@gmail.com": "sgn://mailto/?pk\u003dsearchbrowser@gmail.com"
 },
 "nodes": {
  "sgn://mailto/?pk\u003dsearchbrowser@gmail.com": {
   "attributes": {
   },
   "claimed_nodes": [
   ],
   "unverified_claiming_nodes": [
    "sgn://typepad.com/?ident\u003dpetewarden"
   ],
   "nodes_referenced": {
   },
   "nodes_referenced_by": {
    "sgn://typepad.com/?ident\u003dpetewarden": {
     "types": [
      "me"
     ]
    }
   }
  }
 }
}


MySpace

The early social network still holds information on a lot of people, and it exposes a surprisingly large amount, including things like age and gender. This could come in handy if you need to do a demographic analysis of your user base, though with the lack of activity on the site, the information will become less useful as time goes by. You can use the API without any authentication:

curl "http://api.myspace.com/opensearch/people?searchBy=email&\
searchTerms=bill%40example.com"

{"startIndex":"1","itemsPerPage":"10","totalResults":"2",
 "resultCount":"2","searchId":"34848869-de3b-415a-81ab-5df0b1ed82eb","entry":[{
  "id":"myspace.com.person.3430419",
  "displayName":"bill",
  "profileUrl":"http:\/\/www.myspace.com\/3430419",
  "thumbnailUrl":"http:\/\/x.myspacecdn.com\/images\/no_pic.gif",
  "msUserType":"RegularUser",
  "gender":"Female",
  "age":"31",
  "location":"",
  "updated":"12\/12\/2010 6:49:11 PM",
  "isOfficial":"0"},{
  "id":"myspace.com.person.146209268",
  "displayName":"Andy",
  "profileUrl":"http:\/\/www.myspace.com\/146209268",
  "thumbnailUrl":"http:\/\/x.myspacecdn.com\/images\/no_pic.gif",
  "msUserType":"RegularUser",
  "gender":"Male",
  "age":"34",
  "location":"",
  "updated":"3\/26\/2010 1:14:00 PM",
  "isOfficial":"0"}]}


Github

If you’re targeting people who are likely to be developers, there’s a good chance they’ll have github accounts, and if they’ve opted-in to being found by email address, you’ll be able to pull up their public details. The API doesn’t require authorization, or even registration, and it gives you information on users’ companies, real names, locations, and any linked sites, like blogs:

curl "http://github.com/api/v2/xml/user/email/pete%40petewarden.com"

<?xml version="1.0" encoding="UTF-8"?>
<user>
  <gravatar-id>9cbf603d5f93133178367214f1e091b9</gravatar-id>
  <company>Mailana Inc</company>
  <name>Pete Warden</name>
  <created-at type="datetime">2009-12-03T08:29:50-08:00</created-at>
  <location>Boulder, CO</location>
  <public-repo-count type="integer">26</public-repo-count>
  <public-gist-count type="integer">0</public-gist-count>
  <blog>http://petewarden.typepad.com/</blog>
  <following-count type="integer">0</following-count>
  <id type="integer">161459</id>
  <type>User</type>
  <permission nil="true"></permission>
  <followers-count type="integer">58</followers-count>
  <login>petewarden</login>
  <email>pete@petewarden.com</email>
</user>


Rapleaf

Originally, Rapleaf’s API returned information about a person’s social networking accounts if you supplied an email, but it has recently switched to offering demographic data on age, gender, income, and address instead. The FindByEmail code still uses the old V2 API. Since the service gathers data without any user involvement (though it does operate an opt out system), it’s been controversial.

Jigsaw

Another service that collects and aggregates information on people with no involvement from the users, Jigsaw lets you look up people by email address. It returns information on a person’s real name, location, phone number, company, and job title, if he or she is in the database.
Data Source Handbook

Learn more about this topic from Data Source Handbook.

If you're a developer looking to supplement your own data tools and services, this concise ebook covers the most useful sources of public data available today. You’ll find useful information on APIs that offer broad coverage, tie their data to the outside world, and are either accessible online or feature downloadable bulk data. You'll also find code and helpful links.

See what you'll learn


Tags:
0 Subscribe


0 Replies