Jump to content

A Data-Driven Game Using Facebook Data

+ 1
  chco's Photo
Posted Mar 14 2011 04:55 AM

Sometimes you need to visualize the data that you collect from Facebook. The following excerpt from the recent O'Reilly publication Mining the Social Web offers some insight into doing so with this handy game.
There are a number of interesting variables that you could correlate for analysis, but everyone loves a good game every once in a while. This section lays the foundation for a simple game you can play to see how well you know your friends, by grouping them such that their hometowns and current locations are juxtaposed. As always, the full source is available online at http://github.com/pt...nd_hometowns.py.

The FQL query we’ll run to get the names, current locations, and hometowns is simple and should look fairly familiar to previous FQL queries:

q = """select name, current_location, hometown_location from user where uid in
       (select target_id from connection where source_id = me())"""
results = fql(query=q)


Example 9-17 shows the final format that feeds the tree widget once you’ve invested the sweat equity in massaging it into the proper format, and Example 9-18 shows the Python code to generate it.

Example 9-17. Target JSON that needs to be produced for consumption by the Dojo tree widget


{
    "items": [
        {
            "name": " Alabama (2)", 
            "children": [
                {
                    "state": " Alabama", 
                    "children": [
                        {
                            "state": " Tennessee", 
                            "name": "Nashville, Tennessee (1)", 
                            "children": [
                                {
                                    "name": "Joe B."
                                }
                            ]
                        }
                    ], 
                    "name": "Prattville, Alabama (1)", 
                    "num_from_hometown": 1
                }
            ]
        }, 
        {
            "name": " Alberta (1)", 
            "children": [
                {
                    "state": " Alberta", 
                    "children": [
                        {
                            "state": " Alberta", 
                            "name": "Edmonton, Alberta (1)", 
                            "children": [
                                {
                                    "name": "Gina F."
                                }
                            ]
                        }
                    ], 
                    "name": "Edmonton, Alberta (1)", 
                    "num_from_hometown": 1
                }
            ]
        }, 
        ...
    ], 
    "label": "name"
    }


The final widget ends up looking like Figure 9-8, a hierarchical display that groups your friends first by where they are currently located and then by their hometowns. In Figure 9-8, Jess C. is currently living in Tuscaloosa, AL but grew up in Princeton, WV. Although we’re correlating two harmless variables here, this exercise helps you quickly determine where most of your friends are located and gain insight into who has migrated from his hometown and who has stayed put. It’s not hard to imagine deviations that are more interesting or faceted displays that introduce additional variables, such as college attended, professional affiliation, or marital status.

Figure 9-8. How well do you know your friends? Find out by playing the smash hit, “Where are they now?”, a game that’s just as informative as it is fun!

Attached Image


A simple FQL query is all that it took to fetch the essential data, but there’s a little work involved in rolling up data items to populate the hierarchical tree widget. A fun improvement to the user experience might be integrating Google Maps with the widget so that you can quickly bring up locations you’re unfamiliar with on a map. Adding age and gender information into this display could also be interesting if you want to dig deeper or take another approach to clustering. Emitting some KML in a fashion and visualizing it in Google Earth might be another possibility worth considering, depending on your objective.

Example 9-18. Harvesting data and computing the target JSON as displayed in Example 9-17 (facebook__get_friends_current_locations_and_hometowns.py)


import sys
import json
import facebook
from facebook__fql_query import FQL
from facebook__login import login

try:
    ACCESS_TOKEN = open("facebook.access_token").read()
except IOError, e:
    try:
        # If you pass in the access token from the Facebook app as a command-line
        # parameter, be sure to wrap it in single quotes so that the shell
        # doesn't interpret any characters in it. You may also need to escape
        # the # character.
        ACCESS_TOKEN = sys.argv[1]
    except IndexError, e:
        print >> sys.stderr, "Could not either find access token" + \ 
            in 'facebook.access_token' or parse args. Logging in..."
        ACCESS_TOKEN = login()


# Process the results of the following FQL query to create JSON output suitable for
# consumption by a simple hierarchical tree widget:

fql = FQL(ACCESS_TOKEN)

q = \
    """select name, current_location, hometown_location from user where uid in
       (select target_id from connection where source_id = me() and target_type = 
           'user')"""
results = fql.query(q)

# First, read over the raw FQL query and create two hierarchical maps that group 
# people by where they live now and by their hometowns. We'll simply tabulate 
# frequencies, but you could easily grab additional data in the FQL query and use it 
# for many creative situations.

current_by_hometown = {}
for r in results:
    if r['current_location'] != None:
        current_location = r['current_location']['city'] + ', ' \
            + r['current_location']['state']
    else:
        current_location = 'Unknown'

    if r['hometown_location'] != None:
        hometown_location = r['hometown_location']['city'] + ', ' \
            + r['hometown_location']['state']
    else:
        hometown_location = 'Unknown'

    if current_by_hometown.has_key(hometown_location):
        if current_by_hometown[hometown_location].has_key(current_location):
            current_by_hometown[hometown_location][current_location] += \
                [r['name']]
        else:
            current_by_hometown[hometown_location][current_location] = \
                [r['name']]
    else:
        current_by_hometown[hometown_location] = {}
        current_by_hometown[hometown_location][current_location] = \
            [r['name']]

# There are a lot of different ways you could slice and dice the data now that 
# it's in a reasonable data structure. Let's create a hierarchical
# structure that lends itself to being displayed as a tree.

items = []
for hometown in current_by_hometown:
    num_from_hometown = sum([len(current_by_hometown[hometown][current])
                            for current in current_by_hometown[hometown]])
    name = '%s (%s)' % (hometown, num_from_hometown)
    try:
        hometown_state = hometown.split(',')[1]
    except IndexError:
        hometown_state = hometown
    item = {'name': name, 'state': hometown_state,
            'num_from_hometown': num_from_hometown}

    item['children'] = []
    for current in current_by_hometown[hometown]:
        try:
            current_state = current.split(',')[1]
        except IndexError:
            current_state = current

        item['children'].append({'name': '%s (%s)' % (current,
                                len(current_by_hometown[hometown][current])),
                                'state': current_state, 'children'
                                : [{'name': f[:f.find(' ') + 2] + '.'}
                                for f in
                                current_by_hometown[hometown][current]]})

        # Sort items alphabetically by state. Further roll-up by state could 
        # be done here if desired.

    item['children'] = sorted(item['children'], key=lambda i: i['state'])
    items.append(item)

# Optionally, roll up outer-level items by state to create a better user experience 
# in the display. Alternatively, you could just pass the current value of items in
# the final statement that creates the JSON output for smaller data sets.

items = sorted(items, key=lambda i: i['state'])
all_items_by_state = []
grouped_items = []
current_state = items[0]['state']
num_from_state = items[0]['num_from_hometown']
for item in items:
    if item['state'] == current_state:
        num_from_state += item['num_from_hometown']
        grouped_items.append(item)
    else:
        all_items_by_state.append({'name': '%s (%s)' % (current_state,
                num_from_state), 'children': grouped_items})
        current_state = item['state']
        num_from_state = item['num_from_hometown']
        grouped_items = [item]
all_items_by_state.append({'name': '%s (%s)' % (current_state,
                          num_from_state), 'children': grouped_items})

# Finally, emit output suitable for consumption by a hierarchical tree widget

print json.dumps({'items': all_items_by_state, 'label': 'name'},
indent=4)


Mining the Social Web

Learn more about this topic from Mining the Social Web.

Popular social networks such as Facebook and Twitter generate a tremendous amount of valuable data on topics and use patterns. Who’s talking to whom? What are they talking about? How often are they talking? This concise and practical book shows you how to answer these questions and more by harvesting and analyzing data using social web APIs, Python tools, GitHub, HTML5, and Javascript.

See what you'll learn


Tags:
0 Subscribe


0 Replies