Jump to content

How to Capture Tweets in Real-time with Twitter's Streaming API

+ 1
  chco's Photo
Posted Apr 06 2011 01:20 AM

Suppose you want to capture a stream of public tweets in real-time, optionally filtering by select screen names or keywords in the text of the tweet. The solution is to use Twitter's streaming API. The following excerpt from the O'Reilly publication 21 Recipes for Mining Twitter will show you how.

While handy and quite beautiful, the twitter package doesn’t support streaming API resources at this time. However, tweepy (http://github.com/joshthecoder/tweepy) is a very nice package that provides simplified access to streaming API resources and can easily be used to interact with the streaming API. The PyPi version of tweepy has been noted to be somewhat dated compared to the latest commit to its public GitHub repository, so it is recommended that you install directly from GitHub using a handy build tool called pip (http://pip.openplans.org/). You can conveniently and predictably install pip with easy_install pip, and afterward, a pip executable should appear in your path. From there, you can install the latest revision of tweepy with the following command: pip install git+git://github.com/joshthecoder/tweepy.git.

With tweepy installed, Example 1-17 shows you how to create a streaming API instance and filter for any public tweets containing keywords of interest. Try TeaParty or JustinBieber if you want some interesting results from two high velocity communities.

Example 1-17. Filtering tweets using the streaming API (see http://github.com/pt...treaming_api.py)
# -*- coding: utf-8 -*-

import sys
import tweepy
import webbrowser

# Query terms

Q = sys.argv[1:] 

# Get these values from your application settings.


# Get these values from the "My Access Token" link located in the
# margin of your application details, or perform the full OAuth
# dance.


auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# Note: Had you wanted to perform the full OAuth dance instead of using
# an access key and access secret, you could have uses the following 
# four lines of code instead of the previous line that manually set the
# access token via auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET).
# auth_url = auth.get_authorization_url(signin_with_twitter=True)
# webbrowser.open(auth_url)
# verifier = raw_input('PIN: ').strip()
# auth.get_access_token(verifier)

class CustomStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        # We'll simply print some values in a tab-delimited format
        # suitable for capturing to a flat file but you could opt 
        # store them elsewhere, retweet select statuses, etc.

            print "%s\t%s\t%s\t%s" % (status.text, 
        except Exception, e:
            print >> sys.stderr, 'Encountered Exception:', e

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

# Create a streaming API and set a timeout value of 60 seconds.

streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=60)

# Optionally filter the statuses you want to track by providing a list
# of users to "follow".

print >> sys.stderr, 'Filtering the public timeline for "%s"' % (' '.join(sys.argv[1:]),)

streaming_api.filter(follow=None, track=Q)

If you really like twitter, there’s no reason you couldn’t use twitter and tweepy together. For example, suppose you wanted to implement a bot to retweet any tweet by Tim O’Reilly about Open Government or Web 2.0. In this scenario, you might use tweepy to capture a stream of tweets, filtering on @timoreilly and certain keywords or hashtags, but use twitter to retweet or perform other actions.

Finally, although a slightly less elegant option, it is certainly possible to poll one or more of the /users timeline resources for updates of interest instead of using the streaming API. If you choose to take this approach, be sure to take advantage of the since_id keyword parameter to request only tweets that have been updated since you last checked.

21 Recipes for Mining Twitter

Learn more about this topic from 21 Recipes for Mining Twitter.

Millions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools. Each recipe offers a discussion of how and why the solution works, so you can quickly adapt it to fit your particular needs.

See what you'll learn

0 Subscribe

0 Replies