Connecting to Twitter API
There are different ways to connect to the Twitter API (application program interface). But one of the easiest ways is to apply Twitter API key and install Twython, a python module that makes the use of Twitter API less suffering. You can also try Tweepy, which is more comprehensive than Twython, but then you need to apply the OAuth access token, which is suggested but not required in Twython.
This book aims to minimize the programming work for a social scientist, so I will go for Twython in this chapter. But you are encouraged to try various different things after you are becoming more sophisticate - you can even try to write your own module.
Install Twython at first, and then you can import it together with other modules.
import time, random
from twython import Twython
As mentioned, to connect to Twitter and milk data, you will have to apply a pair of key and secret from here. Using the assigned key and secret, you can initialize a connector to download the stream data of tweets.
Initialize a Connector
Consumer_Key = 'your own key'
Consumer_Secret = 'your own secret'
twitter = Twython(Consumer_Key, Consumer_Secret)
After you initialize the connector, you are good to go for the data stream - just imagine how you connect to gas pump using nozzle and "download" gas to your own tank. Simple, right ? But be aware of the rate limits and try not to hit it.
Search Tweets by Keywords
def getTweets(keyword):
t = twitter.search(q=keyword, result_type='popular')
time.sleep(random.random())
d=[[i['user']['name'],i['text'],i['created_at'],i['user']['followers_count']] for i in t['statuses']]
return d
keyword = '#hillaryclinton'
t = getTweets(keyword)
In the above codes, we define a very simple function to search tweets that are relevant to our keyword. Check the manual of Twython and you will find other more advanced data collection functions, such as getting the real-time stream of tweets or searching tweets by location or user. There are also parameters in these functions that allows you to set the duration/quantity of data collection.
The above figure shows a slice of data we got from Twitter. Besides the tweet text, we also get user names, the published time of tweets, and the number of followers of those users. You may want to save the download data for future analysis:
Save the downloaded Tweets
f = open('.../tweets.txt', "wb")
for i in t:
try:
f.write('\t'.join(i)+'\n')
except:
pass
f.close()