33
Created by The Curiosity Bits Blog (curiositybits.com) Download the Python code used in the tutorial Codes provided by Dr . Gregory D. Saxton Mining Twitter User Profile on Python 1

Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

Embed Size (px)

DESCRIPTION

This tutorial teaches you how to use Python code to crawl a list of users' profile information.

Citation preview

Page 2: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

2

Prerequisite

Setting up API keys: pg.4-6Installing necessary Python libraries: pg.7-8Creating a list of Twitter screen-names: pg.9Setting up a SQLite Database to store Twitter data: pg.10-14

But, if you are a Python newbie, so let’s start with the very basics.

Page 3: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

3

We assume you are a Python newbie, so let’s start with the very basics.

• Choosing the right Python platform: Python is a programing language, but you can use different software packages to write, edit and run Python codes. We choose Anaconda which is free to download, and the Python version is 2.7. • Once you install Anaconda, you can play around Python

codes in Spyder

Page 4: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

4

Setting up API keys

• We need keys to get Twitter data through Twitter API (https://dev.twitter.com/). You need: API Key, API Secret, Access token, Access token secret. • First, go to https://dev.twitter.com/, and sign in your Twitter

account. Go to my applications page to create an application.

Page 5: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

5

Enter any name that makes sense to you

Enter any text that makes sense to you

you can enter any legitimate URL, here, I put in the URL of my institution.

Same as above, you can enter any legitimate URL, here, I put in the URL of my institution.

Setting up API keys

Page 6: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

6

• After creating the app, go to API Keys page, scroll down to the bottom and click Create my access token. Wait for a few minutes and refresh the page, then you get all your keys!

Setting up API keys

you need API Key, API Secret, Access token, Access token secret.

Page 7: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

7

Installing necessary Python libraries

Think of Python libraries as the apps running on your operating system. To use our code, you need the following libraries:

• Simplejson (https://pypi.python.org/pypi/simplejson)

• Sqlite3 (http://sqlite.org/)• Sqlalchemy (http://www.sqlalchemy.org/)• Twython (

https://twython.readthedocs.org/en/latest/index.html)

Page 8: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

8

Installing necessary Python libraries To install the libraries, go to Start menu and type in CMD and run the CMD file as administrator. Once you are on CMD, type in the command line pip install, followed by the name of Python library. For example, to install Twython, you need to type pip install twython, and press enter. Use this procedure to Install all necessary libraries.

Page 9: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

9

• Our Python code enables gathering profile information for multiple Twitter users. So, first let’s create a list of users. The list should be in .csv format and contains three columns (in accordance to the configuration in our Python code). Specially, it looks like this:

Creating a list of Twitter screen-names

The first column lists sequential numbers

the second column lists Twitter screen-names you are interested in

For the third column, I entered 1 all throughout, but you can leave it blank.

Page 10: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

10

Setting up a SQLite Database to store Twitter dataYou need a storage for incoming data from Twitter API. That is what databases are for. We use SQLite, a Python library based on SQL. SQL is a common relational database management system (RDBMS). In previous steps, you have installed this sqlite library (sqlite3). On top of that, you can download a database browser to view and edit the database just like an Excel file.

Go to http://sqlitebrowser.sourceforge.net/ and download SQLite Database Browser. It allows you to view and edit SQLite databases.

Page 11: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

11

Setting up a SQLite Database to store Twitter data

Once you have the files downloaded, run the following file.

Page 12: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

12

Setting up a SQLite Database to store Twitter dataNow, we need to import the Twitter users list into a SQLite database. To do that, create a new database. Remember the database file name because we need to write that into Python code. The default file extension for sqlite is .sqlite, to prevent future complications, add the extension .sqlite when you save a file in SQLite database browser,.

Page 13: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

13

File-Import-Table From CSV File, import the .csv file you saved. Name the imported table as accounts. This table name corresponds to the one we will use in Python code. After you click create, the csv list will be loaded into the database, and you can browse it in Browse Data. Lastly, remember to save the database.

Setting up a SQLite Database to store Twitter data

Stay on the database file you just created.

Page 14: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

14

Setting up a SQLite Database to store Twitter dataNow, we need to modify the imported table.

Go to Edit-Modify Tables, then use Edit field to change column names. To correspond to our Python code, name the first column as rowed, and Filed Type as Integer; the second column as screen_name, and Field type String, and the third as user_type, and String. In the end, the database table is defined as the screen-shoted.

Page 15: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

15

Now, moving on to the actual Python code…

Download the Python code, and open it in Anaconda

Page 16: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

16

There are only a few places you need to change, but let’s walk through the code first…

The first block of code is to import necessary Python libraries

Make sure you have installed all these necessary libraries

Page 17: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

17

The second block is where you need to enter the keys we have obtained in the beginning. Just copy and paste the keys inside quotation mark.

API Key

API secret

Access token

Access token secret

Page 18: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

18

The third block is where we define columns in SQLite database. For now, we do not need to edit anything here.

Page 19: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

19

The fourth block is where we ask the Python code to get Twitter user profile information based on a list of users already saved in SQLite database. Here, you will see that table names and the column names correspond to the ones we previously saved in SQLite.

Page 20: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

20

The fifth block is where we make specific request through Twitter API to get data:

Here, we ask Python to get one recent status from the listed user. This procedure returns the user’s profile information. We will discuss what profile information is available later on.

Page 21: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

21

The raw output from Twitter API is in JSON format. JSON is a standardized way of storing information. Now we need to map the information in JSON format to the tables in database. Notice that each column in the database represents a Twitter output variable.

e.g. A Twitter user’s profile description is stored as description under user in JSON. This line of code maps the profile description in JSON to the database column named from_user_description.

Page 22: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

22

You need to change the file path and file name here (RECOMMENDED).

If the Python file and your SQLite database are in the same folder, just paste your database name here.

Page 23: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

23

Now, you are ready to run the code. Go to Run, and choose Execute in a new dedicated Python interpreter. The first option Execute in current Python or IPython interpreter does not work on my end, but may be working on your computer.

Page 24: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

24

Now, look at the right-side bar in Anaconda. Oops, looks like I am getting error messages!

ERRORS!!

Don’t panic! Its likely you will hit roadblocks when you run Python codes. So, it is important to learn to debug.

For this error, it is likely because I saved the Python file in a folder that is not a default Python folder.

But what is default Python folder ?

Page 25: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

25

the simple way to find out your default Python folder is • On a WINDOWS machine, In Start menu, right-click the Computer and choose Properties

Page 26: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

26

Folders listed here are your default Python folders.

Page 27: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

27

In my case, C:\Anaconda\Lib\site-packages is my default Python folder. So I moved the Python code there, edited the file path in the code, and ran it. Here you go, the code is running and is getting what we want! If you go check the database file, you will see a new table named typhoon is created (you can change the table name in the Python code), and it includes the listed users’ recent tweets and profile information.

Page 28: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

28

Oops! Error again!Twitter API has rate limit.

Based on the version of Twitter API in our Python code, you can get 300ish users per 15 minutes. Once you hit the limit, you will see the error message shown in the screenshot.

There are two ways to deal with the restriction:1. wait for 15 minutes for another run;2. create multiple Twitter apps and get multiple keys. Once you use up the quota in one run, paste in a new key to start a new run!

Page 29: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

29

If putting 0 here, the code starts with the user listed in the first row.

Because we will hit rate limit, you will need to run the code multiple times to complete crawling all users on the list. Make sure to change the starting row number!

For example, in the first run, you get user (0) to user (150), and hit rate limit. You should put 151 in the second run to start with the user listed on the 150th row.

Page 30: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

30

A list of Twitter output variablesGo to SQLite Database Browser and select the table typhoon (again, this is the name we gave in Python code). You will see output variables across columns.

Page 31: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

31

A list of Twitter output variablesSome key variables related to user profile:• from_user_screen_name: user’s Twitter screen-name• from_user_followers_count: how many people are following the

user• from_user_friends_count: how many people this user is following

• from_user_listed_count: how many times the user is listed in other users’ public lists

• from_user_favourites_count: how many times the user is favored (liked) by other users

• from_user_statuses_count: how many tweets has the user sent

• from_user_description: the user’s profile bio• from_user_location: location• from_user_created_at: when is the account created

Page 32: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

32

A list of Twitter output variables

File – Export – Table as CSV to export the data into csv. format. Make sure to add the .csv file extension name.

Page 33: Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

33

Please send your questions and comments to

weiaixu [at] buffalo dot edu