58
Predicting the Oscars with Data Science http://bit.ly/tf-predict-oscars

Predict oscars (4:17)

Embed Size (px)

Citation preview

Page 1: Predict oscars (4:17)

Predicting the Oscars with Data Science

http://bit.ly/tf-predict-oscars

Page 2: Predict oscars (4:17)

About me

• Jasjit Singh

• Self-taught developer

• Worked in finance & tech

• Co-Founder Hotspot

• Thinkful General Manager

Page 3: Predict oscars (4:17)

About us

Thinkful prepares students for web development & data science jobs with 1-on-1 mentorship programs

Page 4: Predict oscars (4:17)

What’s your background?

• I have a software background

• I have a math or stats background

• None of the above

Page 5: Predict oscars (4:17)

Data Science Process

• Frame the question.

• Collect the raw data.

• Process the data.

• Explore the data.

• Communicate results.

Page 6: Predict oscars (4:17)

Frame the question

• Who will win the Oscar for Best Picture?

Page 7: Predict oscars (4:17)

Collect the Data

• What kind of data do we need?

• Financial data (Budget, box office…)

• Reviews, ratings and scores.

• Awards and nominations.

Page 8: Predict oscars (4:17)

Process the data

• How’s the data “dirty” and how can we fix it?

• User input, redundancies, missing data…

• Formatting: adapt the data to meet certain specifications.

• Cleaning: detecting and correcting corrupt or inaccurate records.

Page 9: Predict oscars (4:17)

Explore the data

• What are the meaningful patterns in the data?

• How meaningful is each data point for our predictions?

Page 10: Predict oscars (4:17)

Goals

• Introduction to a data scientist's tools and methods:

• Jupyter notebooks, numpy, pandas, sklearn…

• Overview of basic machine learning concepts:

• Data formatting and cleaning, Decision trees, Overfitting, Random Forests…

Page 11: Predict oscars (4:17)

Jupyter Notebooks

• One of data scientist’s everyday tools.

• Find the links in our classroom tool.

• Contains cells with code.

Page 12: Predict oscars (4:17)

NumPy

• The fundamental package for scientific computing with Python.

• Provides powerful multi-dimensional array objects.

• Many methods for fast operations on arrays.

Page 13: Predict oscars (4:17)

Pandas

• Fundamental high-level building block for doing practical, real world data analysis in Python.

• Built on top of NumPy.

• Offers data structures and operations for manipulating numerical tables and time series.

Page 14: Predict oscars (4:17)

Scikit-learn

• Python module for machine learning.

• Provides a large menu of libraries for scientific computation, such as integration, interpolation, signal processing, linear algebra, statistics, etc.

Page 15: Predict oscars (4:17)

Initial imports and loading data with Pandas

Page 16: Predict oscars (4:17)

Understanding your data

• .head(n) method: Returns first n rows.

• .value_counts() method: Returns the counts of unique values in the DataFrame.

Page 17: Predict oscars (4:17)

Formatting your Data

Page 18: Predict oscars (4:17)

Formatting your Data

• Rate values in a non-numeric format. Thus, we will need to assign each rate a unique integer so that Python can handle the information.

• With the .ix method you create a subset of rows and assign a value to a certain variable of that subset of observations.

Page 19: Predict oscars (4:17)

Cleaning your Data

Page 20: Predict oscars (4:17)

Decision Trees

• It breaks down a dataset into smaller and smaller subsets.

• The final result is a model with a tree structure that has:

• Decision nodes: ask a question and have two or more branches.

• Leaf nodes: represent a classification or decision.

Page 21: Predict oscars (4:17)
Page 22: Predict oscars (4:17)

Classification vs Regression

• Classification — Predict categories.• Identifying group membership.

• Regression — Predict values.• Involves estimating or predicting a

response.

Page 23: Predict oscars (4:17)

Classification

Page 24: Predict oscars (4:17)

Classification

?

Page 25: Predict oscars (4:17)

Creating your first Decision Tree

You will use the scikit-learn and numpy libraries to build your first decision tree. We will need the following to build a decision tree

• target: A one-dimensional numpy array containing the target from the train data.

• features: A multidimensional numpy array containing the features/predictors from the train data.

Page 26: Predict oscars (4:17)

Creating your first Decision Tree

Page 27: Predict oscars (4:17)

Importances and Score

• .feature_importances_ attribute: tells us how important the features are for the final result.

• .score() method: returns the mean accuracy of our fitting.

Page 28: Predict oscars (4:17)

Importances and Score

Page 29: Predict oscars (4:17)

Predicting

Page 30: Predict oscars (4:17)

Pretty bad results :(Let’s improve it!

Page 31: Predict oscars (4:17)

Let’s improve it!

Page 32: Predict oscars (4:17)

Modify the feature list

Page 33: Predict oscars (4:17)

Run the prediction again

Page 34: Predict oscars (4:17)

Overfitting

• Resulting model too tied to the training set.

• It doesn’t generalize to new data, which is the point of prediction.

Page 35: Predict oscars (4:17)

Random Forest Classifier

• Random Forest Classifiers use many Decision Trees to build a classifier.

• We introduce a bit of randomness.

• Each Tree can give a different answer (a vote). The final classification is the most common amongst the Trees.

Page 36: Predict oscars (4:17)

Random Forest Classifier

Page 37: Predict oscars (4:17)

Creating your first Decision Tree

Page 38: Predict oscars (4:17)

Importances and Score

Page 39: Predict oscars (4:17)

Predicting with Random Forest Classifiers

Page 40: Predict oscars (4:17)

Results

Page 41: Predict oscars (4:17)

1976

Rocky

Page 42: Predict oscars (4:17)

1984

Amadeus

Page 43: Predict oscars (4:17)

1996

The English Patient

Page 44: Predict oscars (4:17)

2009

The Hurt Locker

Page 45: Predict oscars (4:17)

And the Oscar goes to…

Page 46: Predict oscars (4:17)

La La Land!!

Page 47: Predict oscars (4:17)
Page 48: Predict oscars (4:17)
Page 49: Predict oscars (4:17)

The EndNothing happened after that.

Right?? RIGHT??

Page 50: Predict oscars (4:17)

We can predict the OscarsExcept for 2017 ¯\_(ツ)_/¯

Page 51: Predict oscars (4:17)
Page 52: Predict oscars (4:17)

More about Thinkful

• Anyone who’s committed can learn to code

• 1-on-1 mentorship is the best way to learn

• Flexibility! Learn anywhere, anytime, & at your own pace

Page 53: Predict oscars (4:17)

Our Program

You’ll learn concepts, practice with drills, and build capstone projects — all guided by a personal mentor

Page 54: Predict oscars (4:17)

Our Mentors

Mentors have, on average, 10+ years of experience

Page 55: Predict oscars (4:17)

Data Science Syllabus

• Managing data with SQL and Python

• Modeling with both supervised and unsupervised models

• Data visualization and communicating with data

• Technical interviews + Career prep

Page 56: Predict oscars (4:17)

Our Results

Job Titles after GraduationMonths until Employed

Page 57: Predict oscars (4:17)

Special Introductory Offer

• Prep course for 50% off — $250 instead of $500

• Covers math, stats, Python, and data science toolkit

• Option to continue into full program

• Talk to me (or email me) if you’re interested

Page 58: Predict oscars (4:17)

October 2015

Questions? [email protected]

schedule a call through thinkful.com