Predictive analytics and big data tutorial

Preview:

DESCRIPTION

This presentation covers data science buzz words, big data introduction, predictive analytics, and model building methods. Structured vs unstructured. Supervised learning vs unsupervised learning.

Citation preview

Ben Taylor @bentaylordata

Predictive Analytics / Data Science

Presentation Objectives

• Enable you to be smarter than your prospect (data history / lingo)

• Motivate you to be unstoppable and hyper-confident

• Motivate you to begin looking for data driven opportunities

• Motivate you to become a data scientist

"What the hell is cloud computing?"-Larry Ellison, CEO Oracle

What is cloud computing?

?

What is big data?

Big data includes datasets or problems which exceed the capacity of a single computer and require a distributed data access system.

The concept of "big" is relative to the conventional systems and technology and is subject to change in the future with advances in memory and storage solutions.

http://www.pcmag.com/article2/0,2817,2453838,00.asp

Big data trends

What is a data scientist?

What is a data scientist?

Engineering Finance Economics Mathematics Computer Science Physics

Data Science6-10yrs

Python Bootcamp $8,000 (3 months)

$16,000-$4,000 (3 months)

$115K avg

What is a data scientist?

What is a data scientist?

Master Builder

What is a data scientist?

Reality distortion: Hyper-confidence

Data Scientist = Peacock

@bentaylordata

Humans Algorithms

VS

Smartest pirate

Humans Algorithms

VS

NA

Humans Algorithms

VSGerman (1795), French (1806)

Humans Algorithms

VS

1997, IBM deep blue

Kasparov

Humans Algorithms

VS

2011, IBM Watson

Ken Jennings & Brad Rutter

Humans Algorithms

VS

2014, HireVue Iris

Hiring Panel

Prediction process

Raw data

Data munging

Training

Model

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

Clean data

Numeric Excel example

@bentaylordata

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET

Missing values + categorical

@bentaylordata

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET

Retail > 15, Engineering > 95

> 5.67

Resume model

Resume model

Data munging

Prediction process

Raw data

Feature selection

Training

Model

Data cleaning

LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET

Retail > 15, Engineering > 95GPA, Colleges, Hobbies

> 5.67

Text deeper dive

Sentiment example

Sentiment example

Sentiment

Given data, find cat? dog?

@bentaylordata

Talk like a data nerd

@bentaylordata

Confidence & Over-fitting

Confidence & Over-fitting

Data Lingo Supervised vs unsupervised learning

Supervised: Training set provided.

Unsupervised: No training set, clustering based on similar attributes.

Data Lingo Analytic Layers

Descriptive Analytics: Telling a data story, plotting, or visualization.

Predictive Analytics: Predict future outcomes, usually trained on a historical training set

Prescriptive Analytics: Using the insight from your predictive model to proactively change something

Interview/Interaction Analytics: Any analytics surrounding the interview or interaction.

Data Lingo Prediction methods

Regression: Predicting a continuous output (stock)

Classification: Predicting discrete category outputs. i.e. Yes/Maybe/No

Data Lingo

Data Types Structured: Does it play well in Excel?

Unstructured: Raw text (Twitter), audio, video, photos, resumes, etc…

Recommended