24
Mary Pat Campbell, FSA, MAAA, PRM Predictive Analytics: Resources for Beginners 16 May 2016

Resources for Getting Started in Predictive Analytics

Embed Size (px)

Citation preview

Page 1: Resources for Getting Started in Predictive Analytics

Mary Pat Campbell, FSA, MAAA, PRMPredictive Analytics: Resources for Beginners16 May 2016

Page 2: Resources for Getting Started in Predictive Analytics

2

Predictive Analytics: Resources for Complete Beginners

Page 3: Resources for Getting Started in Predictive Analytics

3

Definitions and Terminology

• Predictive Analytics or Statistical Learning or Machine Learning or… [buzzterm here]

• Supervised• Unsupervised

• Model Types (examples)• Regression – Linear, Logistical, Polynomial,

GLMs• Clusters• Trees• Support Vector Machines• Principal Components Analysis

Page 4: Resources for Getting Started in Predictive Analytics

4

Predictive Modeling

“An area of statistical analysis and data mining, that deals with extracting information from data and using it to predict future behavior patterns or other results. A predictive model is made up of a number of predictors, variables that are likely to influence future behavior.” – Alan Mills, 2009

Page 5: Resources for Getting Started in Predictive Analytics

5

Statistical Learning“Statistical learning refers to a vast set of tools for understanding data. These tools can be classified as supervised or unsupervised. Broadly speaking, supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs. Problems of this nature occur in fields as diverse as business, medicine, astrophysics, and public policy. With unsupervised statistical learning, there are inputs but no supervising output; nevertheless we can learn relationships and structure from such data.”

Source: An Introduction to Statistical Learning, 4th edition

Page 7: Resources for Getting Started in Predictive Analytics

7

Example of Unsupervised Statistical LearningAsset Allocation in 2012, Representative Clusters(% of investable assets)

Data source: ©A.M. Best Company—used by permission, Conning analysis

Page 8: Resources for Getting Started in Predictive Analytics

8

Books

Page 9: Resources for Getting Started in Predictive Analytics

9

Statistics (the Easier Way) with R• Level: Absolute beginner• Language: R• Main Focus: Introductory

Statistics• Charts and plots in R• Confidence intervals• Statistical tests• Linear regression

• Author page: http://nicoleradziwill.com/

• Free preview• Amazon link

Page 10: Resources for Getting Started in Predictive Analytics

10

Data Science from Scratch: First Principles with Python• Level: Beginner• Language: Python and SQL• Main Focus: Machine

learning/Python• K-nearest neighbors• Regression (linear, logistic)• Decision trees• Neural networks• Clustering• Network analysis• Recommender systems• MapReduce

• Author page: http://joelgrus.com/

• Github repo for book• Free preview• Amazon link

Page 11: Resources for Getting Started in Predictive Analytics

11

An Introduction to Statistical Learning• Level: Intermediate• Language: R• Main Focus: Statistical Learning

• Regression (Linear, Logistic)• Resampling methods• Model Selection• Nonlinear techniques• Decision Trees• Support Vector Machines• Unsupervised Learning

(Clustering, PCA)

• Free eBook version• Amazon link• Videos and slides• Online course - archived

Page 12: Resources for Getting Started in Predictive Analytics

12

The Truthful Art• Level: Beginner• Language: N/A• Main Focus: Data visualization

• Qualities of great visualizations

• Dubious models• Visualizing distributions• Decomposing time series• Seeing relationships• Choropleth maps and other

data maps

• “Any visualization is a model.”

• Author’s site• Amazon link

Page 13: Resources for Getting Started in Predictive Analytics

13

Online Courses

Page 14: Resources for Getting Started in Predictive Analytics

14

Datacamp• Level: Absolute Beginner to

Intermediate• Language: R and Python• Topics: R, Data Science, Data

visualization• Timing: On-demand, short• Paid features – subscription

by month or year: Access to all courses, statement of completion

• Credentials: Statement of completion

• Example course: Intro to Python for Data Science

Page 15: Resources for Getting Started in Predictive Analytics

15

Udacity• Level: Beginner to Advanced• Language: R, Python, SQL,

Hadoop• Topics: Data analysis, App

development, programming• Timing: On-demand, length

varies

• Paid features: Monthly charge for access to coaches, projects with ongoing feedback, credentials

• Credentials: verified certificates, MS in Computer Science, nanodegrees

• Example course: Intro to Data Analysis

Page 16: Resources for Getting Started in Predictive Analytics

16

Coursera• Level: Beginner to Advanced• Language/Topics: Soooo

much! I can’t choose!• Timing: Some on schedule,

some on-demand; length varies, usually weeks-long

• Paid features: Certifications

• Credentials: Signature Track credential, Specialization certificates from sponsoring universities

• Example course: Introduction to Data Science

Page 17: Resources for Getting Started in Predictive Analytics

17

edX• Level: Beginner to Advanced• Language/Topics: Many many

many• Timing: Mostly on specific

schedules

• Paid features: Certifications, some courses are paid only (prices can vary wildly – from $50 up to $1000)

• Credentials: Verified certificates, college credit, XSeries certificates

• Example course: The Analytics Edge

Page 18: Resources for Getting Started in Predictive Analytics

18

What it looks like…

Page 19: Resources for Getting Started in Predictive Analytics

19

Links

Page 21: Resources for Getting Started in Predictive Analytics

21

Data sites

• Kaggle – predictive modeling competitions, datasets

• Government data• http://www.data.gov/• http://factfinder.census.gov/

• Our World in Data• Fivethirtyeight’s github repo

Page 22: Resources for Getting Started in Predictive Analytics

22

Keep your Modeling Ego in Check

Page 23: Resources for Getting Started in Predictive Analytics

23

Shape of Data

Source: Source: By DenisBoigelot, original uploader was Imagecreator - Own work, original uploader was Imagecreator, CC0, https://commons.wikimedia.org/w/index.php?curid=15165296

Page 24: Resources for Getting Started in Predictive Analytics

24

Know How the Models Fail

Source: Classifier Comparison, scikit-learn.org, http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html