Upload
meepbobeep
View
488
Download
0
Embed Size (px)
Citation preview
Mary Pat Campbell, FSA, MAAA, PRMPredictive Analytics: Resources for Beginners16 May 2016
2
Predictive Analytics: Resources for Complete Beginners
3
Definitions and Terminology
• Predictive Analytics or Statistical Learning or Machine Learning or… [buzzterm here]
• Supervised• Unsupervised
• Model Types (examples)• Regression – Linear, Logistical, Polynomial,
GLMs• Clusters• Trees• Support Vector Machines• Principal Components Analysis
4
Predictive Modeling
“An area of statistical analysis and data mining, that deals with extracting information from data and using it to predict future behavior patterns or other results. A predictive model is made up of a number of predictors, variables that are likely to influence future behavior.” – Alan Mills, 2009
5
Statistical Learning“Statistical learning refers to a vast set of tools for understanding data. These tools can be classified as supervised or unsupervised. Broadly speaking, supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs. Problems of this nature occur in fields as diverse as business, medicine, astrophysics, and public policy. With unsupervised statistical learning, there are inputs but no supervising output; nevertheless we can learn relationships and structure from such data.”
Source: An Introduction to Statistical Learning, 4th edition
6
Example of Supervised Statistical Learning
Source: An Introduction to Statistical Learning, 4th edition
7
Example of Unsupervised Statistical LearningAsset Allocation in 2012, Representative Clusters(% of investable assets)
Data source: ©A.M. Best Company—used by permission, Conning analysis
8
Books
9
Statistics (the Easier Way) with R• Level: Absolute beginner• Language: R• Main Focus: Introductory
Statistics• Charts and plots in R• Confidence intervals• Statistical tests• Linear regression
• Author page: http://nicoleradziwill.com/
• Free preview• Amazon link
10
Data Science from Scratch: First Principles with Python• Level: Beginner• Language: Python and SQL• Main Focus: Machine
learning/Python• K-nearest neighbors• Regression (linear, logistic)• Decision trees• Neural networks• Clustering• Network analysis• Recommender systems• MapReduce
• Author page: http://joelgrus.com/
• Github repo for book• Free preview• Amazon link
11
An Introduction to Statistical Learning• Level: Intermediate• Language: R• Main Focus: Statistical Learning
• Regression (Linear, Logistic)• Resampling methods• Model Selection• Nonlinear techniques• Decision Trees• Support Vector Machines• Unsupervised Learning
(Clustering, PCA)
• Free eBook version• Amazon link• Videos and slides• Online course - archived
12
The Truthful Art• Level: Beginner• Language: N/A• Main Focus: Data visualization
• Qualities of great visualizations
• Dubious models• Visualizing distributions• Decomposing time series• Seeing relationships• Choropleth maps and other
data maps
• “Any visualization is a model.”
• Author’s site• Amazon link
13
Online Courses
14
Datacamp• Level: Absolute Beginner to
Intermediate• Language: R and Python• Topics: R, Data Science, Data
visualization• Timing: On-demand, short• Paid features – subscription
by month or year: Access to all courses, statement of completion
• Credentials: Statement of completion
• Example course: Intro to Python for Data Science
15
Udacity• Level: Beginner to Advanced• Language: R, Python, SQL,
Hadoop• Topics: Data analysis, App
development, programming• Timing: On-demand, length
varies
• Paid features: Monthly charge for access to coaches, projects with ongoing feedback, credentials
• Credentials: verified certificates, MS in Computer Science, nanodegrees
• Example course: Intro to Data Analysis
16
Coursera• Level: Beginner to Advanced• Language/Topics: Soooo
much! I can’t choose!• Timing: Some on schedule,
some on-demand; length varies, usually weeks-long
• Paid features: Certifications
• Credentials: Signature Track credential, Specialization certificates from sponsoring universities
• Example course: Introduction to Data Science
17
edX• Level: Beginner to Advanced• Language/Topics: Many many
many• Timing: Mostly on specific
schedules
• Paid features: Certifications, some courses are paid only (prices can vary wildly – from $50 up to $1000)
• Credentials: Verified certificates, college credit, XSeries certificates
• Example course: The Analytics Edge
18
What it looks like…
19
Links
20
SOA Links
• SOA Predictive Analytics: https://www.soa.org/News-and-Publications/Newsroom/Emerging-Topics/Predictive-Analytics/default.aspx
• 2015 Essays on Predictive Analytics: https://www.soa.org/Library/Essays/research-2015-predictive-analytics.pdf
• Predictive Analytics and Futurism section: https://www.soa.org/predictive-analytics-and-futurism/
21
Data sites
• Kaggle – predictive modeling competitions, datasets
• Government data• http://www.data.gov/• http://factfinder.census.gov/
• Our World in Data• Fivethirtyeight’s github repo
22
Keep your Modeling Ego in Check
23
Shape of Data
Source: Source: By DenisBoigelot, original uploader was Imagecreator - Own work, original uploader was Imagecreator, CC0, https://commons.wikimedia.org/w/index.php?curid=15165296
24
Know How the Models Fail
Source: Classifier Comparison, scikit-learn.org, http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html