Upload
africa
View
73
Download
0
Embed Size (px)
DESCRIPTION
Big Data and Predictive Analytics. Unravel the BIG mystery. “In God we trust, all others must bring data”. Antarip Biswas Sept 26th 2013. Agenda / Table of Contents. Introduction to Big Data. Drivers of Big Data Analytics. Data Sciences. - PowerPoint PPT Presentation
Citation preview
Big Data and Predictive Analytics
Unravel the BIG mystery
Antarip BiswasSept 26th 2013
“In God we trust, all others must bring data”
Agenda / Table of Contents
2
Introduction to Big Data
Drivers of Big Data Analytics
Data Sciences
Use Cases and Success Stories – Class 3
Social Media Analytics
Technical Deep Dive, Real Life Projects
Real Life Projects – Class 3
3
Use Cases and
Success Stories
4
Success Stories - FareCast
Air fare prediction
For an online airfare predicts whether the fare will go UP or DOWN or STAY SAME in the future
Acquired for $100M by Microsoft
Employed machine learning technologies over big data
5
Tesco Loyalty Program
Done by Dunnhumby
Data Data for Loyalty Program
Basic demographic information such as address, age, gender, the number of members in a household and their ages, dietary habits.
Purchase history appended Summary attributes
Cluster analysis
Crucible a massive database of not only applicant information and purchase
history, but also information purchased and collected elsewhere about participating consumers. Credit reports, loan applications, magazine subscription lists, Office for National Statistics, and the Land Registry are all sources of additional information that is stored in Crucible.
6
Tesco Loyalty Program - Benefits
1. Loyalty
2. Cross-sells
3. Inventory, distribution and store network planning
4. Optimal targeting and use of manufacturer promotions
5. Consumer insight generation and marketing those insights
Tesco has achieved a 3.6 factor increase in coupon redemption ratesby using big-data predictive analytics to predict which consumers are more likely to redeem which coupons !
7CONFIDENTIAL & PROPRIETARY
Big Data – Success Story
8
Netflix Recommendations
Existing recommendation system – Cinematch
Korbell Team winner 107 algorithms explored Machine learning and Data mining Employed SVD and RBM
Achieved 8.43% improvement in recommendations over existing system
9
Google Flu Spread Prediction
Prediction of the spread of flu in real time during H1N1 2009 Google tested a mammoth of 450 million different mathematical models
to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds
Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system
10
Prediction – High Frequency Trading
Objective: predict impact of earnings announcement on stock prices Use historical financial data to get a time series of quarterly expected and actual
earnings announcements Use historical financial data of stock price movements after the announcement
Approach Categorize stocks based on market capital so that similar sized companies are
grouped together Split the historical data into in sample (training) set and out sample (validation) set Fit a linear regression model on sample data where the independent variable
(feature) is the difference between the actual and estimated earnings, the dependent variable is the impact on stock price
Achieved return of 1% or 100 “basis points”
11
Predictive Analytics for Couponing
Test Group List of households from
Analytic engine
Control GroupList of households
getting the same offer
Run the same campaign on both lists
Verify efficacy of household recommendation demonstrating
significant variance from Control Group
Measure resultsRedemption (primary),
Clips (secondary)
Evaluate impact – Control Group vs.
Test Group
12
Improve Recommendations/Allocations
Customer deviation in buying behavior refined by customer profile changes
• Taxonomy based approach to identify business semantic Major events that
determine change in buying pattern: Location change, change in marital status, change in income group, birth of child, …
Source for this information social channels, purchase deviation, …
Identify specific product categories relevant for the major event Association of product
categories to various customer classification
For instance customers with kids buy candies; or customers with pets buy pet-food
Expl
orat
ory
tech
niqu
es
Customer Transactions
Customer 360
Association &Clustering
Customer groups based on classifica-
tions
Matching / Filtering
PersonalizedRecommendation
List
Products eligible forrecommendation
Clusterassignments
Probabilistic product affinities based on segment’s behavior
Target Recommen-dation
Products ListFor target customer’s
cluster
Refine clas-sifiers
Campaign results
Time Series
Time specific prod-uct and associated
prods
Product classification and Customer seg-ment association
13
Improve Recommendations/Allocations
Products bought by similar customers, but not by current customer
• Identification of similar customers more accurately with availability of extensive profile information Classification of customers
by predetermined attributes Usage of exploratory
techniques to identify clusters of similar customers
• Identify product propensity for specific segments Determined by clustering
and classification techniques
Customer Transactions
Customer 360 - NoSQL
Association &Clustering
Segment specificProduct lists
Customer groups based on classifica-
tions
Matching / Filtering
PersonalizedRecommendation
List
Products eligible forrecommendation
Clusterassignments
Exploratory techniques
Probabilistic product affinities based on segment’s behavior
Target Recommen-dation
Products ListFor target customer’s
cluster
Refine classifiers
Campaign results
14
Expl
orat
ory
tech
niqu
es
Improve Recommendations/Allocations
Determine correlated items not bought by current customer
• Link association to determine products that are bought together – bread and butter, wine and cheese, …
• Identify products bought by customer, but not the correlated item
• Recommendation based on absence of product
Customer Transactions
Customer 360 - NoSQL
Association & Clustering
Customer groups based on classifica-tions
Matching / Filtering
PersonalizedRecommendationList
Products eligible forrecommendation
Clusterassignments
Probabilistic product affinities based on segment’s behavior
Target Recommen-dation
Products ListFor target customer’scluster
Refine clas-sifiers
Campaign results
Association rules
Segment specific product and associ-ated prods
Identify what customers want – and when
15
Transaction details for filtered customer list : Buyers of Cat food/ Cat food Generic 4
oz
• Salary, • Zipcode,• No of kids,• House owner• Gender
Cross-tabulated data• Brand1, Brand2,… Brandn• Weight, Size, Volume, • Brand• Category1, Categgory2, ..• Offer clipped category1, …
Affinity models
Associated Variables: Single or multiple variables by different segments
Scattergrams Correlation Regression
using multi-model approach
Prediction models
Customer list by probability
Transaction details merged with customer data to provide contextual information as required for inference
Models generated using historical data by the analytic engine to identify affinity of specific variables
Application of variable affinity to customer list to identify probability of non-purchasers to purchase cat food / cat food Generic 4 oz
Sample technique
Contextualize information, correlate facts, predict and improve
16
Contextualize
•Offers clipped, •Customer transactions, •Product taxonomy
Correlate
•Associated products•Associated variables•Category association
Predict and improve
•Probability of customer purchase – based on rule-sets, adjusted, Machine learning to improve recommendation, Predict customer patterns based on empirical data
Information from multiple operational and data warehousing systems that contain
customer data, purchase details, …
Information from social channels that provide supporting information to create detailed customer profile
Rule sets from knowledgebase accumulated over the years
Advanced Analytics - Product association
Pet foods
Cat food
Brand 1Variety1
Variety2
Brand 1Variety1
Variety2
Pet owners
Cat owners
Affinity Carpet cleaners
Cat grooming tools
needLitter box
Litter
Filter
Buyer of Cat Food / Generic Cat food 4 ounce
Transaction details for this customer list
Filtered high vol. categories
Associated products by affinity + confidence
Inferred rules
Customer list, probability
Obama for America Campaign 2012
17
Canvassing fro
m
youth
Canvassing from older
generation
Obama for America Campaign 2012
18
Obama for America data science team used social media as a tool to efficiently recruit human resources it needed leading into the election’s home stretch
Primary objective - determine who were the best messengers, who they might be able to persuade, and what actions they might be willing to take
Reason to harness social media - • Youth majority unreachable on phone calls or neighborhood
canvassing, but always connected to some form of social media• Optimize resources by enabling to transform voter intelligence to
actionable intelligence.
Traffic Congestion Control
19
• Big Data Analytics used for traffic congestion control
• Enables travellers to plan their routes to their destinations
• Enables traffic controllers to effectively route cars in order to avoid as much congestion as possible
• Implemented in LA by a joint initiative of Xerox and the LA transport department
DNA Sequencing and Cancer Therapies
20
• Previously small portions of people’s genes sequenced
• Big Data technology enables entire DNA to be sequenced which is largely helpful for cancer patients
• Enabled selecting therapies based on genetic markers and person-specific genetic makeup
• If one treatment became ineffective due to cancer mutation, use different therapies based on other gene markers.
• Steve Jobs one of the first people in the world to have entire DNA sequenced
21
Thank You