Upload
jo-fai-chow
View
2.172
Download
0
Embed Size (px)
Citation preview
Machine Learning at Comcast
November 10th, 2015 Andrew Leamon – Director Chushi Ren – Software Engineer / Data Scientist Engineering Analysis
About Comcast
Machine Learning at Comcast 2
Comcast brings together the best in media and technology. We drive innovation to create the world’s best entertainment and online experiences.
High Speed Internet
Video
IP Telephony
Home Security / Automation
Universal Parks
Media Properties
Ø Average US household watches 3-5 hours of TV per day (Nielsen)
Ø 3x more than Netflix (BTIG Research 4/2015)
Ø 4x Videos on Smartphones, Tablets, Computers
Ø 50% of leisure time is spent watching TV!
Importance of Live TV
Netflix
LIVE TV
Online Video
Machine Learning at Comcast 4
CONTENT INFORMATION
CONTENT
IMAGES
LOGOS
SUBSCRIBER INFORMATION
CATALOGS
ENTITLEMENTS
CHANNEL LINEUPS
DISCOVERY
SEARCH
BROWSE
RECOMMEND
PERSONALIZE
VOICE CONTROL
MENU
MILLIONS OF DEVICES
METADATA PROVIDERS
CONTENT PROVIDERS
BILLING SYSTEMS
CUSTOMER USAGE
PURCHASES
DEEP METADATA
SPARK
5
X1 Personalization
• Ensemble of Gradient Boosted Decision Trees • Input: statistics of program ratings, program metadata, channel info, …
Number of Signals
0.77
= New Signal
Trending on X1 – Predict Popularity 24 Hours in Advance
Machine Learning at Comcast 6
Program recommendations are updated every 20 sec (Spark Streaming) For more details and code samples see our talk at the Spark Summit
East March 2015 - https://spark-summit.org/east-2015/
Live Tune Activity from
Kafka
Batch: User Clustering
with KMeans
Real-time: TopK Trending Programs
per Cluster
Real-time Program recommendations per
user
User History from HDFS
Real-time Recommendations
Machine Learning at Comcast 7
Problem: Avoidable Truck Rolls (ATR)
Machine Learning at Comcast 9
Customer calls to report an issue with their service
Customer service agent goes through ITG to debug the problem with customer via phone
When agent cannot resolve the problem by phone, a truck roll will be scheduled
Ø Examples of avoidable truck rolls: Ø Reset modem Ø Change remote battery Ø Entitlement issue
Ø Goal Ø Build a predictive model to prevent ATRs
ATR Machine Learning Pipeline
Machine Learning at Comcast 10
Feature extraction
Feature selection
Model training
Model validation
Data source
Training data
Test data
Classifier
ATR Challenges
Machine Learning at Comcast 11
Ø Skewed data --- only a very small portion of the truck rolls are avoidable Ø Use balance class option
in H2O to upsample data with minority class
Ø Subsemble
Ø Information leakage --- we use some feature statistic as feature, which will cause information leakage Ø Hold current row off Ø Add random noise
Ø Operationalize model
Netflix
LIVETV
Online Video
Machine Learning at Comcast 12
Machine Learning to Improve Customer Experience
Problem: Customer Experience Metric (CXE Metric)
Machine Learning at Comcast 13
In CMTS (Cable Modem Termination System), ports are logically bonded to form “Service Group”.
SG Utilization = Customer experience?
Why Do We Need CXE Metric?
Machine Learning at Comcast 14
CXE Metric
Understand Customers’ Need
Prioritize Hardware Deployment
Customer Experience Metric
Machine Learning at Comcast 15
Ø Select features correlated to customer experience across different dataset
Ø Join them and perform cleaning and aggregation Ø Cluster to form customer experience groups
Netflix
LIVETV
Online Video
Machine Learning at Comcast 16
Machine Learning for More Resilient & Reliable Products
The Evolution of Resiliency – Scale It!
Machine Learning at Comcast 17
System Errors • User experiences an
Issue
Customer Contact • Effort Required
Agent Manually Fixes • Effort Required
System Errors • User Experiences an Issue
Machine Learning • Intelligent Scoring for Solution
Automated / Suggested Fix • Issue Resolved with lower
effort
Ø We can reduce effort for Customers and for Customer Care by building intelligent systems.
Netflix
LIVETV
Online Video
Machine Learning at Comcast 19
Machine Learning: The Promise / Challenge of
Operationalization
Real-time Data + Operationalized Models -> Better Products
“However valuable these PhDs are, the organizations that have been lucky enough to secure these resources are realizing the limitations in human-powered data science: it’s simply not a scalable solution.”
“The commonality across all of these new technologies is that they offer something additional humans cannot provide: the power of scale. Organizations that do not have a strategic initiative to regularly and organically engage with its customers will be at a serious disadvantage. Soon, AI-driven engagement models that interpret data and intuitively interact with clients will be the norm.”
Harvard Business Review: “Data Scientists Don’t Scale”: https://hbr.org/2015/05/data-scientists-dont-scale
20 Machine Learning at Comcast
Challenges in Operation: Getting Data in Real-time
Machine Learning at Comcast 21
Ø Various source of data with different format Ø Enables real time query with customer event data
Challenges in Operation: Computation in Real-time
Machine Learning at Comcast 22
Ø Challenges Ø Handles heavy computation involved to transform raw data Ø Responds to large amount of prediction requests fast Ø Updates model with latest data
Ø Potential Solution Ø Spark + Sparkling Water
Tools & Infrastructure to integrate with Actual Products
Machine Learning at Comcast 23
Data
• Real-time Production • Schema Management • Governance
Models
• Versioning • Operationalization • Publishing / Deployment
Integration
• Execution at Runtime • System APIs • Validation