View
506
Download
3
Category
Tags:
Preview:
Citation preview
Near line systems to improve Netflix recommendations
Gopal Krishnan
Feb 2015
About me
Gopal Krishnan
Director, Consumer Science Engineering
Netflix, Inc.
Driving innovation through AB testing the member experience.
Twitter: @sgkrishnan
LinkedIn: https://www.linkedin.com/pub/gopal-krishnan/0/7a7/905
Netflix: global streaming video service for TV and movies
Netflix is available on 1000+ devices
More than 57M members globally
• In more than 50 countries
• Planning to launch in all (200+) countries in 2 years.
Netflix Consumes 34% of peak downstream bandwidth in North America
Netflix Consumes 6% of peak upstream bandwidth in North America
What my team does?
• Help improve rate of innovation through AB testing to improve member experience
• Infrastructure for algorithmic support
– Feature value store to help model training
– Services to store and serve explicit data sources
– Services to collect, process, validate, and serve implicit data sources
– Caching services
• Data improves our understanding of end to end user behavior
Every part of Netflix is personalized
Every part of Netflix is personalized
Every part of Netflix is personalized
NETFLIX RECOMMENDATIONS WITH ONLINE MICRO SERVICES
Life Cycle of Netflix Recommendation Data
Devices
Data Collection
Offline Big Data Analysis
Netflix recommendation:
online services
Netflix API Netflix beacon telemetry
Data Collection: explicit inputs
Plays
Star ratings
Data Collection: explicit inputs
Data Collection: explicit inputs
Virtual plays from new user on-boarding
Outputs from offline analysis
Devices
Data Collection
Offline Big Data Analysis
Netflix recommendation:
online services
Netflix API Netflix beacon telemetry
“Implicit” Data Services
Popularity Targeting
User clustering
Recommendations combines both online and aggregated offline data
Devices
Data Collection
Offline Big Data Analysis
Netflix recommendation:
online services
Netflix API Netflix beacon telemetry
“Explicit” Data Services
My List On Ramp
Taste pref
“Implicit” Data Services
Popularity Targeting
User clustering
WHY BOTHER WITH NEAR LINE SYSTEMS THEN?
Our algorithms became too complex to be computed online leading to higher latency.
Near line systems improve our availability story.
Near line systems allow us to innovate at a greater velocity.
Near line systems improve agility and availability
Devices
Data Collection
Big Data Analysis(Hadoop, Teradata)
Netflix recommendation:
online services
Pre-computed recommendations
“Explicit” Data Services
“Implicit” Data Services
Post-processat run time
Manhattan pre-compute engine
Manhattan: Netflix pre-compute engine
Video Ranker
Row selection
Similars
Top picks
What data would improve recommendations even further?
All UI Events from all key platforms
• Moving beyond explicit inputs from users, we would like to track all member activity to derive deeper insights.
• Challenges include:
– 1000s of device platforms
– Non-standardized UIs across different platforms
– Lack of earlier focus on tracking the browse experience
Patterns arise in aggregate
Challenges with collecting UI Events
• Consistent data semantics across lots of device and UI platforms.
• Scaling to handle billions of events.
• Near real-time semantic data quality and validation
• Dealing with data loss (low power devices, loss at the network, etc.)
Canaries for data quality
Near real time feedback and validation on data quality.
“Trending” on Netflix
Now being AB tested
Near line systems for Netflix recommendations
Devices
Data Collection
Big Data Analysis(Hadoop, Teradata)
Netflix recommendation:
online services
Pre-computed recommendations
“Explicit” Data Services
“Implicit” Data Services
Post-processat run time
Near line data processing and serving
systems
“Trending on Netflix” near line system
Take rates (play/impression)kafka stream
Cassandra
dashboards
StreamProcessing(ETA: low # of minutes)
Play start(kafka stream)
1000’s / sec
Impressions (kafka stream)
millions / sec
“Trending on Netflix” near line system
Play start(kafka stream)
1000’s / sec
Impressions (kafka stream)
millions / secStream ProcessingWindowed operations.Small batches.Merging streams.Flexibility.
Take rates
Impressions rollup
Personalized Ranked videos
Merged to generate “Trending on Netflix”
Spark Streaming at Netflix
• Collaborating with Databricks to make sure Spark (batch and streaming) works well in a cloud environment
– Resiliency and scalability testing
• Actively working on studying scaling needs for algorithmic needs for both Spark batch and Spark streaming.
Spark at Netflix
• Several different use cases where we are interested in Spark – both batch and streaming.
• Largest Spark batch production cluster is 150 m3.2xl instances for personalization.
• Netflix has both Spark batch and Spark streaming in production.
Spark at Netflix
• Integrating with Spark with Scala (mostly), python, and some SQL.
• Python typically via iPython notebook integration.
• Running in standalone mode or in mesos.
Spark: areas to watch for.
• We have really not tested the multi-tenancy boundaries yet. Mostly spinning custom purpose clusters for now.
• Tuning the jobs and optimizing performance of jobs remains a challenge as we make steady inroads.
• Incrementally getting better with stability and scale as we tackle larger use cases this year.
Netflix Tech Blog
• Tech blog about the “Trending on Netflix” row published today.
• Watch for upcoming tech blog from Netflix on near line systems and another one about Spark in the coming weeks.
Now Hiring leaders and engineers!
Talk to me in person or at
Twitter: @sgkrishnan
LinkedIn:https://www.linkedin.com/pub/gopal-krishnan/0/7a7/905
Recommended