Upload
hana
View
60
Download
2
Embed Size (px)
DESCRIPTION
Watching Pigs Fly with the Netflix Hadoop Toolkit. Hadoop Summit 2013 San Jose, CA. Our Motivation. Data should be accessible, easy to discover, and easy to process for everyone. Our Users. Analysts. Engineers. Hadoop Platform as a Service. Hadoop Platform as a Service. S3. - PowerPoint PPT Presentation
Citation preview
Watching Pigs Fly with the Netflix Hadoop Toolkit
Hadoop Summit 2013San Jose, CA
Data should be accessible, easy to discover, and easy to process for everyone.
Our Motivation
Our Users
Analysts Engineers
Hadoop Platform as a Service
Hadoop Platform as a Service
S3
Hadoop Platform as a ServiceData Platform
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
Forklift (Data Movement)
Looper(Backloading)
Ignite(A/B Test Analytics)
Spock(Data Auditing)
Genie(Hadoop PaaS)
Lipstick(Pig Workflow Visualization)
Event Service(Orchestration)
Hadoop
S3
Other Processing
Let’s solve a problem using the data!
Build a recommender.
But, what makes good recommendations?Similarity
Personalization
COLORS!
COLORS!Box art is colorful…
We’re Sorry
COLORS!Box art is colorful…
Where can I find the data?
Hadoop Platform as a Service
S3
Hadoop Platform as a Service
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin(Metadata API)
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin(Metadata API)
Create a dataset for box art and color.
Whether your dataset is large or small, being able to visualize it makes it easier to explain.
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
Sting
• Allows users to cache the results of a genie job in memory
• Sub second response to OLAP style operations (slicing, dicing, aggregations).
• Adhoc / recurring schedule• Easy to use!
HiveQuery
Schema
% Content Consumed / Hour
HemlockGrove
House ofCards
ArrestedDevelopment
Similarity
House ofCards Macbeth
Toddlers& Tiaras
Star Trek:Voyager
Personalization
# of subscribers X # of titles = ???,000,…,000 (big data)
Big Data
Netflix Apache Pig
Lipstick
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
Lipstick
• Allows users to visualize their data flow• Allows users to see common errors• Allows users to easily monitor their jobs• Empowers users to support themselves• Facilitates communication between
infrastructure team and users
Lipstick
Overall JobProgress
LogicalPlan
Overall JobProgress
Logical Operator(reduce side)
Logical Operator(map side)
Map/Reduce Job
Intermediate Row Count
RecordsLoaded
HadoopCounters
My Job has stalled.
Common Problem #1
Unoptimized/OptimizedLogical Plan Toggle
Dangling Operator
I didn’t get the data I was expecting
Common Problem #2
I don’t understand why my job failed.
Common Problem #3
Failed Job(light red background)
Successful Job(light blue background)
Wrapping up
• Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie).
• Lipstick is part of Netflix OSS.• Clone it on github at http:
//github.com/Netflix/Lipstick• We welcome feedback and contributions!
Charles Smith: [email protected] Jeff Magnusson: [email protected]
Thank you!
Jobs: http://jobs.netflix.comNetflix OSS: http://netflix.github.io
Tech Blog: http://techblog.netflix.com/