Upload
jen-aman
View
862
Download
1
Embed Size (px)
Citation preview
Heterogeneous Workflows with Spark at Netflix
0
Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure
1
Helpmembersfindcontenttowatchandenjoytomaximizemembersa8sfac8onandreten8on
Everything is a Recommendation 2
Recommenda)onsaredrivenbyMachineLearning
Ranking
Rows
Machine Learning Pipeline 3
UserSelec8onFeature
Genera8onModel
Valida8onPublishModel
ModelTraining
Machine Learning Pipeline Challenges 4
• Innova8on• HeterogeneousEnvironments
• Spark• Na8veSupport
• SeparateOrchestra8onandExecu8on
• Mul8Tenancy
• MachineLearningConstructs• ParameterSweep–30kDockers
Meson Workflow System 5
• GeneralPurposeWorkflowOrchestra8onandSchedulingframework• Delegatesexecu8ontoresourcemanagerslikeMesos
• Op8mizedforMachineLearningPipelinesandVisualiza8on
• CheckouttheBlog• hTp://bit.ly/mesonwsortechblog.neXlix.com
• PlantoOpenSourcedsoon
Meson Architecture 6
Standard and Custom Step Types 7
Parameter Passing 8
HiveQuery UserDataSet RegionalDataSet
GlobalDataSet
GetUsers
RegionalModel
GlobalModel
UserDataSet
WrangleData
Structured Constructs 9
Top Down or Bottom Up 10
Two Way Communication 11
Spark Step 12
Artifacts 13
• StepoutputstrackedasAr8facts
• Visualiza8on
• Memoiza8on
Multi Tenancy 14
• ResourceATributes • spark.cores.max• spark.executor.memory• spark.mesos.constraints• DynamicResourceAlloca8on
Cluster Management 15
• Red-Blackso\wareupdates
• Scaleup/Scaledown
Meson/Spark Cluster 16
• 100sofConcurrentJobs
• 700Nodes
• 5000Cores
• 25TBMemory
• Apps:MesonWorkflowSystem,SparkandDockers
• Fewsmallerclusters
17
Antony Arokiasamy Kedar Sadekar
@aasamy
/aasamy
@kedar_sadekar
/kedar-sadekar