18
Heterogeneous Workflows with Spark at Netflix 0 Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure

Heterogeneous Workflows With Spark At Netflix

Embed Size (px)

Citation preview

Page 1: Heterogeneous Workflows With Spark At Netflix

Heterogeneous Workflows with Spark at Netflix

0

Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure

Page 2: Heterogeneous Workflows With Spark At Netflix

1

Helpmembersfindcontenttowatchandenjoytomaximizemembersa8sfac8onandreten8on

Page 3: Heterogeneous Workflows With Spark At Netflix

Everything is a Recommendation 2

Recommenda)onsaredrivenbyMachineLearning

Ranking

Rows

Page 4: Heterogeneous Workflows With Spark At Netflix

Machine Learning Pipeline 3

UserSelec8onFeature

Genera8onModel

Valida8onPublishModel

ModelTraining

Page 5: Heterogeneous Workflows With Spark At Netflix

Machine Learning Pipeline Challenges 4

•  Innova8on•  HeterogeneousEnvironments

•  Spark•  Na8veSupport

•  SeparateOrchestra8onandExecu8on

•  Mul8Tenancy

•  MachineLearningConstructs•  ParameterSweep–30kDockers

Page 6: Heterogeneous Workflows With Spark At Netflix

Meson Workflow System 5

•  GeneralPurposeWorkflowOrchestra8onandSchedulingframework•  Delegatesexecu8ontoresourcemanagerslikeMesos

•  Op8mizedforMachineLearningPipelinesandVisualiza8on

•  CheckouttheBlog•  hTp://bit.ly/mesonwsortechblog.neXlix.com

•  PlantoOpenSourcedsoon

Page 7: Heterogeneous Workflows With Spark At Netflix

Meson Architecture 6

Page 8: Heterogeneous Workflows With Spark At Netflix

Standard and Custom Step Types 7

Page 9: Heterogeneous Workflows With Spark At Netflix

Parameter Passing 8

HiveQuery UserDataSet RegionalDataSet

GlobalDataSet

GetUsers

RegionalModel

GlobalModel

UserDataSet

WrangleData

Page 10: Heterogeneous Workflows With Spark At Netflix

Structured Constructs 9

Page 11: Heterogeneous Workflows With Spark At Netflix

Top Down or Bottom Up 10

Page 12: Heterogeneous Workflows With Spark At Netflix

Two Way Communication 11

Page 13: Heterogeneous Workflows With Spark At Netflix

Spark Step 12

Page 14: Heterogeneous Workflows With Spark At Netflix

Artifacts 13

•  StepoutputstrackedasAr8facts

•  Visualiza8on

•  Memoiza8on

Page 15: Heterogeneous Workflows With Spark At Netflix

Multi Tenancy 14

•  ResourceATributes •  spark.cores.max•  spark.executor.memory•  spark.mesos.constraints•  DynamicResourceAlloca8on

Page 16: Heterogeneous Workflows With Spark At Netflix

Cluster Management 15

•  Red-Blackso\wareupdates

•  Scaleup/Scaledown

Page 17: Heterogeneous Workflows With Spark At Netflix

Meson/Spark Cluster 16

•  100sofConcurrentJobs

•  700Nodes

•  5000Cores

•  25TBMemory

•  Apps:MesonWorkflowSystem,SparkandDockers

•  Fewsmallerclusters

Page 18: Heterogeneous Workflows With Spark At Netflix

17

Antony Arokiasamy Kedar Sadekar

@aasamy

/aasamy

[email protected]

@kedar_sadekar

/kedar-sadekar

[email protected]