31
U B E R | Data Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal

Hadoop Infrastructure @Uber Past , Present and Future

Embed Size (px)

Citation preview

Page 1: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HadoopInfrastructure@UberPast,PresentandFutureMayankBansal

Page 2: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

“Transporta=onasreliableasrunningwater,everywhere,foreveryone”

Uber’sMission

75+Countries 500+Ci=es

Andgrowing…

Page 3: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HowUberworks

Page 4: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HowUberworks

Page 5: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HowUberworks

Page 6: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

DataDrivenDecisions

Page 7: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

DataInfraOnceUpona8me..(2014)

Kafka Logs

Key-Val DB

RDBMS DBs

S3

Applica=ons

ETL

BusinessOps

A/BExperiments

Adhoc Analytics

CityOps

Vertica DataWarehouse

Data

Science

EMR

Page 8: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

DataInfrastructureToday

Kafka8 Logs

Schemaless DB

SOA DBs

Service Accounts

ETL MachineLearning

Experimenta=on

Data Science

Adhoc Analytics Ops/DataScience

HDFS

CityOpsDataScience

Spark|PrestoHive

Page 9: Hadoop Infrastructure @Uber Past , Present and Future

FewTakeaways…

●  StrictSchemaManagement○  BecauseourlargestdataaudienceareSQL

Savvy!(1000sofUberOps!)○  SQL=StrictSchema

●  BigDataProcessingToolsUnlocked-Hive,PrestoandSpark○  MigrateSQLsavvyusersfromVer=catoHive

&Presto(1000sofOps&100sofdatascien=sts&analysts)

○  Sparkformoreadvancedusers-100sofdatascien=sts

Page 10: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HadoopEvolu8on@ebay

2014

1XNodes1XPB

2015 10X Nodes 4X PB Data 3000+ node 30,000+ cores 50+ PB

2016 90X Nodes

40X PB Data

HadoopEvolu8on@Uber

Page 11: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HadoopClusterU=liza=on

•  Overprovisioningforthepeakloads.

•  Overcapacityforan=cipa=onoffuturegrowth

Page 12: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HadoopEvolu8on@ebay

20140Nodes

2015 X Nodes

2016

300XNodes

MesosEvolu8on@Uber

Page 13: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

MesosClusterU=liza=on

•  Overprovisioningforthepeakloads

•  Overcapacityforan=cipa=onoffuturegrowth

Page 14: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

EndGoal

Online

Presto

Page 15: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

Whatweneed?

GLOBALVIEWOFRESOURCES

Page 16: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

AvailableResourceManagers

Page 17: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

MesosvsYARN

YARN MESOSSingleLevelScheduler TwoLevelSchedulerUseCgroupsforisola=on UseCgroupsforIsola=onCPU,Memoryasaresource CPU,MemoryandDiskasa

resource

WorkswellwithHadoopworkloads Workswellwithlongerrunningservices

YARNsupport=mebasedreserva=ons

Mesosdoesnothavesupportofreserva=ons

Dominantresourcescheduling Schedulingisdonebyframeworksanddependsoncasetocasebasis

ScalesBegerSimilarIsola=on

Diskisbeger

ThisisImportant

ImpforbatchSLA’sBegerforbatch

Page 18: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

Let’s8edthemtogether

YARNisgoodforHadoopMesosisgoodforLongerRunningServices

InaNutshell

Page 19: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

Page 20: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

•  MyriadisMesosFrameworkforApacheYARN

•  MesosmanagesDataCenterresources•  YARNmanagesHadoopworkloads

•  Myriad•  GetsresourcesfromMesos•  LaunchesNodeManagers

Page 21: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

•  YARNwillhandleresourceshanded

overtoit.•  Mesoswillworkonrestoftheresources

Myriad’sLimita8onsSta=cResourcePar==oning

Page 22: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

•  YARNwillneverbeabletodooversubscrip=on.•  NodeManagerwillgoaway•  Fragmenta=onofresources

•  Mesosoversubscrip=oncankillYARNtoo

Myriad’sLimita8onsResourceOverSubscrip=on

Page 23: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

•  NoGlobalQuotaEnforcement

•  NoGlobalPriori=es

Myriad’sLimita8ons

Page 24: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

•  Elas=cResourceManagement

•  BinPacking•  Stability

•  LongList…

Myriad’sLimita8ons

Page 25: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

UnifiedScheduler

Page 26: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

HighLevelCharacteris8cs

•  GlobalQuotaManagement

•  CentralSchedulingpolicies

•  Oversubscrip=onforbothOnlineandBatch

•  Isola=onandbinpacking

•  SLAguaranteesatGlobalLevel

Page 27: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

UnifiedScheduler

Page 28: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

FewTakeaways…

•  Weneedoneschedulinglayeracrossallworkloads

•  Par==oningresourcesarenotgood •  Atleastcansave30%resources

•  StabilityandsimplicitywinsinProduc=on•  Mul=LevelofresourceManagementandschedulingwillnotbescalable

Page 29: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

Page 30: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

Ques=ons?

[email protected]@apache.org

Page 31: Hadoop Infrastructure @Uber Past , Present and Future

U B E R | Data

ThankYou!!!