19
Hadoop @ eBay Marketplaces Ming Ma June 27 th , 2013

Hadoop @ eBay Marketplaces

  • Upload
    lulu

  • View
    24

  • Download
    2

Embed Size (px)

DESCRIPTION

Hadoop @ eBay Marketplaces. Ming Ma. June 27 th , 2013. O verview. Hadoop growth @ eBay Marketplaces Availability study Opportunities ahead. Big Data @ eBay Marketplaces. 120+ Million Active users 300+ Million search queries every single day 350+ Million items available. Data Sets. - PowerPoint PPT Presentation

Citation preview

Page 1: Hadoop  @ eBay Marketplaces

Hadoop @ eBay Marketplaces

Ming Ma

June 27th, 2013

Page 2: Hadoop  @ eBay Marketplaces

Overview• Hadoop growth @ eBay Marketplaces• Availability study• Opportunities ahead

Page 3: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 3

Big Data @ eBay Marketplaces

120+ Million Active users

300+ Million search queries every single day

350+ Million items available

Page 4: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 4

Data Sets

• Inventory Data– Product Listings, Catalogue, Quantity etc.

•Transactional Data– Buying, Returning etc.

•User Behavioral Data– Click stream, comments, suggestions, user activities etc.

•Customer profiles– Buyer, Seller, Partner information etc.

•Machine data– Logs, application data etc.

Page 5: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 5

Hadoop Evolution @ eBay Marketplaces

2007Single digit nodes

2010Shared cluster• 100s nodes• 1000s +

core• PB• CDH2

2011• Shared

clusters• 1000s node• 10,000+ core• 10s PB• Wilma (0.20)

2012• Shared

clusters• 1000s node• 10,000+ core• 10s PB

2013• Shared

clusters• 4k+ node• 40,000+ core• 50s PB• HDP

2009Search• 10s-

nodes

Page 6: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 6

Shared vs. Dedicated Clusters

Shared clusters – 10s of PB and 10s of thousands of slots per cluster– Run HDP 1.2 – Used primarily for analytics of user behavior and inventory– Mix of production and ad-hoc jobs– Mix of MR, Hive, PIG, Cascading etc.– Hadoop and HBase security enabled

Dedicated clusters– Very specific use cases like Index Building– Tight SLAs for jobs (in order of minutes)– Immediate revenue impact– Usually smaller than our shared clusters, but still big (100s of nodes…)

Page 7: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 7

Job Distribution by Type

Page 8: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 8

Use Case Examples

•Cassini, full re-write of eBay’s search engine:– Use MR to build full and incremental near-real-time indexes– Data for indexing is stored in HBase for efficient updates and random read– Strong SLAs– Run on dedicated clusters

•Related and similar Items recommendations:– Use transactional data, click stream data, search index, etc.– Production MR jobs on a shared cluster

•Analytics dashboard:– Run Mobius MR jobs to join click stream data and transactional data – Store summary data in HBase– Web application to query HBase

Page 9: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 9

eBay Hadoop Data Platform

Data IngestExtract

Load Validate

Transform

ClientsJava

Scala

Pig

Hive Cascading

Mobius

Hadoop Behavioral Transactional Inventory

Metadata Metastore Type System ServiceAPI

Data AccessJava POJO

Pig UDF

Hive UDF

ToolsETL Monitor

Metadata Mgmt

Data Catalog

User Mgmt

Page 10: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 10

Platform Innovation

•Many reliability improvements•New Security features

– Multi-realm support– Encryption– https in hadoop 1

•Hadoop 2.0– MR 1 and YARN binary compatibility

•Automation for operations– Machine decommission and re-commission process

•Data and user management– Metadata management– User account provisioning

Page 11: Hadoop  @ eBay Marketplaces

Overview• Hadoop growth @ eBay• Availability study• Next steps

Page 12: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 12

Case study – defective applications

•HBase: A test app created heavy write load– Test app used all region server RPC threads– All RPCs are blocked by region flush– RPC requests from production HBase MR job timed out

•HDFS: An app created lots of small files inside map tasks– NN RPC Queue length spiked– DN heartbeat RPC can’t be processed– HDFS replication storm

Page 13: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 13

Case study – platform bugs

•Hadoop:– DFSClient.LeaseChecker thread leak in job tracker -> bi-weekly JT restart– dfs.datanode.balance.bandwidthPerSec set to 200MB -> big performance impact

•JVM:– leap second bug -> All clusters were down the same time– GC setting -> NN full GC happened regularly

•OS:– “Divide by zero” in CentOS and RH 6.1 -> machine reboot

Page 14: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 14

Case study – cluster maintenance

•Code rollout:– NN SPOF– RPC compatibility between old and new versions

•Hadoop configuration change:– Likely required Hadoop JVM restart– Rolling restart has impact on job latency– Datanode rolling restart caused HBase region servers to exit

•Machines re-commission:– Hadoop version drift– OS configuration bug reappeared

Page 15: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 15

Metrics

•Definition:– Availability = MTBF ( mean time between failure ) / MTBF + MDT ( mean down time )– Down time includes planned maintenance

•Measurement:– Synthetic transaction approach– Run regular canary work count MR job– Canary job times out in X minutes

Page 16: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 16

More about metrics

•Availability != MTTR ( mean time to recover )– MTTR is more important for applications like Cassini index build

•What is considered “available”?– Performance degradation– % of live slave nodes– Other entry points such as Web UI– Core data set availability– Multi-tenancy scenario

Page 17: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 17

Ways to improve availability

•Automation– Use puppet and daemontools– Monitor system health

•Redundancy– Namenode HA– Hot standby region server

• Isolation– HDFS federation– Region server grouping

•Congestion control– RPC congestion control, Hadoop-9640– Apply to both HDFS and HBase

•Features to enable “no downtime maintenance” – Dynamic configuration update– RPC compatibility– Better ways to do rolling restart

Page 18: Hadoop  @ eBay Marketplaces

Overview• Hadoop growth @ eBay• Availability study• Next steps

Page 19: Hadoop  @ eBay Marketplaces

hadoop @ eBay Marketplaces 19

Opportunities ahead

•More automation

•Availability and scalability– Hadoop 2.0– HBase fast recovery time

•Multi-tenancy– Run production jobs with strong SLAs in big shared clusters– QoS in HDFS and HBase

•New scenarios– Interactive Analysis with SQL language– Direct Hadoop Access from dev machines