Apache con 2013-hadoop

Embed Size (px)

Citation preview

ApacheCon North Amaerica 2013 Presentation Template

Network UsageMinutesMB/sec

11.1224218755.02768554687497

1.51.050175781254.822841796875

211.29907226562510.609931640625

2.50.9464941406254.21170898437501

32.0980957031256.85800781249999

3.55.646904296874988.49869140625

42.35730468758.36802734375

4.57.03864257812510.143896484375

58.75604492187512.00826171875

5.52.82972656258.5447265625

69.80476562512.157607421875

6.51.626552734375015.887373046875

710.079257812511.24982421875

7.53.344667968757.227041015625

88.0620800781250511.31037109375

8.53.746904296875017.75479492187499

99.4621582031250312.40615234375

9.52.487304687500018.56701171875

108.644667968759.167578125

10.51.3833691406255.26523437499999

115.324306640624987.414052734375

11.56.4070410156258.81751953125001

122.3116113281256.708798828125

12.511.6919726562510.10140625

1356.39834960937549.3638671874999

13.595.444199218750298.2463671874998

14111.656708984375116.008935546875

14.5119.0316015625110.67947265625

15119.118359375124.39212890625

15.5117.5168359375127.3216796875

1692.5129980468748109.76359375

16.5122.46328125124.084931640625

17122.35578125129.00892578125

17.589.49899414062588.982216796875

1893.071220703125115.0140234375

18.5115.93789062592.041884765625

19100.27342773437595.4164257812504

19.5106.54077148437581.7628125

20123.50259765625108.70060546875

20.5118.5265625103.958271484375

2197.3502441406246128.571298828125

Memory UsageMinutesAxis Title

183.741664886474698.8

1.583.781969547271998.83

284.044700860977499.08

2.584.144169092178699.12

384.113103151321499.18

3.584.510660171508899.55

484.495437145233299.61

4.584.412235021591299.5

584.519022703170899.62

5.584.477406740188699.57

683.870869874953998.95

6.584.250152111053299.33

783.694100379943998.77

7.584.192377328872499.25

883.891510963439998.98

8.584.260141849517899.35

984.494662284851399.59

9.584.521478414535199.63

1084.443420171737899.53

10.584.537261724472399.59

1184.560370445251599.59

11.584.534263610840199.61

1284.321165084838999.38

12.584.520900249481299.62

1384.512937068939299.61

13.584.499871730804499.58

1484.975296258926499.59

14.584.084808826446298.12

1585.734498500823999.61

15.585.793703794479499.6

1685.807859897613599.59

16.585.76487898826699.61

1785.689723491668799.61

17.585.304903984069899.63

1884.915763139724799.56

18.584.585416316985699.61

1982.810163497924898.03

19.580.992668867111296.19

2082.516628503799497.74

20.584.348499774932999.61

2182.429486513137897.66

Taking the guesswork out of your Hadoop Infrastructure

Steve Watt @wattsteve

Agenda Clearing up common misconceptions

The enterprise perspective is also differentSingle/Dual Socket 1+ GHz4-8 GB RAM2-4 Cores1x 1GbE NIC(2-4) x 1 TB SATA DrivesWeb Scale Hadoop Origins Commodity in 2013Dual Socket 2+ GHz24-48 GB RAM4-6 Cores(2-4) x 1GbE NICs(4-14) x 2 TB SATA DrivesThe average enterprise customers are running 1 or 2 Racks in Production. In that scenario, you have redundant switches in each RACK and you RAID Mirror the O/S and Hadoop Runtime and JBOD the Data Drives because losing Racks and Servers is costly. VERY Different to the Web Scale perspective.

You can quantify what is right for you

Balancing Performance and Storage Capacity with Price

PRICE

Performance

StorageCapacity

$

Hadoop Slave Server Configurations are about balancing the following: Performance (Keeping the CPU as busy as possible)

Storage Capacity (Important for HDFS)

Price (Managing the above at a price you can afford)

Commodity Servers do not mean cheap. At scale you want to fully optimize performance and storage capacity for your workloads to keep costs down.

Weve profiled our Hadoop applications so we know what type of infrastructure we need

Said no-one. Ever.So as any good Infrastructure Designer will tell you you begin by Profiling your Hadoop Applications.

Profiling your Hadoop Cluster

Its pretty simple:Instrument the Cluster

Run your workloads

Analyze the Numbers

High Level Design goals

Dont do paper exercises. Hadoop has a way of blowing all your hypothesis out of the water.

Lets walk through a 10 TB TeraSort on Full 42u Rack: 2 x HP DL360p (JobTracker and NameNode)

18 x HP DL380p Hadoop Slaves (18 Maps, 12 Reducers)

64 GB RAM, Dual 6 core Intel 2.9 GHz, 2 x HP p420 Smart Array Controllers, 16 x 1TB SFF Disks, 4 x 1GbE Bonded NICs

Instrumenting the Cluster

Analysis using the Linux SAR Tool

Outer Script starts SAR gathering scripts on each node. Then starts Hadoop Job.

SAR Scripts on each node gather I/O, CPU, Memory and Network Metrics for that node for the duration of the job.

Upon completion the SAR data is converted to CSV and loaded into MySQL so we can do ad-hoc analysis of the data.

Aggregations/Summations done via SQL.

Excel is used to generate charts.

The key is to capture the data. Use whatever framework youre comfortable with

Examples

ssh wkr01 sadf -d sar_test.dat -- -u > wkr01_cpu_util.csvssh wkr01 sadf -d sar_test.dat -- -b > wkr01_io_rate.csvssh wkr01 sadf -d sar_test.dat -- -n DEV > wkr01_net_dev.csv

ssh wkr02 sadf -d sar_test.dat -- -u > wkr02_cpu_util.csvssh wkr02 sadf -d sar_test.dat -- -b > wkr02_io_rate.csvssh wkr02 sadf -d sar_test.dat -- -n DEV > wkr02_net_dev.csv

Performed for each node results copied to repository node

/home/sar_repositoryreposFile name is prefixed with node name (i.e., wrknn)

I/O Subsystem has only around 10% of Total Throughput Utilization

I/O Subsystem Test Chart

RUN the DD Tests first to understand what the Read and Write throughput capabilities of your I/O Subsystem

X axis is time

Y axis is MB per second

TeraSort for this server design is not I/O bound. 1.6 GB/s is upper bound, this is less than 10% utilized.

Network throughput utilization per server less than a of capacity

Network Subsystem Test Chart

A 1GbE NICs can drive up to 1Gb/s for Reads and 1Gb/s for Writes which is roughly 4 Gb/s or 400 MB/s total across all bonded NICs

Y axis is throughput

X axis is elapsed time

Rx = Received MB/secTx = Transmitted MB/secTot = Total MB/sec

CPU Utilization is High, I/O Wait is low Where you want to be

CPU Subsystem Test Chart

Each data point is taken every 30 secondsY axis is percent utilization of CPUsX axis is timeline of the run

CPU Utilization is captured by analyzing how busy each core is and creating an average

IO Wait (1 of 10 other SAR CPU Metrics) measures percentage of time CPU is waiting on I/O

But generally speaking, you design a pretty decent cluster infrastructure baseline that you can later optimize, provided you get these things right.

High Percentage Cache Ratio to Total Memory shows CPU is not waiting on memory

Memory Subsystem Test Chart

X axis is Memory Utilization

Y axis is elapsed time

Percent Caches is a SAR Memory Metrics (kb_cached). Tells you how many blocks of memory are cached in the Linux Memory Page Cache. The amount of memory used to cache data by the kernel.

Means were doing a good job of caching data so the JVM is not having to do I/Os and incurring I/O wait time (reflected in the previous slide)

Tuning from your Server Configuration baseline

Weve established now that the Server Configuration we used works pretty well. But what if you want to Tune it? i.e. sacrifice some performance for cost or to remove an I/O, Network or Memory limitation?

Network

Data CENTER

COMPUTE

I/O Subsystem

Type (Socket R 130 W vs.

Socket B 95 W)Amount of Cores

Amount of Memory Channels

4GB vs. 8GB DIMMS

Types of Disks / Controllers

Number of Disks / Controllers

Type of Network (1 or 10GbE)

Amount of Server of NICs

Switch Port Availability

Deep Buffering

Floor Space (Rack Density)

Power and Cooling Constraints

Annual Power and Cooling Costs of a Cluster are 1/3 the cost of acquiring the cluster

Introducing: Hadoop Ops whack-a-mole

So as any good Infrastructure Designer will tell you you begin by Profiling your Hadoop Applications.

What you really want is a shared service...

The Server Building Blocks

NameNode and JobTracker Configurations

1u, 64GB of RAM, 2 x 2.9 GHz 6 Core Intel Socket R Processors, 4 Small Form Factor Disks, RAID Configuration, 1 Disk Controller

Hadoop Slave Configurations

2u 48 GB of RAM, 2 x 2.4 GHz 6 Core Intel Socket B Processors, 1 High Performing Disk Controller with twelve 2TB Large Form Factor Data Disks in JBOD Configuration

24 TB of Storage Capacity, Low Power Consumption CPU

In reality, you dont want Hadoop Clusters popping up all over the place in your company. It becomes untenable to manage. You really want a single large cluster managed as a service.

If thats the case then you dont want to be optimizing your cluster for just one workload, you want to be optimizing this for multiple concurrent workloads (dependent on scheduler) and have a balanced cluster.

Single rack configuration

This rack is the initial building block that configures all the key management services for a production scale out cluster to 4000 nodes.

Single Rack Config

42u rack enclosure1u 1GbE Switches 1u 2 Sockets, 4 Disks2u 2 Sockets, 12 Disks 2 x 1GbE TOR switches1 x Management node1 x JobTracker node1 x Name nodeOpen for KVM switch(3-18) x Worker nodesOpen 1u2u 2 Sockets, 12 Disks

Scale out configuration

The Extension building block. One adds 1 or more racks of this block to the single rack configuration to build out a cluster that scales to 4000 nodes.

Multi-Rack Config

1 or more 42u Racks1u 1GbE Switches

2u 2 Sockets, 12 Disks2 x 1GbE TOR switches

Open for KVM switchOpen 2u2u 2 Sockets, 12 Disks

Single rack RA

2 x 10 GbE Aggregation switch

1u Switches

19 x Worker nodes

System on a Chip and Hadoop

Amazing Density Advances. 270 + servers in a Rack (compared to 21!).

Trevor Robinson from Calxeda is a member of our community working on this. Hed love some help:

[email protected]

4-8 GB RAM4 Cores 1.8 GHz2-4 TB1 GbE NICsIntel ATOM and ARM Processor Server CartridgesHadoop has come full circle

Thanks for attending

Thanks to our partner HP for letting me present their findings. If youre interested in learning more about these findings please check out hp.com/go/hadoop and see detailed reference architectures and varying server configurations for Hortonworks, MapR and Cloudera.

Steve Watt [email protected] @wattsteve

YCSB Workload A and C unable to Drive the CPU

HBase Testing Results still TBD

2 Workloads Tested: YCSB workload C : 100% read-only requests, random access to DB, using a unique primary key value

YCSB workload A : 50% read and 50% updates, random access to DB, using a unique primary key value

When data is cached in HBase cache, performance is really good and we can occupy the CPU. Ops throughput (and hence CPU Utilization) dwindles when data exceeds HBase cache but is available in Linux cache. Resolution TBD Related JIRA HDFS-2246

Click to edit the title text formatClick to edit master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level

Ninth Outline LevelClick to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit the title text formatClick to edit master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level

Ninth Outline LevelClick to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level

Ninth Outline LevelClick to edit Master text stylesSecond levelThird levelFourth levelFifth level

Click to edit the title text formatClick to edit
master
title style

Click to edit the title text formatClick to edit master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level

Ninth Outline LevelClick to edit Master text styles

Second level

Third levelFourth levelFifth level

CPU UtilizationMinutesAxis Title

11.3111.16

1.50.86000000000000110.93

20.5811.8

2.50.429.49

31.6115.04

3.51.4413.66

41.1317.75

4.51.3718.9

52.7619.38

5.56.4622.82

61.2316.41

6.50.84000000000000113.27

71.5317.66

7.52.6315.77

83.0718.01

8.51.3717.57

91.4118.68

9.51.1220.89

100.76000000000000114.68

10.50.66000000000000214.49

110.5116.22

11.50.87000000000000213.52

122.2318.89

12.51.316.14

130.48.61

13.50.197.19

140.3100000000000018.26000000000001

14.50.578.81

150.48.82

15.50.428.58

160.257.5

16.50.298.45

170.3300000000000018.63

17.50.5310.03

180.5911.26

18.50.4514.93

190.4216.9899999999999

19.50.4713.76

200.4211.63

20.50.358.35000000000001

210.597.98000000000001

Disk UsageMinutesMB/sec

12.5825292968750.0103515625

1.51.38337402343750.01134765625

21.0284863281250.0457324218750003

2.50.8297753906250028.72831054687497

33.64860839843750.0208837890625

3.53.398139648437511.390595703125

42.97996093753.2552099609375

4.54.75896972656251.3722998046875

58.82786621093754.53007812500001

5.518.96558593749996.75576171875

62.280639648437510.4578955078125

6.51.3483105468757.18407714843748

73.289824218750010.0403906250000002

7.56.3263574218757.9277490234375

87.3799218751.30099609375

8.53.163657226562516.454521484375

93.0952832031251.192509765625

9.52.48698730468756.94166015625001

101.7353417968750.0746044921875

10.51.12449218756.5438916015625

110.68452636718750.0191650390625001

11.52.42668457031253.416962890625

125.87292968754.12479980468748

12.52.9844628906254.9212548828125

130.92246582031254.34472656249998

13.50.10926757812570.6760693359375

140.1103271484375137.976259765625

14.50.0842431640625147.212631835938

150.119892578125164.000595703125

15.50.1323486328125155.841684570312

160.158115234375001144.496704101562

16.50.339028320312502136.8154296875

170.543745117187502152.350141601563

17.56.908056640625159.017690429687

1815.9964697265625122.44791015625

18.525.004306640625131.913828125

1933.369091796875145.935297851563

19.534.8378222656249126.42880859375

2046.2808984375138.73140625

20.547.3582373046874153.412016601563

2144.83646484375208.069985351562