View
1.877
Download
0
Category
Preview:
Citation preview
ApacheCon North Amaerica 2013 Presentation Template
Network UsageMinutesMB/sec
11.1224218755.02768554687497
1.51.050175781254.822841796875
211.29907226562510.609931640625
2.50.9464941406254.21170898437501
32.0980957031256.85800781249999
3.55.646904296874988.49869140625
42.35730468758.36802734375
4.57.03864257812510.143896484375
58.75604492187512.00826171875
5.52.82972656258.5447265625
69.80476562512.157607421875
6.51.626552734375015.887373046875
710.079257812511.24982421875
7.53.344667968757.227041015625
88.0620800781250511.31037109375
8.53.746904296875017.75479492187499
99.4621582031250312.40615234375
9.52.487304687500018.56701171875
108.644667968759.167578125
10.51.3833691406255.26523437499999
115.324306640624987.414052734375
11.56.4070410156258.81751953125001
122.3116113281256.708798828125
12.511.6919726562510.10140625
1356.39834960937549.3638671874999
13.595.444199218750298.2463671874998
14111.656708984375116.008935546875
14.5119.0316015625110.67947265625
15119.118359375124.39212890625
15.5117.5168359375127.3216796875
1692.5129980468748109.76359375
16.5122.46328125124.084931640625
17122.35578125129.00892578125
17.589.49899414062588.982216796875
1893.071220703125115.0140234375
18.5115.93789062592.041884765625
19100.27342773437595.4164257812504
19.5106.54077148437581.7628125
20123.50259765625108.70060546875
20.5118.5265625103.958271484375
2197.3502441406246128.571298828125
Memory UsageMinutesAxis Title
183.741664886474698.8
1.583.781969547271998.83
284.044700860977499.08
2.584.144169092178699.12
384.113103151321499.18
3.584.510660171508899.55
484.495437145233299.61
4.584.412235021591299.5
584.519022703170899.62
5.584.477406740188699.57
683.870869874953998.95
6.584.250152111053299.33
783.694100379943998.77
7.584.192377328872499.25
883.891510963439998.98
8.584.260141849517899.35
984.494662284851399.59
9.584.521478414535199.63
1084.443420171737899.53
10.584.537261724472399.59
1184.560370445251599.59
11.584.534263610840199.61
1284.321165084838999.38
12.584.520900249481299.62
1384.512937068939299.61
13.584.499871730804499.58
1484.975296258926499.59
14.584.084808826446298.12
1585.734498500823999.61
15.585.793703794479499.6
1685.807859897613599.59
16.585.76487898826699.61
1785.689723491668799.61
17.585.304903984069899.63
1884.915763139724799.56
18.584.585416316985699.61
1982.810163497924898.03
19.580.992668867111296.19
2082.516628503799497.74
20.584.348499774932999.61
2182.429486513137897.66
Taking the guesswork out of your Hadoop Infrastructure
Steve Watt @wattsteve
Agenda Clearing up common misconceptions
The enterprise perspective is also differentSingle/Dual Socket 1+ GHz4-8 GB RAM2-4 Cores1x 1GbE NIC(2-4) x 1 TB SATA DrivesWeb Scale Hadoop Origins Commodity in 2013Dual Socket 2+ GHz24-48 GB RAM4-6 Cores(2-4) x 1GbE NICs(4-14) x 2 TB SATA DrivesThe average enterprise customers are running 1 or 2 Racks in Production. In that scenario, you have redundant switches in each RACK and you RAID Mirror the O/S and Hadoop Runtime and JBOD the Data Drives because losing Racks and Servers is costly. VERY Different to the Web Scale perspective.
You can quantify what is right for you
Balancing Performance and Storage Capacity with Price
PRICE
Performance
StorageCapacity
$
Hadoop Slave Server Configurations are about balancing the following: Performance (Keeping the CPU as busy as possible)
Storage Capacity (Important for HDFS)
Price (Managing the above at a price you can afford)
Commodity Servers do not mean cheap. At scale you want to fully optimize performance and storage capacity for your workloads to keep costs down.
Weve profiled our Hadoop applications so we know what type of infrastructure we need
Said no-one. Ever.So as any good Infrastructure Designer will tell you you begin by Profiling your Hadoop Applications.
Profiling your Hadoop Cluster
Its pretty simple:Instrument the Cluster
Run your workloads
Analyze the Numbers
High Level Design goals
Dont do paper exercises. Hadoop has a way of blowing all your hypothesis out of the water.
Lets walk through a 10 TB TeraSort on Full 42u Rack: 2 x HP DL360p (JobTracker and NameNode)
18 x HP DL380p Hadoop Slaves (18 Maps, 12 Reducers)
64 GB RAM, Dual 6 core Intel 2.9 GHz, 2 x HP p420 Smart Array Controllers, 16 x 1TB SFF Disks, 4 x 1GbE Bonded NICs
Instrumenting the Cluster
Analysis using the Linux SAR Tool
Outer Script starts SAR gathering scripts on each node. Then starts Hadoop Job.
SAR Scripts on each node gather I/O, CPU, Memory and Network Metrics for that node for the duration of the job.
Upon completion the SAR data is converted to CSV and loaded into MySQL so we can do ad-hoc analysis of the data.
Aggregations/Summations done via SQL.
Excel is used to generate charts.
The key is to capture the data. Use whatever framework youre comfortable with
Examples
ssh wkr01 sadf -d sar_test.dat -- -u > wkr01_cpu_util.csvssh wkr01 sadf -d sar_test.dat -- -b > wkr01_io_rate.csvssh wkr01 sadf -d sar_test.dat -- -n DEV > wkr01_net_dev.csv
ssh wkr02 sadf -d sar_test.dat -- -u > wkr02_cpu_util.csvssh wkr02 sadf -d sar_test.dat -- -b > wkr02_io_rate.csvssh wkr02 sadf -d sar_test.dat -- -n DEV > wkr02_net_dev.csv
Performed for each node results copied to repository node
/home/sar_repositoryreposFile name is prefixed with node name (i.e., wrknn)
I/O Subsystem has only around 10% of Total Throughput Utilization
I/O Subsystem Test Chart
RUN the DD Tests first to understand what the Read and Write throughput capabilities of your I/O Subsystem
X axis is time
Y axis is MB per second
TeraSort for this server design is not I/O bound. 1.6 GB/s is upper bound, this is less than 10% utilized.
Network throughput utilization per server less than a of capacity
Network Subsystem Test Chart
A 1GbE NICs can drive up to 1Gb/s for Reads and 1Gb/s for Writes which is roughly 4 Gb/s or 400 MB/s total across all bonded NICs
Y axis is throughput
X axis is elapsed time
Rx = Received MB/secTx = Transmitted MB/secTot = Total MB/sec
CPU Utilization is High, I/O Wait is low Where you want to be
CPU Subsystem Test Chart
Each data point is taken every 30 secondsY axis is percent utilization of CPUsX axis is timeline of the run
CPU Utilization is captured by analyzing how busy each core is and creating an average
IO Wait (1 of 10 other SAR CPU Metrics) measures percentage of time CPU is waiting on I/O
But generally speaking, you design a pretty decent cluster infrastructure baseline that you can later optimize, provided you get these things right.
High Percentage Cache Ratio to Total Memory shows CPU is not waiting on memory
Memory Subsystem Test Chart
X axis is Memory Utilization
Y axis is elapsed time
Percent Caches is a SAR Memory Metrics (kb_cached). Tells you how many blocks of memory are cached in the Linux Memory Page Cache. The amount of memory used to cache data by the kernel.
Means were doing a good job of caching data so the JVM is not having to do I/Os and incurring I/O wait time (reflected in the previous slide)
Tuning from your Server Configuration baseline
Weve established now that the Server Configuration we used works pretty well. But what if you want to Tune it? i.e. sacrifice some performance for cost or to remove an I/O, Network or Memory limitation?
Network
Data CENTER
COMPUTE
I/O Subsystem
Type (Socket R 130 W vs.
Socket B 95 W)Amount of Cores
Amount of Memory Channels
4GB vs. 8GB DIMMS
Types of Disks / Controllers
Number of Disks / Controllers
Type of Network (1 or 10GbE)
Amount of Server of NICs
Switch Port Availability
Deep Buffering
Floor Space (Rack Density)
Power and Cooling Constraints
Annual Power and Cooling Costs of a Cluster are 1/3 the cost of acquiring the cluster
Introducing: Hadoop Ops whack-a-mole
So as any good Infrastructure Designer will tell you you begin by Profiling your Hadoop Applications.
What you really want is a shared service...
The Server Building Blocks
NameNode and JobTracker Configurations
1u, 64GB of RAM, 2 x 2.9 GHz 6 Core Intel Socket R Processors, 4 Small Form Factor Disks, RAID Configuration, 1 Disk Controller
Hadoop Slave Configurations
2u 48 GB of RAM, 2 x 2.4 GHz 6 Core Intel Socket B Processors, 1 High Performing Disk Controller with twelve 2TB Large Form Factor Data Disks in JBOD Configuration
24 TB of Storage Capacity, Low Power Consumption CPU
In reality, you dont want Hadoop Clusters popping up all over the place in your company. It becomes untenable to manage. You really want a single large cluster managed as a service.
If thats the case then you dont want to be optimizing your cluster for just one workload, you want to be optimizing this for multiple concurrent workloads (dependent on scheduler) and have a balanced cluster.
Single rack configuration
This rack is the initial building block that configures all the key management services for a production scale out cluster to 4000 nodes.
Single Rack Config
42u rack enclosure1u 1GbE Switches 1u 2 Sockets, 4 Disks2u 2 Sockets, 12 Disks 2 x 1GbE TOR switches1 x Management node1 x JobTracker node1 x Name nodeOpen for KVM switch(3-18) x Worker nodesOpen 1u2u 2 Sockets, 12 Disks
Scale out configuration
The Extension building block. One adds 1 or more racks of this block to the single rack configuration to build out a cluster that scales to 4000 nodes.
Multi-Rack Config
1 or more 42u Racks1u 1GbE Switches
2u 2 Sockets, 12 Disks2 x 1GbE TOR switches
Open for KVM switchOpen 2u2u 2 Sockets, 12 Disks
Single rack RA
2 x 10 GbE Aggregation switch
1u Switches
19 x Worker nodes
System on a Chip and Hadoop
Amazing Density Advances. 270 + servers in a Rack (compared to 21!).
Trevor Robinson from Calxeda is a member of our community working on this. Hed love some help:
trevor.robinson@calxeda.com
4-8 GB RAM4 Cores 1.8 GHz2-4 TB1 GbE NICsIntel ATOM and ARM Processor Server CartridgesHadoop has come full circle
Thanks for attending
Thanks to our partner HP for letting me present their findings. If youre interested in learning more about these findings please check out hp.com/go/hadoop and see detailed reference architectures and varying server configurations for Hortonworks, MapR and Cloudera.
Steve Watt swatt@redhat.com @wattsteve
YCSB Workload A and C unable to Drive the CPU
HBase Testing Results still TBD
2 Workloads Tested: YCSB workload C : 100% read-only requests, random access to DB, using a unique primary key value
YCSB workload A : 50% read and 50% updates, random access to DB, using a unique primary key value
When data is cached in HBase cache, performance is really good and we can occupy the CPU. Ops throughput (and hence CPU Utilization) dwindles when data exceeds HBase cache but is available in Linux cache. Resolution TBD Related JIRA HDFS-2246
Click to edit the title text formatClick to edit master title style
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level
Ninth Outline LevelClick to edit Master text stylesSecond levelThird levelFourth levelFifth level
Click to edit the title text formatClick to edit master title style
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level
Ninth Outline LevelClick to edit Master text stylesSecond levelThird levelFourth levelFifth level
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level
Ninth Outline LevelClick to edit Master text stylesSecond levelThird levelFourth levelFifth level
Click to edit the title text formatClick to edit
master
title style
Click to edit the title text formatClick to edit master title style
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level
Ninth Outline LevelClick to edit Master text styles
Second level
Third levelFourth levelFifth level
CPU UtilizationMinutesAxis Title
11.3111.16
1.50.86000000000000110.93
20.5811.8
2.50.429.49
31.6115.04
3.51.4413.66
41.1317.75
4.51.3718.9
52.7619.38
5.56.4622.82
61.2316.41
6.50.84000000000000113.27
71.5317.66
7.52.6315.77
83.0718.01
8.51.3717.57
91.4118.68
9.51.1220.89
100.76000000000000114.68
10.50.66000000000000214.49
110.5116.22
11.50.87000000000000213.52
122.2318.89
12.51.316.14
130.48.61
13.50.197.19
140.3100000000000018.26000000000001
14.50.578.81
150.48.82
15.50.428.58
160.257.5
16.50.298.45
170.3300000000000018.63
17.50.5310.03
180.5911.26
18.50.4514.93
190.4216.9899999999999
19.50.4713.76
200.4211.63
20.50.358.35000000000001
210.597.98000000000001
Disk UsageMinutesMB/sec
12.5825292968750.0103515625
1.51.38337402343750.01134765625
21.0284863281250.0457324218750003
2.50.8297753906250028.72831054687497
33.64860839843750.0208837890625
3.53.398139648437511.390595703125
42.97996093753.2552099609375
4.54.75896972656251.3722998046875
58.82786621093754.53007812500001
5.518.96558593749996.75576171875
62.280639648437510.4578955078125
6.51.3483105468757.18407714843748
73.289824218750010.0403906250000002
7.56.3263574218757.9277490234375
87.3799218751.30099609375
8.53.163657226562516.454521484375
93.0952832031251.192509765625
9.52.48698730468756.94166015625001
101.7353417968750.0746044921875
10.51.12449218756.5438916015625
110.68452636718750.0191650390625001
11.52.42668457031253.416962890625
125.87292968754.12479980468748
12.52.9844628906254.9212548828125
130.92246582031254.34472656249998
13.50.10926757812570.6760693359375
140.1103271484375137.976259765625
14.50.0842431640625147.212631835938
150.119892578125164.000595703125
15.50.1323486328125155.841684570312
160.158115234375001144.496704101562
16.50.339028320312502136.8154296875
170.543745117187502152.350141601563
17.56.908056640625159.017690429687
1815.9964697265625122.44791015625
18.525.004306640625131.913828125
1933.369091796875145.935297851563
19.534.8378222656249126.42880859375
2046.2808984375138.73140625
20.547.3582373046874153.412016601563
2144.83646484375208.069985351562
Recommended