28
HADOOP 101 Cluster Computing Made Easy

Hadoop 101: North East Wisconsin Code Camp

Embed Size (px)

Citation preview

Page 1: Hadoop 101: North East Wisconsin Code Camp

HADOOP

101Cluster Computing Made Easy

Page 2: Hadoop 101: North East Wisconsin Code Camp

Show of Hands

Page 3: Hadoop 101: North East Wisconsin Code Camp

Big Data

Page 4: Hadoop 101: North East Wisconsin Code Camp

Big Data

Volume

Variety

Velocity

Page 5: Hadoop 101: North East Wisconsin Code Camp

Common Types of Analysis

Text mining

Index building

Graph creation and analysis

Pattern recognition

Collaborative filtering

Prediction Models

Sentiment Analysis

Risk Assessment

Page 6: Hadoop 101: North East Wisconsin Code Camp

Hadoop

Hadoop is a cluster storage and computing

framework.

Page 7: Hadoop 101: North East Wisconsin Code Camp

Changing of the Guard

“Scale out guarantees that

hardware and software will

fail”

“I don’t want to see anymore

2001 papers about awesome

my IT team was because they

could reshard my database

on demand.”

Page 8: Hadoop 101: North East Wisconsin Code Camp

Storage

A

B

A

A

A

B

B

B

Page 9: Hadoop 101: North East Wisconsin Code Camp

Storage

A

B

A

A

A

B

B

B

Page 10: Hadoop 101: North East Wisconsin Code Camp

Tunneling Through the Cost

Barrier

Page 11: Hadoop 101: North East Wisconsin Code Camp

Solutions

Page 12: Hadoop 101: North East Wisconsin Code Camp

Solutions

Page 13: Hadoop 101: North East Wisconsin Code Camp

Solutions

“In pioneer days they

used oxen for heavy

pulling, and when one ox

couldn’t budge a log, we

didn’t try to grow a larger

ox. We shouldn’t be trying

for bigger computers, but

for more systems of

computers.”

Page 14: Hadoop 101: North East Wisconsin Code Camp

Cluster Computing

Complexities

Process management

Communication

Data movement

Task coordination

Partial failures

Scheduling

Tracking

Page 15: Hadoop 101: North East Wisconsin Code Camp

Cluster Computing

Complexities

Process management

Communication

Data movement

Task coordination

Partial failures

Scheduling

Tracking

RobustnessResiliencePerformanceSimplicity

Page 16: Hadoop 101: North East Wisconsin Code Camp

Where Do You Fit?

Input Split 1

Shuffle and Sort

Record

Reader

Output Format

Reducer

Mapper

Partitioner

Output File

Input Split 2

Record

Reader

Mapper

Partitioner

Input Split n

Record

Reader

Mapper

Partitioner

Output Format

Reducer

Output File

Page 17: Hadoop 101: North East Wisconsin Code Camp

Storage

A

B

A

A

A

B

B

B

Page 18: Hadoop 101: North East Wisconsin Code Camp

Where Do You Fit?

Input Split A

Shuffle and Sort

Record

Reader

Output Format

Reducer

Mapper

Partitioner

Output File

Input Split B

Record

Reader

Mapper

Partitioner

Output Format

Reducer

Output File

Page 19: Hadoop 101: North East Wisconsin Code Camp

Mapper Purpose

Sanitize Data

Select Subsets

Convert

Input Split A

Record

Reader

Mapper

Partitioner

Page 20: Hadoop 101: North East Wisconsin Code Camp

Mapper

Input:

Key

Value

Context

Output:

Key

Value

Input Split A

Record

Reader

Mapper

Partitioner

Mapper

Page 21: Hadoop 101: North East Wisconsin Code Camp

Word Count Mapper

Input: (Long, Text)

Key: 0

Value: “the cat sat on the mat”

Output: (Text, Long)

Key Value

the 1

cat 1

sat 1

on 1

the 1

mat 1

Page 22: Hadoop 101: North East Wisconsin Code Camp

Where Do You Fit?

Input Split A

Shuffle and Sort

Record

Reader

Output Format

Reducer

Mapper

Partitioner

Output File

Input Split B

Record

Reader

Mapper

Partitioner

Output Format

Reducer

Output File

Page 23: Hadoop 101: North East Wisconsin Code Camp

Reducer

Input:

Key

Values // This is an iterable

Context

Output:

Key

Value

Page 24: Hadoop 101: North East Wisconsin Code Camp

Reducer

Key Values

cat 1

mat 1

on 1

sat 1

the 1, 1

cat 1

mat 1

on 1

sat 1

the 2

Reducer

reduce(){

}

part-r-00001

Page 25: Hadoop 101: North East Wisconsin Code Camp

Demo

MRUnit

Mapper

Reducer

Run the whole cycle

Page 26: Hadoop 101: North East Wisconsin Code Camp

Platform

Page 27: Hadoop 101: North East Wisconsin Code Camp

Bibliography

Rear Admiral Hopper http://www.youtube.com/watch?v=1-

vcErOPofQ

Mike Olson talk http://web.archive.org/web/20130729201323id_/http://itc.conversationsnetw

ork.org/shows/detail4868.html

Large Scale C++ by John Lakos http://www.amazon.com/Large-

Scale-Software-Design-John-Lakos/dp/0201633620

Page 28: Hadoop 101: North East Wisconsin Code Camp

Jim Argeropoulos

[email protected]

@exploremqt

https://github.com/exploremqt