12
Abhishek Mukherjee Utkarsh Srivastava 13 th ,September Not everything that can be counted counts, and not everything that counts can be counted. WELCOME TO BIG DATA TRANING

Map Reduce basics

Embed Size (px)

Citation preview

Page 1: Map Reduce basics

Abhishek MukherjeeUtkarsh Srivastava

13th,September

Not everything that can be counted counts, and not everything that counts can be counted.

WELCOME TO BIG DATA TRANING

Page 2: Map Reduce basics

What are we going to cover today?

Uses of Big Data

What is Hadoop?

Short intro to the HDFS architecture.

What is Map Reduce?

The components of Map Reduce Algorithm

Hello world of map reduce i.e. Word Count Algorithm

Tips and Tricks of Map Reduce

Distribution of twitter data to test Map Reduce jars

Page 3: Map Reduce basics

Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information.

Lots of Data(Zetabytes or Terabytes or Petabytes)

Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information.

A airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time.

What is Big Data?

Page 4: Map Reduce basics

HDFS ARCHITECTURE

Page 5: Map Reduce basics

HDFS ARCHITECTURE CONTD.

Page 6: Map Reduce basics

Map Phase

Combiner Phase(Optional)

Sort Phase

Shuffle Phase

Partition Phase(Optional)

Reducer Phase

Key points

Map Reduce Algorithm

Page 7: Map Reduce basics
Page 8: Map Reduce basics

Hello my name is abhishek Hello my name is utsav

Hello my passion is cricket

Imagine this as the input file:

Map Phase

This file has 2 lines. Each line in the file has a byte offset of its own which serves as a key to the mapper and the value of the mapper is the data which is present In the line.

Page 9: Map Reduce basics

Operation on output of map phaseHello 1

my 1

name 1

is 1

abhishek 1

Hello 1

my 1

name 1

is 1

utsav 1

Hello 1

my 1

passion 1

is 1

cricket 1

Hello(1,1,1)

my(1,1,1)

name(1,1,1)

is(1,1,1)

abhishek(1)

utsav(1)

passion(1)

cricket(1)

Key(tuple of values)

Page 10: Map Reduce basics

The key points are as follows:

Sort the key value pairs according to the key values

Shuffle the mapped output to get values with same key to create a tuple of values with same key

This output is fed to the reducer which in turn maps the values of the tuple by returning a single value for a list of values present in the tuple

Explaination of sort and shuffle phase

Page 11: Map Reduce basics

Reducer phase

Hello(1,1,1)

my(1,1,1)

name(1,1,1)

is(1,1,1)

abhishek(1)

utsav(1)

passion(1)

cricket(1)

Key(tuple of values)

abhishek(1)

cricket(1)

Hello(3)

is(3)

my(3)

name(3)

passion(1)

utsav(1)

Key(single value)

Page 12: Map Reduce basics

ANY QUERIES?