Introduction to Map Reduce

Introduction to MapReduce

Bhupesh Chawdabhupesh@apache.org

DataTorrent

Why Hadoop?Data Growth is mind boggling. Forecast for 2020: 40 Trillion GB

Cost effective

Scalable

Open source

Source: https://rapidminer.com/rapidminer-acquires-radoop/Image: http://seikun.kambashi.com/images/blog/interning_at_placeiq/2.jpg

What is MapreduceIt is a powerful paradigm for parallel computation

Hadoop uses MapReduce to execute jobs on files in HDFS

Hadoop will intelligently distribute computation over cluster

Take computation to data

Analogy: Counting FansGiven a cricket stadium, count the number of fans for each player /

Traditional way

Smart way

Smarter way?

Origin: Functional ProgrammingMap - Returns a list constructed by applying a function (the first

argument) to all items in a list passed as the second argumentmap f [a, b, c] = [f(a), f(b), f(c)]

map sq [1, 2, 3] = [sq(1), sq(2), sq(3)] = [1,4,9]

Reduce - Returns a list constructed by applying a function (the first argument) on the list passed as the second argument. Can be identity (do nothing).

reduce f [a, b, c] = f(a, b, c)

reduce sum [1, 4, 9] = sum(1, sum(4,sum(9,sum(NULL)))) = 14

Sum of squares example

Sum of squares of even and odd numbers

Programming model - Key Value PairsFormat of input- output

(key, value)

Map: (k1 , v1 ) → list (k2 , v2 )

Reduce: (k2 , list v2 ) → list (k3 , v3 )

Sum of squares of odd, even and prime

Map reduce overview

Map reduce with combiner

The Big Picture

Image Source: http://blog.csdn.net/bingduanlbd/article/details/51933914

The Bigger Picture

Image Source: http://blog.csdn.net/bingduanlbd/article/details/51933914

MapReduce Code Example - Word Count

Image Source: http://arnon.me/2014/06/mapreduce/

MapReduce - The Mapper

Source: https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

MapReduce - The Reducer

Source: https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

MapReduce - The Driver

Image Source: https://memegenerator.net/instance/56997204

Hadoop Distributions

Who is using Hadoop?

Referenceshttps://hadoop.apache.org/

www.slideshare.net/SandeepDeshmukh5/hadoopintroduction-46841859

Hadoop - The Definitive Guide - 4th Edition

Images shamelessly stolen from the internet - Have credited though!

AcknowledgementsSandeep Deshmukh, DataTorrent - For some of the slides

Thank You!!

Please send your questions at:bhupesh@apache.org / bhupesh@datatorrent.com

Extra Slides

Anatomy of a Map reduce runIn Map reduce context

The client which submits the job

Job tracker which coordinates the run

Task trackers which run the map and reduce tasks

In YARN context - Will see later

The client which submits the job

YARN resource manager

YARN node managers

Map Reduce App Master

Map reduce in YARN - Will see later

The Map Side - DetailsMap task writes to a circular buffer which it writes the output to

Once it reaches a threshold, it starts to spill the contents to local disk

Before writing to disk, the data is partitioned corresponding to the reducers that the data will be sent to

Each partition is sorted by key and combiner is run on the sorted output

Multiple spill files may be created by the time map finishes. These spill files are merged into a single partitioned, sorted output file

The output file partitions are made available to reducers over HTTP

The Reduce Side - DetailsThe map outputs are sitting on local disks. Reduce tasks will need this

data in order to proceed with the reduce task

Reduce task needs the map output for its particular partition from several maps across the cluster

The reduce task starts copying the map outputs as soon as each map completes. This is the copy phase. The map outputs are fetched in parallel by multiple threads.

Map outputs are copied to jvm’s memory if small enough, else copied to disk. As copies accumulate, they are merged into larger sorted files. When all are copied, they are merged maintaining their sort order

Reduce function is invoked for each key in sorted output and output is written directly to HDFS

Map reduce as unix commandsProblem:

Input1 TB file containing

color names - Red, Blue, Green, Yellow, Purple, Maroon

OutputNumber of occurrences

of colors Blue and Green

Introduction to Map Reduce

Technology

Introduction to Map-Reduce - Smith College€¦ · mith College C omputer Science Dominique Thiébaut dthiebaut@smith.edu Introduction to Map-Reduce CSC352—Week #11

Big Data Analysis using Hadoop Map-Reduce –An Introduction ...b-tierney.com/wp-content/uploads/2019/01/L2-Hadoop-1.pdf · Big Data Analysis using Hadoop Map-Reduce –An Introduction

Map/Reduce Programming Model Ahmed Abdelsadek. Outlines Introduction What is Map/Reduce? Framework Architecture Map/Reduce Algorithm Design Tools and

Reading Material Map Reduce The Map-Reduce Framework · Map-Reduce Steps • Input is typically (key, value) pairs –but could be objects of any type • Map and Reduce are performed

NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Analysis and improvement of map-reduce data …...Contents •Introduction and problem statement •Objectives •Possible solutions: Map Reduce paradigm •Hadoop and MR MAQ •Experimentation

Introduction aux algorithmes map reduce

%% · Map’2 Reduce’0’ Reduce’1

Map Reduce ~Continuous Map Reduce Design~

Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift

An Introduction to Spark Fishel... · 4 A brief review of MapReduce Map MapMap Map Map Map Map Map Map Reduce Reduce Reduce Reduce Key advances by MapReduce: • Data Locality: Automatic

Big Data Analysis using Hadoop Map-Reduce –An Introduction

Introduction to Map/Reduce · Introduction to Map/Reduce Examples and Principles . Recall the framework: D 1 map() • User defines , mapper, and reducer ... reduce()

Big Data Analytics - Universität Hildesheim · Map-Reduce Outline 1. Introduction 2. Parallel Computing 3. Parallel programming paradigms 4. Map-Reduce Lucas Rego Drumond, Information

3.introduction to map reduce

IBM Research ® © 2007 IBM Corporation INTRODUCTION TO HADOOP & MAP- REDUCE

Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Zh Tw Introduction To Map Reduce

Map Reduce

Big Data Analysis using Hadoop Map-Reduce –An Introduction Lecture 2b-tierney.com/wp-content/uploads/2018/02/L2-Hadoop-1.pdf · Map-Reduce –An Introduction Lecture 2 Last Week