Hadoop 101 v2

Hadoop 101A really quick overview of the concepts…

A few Terabytes of Data...

Text processing--a few hours?

But what if you have more data?

Network Storage--Petabytes!

What if you need compute power for complex algorithms?

8 core? 16 Cores? 64 cores? 512 GB RAM?

A network of commodity computers

Run jobs on PART of the data on each computer then AGGRETAGE the intermediary results from each

computer.

Let’s add a computer to manage the process of job delegation, merging the results...

and keeping track of the results...

We also need something to keep track of what files are where, so we know where the data is that needs

to be computed...

When you have a lot of computers, and even more hard drives,

one thing I can guarantee...

Computers will eventually fail.

Hard drives will eventually fail.

Even whole racks will fail.

If a computer fails and you only have one copy of your data...

You will be very, very unhappy.

So lets store multiple copies of the data. Hard drives are CHEAP!

If one hard drive fails... we are still OK

If one computer fails... we are still OK

Even if a whole rack fails... we are still OK

Once we find a failure let’s have the system recopy the copies.

Send the compute job to all nodes.

And let it run on it’s part of the data….

One is stuck….

We have three copies—we can redistribute the compute

And take the one that finishes fastest

Merge sorted sets based on some key…

A-E F-J K-O P-T U-Z

…and write partial results

PART-01 PART-02 PART-03 PART-04 PART-05

Guess, what? We’ve just invented Hadoop!

PART-03

PART-01

PART-02

A-E F-J

So let’s talk about the pieces of Hadoop.

Data nodes store and manage the data on a single “slave” computer

Data Node

Task trackers manage the compute

Data Node

Task Tracker

Job tracker manages task trackers, ships code to compute nodes

Data Node

Task TrackerJob Tracker

Name node manages distribution and replication on the data nodes

Data Node

Name Node

Map Reduce

HDFS (Hadoop Distributed File System)

Data Node

Name Node

Visual Example

Shuffle

Reduce

Putting It All Together

Hadoop 101 v2

Data & Analytics

Genealogy 101 Extended Version for Cornerstone 051814 v2

Hadoop 101 - Big Data Technology

Hadoop , Hadoop , Hadoop !!!

Ether mining 101 v2

i-Options LK-101 LK-102 LK-103smupload.evocdn.co.uk/olivetti/uploads/download/i-Options-LK-101-LK-102... · 1 1 i-Option LK-101 v2 /102/103 v2/105 OUTLINE OUTLINE 1. PRODUCT OUTLINE

101 4.6 create and change hard and symbolic links v2

101 Going Online For Small Orgs V2

Securing Hadoop in an Enterprise Context (v2)

Night Club Photography 101 v2

91 LPI 101 V2 Preparation

Hadoop on Azure 101 What is the Big Deal?

Dominion KX II-101-V2 User Guidesupport.raritan.com/dominion-kx-ii-101/version-3.3-(kx-ii-101-v2... · Configuring Date/Time Settings ... Supported Operating Systems ... or Apple

101 1.3 runlevels, shutdown, and reboot v2

Hadoop 101

Containers #101 : Introduction to Docker Compose V2

Hadoop 101 (v1) (20150730)

Hadoop 101 - Kansas City Big Data Summit 2014

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014

Dominion KX II-101-V2 - 42U · Stopping CC-SG Management ... Thank you for purchasing the Dominion KX II-101-V2. The KX II-101-V2 ... A PDF version of the help can be downloaded from

Containers #101: Introduction to Docker Compose V2