Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it...

Sky Agile HorizonsHadoop at Sky

• What is Hadoop?- Reliable, Scalable, Distributed

• Where did it come from?- Community + Yahoo!

• Where is it now? - Apache Software Foundation

• Why is it called “Hadoop”?

Hadoop at Sky

Overview

To name just a few…

Hadoop at Sky

Who is using it?

This screengrab is from one of the Hadoop clusters at Facebook (May 2010)

Hadoop at Sky

Is it “production” ready?

Hadoop at Sky

So, what does it give you?

• Distributed Filesystem (HDFS)- Name Node- Data Node(s)

• Distributed Processing Infrastructure- Job Tracker- Task Tracker(s)

Hadoop at Sky

Just two things...

• Blocks- 64MB chunks (configurable)

• WORM (Write once, read many)

- NO EDITS- NO APPENDS

• Replication- 3 copies- direct

Hadoop at Sky

HDFS - Overview

Hadoop at Sky

HDFS - ReadName Node

1 1 1 2

3 3 34

Client 1. Get Metadata

2. Fetch Blocks

Data Nodes

Control / Monitoring

Hadoop at Sky

HDFS - WriteName Node

Client 1. Create Metadata

2. Put Blocks

Data Nodes

• Slots- X mapper slots, Y reducer slots (per node)

• Jobs- Queued- Prioritised

• Tasks

- Data-aware

Hadoop at Sky

Distributed Processing

Hadoop at Sky

Distributed ProcessingJob TrackerClient 1. Setup Job

Task Trackers

• Two modes of operation

Hadoop at Sky

Implementation

Name Node

Data Node

Job Tracker

Task Tracker

Standalone

Name Node

Job Tracker

Master

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Slaves

Hadoop at Sky

Building upon the basics

• Map/Reduce – divide & conquer

• Pig – SQL-like “Pig Latin”

• HBase – column-based database

• Hive – data-warehousing (SQL-like queries)

• Mahout – distributed algorithms

Hadoop at Sky

Sub-projects

• Java-based- Key,Value input, Key,Value output(s)

• Intended for low-level / bespoke work

Hadoop at Sky

Map/Reduce

• SQL-like syntax, Map/Reduce under the hood

• Client-only software

Hadoop at Sky

Results

M R M R M R

Hadoop at Sky

Live Demo

• It’s not a magic bullet…

• If the tools you need don’t exist…

• Approach is everything…

• Hadoop is *just* the framework

Hadoop at Sky

Lastly, word of warning...

Hadoop at Sky

Thank you!

Questions?

http://cotdp.com/hadoop.html- Soft-copy of this presentation- VM image available to download- Example code is on GitHub

Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it...

Documents

First Look: Where the Rock Splits the Sky by Phillip Webb (Excerpt)

Hue: The Hadoop UI - Hadoop Singapore

Clean Sky and Clean Sky 2 Where we are - EUCASS 2015 Sky and Clean Sky 2 Where we are. ... Control & Power Systems Externals & Structures ... Electrical ECS demonstration

Hadoop Online Tutorials - indiatrainings.in · Menu Search Hadoop Online Tutorials Author REPLY #1825 Hadoop Eco System › Forums › Hadoop Discussion Forum › 250 Hadoop Interview

Securing Hadoop: Security Recommendations for Hadoop

Hadoop 1.0 vs Hadoop 2.0

SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before

Hadoop Conf 2014 - Hadoop BigQuery Connector

MapReduce and Hadoop File System - lib.hfu.edu.tw · What is Hadoop? By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This make

Administering HDFS · where is the directory for storing the Hadoop configuration files. For example, /etc/ hadoop/conf. 2. Update the NameNode with the new

2. Hadoop - lsd.ls.fi.upm.eslsd.ls.fi.upm.es/nuevas-tendencias-en-sistemas-distribuidos/Hadoop_… · Hadoop Hadoop Software Ecosystem Hadoop MapReduce Hadoop Distributed File System

Where does hadoop come handy

Continuous Delivery for Linux/Windows/Hadoop...Beta Cluster Hadoop JobTracker Jenkins Slave Hadoop node Hadoop node Hadoop node Hadoop node Slave Node Gateway Prod. Cluster PigServer

the place where the sky is always grey

Analyzing Hadoop with Hadoop

Hitachi Data Center Analytics advanced reporter Getting ... · you need to have Hadoop setup along with the advanced reporter. ... Hadoop fulfills this requirement, where you can

SAS Data Loader 2.3 for Hadoop · You must copy Hadoop related files to the machine where you run the vApp. Also, you must configure settings to connect to the Hadoop server. Ask

The Sun’s Motion in the Sky. Where does sunset occur in December?

where sky and sea meet. and beyond. · 2019-05-27 · where sky and sea meet. and beyond. Somecgroup brings together highly experienced companies specializing in engineering, design,

· (Page views ? Hourly? Monthly Hadoop Node Hadoop Node Hadoop Camus Node Hadoop Node Hadoop Node Hadoop Node Hadoop Node Hadoop Node Ad-Hoc Analysis External Datastores Trends