26
ENGINEERING BIG DATA WITH HADOOP BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention

Engineering Big Data with Hadoop

Embed Size (px)

DESCRIPTION

This presentation explains about Introduction of BIG DATA with HADOOP.

Citation preview

Page 1: Engineering Big Data with Hadoop

ENGINEERING BIG DATA WITH

HADOOP

BYInternational School of

Engineering {We Are Applied Engineering}

Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention

Page 2: Engineering Big Data with Hadoop

OVERVIEW

• WHAT IS BIG DATA?

• EXPLOSION OF DATA

• DATA CONTRIBUTIONS

• DATA EXPLOSION

• WHO ARE THE PLAYERS?

• BIG DATA–BIG PICTURE– LANDSCAPE

• BIG DATA– ENTERPRISE ROLES

• WHAT IS HADOOP?

• EVOLUTION OF HADOOP

• HADOOP ECOSYSTEM

• HADOOP ECOSYSTEM MAP

• HADOOP: 30,000 FEET VIEW

• BIG DATA & ANALYTICS Case studies

• VIDEO OF HADOOP ECOSYSYTEM

Page 3: Engineering Big Data with Hadoop

WHAT IS BIG DATA?

• High-volume, high-velocity and high- variety information assets that demand cost- effective,

innovative forms of information processing for enhanced insight and decision making.

-Gartner

HIGH VOLUME

HIGH VELOCITY

HIGH VARIETY

Page 4: Engineering Big Data with Hadoop

EXPLOSION OF DATA

Page 5: Engineering Big Data with Hadoop

Source: http://www.emc.com/leadership/digital-universe/iview/index.htm

Page 6: Engineering Big Data with Hadoop

DATA CONTRIBUTIONS

Page 7: Engineering Big Data with Hadoop

DATA EXPLOSION

Bing ingests > 7 petabyte a month

The Twitter community generates over 1 terabyte of tweets every day

Cisco predicts that by 2013 annual internet traffic flowing will reach 667

exabytes

Page 8: Engineering Big Data with Hadoop

Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf

Page 9: Engineering Big Data with Hadoop

Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf

Page 10: Engineering Big Data with Hadoop

WHO ARE THE PLAYERS?

Page 11: Engineering Big Data with Hadoop
Page 12: Engineering Big Data with Hadoop

BIG DATA–BIG PICTURE– LANDSCAPE

Page 13: Engineering Big Data with Hadoop

BIG DATA– ENTERPRISE ROLES

Page 14: Engineering Big Data with Hadoop

INTRODUCTION TO

Page 15: Engineering Big Data with Hadoop

WHAT IS HADOOP?

• Flexible

Structured/Unstructured

Text/Binary

Schema/Schema less

• 100% Open Source

• Scalable

– Petabytes of Data

– Thousands of Nodes

Source: http://cloudtimes.org/2013/06/25/hadoop-as-a-service-market-growing/

Page 16: Engineering Big Data with Hadoop

How does an Elephant Sneak up on you?

EVOLUTION OF HADOOP

Page 17: Engineering Big Data with Hadoop

HADOOP ECOSYSTEM

Chukwa Sqoop Zookeeper Pig

HBase Avno Mahout Flume

WhirrMap Reduce Engine

Hama

Hive

Hadoop Distributed File System

Hadoop Common

Page 18: Engineering Big Data with Hadoop

Source: http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/

HADOOP ECOSYSTEM MAP

Page 19: Engineering Big Data with Hadoop

Hadoop Evolution – Map Explained!

• How did it all start- huge data on the web!

• Nutch built to crawl this web data

• Huge data had to be saved- HDFS was born!

• How to use this data? Map reduce framework built for coding and running analytics – java,

any language-streaming (Hadoop streaming)

• How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs –

fuse,webdav, chukwa, flume, Scribe

• Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!

Page 20: Engineering Big Data with Hadoop

Continued

• High level interfaces required over low level map reduce programming– Pig, Hive, Jaql

• BI tools with advanced UI reporting- drilldown etc- Intellicus

• Workflow tools over Map-Reduce processes and High level languages: Oozie

• Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere,

eclipse plugin, cacti, ganglia

• Support frameworks- Avro (Serialization), Zookeeper (Coordination)

• More High level interfaces/uses- Mahout, Elastic map Reduce

• OLTP- also possible – Hbase

Page 21: Engineering Big Data with Hadoop

• Distribute data initially

– Let processors / nodes work on local data

– Minimize data transfer over network

– Replicate data multiple times for increased availability

• Write applications at a high level

– Programmers should not have to worry about network programming, temporal

dependencies, low level infrastructure, etc

• Minimize talking between nodes (share-nothing)

HADOOP: 30,000 FEET VIEW

Page 22: Engineering Big Data with Hadoop

BIG DATA & ANALYTICS

Case Studies

Page 23: Engineering Big Data with Hadoop

YAHOO - PERSONALIZATION

Page 24: Engineering Big Data with Hadoop

YAHOO SEARCH ASSIST

Page 25: Engineering Big Data with Hadoop

For Detailed Description of HADOOP ECOSYSTEM

components

checkout our video on

Page 26: Engineering Big Data with Hadoop

Plot no 63/A, 1st Floor, Road No 13, Film Nagar, Jubilee Hills, Hyderabad-500033

For Individuals (+91) 9502334561/62For Corporates (+91) 9618 483 483

Facebook: www.facebook.com/insofe

Slide share: www.slideshare.net/INSOFE

International School of Engineering