22
An Introduction to Big Data Concept & Ecosystem

An introduction to Big Data

Embed Size (px)

Citation preview

An Introduction to Big Data

Concept & Ecosystem

What is Big Data?

Volume • 100s of TBs • PB scale • Too big for

traditional transaction processing

Velocity • Distributed,

Parallel Processing

Variety • Structured &

unstructured content

Veracity • Trustworthiness,

Reliability

Drivers for Big Data Adoption

Big Data Adoption

Commodity Hardware Support

Open Source

Ecosystem Web

Economy

Reduced Storage

Costs

Sources of Big Data

Archives

Documents

Media

Business Apps, Data

Storage Public Web

Social Media

Machine/Sensor Data

Where it is used?

Trends Patterns

Predictions

Usage ScenariosW

hat

we

do? Activities

Conversations

Social Media Photographs

Videos Transactions

Wha

t bi

g da

ta D

oes?

Text Analysis Speech Analysis

Sentiment Analysis

Spending Analysis Geographical Analysis

Working with Big Data

Data Source /Ingestion Data Storage

Data Processing/

Transformation Data Analysis

& Output

Hadoop

Combination of MapReduce engine and HDFS

Shift of responsibilities for availability & distribution

Brings processing closer to the data

Hadoop Eco-System

Apache Hadoop

HBase, Cassandra

Hive, Pig

Sqoop

Mahout

MapReduce, HDFS

Database

Structured Queries

RDBMS Connectivity

Machine Learning/Data Mining

MapReduce

Input

Map • Key

Reduce • Aggregate Value

MapReduce..Word count Example

MapReduce..Word count Example

MapReduce..Word count Example

MapReduce..Word count Example

MapReduce..Word count Example

MapReduce..Word count Example

Hive

•  Started as a sub-project of Hadoop •  Now a top-level Apache project

•  Provides SQL like abstraction layer over MapReduce

•  Has its own HDFS table file format (and it’s fully schema-bound)

•  Can also work over Hbase

•  Acts as a bridge to many BI products which expect tabular data

Big Data + NoSQL

CAP Theorem

Consistency

Availability Partition Tolerance

NoSQL

Relational

NoSQL

• Neo4j • Hbase

• MongoDB • Amazon

DynamoDB

• Redis

Key-Value Stores

Document Stores

Graph Databases

Wide Column/Column Family

Hadoop Distros

Cloudera HortonWorks

MapR IBM

InfoSphere BigInsights

Hadoop In Clouds

Amazon EMR

Microsoft HDInsight

Google Cloud

Platform WHIRR

Additional Information}  To learn more about big data & the eco-system,, get in

touch with us.

[email protected] www.forwardsprint.com

Thank you!

www.forwardsprint.com