22
De-Mystifying Big Data Prasad Mavuduri American Institute of Big Data Professionals

De-Mystifying Big Data

Embed Size (px)

Citation preview

Page 1: De-Mystifying Big Data

De-Mystifying Big Data Prasad Mavuduri

American Institute of Big Data Professionals

Page 2: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Agenda

Analyze & Define

•Progression of Analytics•The new phenomenon - Big Data•Big Data Defined

Technology

Discussion

•Big Data Technology – Hadoop•Big Data – Big Savings – Hadoop

Use Cases

•What can we solve with Big Data – example

•What is next ? Where are the opportunities

Page 3: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Progression of Analytics

Structured – Known Data

Traditional – ETL, Data Marts, DW, RDBMS

Growth – Normal Incremental – Archive

Less Cross Functional Integration

More Tactical than Strategic

Sizes GBs to TBs

Data Architects vs. Functional

So Far…..

Page 4: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

A serious Matter

Page 5: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

The new phenomenon - Big Data

Growing Pains ??!!!

Big Data ?!!!

Is it just data ?

Page 6: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

The new phenomenon - Big Data

1. No to “fit-for-all” but Yes to “fit-for-purpose”

2. Proliferation of data sources – variety of data

3. Proliferation of volume of data 4. The demand for the speed (velocity) of

data5. Demand for high value & accuracy

( veracity) of info 6.Massive Parallel processing7. Commodity servers vs. Specialized

servers

DATA DRIVEN BUSINESS

isTHE SMART BUSINESS

Page 7: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Big Data Definition

• High volume of data which is growing every year more than 50 % every year

• High Speed Streaming, Machine generated data etc

• Different Data sources In-the-enterprise and external data around the enterprise data

• Data collected taking huge memory (typically 100 TB or more) where RDBMS is inefficient

Value Variety

VolumeVelocity

VERACITY

Meaningful

Page 8: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Big Data Definition

VERACITY

Big Data is the new art and science, using Massive Parallel Processing (MPP) technology, of collection, storage, processing, distribution, and analysis of data with any of the attributes – high volume, high velocity, high variety to extract high value and greater accuracy (veracity).

IBM Says, BIG DATA means 1.Volume (Terabytes --‐> Zettabytes)2. Variety (Structured

--‐> Semi--‐structured --‐> Unstructured)3. Velocity (Batch --‐> Streaming Data)

Page 9: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Big Data Technologies – Typical Stack

Big Data Infrastructure

Data Manipulation & Management

Data Analysis & Mining

Predictive & Prescriptive Analysis

Process Automation& Decision Support Systems

Big Data Stack

Page 10: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Big Data Technologies – SMAQ

User-friendly Analytics1. PIG ( simple Query Language),

2. HIVE ( Similar to SQL)3. Cascading ( Workflow)

4. Mahout ( Machine Learning)

5. Zookeeper (Coordination Service)Data Distribution & Management

across nodes in Batch Mode1. Hadoop MapReduce

2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M),

Strom, HPCC (LexisNexis)

Distributed Non-Relational 1. HBase ( columnar DB)

2. HDFS – Hadoop Distributed File System

Query

Map Reduce

Storage

SMAQ Stack

Page 11: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Big Data – Big Savings – Economics

ROI on Big Data Approach (with Hadoop)Source : American Institute for Analytics

1TB of RDBMS TCO $37,000 - Traditional RDBMS $2,000 only !!!! HadoopSource :American Institute for Analytics

Page 12: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Where is the market on Big Data

Infrastructure / Framework / Analytics software

Horizontal Solutions like EDW etc

Heal

th C

are

Reta

il In

dust

ry

Gove

rnm

ent /

Pu

blic

sect

orEd

ucat

ion

& Hu

man

Cap

ital

Heal

th

Scie

nces

/ Ge

nom

icsTe

leco

mm

unic

atio

ns /

Serv

ices

Ener

gy &

Ut

ilitie

sE-

Com

mer

ce /

Mar

ketin

gM

edia

&

Ente

rtain

men

t

Source: IDC 2011 2010 2011 2012 2013 2014 20150

4

8

12

16

Big Data Market In $B

Current

State

Page 13: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Web LogsImages &

VideosSocial Media

Documents

Structured Data

Big Data /

Hadoop etc.

Existing EDW

Prescriptive

Predictive

Reporting

OLAP

Modeling

Integrated Big data Implementation - Architecture

Coexistence of Big Data with existing EDW

Connectors /

Adapters

Page 14: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Web LogsImages &

VideosSocial Media

DocumentsStructured

Data

Big Data /

Hadoop etc.

Prescriptive

Predictive

Reporting

OLAP

Modeling

Pure Big data Implementation - Architecture

Pure Big Data

Connectors /

Adapters

BarriersDisruption to existing Analytics ?!Roadmap / MethodologyCertainty of costs

HADOOP / Big Table can replace traditional EDWs !!

Page 15: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Big Data Landscape

Page 16: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Big Data Landscape

Page 17: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Applied BIG Data

Page 18: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

BIG Data Opportunities

Some Gaps & opportunities

•Real-time Analysis ( may be use SAP HANA etc !!)

•User interface (UI) frameworks

•App development Big Data on Cloud (multi-Tenancy)

•Security & Data Governance

•Cross Application Integration

•Industry Standards

Page 19: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

AIBDP – Contribution to Big Data

Page 20: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Business Focus Identify data needsIdentify Business Issues Layout data dependencies between functions Resolve Competing priorities Clearly lay out the levels of data, cross-functional requirements

Stakeholder Focus Identify the stake holders Align best practices with the project Plan out the objectives, scope, and timelinesIdentify the KPIs, Reports, Dashboards, Predictive & Prescriptive Analysis to be delivered

Technology Focus Synergies in current technology Take stock of existing “technology assets” towards Big DataAssess your current capabilities and architecture Identify the resources and minimize “specialties” to exploit synergies with existing resource pool Lay out a development methodology to streamline delivery

Process Focus Establish clear data flows Identify Data Governance execution process – People, Processes, Mechanisms Design the process to be more Business focused than IT Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability)

Our Big Data Strategy at a glance

Page 21: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Our Execution Approach – AGILE methodology

Agile Approach to reduce risks

• Close coordination between the customer and the developer

• Small incremental steps makes testing easier and manageable & avoid surprises

• Early recovery from expectation mismatch

• Clarity on Design understanding and regular communication with user.

• Early warning about risks regular status reports.

• Full Knowledge Transfer

Page 22: De-Mystifying Big Data

RIGHT FOCUS AND ON TARGET

Thank You !!

Please contact us for any enquiries at:

Prasad [email protected] 828 9909

Q & A