Upload
prasad-mavuduri
View
59
Download
0
Embed Size (px)
Citation preview
De-Mystifying Big Data Prasad Mavuduri
American Institute of Big Data Professionals
RIGHT FOCUS AND ON TARGET
Agenda
Analyze & Define
•Progression of Analytics•The new phenomenon - Big Data•Big Data Defined
Technology
Discussion
•Big Data Technology – Hadoop•Big Data – Big Savings – Hadoop
Use Cases
•What can we solve with Big Data – example
•What is next ? Where are the opportunities
RIGHT FOCUS AND ON TARGET
Progression of Analytics
Structured – Known Data
Traditional – ETL, Data Marts, DW, RDBMS
Growth – Normal Incremental – Archive
Less Cross Functional Integration
More Tactical than Strategic
Sizes GBs to TBs
Data Architects vs. Functional
So Far…..
RIGHT FOCUS AND ON TARGET
The new phenomenon - Big Data
Growing Pains ??!!!
Big Data ?!!!
Is it just data ?
RIGHT FOCUS AND ON TARGET
The new phenomenon - Big Data
1. No to “fit-for-all” but Yes to “fit-for-purpose”
2. Proliferation of data sources – variety of data
3. Proliferation of volume of data 4. The demand for the speed (velocity) of
data5. Demand for high value & accuracy
( veracity) of info 6.Massive Parallel processing7. Commodity servers vs. Specialized
servers
DATA DRIVEN BUSINESS
isTHE SMART BUSINESS
RIGHT FOCUS AND ON TARGET
Big Data Definition
• High volume of data which is growing every year more than 50 % every year
• High Speed Streaming, Machine generated data etc
• Different Data sources In-the-enterprise and external data around the enterprise data
• Data collected taking huge memory (typically 100 TB or more) where RDBMS is inefficient
Value Variety
VolumeVelocity
VERACITY
Meaningful
RIGHT FOCUS AND ON TARGET
Big Data Definition
VERACITY
Big Data is the new art and science, using Massive Parallel Processing (MPP) technology, of collection, storage, processing, distribution, and analysis of data with any of the attributes – high volume, high velocity, high variety to extract high value and greater accuracy (veracity).
IBM Says, BIG DATA means 1.Volume (Terabytes --‐> Zettabytes)2. Variety (Structured
--‐> Semi--‐structured --‐> Unstructured)3. Velocity (Batch --‐> Streaming Data)
RIGHT FOCUS AND ON TARGET
Big Data Technologies – Typical Stack
Big Data Infrastructure
Data Manipulation & Management
Data Analysis & Mining
Predictive & Prescriptive Analysis
Process Automation& Decision Support Systems
Big Data Stack
RIGHT FOCUS AND ON TARGET
Big Data Technologies – SMAQ
User-friendly Analytics1. PIG ( simple Query Language),
2. HIVE ( Similar to SQL)3. Cascading ( Workflow)
4. Mahout ( Machine Learning)
5. Zookeeper (Coordination Service)Data Distribution & Management
across nodes in Batch Mode1. Hadoop MapReduce
2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M),
Strom, HPCC (LexisNexis)
Distributed Non-Relational 1. HBase ( columnar DB)
2. HDFS – Hadoop Distributed File System
Query
Map Reduce
Storage
SMAQ Stack
RIGHT FOCUS AND ON TARGET
Big Data – Big Savings – Economics
ROI on Big Data Approach (with Hadoop)Source : American Institute for Analytics
1TB of RDBMS TCO $37,000 - Traditional RDBMS $2,000 only !!!! HadoopSource :American Institute for Analytics
RIGHT FOCUS AND ON TARGET
Where is the market on Big Data
Infrastructure / Framework / Analytics software
Horizontal Solutions like EDW etc
Heal
th C
are
Reta
il In
dust
ry
Gove
rnm
ent /
Pu
blic
sect
orEd
ucat
ion
& Hu
man
Cap
ital
Heal
th
Scie
nces
/ Ge
nom
icsTe
leco
mm
unic
atio
ns /
Serv
ices
Ener
gy &
Ut
ilitie
sE-
Com
mer
ce /
Mar
ketin
gM
edia
&
Ente
rtain
men
t
Source: IDC 2011 2010 2011 2012 2013 2014 20150
4
8
12
16
Big Data Market In $B
Current
State
RIGHT FOCUS AND ON TARGET
Web LogsImages &
VideosSocial Media
Documents
Structured Data
Big Data /
Hadoop etc.
Existing EDW
Prescriptive
Predictive
Reporting
OLAP
Modeling
Integrated Big data Implementation - Architecture
Coexistence of Big Data with existing EDW
Connectors /
Adapters
RIGHT FOCUS AND ON TARGET
Web LogsImages &
VideosSocial Media
DocumentsStructured
Data
Big Data /
Hadoop etc.
Prescriptive
Predictive
Reporting
OLAP
Modeling
Pure Big data Implementation - Architecture
Pure Big Data
Connectors /
Adapters
BarriersDisruption to existing Analytics ?!Roadmap / MethodologyCertainty of costs
HADOOP / Big Table can replace traditional EDWs !!
RIGHT FOCUS AND ON TARGET
Big Data Landscape
RIGHT FOCUS AND ON TARGET
Big Data Landscape
RIGHT FOCUS AND ON TARGET
Applied BIG Data
RIGHT FOCUS AND ON TARGET
BIG Data Opportunities
Some Gaps & opportunities
•Real-time Analysis ( may be use SAP HANA etc !!)
•User interface (UI) frameworks
•App development Big Data on Cloud (multi-Tenancy)
•Security & Data Governance
•Cross Application Integration
•Industry Standards
RIGHT FOCUS AND ON TARGET
AIBDP – Contribution to Big Data
RIGHT FOCUS AND ON TARGET
Business Focus Identify data needsIdentify Business Issues Layout data dependencies between functions Resolve Competing priorities Clearly lay out the levels of data, cross-functional requirements
Stakeholder Focus Identify the stake holders Align best practices with the project Plan out the objectives, scope, and timelinesIdentify the KPIs, Reports, Dashboards, Predictive & Prescriptive Analysis to be delivered
Technology Focus Synergies in current technology Take stock of existing “technology assets” towards Big DataAssess your current capabilities and architecture Identify the resources and minimize “specialties” to exploit synergies with existing resource pool Lay out a development methodology to streamline delivery
Process Focus Establish clear data flows Identify Data Governance execution process – People, Processes, Mechanisms Design the process to be more Business focused than IT Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability)
Our Big Data Strategy at a glance
RIGHT FOCUS AND ON TARGET
Our Execution Approach – AGILE methodology
Agile Approach to reduce risks
• Close coordination between the customer and the developer
• Small incremental steps makes testing easier and manageable & avoid surprises
• Early recovery from expectation mismatch
• Clarity on Design understanding and regular communication with user.
• Early warning about risks regular status reports.
• Full Knowledge Transfer
RIGHT FOCUS AND ON TARGET
Thank You !!
Please contact us for any enquiries at:
Prasad [email protected] 828 9909
Q & A