30
21 Big Data and Analytics A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012

Analyticsand bigdata

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Analyticsand bigdata

21 Big Data and Analytics A Technical Perspective

Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain

November 2012

Page 2: Analyticsand bigdata

2 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Between the dawn of civilization and 2003, the human race created 5 exabytes of data Now we generate that every 2 days Total amount of global data is expected to grow to 2700 exabytes during 2012, up 48% from 2011

= 1,000,000 Tb 1 Exabyte

Page 3: Analyticsand bigdata

3 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Big Data Defined

Techniques and technologies that make handling data at extreme scale affordable.

Source: Forrester Research, ctoforum.org

VARIETY

Structured -> Semi-structured -> Unstructured

VOLUME

Terabytes -> Exabytes

VELOCITY

Batch -> Streaming Data

Page 4: Analyticsand bigdata

4 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Evolution of Analytics

2000s 2010s 1990s Late 2000s

Predictive Prescriptive Descriptive

What happened?

Standard Reporting

What could

Happen?

Simulation

Why did it happen?

Query / Drill down

What should I be doing?

Optimization

Page 5: Analyticsand bigdata

5 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

How is Big Data Analytics Different?

BIG DATA ANALYTICS

10s of TB to 100's of PB's

External + Operational

Mostly Semi-Structured

Experimental, Ad Hoc

GBs to 10s of TBs

Operational

Structured

Repetitive

Mathematics

Workload

Variety

Sources

Volumes

TRADITIONAL BI

Addition (Aggregation)

Complex Algorithms / Linear Programming

Page 6: Analyticsand bigdata

6 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

The Big Data Lifecycle

Manage

Enrich

Insight

Source: hadoop.apache.org; Microsoft.com; ibm.com

Page 7: Analyticsand bigdata

7 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Manage Data

ANY DATA, ANYWHERE, ANY SIZE

Non-Relational Relational Streaming

12345894597573629009890467382 3458945975736290098904673

945975736290098904673 8945975736290098

Data Movement

Source: hadoop.apache.org; Microsoft.com; ibm.com

Page 8: Analyticsand bigdata

8 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

ENRICH by Combining and Refining!

Discover

Combine

Refine

Source: Microsoft.com, oracle.com, ibm.com

Page 9: Analyticsand bigdata

9 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Insight | Anywhere, Any Device, Any User

ANY DATA, ANYWHERE (DEVICES), ALL USERS

Source: Microsoft.com, oracle.com, ibm.com

Page 10: Analyticsand bigdata

10 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

BIG DATA REQUIRES AN END-TO-END APPROACH

INSIGHT Self-Service Collaboration Corporate Apps Devices

ENRICH Discover Combine Refine

F(x)

MANAGE Relational Non-relational Streaming Analytical

Source: Microsoft.com, ibm.com

Page 11: Analyticsand bigdata

11 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

We are spoilt for choice in the marketplace

Product Proliferation

Page 12: Analyticsand bigdata

12 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL Source: Product Logos of Big Data Companies

Page 13: Analyticsand bigdata

13 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Enterprise Data Warehouse

Hadoop

Aggregate Oriented DB

In-Memory Stores

Source: Product Logos of Big Data Companies

Page 14: Analyticsand bigdata

14 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

• Requires referential integrity and

structured data - lack of flexibility and

agility

• Analytics and aggregation using OLAP

• “Shared-nothing” MPP Architecture

enable massive scale out architecture

• Best suited for Analytics using

structured data

• Key considerations include Data

Quality/Governance, structuring data,

segmenting analytics workloads

Ingestion Velocity

Variety

Volume

Processing Velocity

Analytics Complexity

ENTERPRISE DATA WAREHOUSES

Page 15: Analyticsand bigdata

15 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

• Java-based open-source framework

• Hadoop Core – MapReduce and HDFS

Structuring delayed until analytics

performed

• Flexibility as business grows/evolves

• Flexibility to build complex

algorithms/models for analytics

purposes

• Only option for Petabyte Range

• Best suited for batch-oriented analytics

• Works best when it’s possible to design

analytics algorithms as “scatter-gather”

• Key considerations: HDFS- file size,

map-reduce algorithm., sequential file

processing, data distribution

Ingestion Velocity

Variety

Volume

Processing Velocity

Analytics Complexity

HADOOP

Page 16: Analyticsand bigdata

16 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

• Maintains data in-memory and SSD

• Leverages shared-nothing architecture

to provide scalability

• In memory Databases (IMDB) – row or

column oriented schema

• In-memory Data grids (IMDG) – key-

value and de-normalized

IMDB: Best suited for real-time analytics

on structured data. Used for specialized

data marts as well as for OLTP needs

Key considerations: Data organization,

parallel query

IMDG: Suited for fast key-based data

access patterns or processing.

Key considerations: data distribution, key-

definition, data-process co-location

Ingestion Velocity

Variety

Volume

Processing Velocity

Analytics Complexity

IN-MEMORY STORES

Page 17: Analyticsand bigdata

17 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

• Highly scalable and available

distributed data-stores

• De-normalized data structures, data

organised as Aggregates. Data saved as

key-value, documents or columns

• Enable faster read/writes on

aggregates

• Best suited for analytics on semi-

structured data where access patterns

that can be bound in “a” key

• Key considerations: data distribution,

aggregate structure, key-definition,

data-process co-location

Ingestion Velocity

Variety

Volume

Processing Velocity

Analytics Complexity

AGGREGATE ORIENTED DB

Page 18: Analyticsand bigdata

18 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Volume Variety Ingestion Velocity

Processing Velocity

Analytics Complexity

Enterprise Data Warehouse

Hadoop

In-Memory Stores

Aggregate-Oriented DB

Product Category Comparison

Specific product selection will depend on an assessment of data and analytics requirements

Page 19: Analyticsand bigdata

19 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

in·te·gra·tion • cov·er·age • pre·vis·i·bil·i·ty Aditya Gandhi

ADVANCED

PHYSICAL PORTFOLIO OPTIMIZATION

Page 20: Analyticsand bigdata

20 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

• Making the next buck is harder • Constantly changing environment • Decisions are narrow or historical

CH

ALL

ENG

E

Page 21: Analyticsand bigdata

21 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

CH

ALL

ENG

E

• Vast but un-captured information • Increasing volume / complexity • Coarse-grained operations

Page 22: Analyticsand bigdata

22 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

CO

NC

EPT

• Toolset like a chess simulator • Takes in current state of the board • Provides best actions to take

Page 23: Analyticsand bigdata

23 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

Markets

Price forecasts

Forward Curves

Volatilities

Costs and Tariffs

Asset Characteristics

Commodity In

Commodity Out

Transport

Storage

Processing

Plants

Beginning positions

Storage Inv

In transit Inventory

Exch Imbalance

Framework

Optimization User Actions

TARGET TRANSACTIONS:

Mkt Optimization formulates the optimal shape of transactions based on target portfolio and beg positions

EXECUTED TRANSACTIONS:

Exogenous and endogenous constraints and factors cause deviation from plan

Page 24: Analyticsand bigdata

24 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

Retail Analytics Pankaj Jain

Page 25: Analyticsand bigdata

25 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

Aspects of Retail Analytics

Market Basket

Analytics

Credit and Loyalty Card

Analytics

Shopper Insight

Store Location

Data

Geo Demo-

graphics

Category Segmentation

Product Affinity Brand Knowledge

Customer Segmentation

Loyalty

Lifestyle and Life Stage Segmentation

Brand Awareness Impulse Shopping

Store Location Store Size Store Format Competitive Analysis

Sociology Income/Education Infrastructure

Page 26: Analyticsand bigdata

26 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

Retail Analytics Business Problems

How much money will customer spend

during the next visit?

When will customer visit the store next?

How many customers are price

sensitive?

How do I balance my product range across

store formats?

How can I find gaps in the product

range?

What should be delisted to introduce

new product

Do my shoppers buy across

range?

Page 27: Analyticsand bigdata

27 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

Analytics Lifecycle

•Poor Structure

•Volume

•Inconsistent

POS & Other Data

•Volume

•Segmented

•Continuously Improved

Organized Data

•Template Reports

•Rapid Analysis

Summarized Data

•Segmentation

•Complex Algorithm

Processed Data

Attributes

Insight

Enrich

Page 28: Analyticsand bigdata

28 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

Business Outcome

• Effective Promotions and Communication

o Over 8% increase in Steadfast customer and 5% more sales

o Over 80% acceptance of offers

o Over a million $ growth in the category

• Over 60% growth in the range with higher repeat sales and

new customers due to Range analysis.

• Addition of three new aerated drinks increased the sales of

that category by 12%.

• Overall higher consistent business growth.

Big Data Small Insights

Page 29: Analyticsand bigdata

29 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL

Conclusion

• Big data has more dimensions than just "Big"

• Lifecycle is critical

• Choose your product and platform wisely

• Big data analytics is lot more insightful than just

analytics

oBig Data Small Insights

oAsk the right question

• Ramp up your college statistics and mathematics!

Page 30: Analyticsand bigdata

30 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Thank You!