Customer summit - big data (final)

BIG DATA Defined:

Data Stack 3.0

Persistent Systems

June 2012

1 24 July 2012

The Data Revolution is Happening Now

The growing need for large-volume, multi-

structured “Big Data” analytics,

as well as … “Fast Data”, have positioned the

industry at the cusp of the most radical

revolution in database architectures in 20

years.

We believe that the economics of data will

increasingly drive competitive advantage.

Source: Credit Suisse Research, Sept 2011

24 July 2012 2

Enterprise Value is Shifting to Data

Mainframe

Operating

Systems

2013 2006

Database

1995 1985 1975 24 July 2012

Organizational leaders want analytics to exploit their growing data and computational power to get smart, and get innovative, in ways they never could before. Source - MIT Sloan Management Review- The New Intelligent Enterprise Big Data, Analytics

and the Path From Insights to Value By Steve LaValle, Eric Lesser,

Rebecca Shockley, Michael S. Hopkins and Nina Kruschwitz

December 21, 2010

What Data Can Do For You

24 July 2012 4

Source: New York Times, September 2, 2009. Tesco, British Grocer, Uses Weather to Predict Sales By Julia Werdigier

http://www.nytimes.com/2009/09/02/business/global/02weather.html

Britain often conjures images of unpredictable weather, with downpours sometimes followed

by sunshine within the same hour — several times a day.

Such randomness has prompted Tesco, the country’s largest grocery chain, to create…its own

software that calculates how shopping patterns change “for every degree of temperature and

every hour of sunshine.”

Determining Shopping Patterns

British Grocer, Tesco Uses Big Data

by Applying Weather Results to Predict

Demand and Increase Sales

24 July 2012 5

GlaxoSmithKline is aiming to build direct relationships with 1 million consumers in a year using

social media as a base for research and multichannel marketing. Targeted offers and

promotions will drive people to particular brand websites where external data is integrated

with information already held by the marketing teams.

Source: Big data: Embracing the elephant in the room By Steve Hemsley

http://www.marketingweek.co.uk/big-data-embracing-the-elephant-in-the-room/3030939.article

Tracking Customers in Social Media

Glaxo Smith Kline Uses Big Data

to Efficiently Target Customers

24 July 2012 6

What does India Think?

Persistent enables Aamir Khan Productions and Star Plus use

Big Data to know how people react to some of the most

excruciating social issues.

http://www.satyamevjayate.in/

24 July 2012 7

Satyamev Jayate - Aamir Khan’s pioneering, interactive socio-cultural TV show - has caught the

interest of the entire nation. It has already generated ~7.5M responses in 4 weeks over SMS,

Facebook, Twitter, Phone Calls and Discussion Forums by its viewers across the world over. This

data is being analyzed and delivered in real-time to allow the producers to understand the

pulse of the viewers, to gauge the appreciation for the show and most importantly to spread

the message. Harnessing the truth from all this data is a key component of the show’s success.

24 July 2012 8

WE ALREADY HAVE DATABASES.

WHY DO WE NEED TO DO ANYTHING

DIFFERENT?

9 24 July 2012

● Transaction processing capabilities ideally suited for transaction-oriented operational stores.

● Data types – numbers, text, etc.

● SQL as the Query language

● De-facto standard as the operational store for ERP and mission critical systems.

● Interface through application programs and query tools

Relational Database Systems for

Operational Store

10 24 July 2012

● Operational data stores store on-line transactions – Many writes, some reads.

● Large fact table, multiple dimension tables

● Schema has a specific pattern – star schema

● Joins are also very standard and create cubes

● Queries focus on aggregates.

● Users access data through tools such as Cognos, Business Objects, Hyperion etc.

Enterprise Data Warehouse for Decision

Support

11 24 July 2012

Data Stack 2.0: Enterprise Data Warehouse Systems

Standard Enterprise Data Architecture

Data Warehouse Engine

Optimized Loader Extraction Cleansing

Analyze Query

Metadata Repository

Relational Databases

Legacy Data

Purchased Data

ERP Systems

Relational Databases

Application Logic

Presentation Layer

Data Stack 1.0:

Operational Data Systems

12 24 July 2012

One in two business executives believe that they do not have sufficient information across their organization to do their job

Source: IBM Institute for Business Value

Despite the two data stacks ..

13 24 July 2012

Data has Variety

24 July 2012 14

Less than 40% of

the Enterprise

Data is stored in

Data Stack 1.0 or

Data Stack 2.0.

Beyond the Operational Systems, data

required for decision making is scattered

within and beyond the enterprise

ERP Systems

CRM Systems

Enterprise

Data Warehouse

Structured

Data Sources

Email Systems Collaboration

/Wiki Sites

Document Repositories

Project artifacts

Employee Surveys

Customer Call

Center Records

Unstructured

Data Sources

Organizational

Workflow

Sensor

Data Sources

CRM Systems

Expense

Management System Vendor

Collaboration Systems

Supply Chain

Systems

Location and

Presence Data

Public

Data Sources

Weather forecasts

Demographic

Economic Data

Social

Networking Data

Twitter

15 24 July 2012

5 Exabytes of information was

created between the dawn of

civilization through 2003, but that

much information is now created

every 2 days, and the pace is

increasing

Eric Schmidt

at the Techonomy Conference,

August 4, 2010 (1 exabyte = 1018 bytes )

Data Volumes are Growing

24 July 2012 16

The Continued Explosion of Data in the

Enterprise and Beyond

80% of new information growth is

unstructured content –

90% of that is currently unmanaged

1990 2000 2010 2020 Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010

800,000 petabytes

35 zettabytes

44x as much

Data and Content

Over Coming Decade

17 24 July 2012

What comes first -- Structure or data?

Schema/

Structure Data

24 July 2012

Structure First is Constraining

Time to create a new data stack for unstructured data. Data Stack 3.0.

19 24 July 2012

The Path to Data Stack 3.0:

Must support Variety, Volume and Velocity

24 July 2012 20

Data Stack 3.0

Dynamic Data Platform

Uncovering Key Insights

Schema less Approach

PBs of Data

End User Direct Access

Structured + Semi Structured

Data Stack 2.0

Enterprise Data Warehouse

Support for Decision Making

Un-normalized Dimensional Model

TBs of Data

End User Access Through Reports

Structured

Data Stack 1.0

Relational Database Systems

Recording Business Events

Highly Normalized Data

GBs of Data

End User Access through Ent Apps

Structured

Can Data Stack 3.0 Address Real Problems?

Large Data

Volume at Low

Diverse Data

beyond

Structured Data

Queries that

Are Difficult to

Answer

Answer Queries

that No One

Dare Ask

24 July 2012 21

Time-out!

Internet companies

have already

addressed the same

problems.

22 24 July 2012

● Twitter has 140 million active users and more than 400 million tweets per day.

● Facebook has over 900 million active users and an average of 3.2 billion Likes and Comments are generated by Facebook users per day.

● 3.1 billion email accounts in 2011, expected to rise to over 4 billion by 2015.

● There were 2.3 billion internet users (2,279,709,629) worldwide in the first quarter of 2012, according to Internet World Stats data updated 31st March 2012.

Internet Companies have to deal with large

volumes of unstructured real-time data.

23 24 July 2012

● Hosted service

● Large cluster (1000s of nodes) of low-cost

commodity servers.

● Very large amounts of data -- Indexing

billions of documents, video, images etc..

● Batch updates.

● Fault tolerance.

● Hundreds of Million users,

● Billions of queries every day.

Their data loads and pricing requirements

do not fit traditional relational systems

24 24 July 2012

● It is the platform that distinguishes them from everyone else.

● They required: – high reliability across data centers

– scalability to thousands of network nodes

– huge read/write bandwidth requirements

– support for large blocks of data which are gigabytes in size.

– efficient distribution of operations across nodes to reduce bottlenecks

Relational databases were not suitable and would have been cost prohibitive.

They built their own systems

25 24 July 2012

Companies have

created business

models to support

and enhance this

software.

Internet Companies have open-sourced the

source code they created for their own use.

26 24 July 2012

Open Source Rules !

Hadoop

Infrastructure

24 July 2012

What about support !

28 24 July 2012

Allows for analysis of massive volumes of information • Structured and Unstructured • External and Internal

Thousands of users, millions of files, terabytes of data needs to be handled

Commoditized hardware can be used to reduce costs

Big Data can and should integrate with existing enterprise information architecture

Only Big Data makes it possible!

Enterprises Always had Data.

Now there is a way to handle it!

24 July 2012 29

PERSISTENT SYSTEMS AND BIG DATA

24 July 2012 30

Persistent Systems has an experienced team of Big Data Experts that has created the technology building blocks to help you implement a Big Data Solution

that offers a direct path to unlock the value in your data.

Big Data Expertise at Persistent ● 10+ projects executed with Leading ISVs and Enterprise Customers

● Dedicated group to MapReduce, Hadoop and Big Data Ecosystem

(formed 3 years ago)

● Engaged with the Big Data Ecosystem, including leading ISVs and

experts

• Preferred Big Data Services Partner of IBM and Microsoft

24 July 2012

Big Data Leadership and Contributions

● Code Contributions to Big Data Open Source Projects, including:

– Hadoop, Hive, and SciDB

● Dedicated Hadoop cluster in Persistent

● Created PeBAL – Persistent Big Data Analytics Library

● Created Visual Programming Environment for Hadoop

● Created Data Connectors for Moving Data

● Pre-built Solutions to Accelerate Big Data Projects

24 July 2012 33

Persistent’s Big Data Offerings 1. Setting up and Maintaining Big Data Platform

2. Data Analytics on Big Data Platform

3. Building Applications on Big Data

Foundational Infrastructure and Platform (Built Upon Selected 3rd Party Big Data Platforms and Technologies;

Cluster of Commodity Hardware)

Persistent Platform Enhancement IP

(PeBAL Analytics Library, Data Connectors)

Persistent Pre-built Horizontal Solutions

(Email, Text, IT Analytics, … )

Persistent Pre-built

Industry Solution: Retail

Technology Assets

Industry Solution: Banking

Industry Solution: Telco

Big Data Custom

Services

Extension of

Your Team

Discovery Workshop

Training for Your Team

Team Formation Process

Cluster Sizing/Config

People Assets

Methodology

24 July 2012 34

Commercial/ Open Source Product

Persistent IP External Data source

Email Server

ector Fram

IBM Tivoli

Web Proxy

Social Me

dia Connector

Twitter, Facebook

Email Server

Web Proxy

Data Warehouse

PIG/Jqal Text Analytics/ GATE/SystemT

Persistent Analytics Library (PEBAL)

Graph Fn Set Fn …. ….. ….. Text Analytics Fn

Solutions

MapReduce and HDFS Cluster Monitoring

Admin App

rkflow

Integratio

ector Fram

BI Tools Reports & Alerts

Persistent Next Generation Data Architecture

24 July 2012 35

Persistent Big Data Analytics Library

WHY PEBAL • Lots of common problems – not all of them are solved in Map Reduce

• PigLatin, Hive, JAQL are languages and not libraries – something is

needed to run on top that is not tied to SQL like interaces

BENEFITS OF A READY MADE SOLUTION • Proven – well written and tested

• Reuse across multiple applications

• Quicker implementation of map reduce applications

• High performance

FEATURES • Organized as JAQL functions, PeBAL implements several graph, set, text

extraction, indexing and correlation algorithms.

• PeBAL functions are schema agnostic.

• All PeBAL functions are tried and tested against well defined use cases.

24 July 2012 36

24 July 2012 37

Analytics

Inverted

Analytics

Statistics

Visual Programming Environment

ADOPTION BARRIERS • Steep Learning Curve

• Difficult to Code

• Ad-hoc reporting can’t always be done by writing programs

• Limited tooling available

VISUAL PROGRAMMING ENVIRONMENT • Use Standard ETL tool as the UI environment for generating PIG scripts

BENEFITS • ETL Tools are widely used in Enterprises

• Can leverage large pool of skilled people who are experts in ETL and BI

• UI helps in iterative and rapid data analysis

• More people will start using it

24 July 2012 38

Visual Programming Environment for

Hadoop

HDFS/ Hive HDFS

Persistent IP

Data Flow UI

PIG Convertor

PIG UDF Library

Big Data Platform

ETL Tool

Metadata

Data Data

Data Sources

PIG code

24 July 2012 39

Persistent Connector Framework

OUT OF THE BOX • Database, Data Warehouse

• Microsoft Exchange

• Web proxy

• IBM Tivoli

• BBCA

• Generic Push connector for *any* content

FEATURES • Bi-directional connector (as applicable)

• Supports Push/Pull mechanism

• Stores data on HDFS in an optimized format

• Supports masking of data

WHY CONNECTOR FRAMEWORK • Pluggable Architecture

20+ Years

24 July 2012 40

Persistent Data Connectors

24 July 2012 41

Persistent’s Breadth of Big Data Capabilities

Horizontal and Vertical Pre-built Solutions

Big Data Platform (PeBAL) analytics libraries and Connectors

IT Management

Big Data Application Programming

Distributed File Systems

Cluster Layer

Tooling

• RDBMS/DWH to import/export data

• Text Analytics libraries

• Data Visualization using Web2.0 and reporting tools - Cognos, Microstrategy

• Ecosystem tools like - Nutch, Katta, Lucene

• Job configuration, management and monitoring with BIgInsight’s job

scheduler (MetaTracker)

• Job failure and recovery management

• Deep JAQL expertise - JAQL Programming, Extending JAQL using UDFs,

Integration of third party tools/libraries, Performance tuning, ETL using JAQL

• Expertise in MR programming - PIG, Hive, Java MR

• Deep expertise in analytics - Text Analytics - IBM’s text extraction solution (AQL + SystemT)

• Statistical Analytics - R, SPSS, BigInsights Integration with R

• HDFS

• IBM GPFS

• Platform Setup on multi-

node clusters, monitoring, VM based

• Product Deployment Persistent IP for Big Data Solutions

Big Data Platform Components 24 July 2012 42

Persistent Roadmap to Big Data

1. Learn

2. Initiate

3. Scale 4. Measure

5. Manage

Discover and

Define Use Cases

Improve Knowledge Base

and Shared Big Data Platform

Upgrade to Production

if Successful

Validate with

Measure Effectiveness

and Business Value

24 July 2012 43

Build a social

graph of all

customers

Overlay sales

data on the

Identify

influential

customers

using network

analysis

Target these

customers for

promotions.

Customer Analytics

24 July 2012 44

Identifying your most

influential customers ?

Targeting influential customers is best way to

improve campaign ROI!

70 million customers

> 1billion transactions

over twenty years

Few thousand

Influential customers

Overview of Email Analytics

● Key Business Needs – Ensure compliance with respect to a variety of business and IT communications and

information sharing guidelines. – Provide an ongoing analysis of customer sentiment through email communications.

● Use Cases – Quickly identify if there has been an information breach or if the information is being shared in

ways that is not in compliance with organizational guidelines.

– Identify if a particular customer is not being appropriately managed.

● Benefits – Ability to proactively manage email analytics and communications across the organization in a

cost-effective way.

– Reduce the response time to manage a breach and proactively address issues that emerge through ongoing analysis of email.

24 July 2012 45

Using Email to Analyze Customer

Sentiment

24 July 2012 46

Sense the mood of your customers through their emails

Carry out detailed analysis on customer team interactions and response times

Analyzing Prescription Data

24 July 2012 47

1.5 million patients are

harmed by medication

errors every year

Identifying erroneous prescriptions can save lives!

Source: Center for Medication Safety & Clinical Improvement

Overview of IT Analytics

● Key Business Needs – Troubleshooting issues in the world of advanced and cloud based systems is highly complex, requiring

analysis of data from various systems.

– Information may be in different formats, locations, granularity, data stores.

– System outages have a negative impact on short-term revenue, as well as long-term credibility and reliability.

– The ability to quickly identify if a particular system is unstable and take corrective action is imperative.

● Use Cases – Identify security threats and isolate the corresponding external factors quickly.

– Identify if an email server is unstable, determine the priority and take preventative action before a complete failure occurs.

● Benefits – Reduced maintenance cost

– Higher reliablity and SLA compliance

24 July 2012 48

Consumer Insight from Social Media

24 July 2012 49

Find out what the customers are talking about your organization or product in the social media

1. Structured Analysis Responses to Pledge, multiple choice questions

2. Unstructured Analysis Responses to following questions • Share your story

• Ask a question to Aamir • Send a message of hope • Share your solution

Content Filtering Rating Tagging System (CFRTS) L0, L1, L2 phased analytics 3. Impact Analysis

Crawling general internet for measuring the before & after scenario on a particular topic

Web/TV Viewer

Response to Pledge multiple choice questions Web, emails, IVR/Calls Individual blogs Social widgets Videos …

Insights for Satyamev Jayate – Variety of

sources

Rigorous Weekly

Operation Cycle

producing instant

analytics Killer combo of Human+Software to

analyze the data efficiently Topic opens on

Sunday

Live Analytics

report is sent

during the show

Data capture

from SMS, phone

calls, social

media, website,

System runs L0

Analysis, L1, L2

Analysts continue

JSONs are

created for the

external and

internal

dashboards

Customer summit - big data (final)

Technology

Big Data Innovation Summit

Transforming the Customer Experience AWS Summit …aws-de-media.s3.amazonaws.com/images/Enterprise Summit... · Transforming the Customer Experience AWS Summit 2015 Reinhard Rabenstein

Big data summit

Big Data Value Association The Hague BDV Small Big Data Summit Small BDVA Summit Hand… · of the Dutch EU Presidency. Big Data Value Association The Hague BDV Small Big Data Summit

Reconnect The Customer Summit

Social Media Customer Care Summit

Customer engagement summit ramos ss

CII Big Picture Summit 2014

Customer Journey Mapping Illustrating the Big Picture - MIMA Summit 2013 - Megan Grocki

Big Data Summit-Hudson Panel

Everbridge 2015 Customer Summit presentation

Truck - amlrv.com · Since 1969, Adventurer Manufacturing’s reputation has been built on a commitment to customer satisfaction. ... Big Sur Expresso Summit 910FBS • Big Sur Decor

Oracle Big Data Summit

Engage Customer Summit November 2013

Big Data Analytics Summit

2012 Customer Engagement Summit Guide

Customer Insights Summit Toronto 2012

Gartner Customer 360 Summit 2011

Chair’s Report: Customer Engagement Summit 2019 · 2019-11-20 · Chair’s Report: Customer Engagement Summit 2019 1 VIEW FORM THE CHAIR CUSTOMER ENGAGEMENT SUMMIT 2019 This event

CUSTOMER RESPONSE SUMMIT LAS VEGAS