42
© 2011 IBM Corporation IBM Confidential Big Data Simon Jeggo 24 May 2012

Ibm swg day 2012 jhb big data (white)

  • Upload
    simonje

  • View
    804

  • Download
    1

Embed Size (px)

DESCRIPTION

Big Data Presentation from IBM Software Day

Citation preview

Page 1: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Big DataSimon Jeggo24 May 2012

Page 2: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

What is Big Data

Some Big Data Use Cases

IBM’s Big Data Platform

Agenda

Page 3: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

What is

Big Data

Page 4: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential4

The Big Data Challenge – a Term defined “Big Data is a term applied to data sets that are large, complex and dynamic (or a combination thereof) and for which there is a

requirement to capture, manage and process the data set in its entirety, such that it is not possible to process the data using traditional software tools and analytic techniques within tolerable time frames.”

New technologies that bring cost effective approaches to explore, understand and predict better business outcomes MPP databases Streams In-database analytics Apache Hadoop Cloud computing platforms Archival storage systems

Why something different? Data x Computation > typical warehouse Schema Flexibility Programming Flexibility

We are engaged in over 50 clients, working with them to apply big data techniques to a class of problems -- e.g., text analytics, log analysis, customer insights, fraud detection etc.

We have a set of unique value-adds – JAQL, GPFS, System-T and others coming… And we can make BigData for our clients sit in their complex IT environment

Integrate Secure

Automate

Integrate Secure

Automate

Page 5: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

…b

y th

e en

d o

f 20

11, t

his

was

ab

ou

t 30

bill

ion

an

d g

row

ing

eve

n f

aste

r

In 2

005

ther

e w

ere

1.3

bil

lio

n R

FID

tag

s in

cir

cula

tio

n…

Page 6: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

An increasingly sensor-enabled and instrumented business environment generates HUGE volumes of

data with MACHINE SPEED characteristics…

1 BILLION lines of codeEACH engine generating 10 TB every 30 minutes!

Page 7: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

350B Transactions/Year

Meter Reads every 15 min.

3.65B – meter reads/day120M – meter reads/month

Page 8: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

In August of 2010, Adam Savage, of “Myth Busters,” took a photo of his vehicle using his smartphone. He then posted the photo to his Twitter account including the phrase “Off to work.”

Since the photo was taken by his smartphone, the image contained metadata revealing the exact geographical location the photo was taken

By simply taking and posting a photo, Savage revealed the exact location of his home, the vehicle he drives, and the time he leaves for work

Page 9: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

The Social Layer in a Instrumented Interconnected World

12+ TBs of tweet data

every day

25+ TBs oflog data every

day

? T

Bs

of

dat

a ev

ery

da

y

2+ billion

people on the Web

by end 2011

30 billion RFID tags today

(1.3B in 2005)

4.6 billion camera phones

world wide

100s of millions of GPS

enabled devices

sold annually

76 million smart meters in 2009… 200M by 2014

Page 10: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Twitter Tweets per Second Record Breakers of 2011

Social-media analytics can be used from healthcare to predicting votes

Challenges– Volume– Velocity– Variety– Language Processing: consider that

Twitter sentences are not well formed and often use urban talk

Page 11: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Extract Intent, Life Events, Micro Segmentation Attributes

Jo Jobs

Tina Mu

Tom Sit

Chloe

Name, Birthday, Family

Not Relevant - Noise

Not Relevant - Noise

Monetizable Intent

Monetizable IntentRelocation

Location Wishful Thinking

SPAMbots

Page 12: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Watson’s advanced analytic capabilities can sort through the equivalent of 200 MILLION pages of data to uncover an answer in 3 SECONDS.

Page 13: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

4Trillion 8GB

iPods

1.8 ZB

1 ZB1 ZB=1T GB

Page 14: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Big Data Use Cases

Cisco turns to IBM big data for intelligent

infrastructure management

• Optimize building energy consumption with centralized monitoring

• Automate preventive and corrective maintenance

Capabilities Utilized:• Streaming Analytics• Hadoop System• Business Intelligence

Applications:• Log Analytics• Energy Bill Forecasting• Energy consumption optimization• Detection of anomalous usage• Presence-aware energy mgt.• Policy enforcement

Page 15: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Applications for Big Data Analytics

Homeland Security

Finance Smarter Healthcare Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics Fraud and Risk

Log Analysis

Search Quality

Retail: Churn, NBO

Page 16: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Retail Industry

Issues for the Retail Industry Deliver value to empowered customers

Move from market analysis to understanding individuals

Take charge of growing volume, velocity and variety of data

Foster lasting connections

Focus on relationships, not just transactions

Invest in expanding the corporate brand

Capture value, measure results

Developing complete understanding of the point of sale

Build new skills and solutions

Page 17: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Structured/Unstructured data

What is our next best offer?

Use Case: Social Media Analytics

As consumers continue to adopt social media technologies, businesses must be able to track customer sentiment and brand perception, finding new opportunities and avoiding business problems from negative perceptions

Solution

Problem

Social Media Analytics What consumers and the industry are saying

Optimizing Internal Operations Better utilization of tools for web analytics

Decreased latency for analysis

Predictive Analytics Promotion targeting for offers

Prospect harvesting

POS analytics, predictive and discovery

Competitive Intelligence

Unlock information across the web

Page 18: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Warehouse Off-load Use Case: Transactional Analytics

Retailers have massive amounts of transaction data that offers a wealth of information about customer purchasing behavior in stores

This data isn't being used effectively because of its volume, the cost to store it, and the barriers to analyzing massive data

Solution

Store POS transactions in BigInsights, reducing the cost from traditional data warehousing

BigInsights enables ad-hoc query for historical reporting, trend analysis, and analyst needs

Data mining feeds for store and customer segmentation, market basket analysis, promotion targeting and other analyticsbased solutions

Historical POS made available for analysis of new product introductions, new store openings, and other disruptive business events

Problem

Page 19: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

FSS - Customer Correspondence Analytics

Current approaches limit insight and predictive analytics to structured data, limiting insight and losing the “state” of the customer

Human-based review of correspondence is limited to small scale sampling

Results of sampling are too dependant on the skills of reviewer and cannot learn from information sets outside of that human reviewers knowledge

Detecting and acting on rapidly changing customer sentiment and understanding why a service touch is occurring from the customer POV

The need to take cost out of service touch points while improving effectiveness/intamacy

Solution Use of un-instrumented or under-instrumented information source to identify and head-off issues

• Extends risk modeling to underutilized sources such as email, chat, social media, call center, and CSR interactions and notes Move from small scale sampling to 100% coverage using BigInsights and cross correlation of information sources

– Natural language analytics combined with machine learning to identify opportunities and issues that are not apparent in small sample sizes and human awareness.

Use of natural language sophisticated analytics to allow develop a predictive understanding customer actions based on customer state

– Topic and sentiment extraction from email, chat, social media, call center, and CSR interactions and notes to predict call reasons and next best action

Problem

Page 20: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

FSS - Risk Platforms and Analytics

Real-time analytics and need to meet SLA windows are outstripping existing infrastructure capabilities

Burst-oriented trading close volumes and resulting position analytics are expanding faster than traditional technologies can cost effectively meet

Standard policies of flushing the data after hours or days is not meeting risk modeling needs

Web, unstructured and machine generated data does not fit existing relational analytics tools

SQL is not the natural tool to manipulate untapped information sources that can improve the dimension of risk modeling

The changing nature of risk requires flexibility in sizing, speed and methods that are not easy to respond to with existing SQL based platforms

Solution

Predict, identify and triage risk anomalies in real-time– Use of SystemT and SystemML analytics engines to identify problems based on historical data and then push those

models to Streams Use of BigInsights to ingest and analyze hundreds of TB an hour to meet SLA requirements for high

volume and complex trading operations Use of un-instrumented or under-instrumented information source to identify and head-off issues

• Extends risk modeling to underutilized sources such as email, chat, social media, call center, and CSR interactions and notes

Problem

Page 21: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

FSS - Social Media Analytics

Important source of information, but requires new approaches to collecting, storing, understanding and utilizing the value to be found.

Fuzzy and messy data are the norm

Little if any of the information is easily structured

Reconciling external and internal sources

Identifying individuals among the fog of external data is not easily done but is often necessary

Linking to known individuals requires Entity analytics concepts and capabilities

Solution

Ability to acquire, parse, analyze, link and persist external information sources to a variety of analytics platforms

– Use of SystemT and SystemML analytics engines to digest and make sense of external sources

Sophisticated text/language analytics to allow powerful and accurate understanding of the external sources

– Entity resolution capabilities to match external sources to known customers and groups– Graphical interfaces to quickly explore data sets, test hypothesis, create production jobs and synthesize data sources

from multiple disparate internal and external sources– Ability to push normalized data to Netezza for analytics with existing methods and tools

Problem

Page 22: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Explosion of data in Telecom

From 500PB per month 2011

To 5,000PB per month 2016

Page 23: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Explosion of Data for Telecom

5 Billon Mobile Phones WW – 550K Android phone activated every day

AT&T Global Network carries 24 Petabytes of data PER DAY

> 2 Billion Internet users 2011> 2 Billion Internet users 2011

Twitter process 7 terabytes of data every dayFacebook processes 10 terabytes of data every day

YouTube – Massive bits through Networks48 Hours of Web of Video uploaded per min3 Billion views per day

Skype 300 Million Min of Video Calls Per MonthSkype 300 Million Min of Video Calls Per Month

Traffic

Revenues

VoiceDominant

Data Dominant

Network Cost

How to lower network costs ($/GB)?

How to improve data revenue ($/GB)?

Profitability Gap (value/GB)

Time

Traffic Volume

$/bit

Telecoms need to be smarter….. smarter networks and smarter business models

All Telecom Enterprises have BIG DATA CHALLENGES

Page 24: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Churn Prediction and Targeted Offers with Social Media Text Analytics

Lost revenue and increase customer acquisition cost is directly related to churn Churn not only lost customers due to pricing, but to service level, new tech offerings, service offerings, and

customer perception

Significant challenge increasing ARPU Revenue per customer is much harder to increase as competition increases

Current churn prediction systems are not up to the challenge

Too slow and not using social media data

Solution

Improve churn prediction using social media– Analyze social media on its own or with current warehouse/BI analytics to predict churn quicker (real-

time) and more accurately– BigInsights Text Analytics is the key to finding new analytics and Streams for RT alerts

Discover ARPU opportunities directly from social media– New source of customer intent and sentiment will drive new revenue opportunities– Real time feedback to marketing systems or warehouse/BI to place offers quickly– Finding ready-to-buy customers

Problem

Page 25: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Real Time CDR Analytics and Ingest

Gathering CDR’s, mediating them into relevant data, and moving them to analytical systems is slow and costly By the time CDR data is mediated and ingested by data warehouses, the ability to respond to problems is significantly

reduced.

Systems tend to be old and require extensive application maintenance and hardware

Cannot achieve real time billing, requires handling billions of CDRs per day, and de-duplication against 15 days worth of CDR data

Solution

Big Data Streams Telecommunications Mediation and Analytics (TMA) offering – Real-time CDR processing– Real-time analytics and dashboard– Unparalleled price/performance benefits– Connectors to Warehouse and BigInsights

Real-Time dashboards include:– Dropped calls by high priority customers, location, providers, etc– Terminated calls by location and customer type– Revenue monitoring by voice and SMS

The solution will enable novel Business Intelligence applications

Problem

Page 26: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

CDR Analytics with Extended Data

Telecom is experiencing an explosion of data from 3G and LTE (4G) network traffic. CDR’s are almost only used for billing systems because storing and analyzing them was too expensive with EDW and BI alone.

Competition driving the need for focus on: customer retention

customer profitability

No connection between CDR, Web, and other data making everything from fraud detection to targeted marketing to ad optimization difficult and expensive

Solution

Problem

BigInsights for cost effective store of original data and large-scale text analytics– Stores data unstructured and non-typed ingested with no data model– Discovery and Analytics tools are built into BigInsights – Machine Learning extensions– Integration to Netezza and DB2. JDBC to other data bases

Big Data Streams Telecommunications Mediation and Analytics (TMA) offering – Real-time CDR processing can be extended to other data sources – fast and low cost

Netezza integration opens Big Data solutions to warehouse and BI – Deep analytics and model development– Can act as a high performance operational data store

Page 27: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Ad Effectiveness Analysis with Social Media

Telecom and Media spend large sums of money on advertising. Measuring the effectiveness of the Ads difficult and almost impossible online without costly services Service providers are slow with responses and expensive

Current ad analysis is mostly guesswork and intuition – not lending itself to timely decisions

Enterprises are demanding better ROI from ad budgets and proof of effectiveness of each ad campaign To increase effectiveness, enterprises have to react in near-real-time

Solution

Problem

BigInsights used for social media ingest and fast analysis– Answers questions like what was the awareness, who did we reach, and what was the reaction to an

ad in a few hours vs weeks– Offers ad departments to react: modify, localize, and focus

Streams for real-time ad analysis extending predictive models for fast reaction React very quickly to ad effectiveness

1. Adjust ad budgets2. Tailor ad’s to geography 3. Alter messaging4. Adjust targeted and direct marketing initiatives

Page 28: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Why IBM for Big Data

The Solution Side

Page 29: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

The IBM Big Data Platform

InfoSphere BigInsights Hadoop-based low latency

analytics for variety and volume

IBM Netezza High Capacity Appliance

Queryable Archive Structured Data

IBM Netezza 1000BI+Ad Hoc

Analytics on Structured Data

IBM Smart Analytics System

Operational Analytics on Structured Data

IBM InfoSphere Warehouse

Large volume structured data analytics

InfoSphere StreamsLow Latency Analytics for

streaming data

MPP Data Warehouse

Stream ComputingInformation Integration

Hadoop

InfoSphere Information ServerHigh volume data integration and

transformation

Page 30: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

A Big Data Platform

Embrace and ExtendAnalytics Excellence Text Analytics ToolkitMachine Learning ToolkitIndustry Accelerators Development Tooling Visualization ToolingDeployment Tooling (“App Store”)$14B in 5 yrs. on Analytics +++

At-Rest Operational Excellence

Harden Hadoop - GPFSSurface Area Lock Down

Policy Driven Retention & ImmutabilityRole-Based Security

Adaptive MapReduceWorkload Manager

Fast Splittable CMX Compression

REST-exposed Administration +++

In-MotionAnalyze extreme amounts of

data in milliseconds

Uses same analytics as BigInsights

Data can be analyzed on the way into the enterprise for earlier pattern

detection

At-RestBeyond traditional

structured data BigInsights uses same analytics as Streams

No forked, not ported: Hadoop Extended with operational excellence and security

Netezza for in-database MapReduce

MPP Data Warehouses

Open Source HadoopIBM Big Data

Platform

In-Motion Operational Excellence

Unrivalled….

Page 31: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Continuous Ingestion Continuous Queries /Analytics on data in motion

Stream Computing: A new paradigm for ultra low latency and high throughput in-motion analytics

Page 32: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Data In Motion

Information used to be aggregated and analyzed every 30-60 minutes and discarded after 72 hours

Analyzing 1000 pieces of unique medical diagnostic information per/sec. and stored in a dynamic model

Perspective: 20% drop in mortality of control group in trials (extend approach to daily activities)

- 120 children monitored:120K messages/sec…billions/day

Page 33: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Data In Motion

Hear what’s going on miles away to optimize perimeter displacements

Perspective: Try to find the word “Zero” in a 1000 MP3 song library in a fraction of a second

– Figure out the difference between the sound of a human whisper and the wind

Page 34: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Data In Motion – Improving What They Already Have

Old Microsoft-based solution not able to keep up with new 3G demands for their real-time xDR analysis business requirements

Streams and Netezza solution proposed– Time to merge and load data reduced 90%+– Time to market for new products from 4 hours to minutes

Internal Use Only Reference

Page 35: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

How Text Analytics Works

Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half,

Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win.

NetherlandsStrikerArjen Robben

Keeper SpainIker Casilas

WingerAndres Iniesta Spain

World Cup 2010 Highlights

Page 36: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

IBM Text Analytics Toolkit Lets You…

Build out world-class text analysis applications 50% faster than manual method

Run faster text analysis (10x or more vs. some marketplace alternatives)

Get more precise and correct answers (2x vs. some marketplace alternatives)

Page 37: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Browser-based Big Data analytics tool for business users Big Data Challenges…

Business users need a no programming approach for analyzing Big Data

Extremely difficult to find actionable business insights in data from multiple sources with different formats

Translating untapped data into actionable business insights is a common requirement that requires visualization

What is BigSheets?

How can BigSheets help? Spreadsheet-like discovery interface lets

business users easily analyze Big Data

with ZERO PROGRAMMING

BUILT-IN “readers” can work with data in several common formats

– JSON arrays, CSV, TSV, Web crawler output, . . .

Users can VISUALLY combine and explore various types of data to identify “hidden” insights

Page 38: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Page 39: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Big Data Made Easy for the Little Guy

USC’s Film Forecaster correctly predicted a clamor for "Hangover 2” that resulted in $100 million opening over Memorial Day weekend– Looked at 250K-500K Tweets and broke down

positive and negative messages using a lexiconof 1700 words

The Film Forecaster sounds like a big undertaking for USC, but it really came down to one communications masters student who learned Big Sheets in a day, then pulled in the tweets and analyzed them - Ryan Kim

Page 40: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Why IBM for Big Data?

Only IBM is showing data-in-motion and data-at-rest analytics: a bigger more opportunistic view of Big Data

Development and research sit side by side

Virtualization tooling, development, file system, analytics

Not just same company: same org, same people, same leadership

BigInsights being used in IBM products today such as Cognos Consumer Insight

Page 41: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential

Without a Big Data PlatformYou Code…

IBM Big Data Platform

Streams provides development, deployment, runtime, and infrastructure services

“TerraEchos developers can deliver applications 45% faster due to the agility of Streams Processing Language…”– Alex Philip, CEO and President

Multithreading

Custom SQLand

Scripts

PerformanceOptimization

Debug

ApplicationManagement

EventHandling

Connectors

CheckPointing

Security

HAAccelerators

and

Toolkits

Over 100 sample applications and toolkits with industry focused toolkits

with 300+ functions and operators!

Page 42: Ibm swg day 2012 jhb big data (white)

© 2011 IBM Corporation IBM Confidential42

THINK

https://w3-connections.ibm.com/wikis/home?lang=en_US#/wiki/Info%20Mgmt%20Client%20Technical%20Professional%20Resources%20Wiki/page/Understanding%20Big%20Data