62
Raul F. Chong Senior Big Data and Cloud Program Manager Big Data University Community Leader [email protected] A holistic approach to Big Data © 2013 BigDataUniversity.com

A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Raul F. ChongSenior Big Data and Cloud Program ManagerBig Data University Community [email protected]

A holistic approach to Big Data

© 2013 BigDataUniversity.com

Page 2: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 3: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 4: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Big Data Adoption Phases

Page 5: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

What is your Big Data source?

“What type of data/records are you planning to analyze

using big data technologies?”

“What type of data/records are you planning to analyze

using big data technologies?”

Page 6: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Multiple responses accepted

“What type of data/records are you planning to

analyze using big data technologies?”

“What type of data/records are you planning to

analyze using big data technologies?”

What is your Big Data source?

Page 7: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

What do you want to do with the Big Data collected?

“What kind of analytics do you want to

perform on this big data?”

“What kind of analytics do you want to

perform on this big data?”

Page 8: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Multiple responses accepted

“What kind of analytics do you

want to perform on this big data?”

“What kind of analytics do you

want to perform on this big data?”

What do you want to do with the Big Data collected?

Page 9: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Use of Big Data globally and in the financial sector

Multiple responses accepted

Page 10: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 11: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

11

KTH Swedish Royal Institute of Technology Reducing Traffic Congestion

• Deployed real-time Smarter Traffic system to predict and improve traffic flow.

• Analyzes streaming real-time data gathered from cameras at entry/exit to city, GPS data from taxis and trucks, and weather information.

• Predicts best time and method to travel such as when to leave to catch a flight at the airport

Results• Enables ability to analyze and predict traffic

faster and more accurately than ever before

• Provides new insight into mechanisms that affect a complex traffic system

• Smarter, more efficient, and more environmentally friendly traffic

11

Page 12: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Benefits Real-time display of public sentiment as

candidates respond to questions

Debate winner prediction based on public opinion instead of solely political analysts

University of Southern California Innovation Lab Monitors Political Debates

Page 13: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Big Data – A holistic approach

Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable:

– Cyber security, Stock market, Traffic control, Sensor information, monitoring trends in Social Media

– What if your company has many silos of information, difficult to move to HDFS?

– What about governance? Can we trust the source of this data?

Page 14: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

Big data holistic approach: A platform

Page 15: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

The IBM Big Data Platform

Delivers deep insight with advanced in-database analytics & operational analytics

Data Warehouse

Data Warehouse

Big data holistic approach: A platform

Page 16: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

Stream Computing

Data Warehouse

Analyze streaming data and large data bursts for real-time insightsStream

Computing

Big data holistic approach: A platform

Page 17: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

The IBM Big Data Platform

HadoopSystem

Stream Computing

Data Warehouse

Cost-effectively analyze Petabytesof unstructured and structured data

HadoopSystem

Big data holistic approach: A platform

Page 18: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

18

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

Govern data quality and manage the information lifecycle

Information Integration & Governance

Big data holistic approach: A platform

Page 19: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

Speed time to value with analytic and application accelerators

Accelerators

Big data holistic approach: A platform

Page 20: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

Systems Management

Application Development

Visualization & Discovery

The IBM Big Data Platform

Discover, understand, search, and navigate federated sources of big data

Visualization & Discovery

Big data holistic approach: A platform

Page 21: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Process any type of data

– Structured, unstructured, in-motion, at-rest, in-place

Built-for-purpose engines

– Designed to handle different requirements

Manage and govern data in the ecosystem

Enterprise data integration

Grow and evolve on current infrastructure

The whole is greater than the sum of parts Integrated components

Out of the box, standards-based services

Start small (value is additive)

21

Solutions

Big Data Platform

Analytics and Decision Management

Big Data Infrastructure

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

Systems Management

Application Development

Visualization & Discovery

Big data holistic approach: A platform

Page 22: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

ETL, MDM, Data Governance

Metadata and Governance Zone

Warehousing Zone

Enterprise Warehouse

Data Marts

Ingestion and Real-time Analytic ZoneStreams

Connectors

BI & Reporting

PredictiveAnalytics

Analytics and Reporting Zone

Visualization & Discovery

Landing and Analytics Sandbox Zone

Hive/HBaseCol Stores

Documentsin variety of formats

MapReduce

Hadoop

An example of the big data platform in practice

Page 23: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 24: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Big Data ExplorationFind, visualize, understand all big data to improve business knowledge

Enhanced 360o Viewof the CustomerAchieve a true unified view, incorporating internal and external sources

Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time

Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency

Operations AnalysisAnalyze a variety of machinedata for improved business results

The 5 High Value Big Data Use Cases

Page 25: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Find, visualize and understand all big data to improve business knowledge• Greater efficiencies in

business processes

• New insights from combining and analyzing data types in new ways

• Develop new business models with resulting increased market presence and revenue

CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems

ConnectorFramework

App Builder

Hadoop

Integration & Governance

UI / User

Streams

Big Data Exploration: Illustrated

WarehouseData Explorer

Page 26: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Big Data Exploration: Example in Practice

• Exploring 4 TB to drive point business solutions (supplier portal, call center, etc.)

• Single-point of data fusion for all employees to use• Reduced costs & improved operational performance for the business

How do you enable employees to navigate and explore enterprise and external content? Can you present this in a single user interface?

How do you identify areas of data risk before they become a problem?

What is the starting point for your big data initiatives?

Is Big Data Exploration Right for You? How do you separate the “noise” from useful

content?

How do you perform data exploration on large and complex data?

How do you find insights in new or unstructured data types (e.g. social media and email)?

Airplane ManufacturerBlinded for confidentiality

Big Data Platform Component Starting Point: Data Explorer

Page 27: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Enhanced 360º View of the Customer: Illustrated

CRMJ Robertson

Pittsburgh, PA 15213

35 West 15th

Name:

Address:

Address:

ERPJanet Robertson

Pittsburgh, PA 15213

35 West 15th St.

Name:

Address:

Address:

LegacyJan Robertson

Pittsburgh, PA 15213

36 West 15th St.

Name:

Address:

Address:

SOURCE SYSTEMS

Janet

35 West 15th St

Pittsburgh

Robertson

PA / 15213

F

48

1/4/64

First:

Last:

Address:

City:

State/Zip:

Gender:

Age:

DOB:

360 View of Party Identity

MasterDataManagement

Unified View of Party’s InformationHadoop Streams Warehouse

Page 28: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

• Advertisements• Promotions• Campaigns• Planning

• Preferred Styles• Designs• Products• Interests

• Pins / Re-pins• Likes / Dislikes• Tweets• Favorites

Photo Albums and Pinboards

Style Kitchen Gallery

Dream Home Wedding

• Photo Semantic Analysis

• User Segmentation

ComputerConsumer

Retailers, Marketers and Planners

28

Enhanced 360º View of the Customer: Insight from user’s photos

Page 29: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Enhanced 360º Customer View: Customer Example

• Increase revenue and decrease cost in the call center• Increase customer & employee satisfaction• Leverage new data types in customer analysis

How are you driving consistency across your information assets when representing your customer, clients, partners etc.?

How do you deliver a complete view of the customer enhance to your line of business users to ensure better business outcomes?

Is the Enhanced 360º Customer View Right for You? How do you identify and deliver all data as it relates to

a customer, product, competitor to those to need it?

How do you gather insights about your customers from social data, surveys, support emails, etc.?

How do you combine your structured and unstructured data to run analytics?

Big Data Platform Component Starting Point: Data Explorer, Hadoop

Blinded for confidentialityLeading Medical Equipment Supplier

Page 30: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

LogsEvents Alerts

Configuration information

System audit trails

External threat intelligence feeds

Network flows and anomalies

Identity context

Web pagetext

Video/audio surveillance

E-mail andsocial activity

Business process data

Customertransactions

Traditional Security Operations and Technology

Big Data Analytics

New ConsiderationsCollection, Storage and Processing

Collection and integrationSize and speedEnrichment and correlation

Analytics and Workflow

VisualizationUnstructured analysisLearning and predictionCustomizationSharing and export

Security/Intelligence Extension: Illustrated

Page 31: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

“Reconstructing Events” – Integrating Multimedia from Diverse Sources

• Correlate multimedia content across a wide diversity of sources and dynamic topology of cameras

• Exploit partial overlaps in field of view, re-identification of objects/people and contextual information

• Obtain real-time operational picture across diverse content• 100K security cameras (static cameras, slowly changing topology)

• 10M mobile photos/day (limited knowledge about locations)• 50M social media photos/video (uncertain geo-temporal context)• Moving vehicles (patrol cars), overhead drones, broadcast, retail, 311, etc.

Overhead

Social MediaMobile Cameras

Security Cameras

31

Page 32: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Security/Intelligence Extension: Customer Example

What are your plans to enrich your security or intel system with unused or underleveraged data sources (video, audio, smart devices, network, Telco, social media)?

How will you address the need sub second detection, identification, resolution of physical or cyber threats?

How do you intend to follow activities of criminals, terrorists, or persons in a blacklist?

How do you plan to enhance your surveillance system with real-time data from video, acoustic, thermal or other security sensors?

Do you want to correlate lots of technical or human intel data and sources looking for associations or patterns (big data forensics)?

How are you going to deal with unstructured data (email, social, etc.) in your Security Information & Event Management (SIEM) solution to improve cyber threat detection & remediation?

Would the Security / Intelligence Extension benefit you?

Captured and analyzed 42TB of daily traffic in real-time for tracking persons of interest to take suitable action and reduce risk.

Big Data Platform Component Starting Point: Streams, Hadoop

Page 33: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Raw

Log

s an

d M

achi

ne D

ata

Indexing, Search

Statistical Modeling

Root Cause Analysis

Federated Navigation & Discovery

Real-time Analysis

Only storewhat is needed

Operations Analysis: Illustrated

Machine DataAccelerator

Page 34: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

1 http://www.information-management.com/infodirect/2009_133/downtime_cost-10015855-1.html2 http://www.itchannelplanet.com/business_news/article.php/3916786/IT-System-Downtime-Costs-265-Billion-A-Year-Study-Finds.htm

Operations analysis is a Business Imperative

Cost of System Down Time– 49 percent of Fortune 500 companies experience > 80 hours of system down time/year1

• Cost of down time varies between $90,000/hr to $6.48 million/hr• 80 hours * $6.48M = approx $500M per year

– System downtown costs North American businesses $26.5 billion a year in lost revenue2

Page 35: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Operations Analysis: Customer Example

• Intelligent Infrastructure Management: log analytics, energy bill forecasting, energy consumption optimization, anomalous energy usage detection, presence-aware energy management

• Optimized building energy consumption with centralized monitoring; Automated preventive and corrective maintenance

• Utilized InfoSphere Streams, InfoSphere BigInsights, IBM Cognos

Do you deal with large volumes of machine data? How do you access and search that data? How do you perform root cause analysis?

How do you perform complex real-time analysis to correlate across different data sets?

How do you monitor and visualize streaming data in real time and generate alerts?

Would Operations Analysis benefit you?

Big Data Platform Component Starting Point: Hadoop, Streams

Page 36: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Integrate big data and data warehouse capabilities to increase operational efficiency

Data Warehouse Augmentation: Needs

Need to leverage variety of data Extend warehouse infrastructure• Optimized storage, maintenance and licensing

costs by migrating rarely used data to Hadoop• Reduced storage costs through smart

processing of streaming data• Improved warehouse performance by

determining what data to feed into it

• Structured, unstructured, and streaming data sources required for deep analysis

• Low latency requirements (hours—not weeks or months)

• Required query access to data

Page 37: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Filter and summarize big data for the warehouse

Hadoop

Data Warehouse Augmentation: Illustrated

Page 38: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Hadoop as a query-ready archive for a data warehouse

Hadoop

Data Warehouse Augmentation: Illustrated

Page 39: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Data Warehouse Augmentation: Customer Example

Are you drowning in very large data sets (TBs to PBs) that are difficult and costly to store?

Are you able to utilize and store new data types?

Are you facing rising maintenance/licensing costs?

Do you use your warehouse environment as a repository for all data?

Improved analysis performance by over 40 times, reduced wait timefrom hours to seconds, and increased campaign effectiveness by 20+%.

Do you have a lot of cold, or low-touch, data driving up costs or slowing performance?

Do you want to perform analysis of data in-motion to determine what should be stored in the warehouse?

Do you want to perform data exploration on all data? Are you using your data for new types of analytics?

Could Data Warehouse Augmentation benefit you?

Big Data Platform Component Starting Point: Hadoop, Streams

Page 40: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 41: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

© 2013 BigDataUniversity.com

Sentiment Analysis using IBM Text Analytics (Basic example)

Page 42: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

© 2013 BigDataUniversity.com

Sentiments for movie Ra.One :-(

Page 43: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

© 2013 BigDataUniversity.com

Sentiments for movie Swades :-)

Page 44: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

© 2013 BigDataUniversity.com

Architecture Diagram

AQLAQL Text AnalyticsOptimizer

Text AnalyticsOptimizer

Text AnalyticsRuntime

Text AnalyticsRuntime

CompiledOperator

Graph (.aog)

CompiledOperator

Graph (.aog)

Rule language with familiar SQL-like syntax

Specify annotator semantics declaratively

Rule language with familiar SQL-like syntax

Specify annotator semantics declaratively

Choose an efficient

execution plan that implements the semantics

Choose an efficient

execution plan that implements the semantics

Highly scalable, embeddable Java runtime

Highly scalable, embeddable Java runtime

InputDocumentStream

AnnotatedDocumentStream

Page 45: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

continuous ingestion Continuous ingestion Continuous analysis

How Streams Works

Page 46: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Achieve scale:By partitioning applications into software componentsBy distributing across stream-connected hardware hosts

Infrastructure provides services forScheduling analytics across hardware hosts, Establishing streaming connectivity

TransformFilter / Sample

ClassifyCorrelate

Annotate

Where appropriate: Elements can be fused togetherfor lower communication latency

Continuous ingestion Continuous analysis

How Streams Works

Page 47: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Scalable Stream Processing

Streams programming model: construct a graph

– Mathematical concept• not a line -, bar -, or pie chart!• Also called a network• Familiar: for example, a tree structure is a graph

– Consisting of operators and the streams that connect them• The vertices (or nodes) and edges of the mathematical graph• A directed graph: the edges have a direction (arrows)

Streams runtime model: distributed processes– Single or multiple operators form a Processing Element (PE)– Compiler and runtime services make it easy to deploy PEs

• On one machine• Across multiple hosts in a cluster when scaled-up processing is required

– All links and data transport are handled by runtime services• Automatically• With manual placement directives where required

OP

OP

OP

OP

OP

OP

OPstream

Page 48: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

From Essential Elements to Running Jobs

Streams application graph:– A directed, possibly cyclic, graph– A collection of operators– Connected by streams

Each complete application is a potentially deployable job

Jobs are deployed to a Streams runtime environment, known as a Streams Instance (or simply, an instance)

An instance can include a single processing node (hardware)

Or multiple processing nodes

Streams instance

OP

OP

Src

Src

Sink

Sink

OPstream

h/w node

node nodenode

nodenode node

node

Page 49: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Streams Runtime Illustrated

x86 host x86 host x86 host x86 host

Optimizing scheduler assigns jobs to hosts, and continually manages resource allocation

Optimizing scheduler assigns jobs to hosts, and continually manages resource allocation

Commodity hardware – laptop, blades or high performance clustersCommodity hardware – laptop, blades or high performance clusters

MetersCompany Filter

Usage Model

Usage Contract

Text Extract

Season Adjust

Daily Adjust

Temp Action

Page 50: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Streams Runtime Illustrated

x86 host x86 host x86 host x86 host x86 host

Optimizing scheduler assigns PEsto hosts, and continually manages resource allocation

Optimizing scheduler assigns PEsto hosts, and continually manages resource allocation

Commodity hardware – laptop, blades or high performance clustersCommodity hardware – laptop, blades or high performance clusters

MetersCompany Filter

Usage Model

Usage Contract

Temp Action

Dynamically add hosts and jobsDynamically add hosts and jobs

New jobs work with existing jobsNew jobs work with existing jobs

Text Extract

Degree History

Compare History Store

History

Meters

Season Adjust

Daily Adjust

Text Extract

Page 51: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Streams Runtime Includes High Availability

x86 host x86 host

MetersCompany Filter

Usage Model

Meters

x86 host

A PE failing on one host can be moved automatically to another; communications are automatically rerouted

A PE failing on one host can be moved automatically to another; communications are automatically rerouted

PEs on busy hosts can be moved manually by the Streams administrator

PEs on busy hosts can be moved manually by the Streams administrator

Usage Contract

x86 host x86 host

Text Extract

Degree History

Compare History Store

History

Text Extract

Temp Action

Season Adjust

Daily Adjust

Page 52: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Social Data Analytics Accelerator Architecture

Data Ingestand Prep

Extract Buzz, Intent ,

Sentiment

Entity Analytics:

Profile Resolution

Real time analytics. Pre-defined views

and charts

Dashboard

Stream Computing and Analytics

BigInsights System and Analytics

Online flow: Data-in-motion analysis

Offline flow: Data-at-rest analysis

Pre-defined Workbooks and

Dashboards

Social Media Data

Extract Buzz, Intent ,

Sentiment And Consumer

Profiles

Entity Analytics and

Integration

Comprehensive Social Media

Customer Profiles

Social Media

Optional: Indexed Search

Index using Push API

Data Explorer

Ad hoc access

Page 53: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Social Data Analytics Accelerator

Page 54: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Business requirement– Improve ability to understand, correct and anticipate outages

Solution Overview– Provide faceted search across log records from multiple systems to find events – Link and correlate events across systems– Discover interesting patterns

Solution Detail– BigInsights applications for

• Import, Extract, Transform, Analyze, Visualize

Machine Data Analytics Accelerator – Preventing outages

Data Scientist End UserData

Administrator

Import Logs Transform Analyze VisualizeExtract

Page 55: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 56: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

The Future of Big Data and Cloud

SQL for Hadoop support improvements – towards full ANSI support

Hive

Impala (Cloudera)

Big SQL (IBM)

Stinger (Hortonworks)

Drill (MapR)

HAWQ (Pivotal)

SQL-H (Teradata)

Improvements in Multimedia Analytics

Growth in usage and adoption of R programming language

Cloud Bare metal support helping with Hadoop workloads

Private network

Full support with APIs

Page 57: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 58: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Agenda

The state of Big Data adoption

Big Data – A holistic approach

The 5 high value Big Data use cases

Technical details of key Big Data components

The future of Big Data and Cloud

Demos

Resources

Page 59: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

BigInsights on the Cloud - Making Learning Hadoop Easy and Fun Flexible on-line delivery allows

learning @your place and @your pace

Free courses, free study materials.

Cloud-based sandbox for exercises – zero setup with Robust Course Management System and Content Distribution infrastructure

108,000 registered students.

Free IBM Hadoop, BigInsights Publications

Big Data University (bigdatauniversity.com)

Page 60: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

BigInsights on the Cloud - Making Learning Hadoop Easy and FunQuick Start Editions available (Free, non-

production, no time bomb):

– IBM InfoSphere BigInsights (IBM’s Hadoop Distribution)ibm.co/QuickStart

– IBM InfoSphere Streamsibm.co/streamsqs

Big Data University (bigdatauniversity.com)

Page 61: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

61

My contact information

Contact Info:Email: [email protected]

Twitter: @raulchong

Facebook: facebook.com/raul.f.chong

LinkedIN: linkedin.com/pub/raul-f-chong/8/aa2/b63

My contact information

Page 62: A holistic approach to Big Data · 19/09/2013  · Big Data – A holistic approach Big Data is Not Only Hadoop! Examples where Hadoop is not entirely applicable: – Cyber security,

Thank You!

© 2013 BigDataUniversity.com