30
IBM BIG Data Plattform Ralph Behrens Client Technical Professional Big Data Certified Netezza Specialist IBM Software Group Deutschland

IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

IBM BIG Data Plattform

Ralph BehrensClient Technical Professional Big DataCertified Netezza SpecialistIBM Software Group Deutschland

Page 2: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation2

“Data is the New Oil”

2

“ Data is the new Oil.

Data is just like crude. It’s valuable,

but if unrefined it cannot really be used.”

– Clive Humby, DunnHumbyWE'RE A CUSTOMER SCIENCE COMPANY

Page 3: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Entdecken

� Einfache Navigieren und

Visualisieren aller internen und

externen Daten als Einstieg in die

Big Data Welt.

Analysieren

� Den Informationsgehalt aller

relevanten strukturierten oder

unstrukturierten Daten vergleichen

und analysieren.

Verstehen

� Korrelationen und Kombinationen

der Information aufdecken um

bessere Entscheidungen zu treffen

Das Verständnis der Daten ist entscheidend

Page 4: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

IBM Big Data & Analytics Reference Architecture

All Data Sources

Advanced Analytics/

New Insights

CognitiveLearn Dynamically?

PrescriptiveBest Outcomes?

PredictiveWhat Could Happen?

DescriptiveWhat Has Happened?

Exploration and DiscoveryWhat Do You Have?

Streaming Data

Text Data

Applications Data

Time Series

Geo Spatial

Relational

Social Network

Video & Image

New/Enhanced

Applications

Automated Process

Case Management

Analytic Applications

Watson

Cloud Services

ISV Solutions

Alerts Fraud

Big Data Platform Capabilities

• Information platform

• Real-time Analytics

• Warehouse & Data Marts

• Analytic AppliancesIn

form

ati

on

In

teg

rati

on

Landing Zone

Data Exploration

Archive

Real-timeAnalytics

Information Governance, Security and

Business Continuity

Information Governance, Security and

Business Continuity

EDW

Data Marts

Open Architecture/

Multiple Product Entry Points

Open Architecture/

Multiple Product Entry Points

Page 5: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation5

PureData Systems

� Expert integrated

systems to make deep

and operational

analytics faster &

simpler

Solutions

Analytics and Decision Management

IBM Big Data Platform

Data Warehouse

Trend #1Appliances

Big Data Infrastructure

IBM Big Data PureData Systems

Page 6: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation6

Powered by Netezza technology

Meeting Big Data Challenges – Fast and Easy!

IBM PureData Systems overview

System for Analytics

System for Operational Analytics

DB2 pureScale powered by System-P or System-X

DB2 powered by System-X

System for Transactions

For apps like E-commerce:

Database cluster services optimized for

transactional throughput and scalability

For apps like Customer Analysis:

Data warehouse services optimized for

high-speed, peta-scale analytics and simplicity

For apps like Real-time Fraud Detection:

Operational data warehouse services

optimized to balance high performance

analytics and real-time operational throughput

Page 7: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation7

PureData for Analytics - Model N2001

� User Data Capacity: 192 TB*� Data Scan Speed: 450 TB/hr*� Load Speed (per system): 5+ TB/hr

� Power Requirements: 7.5 kW� Cooling Requirements: 27,000 BTU/hr� Footprint: 65x110x222 cm /1282 kg

* Assuming 4X compression

2 IBM x3650-M3 Hosts

� 2x 6-Core Intel 3.46 GHz CPUs

� Active-Passive Mode

7 IBM HX5 S-Blades™

� 2x Intel 8 Core 2+ GHz CPUs

� New Netezza BPE4 Side Car

� 2x 8-Engine Xilinx Virtex-6 FPGAs

� 128 GB RAM + 8 GB slice buffer

12 IBM EXP3000 Disk Enclosures

� 288 x 600 GB SAS2 Drives (240 for

User Data, 14 for S-Blades, 34 Spare)

� RAID 1 Mirroring

• All components

are fully redundant

and able to have

their workload

redistributed to a

set of alternate

components.

• Loss of a blade,

any storage

component, even

the host system

that serves as the

primary interface

will not prevent the

system from

functioning.

• Linux 64-bit Kernel

Page 8: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation*Unofficial customer test, **Exadata with/out SSD

Appliance = Increase Data Center EfficiencyWith Faster, More Efficient Systems

PureData usesLess Power

than other systems1

PureData hasMore Capacity

than other systems 2,3

PureData has“Out of the box“ Faster Scan Rates

than other systems

PureData for Analytics - Model N2001

8

Page 9: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

IBM Platform for Big Data: BigInsights

InfoSphereBigInsights

� Enterprise-grade

Hadoop system

enhanced with

advanced text

analytics, data

visualization, tools, &

performance features

for analyzing massive

volumes of structured

and unstructured

data.

IBM Big Data Platform

HadoopSystem

Data Warehouse

Trend #2Analytical

Intelligence on cheap standard

HW

Solutions

Analytics and Decision Management

Big Data Infrastructure9

Page 10: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

� Scalable– New nodes can be added

on the fly

� Affordable – Massively parallel computing on

commodity servers

� Flexible – Hadoop is schema-less, and can

absorb any type of data

� Fault Tolerant – Through MapReduce

software framework

� Performance & reliability– Adaptive MapReduce, Compression,

Indexing, Flexible Scheduler, H

� Enterprise Hardening of Hadoop

� Productivity Accelerators– Web-based Uis and tools

– End-user visualization

– Analytic Accelerators, H.

� Enterprise Integration – To extend & enrich your information

supply chain

� SQL Interface

IBM Enriches Hadoop

10

Page 11: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Key Features and Specifications

Key Features

Hadoop Distribution − InfoSphere BigInsights V2.1

Built-in Analytics/Accelerators − IBM BigSheets

− IBM Accelerator for Text Analytics

− IBM Accelerator for Social Data

− IBM Accelerator for Machine Data

− IBM Big SQL

Development / Administration − Eclipse-based Development Environment

− Exposed Node Management

Enterprise Readiness − Security

− High Availability SW & HW

− Hardware management & monitoring

Data Warehouse Integration − Enterprise data warehouse connectors

− Archival capabilities

Specifications Full Rack

Management Nodes 1 primary, 1 standby (x3550 M4)

Data Nodes 18 (x3630 M4)

CPU Cores 216

Memory 96 GB per node, 1728 GB total

Raw Storage 216 drives, 3 TB each. 648 TB total

User Space 216 TB

11

Page 12: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Benefits of IBM PureData System for Hadoop

1Based on IBM internal testing and customer feedback. "Custom built clusters" refer to clusters that are not professionally pre-built, pre-

tested and optimized. Individual results may vary.2Based on current commercially available Big Data appliance product data sheets from large vendors. US ONLY CLAIM.

Accelerate Big Data Time to Value

Accelerate Big Data Time to Value

Simplify Big DataAdoption &

Consumption

Simplify Big DataAdoption &

Consumption

Implement Enterprise-

Class Big Data

Implement Enterprise-

Class Big Data

• Deploy 8x Fasterthan custom-built solutions1

• Built-in Visualizationto accelerate insight

• Built-in Analytic Accelerators2

unlike big data appliances on the market

• Single System Consolefor full system administration

• Rapid Maintenance Updates with automation

• No Assembly Required data load ready in hours

• Only Integrated Hadoop Systemwith Built-in Archiving Tools2

• Delivered with More Robust Securitythan open source software

• Architected for High Availability

12

Page 13: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Neue Ansätze fürs Data Warehouse

Use Case - Queryable Archive

� Immediate storage alternative of cold data

� Cost savings for cold data

� Compliance requirements

PureData System for Analytics

PureData System for Hadoop

13

Use Case – do more!

� Using unstructured Data

� Explore new Data

� “Super ETL- Landing-Zone”

� Synchronous analyze the data

(Reporting, PredictionH)

Page 14: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

InfoSphereStreams

� Software enabling

continuous analysis of

massive volumes of

streaming data with

sub-millisecond

response times

IBM Big Data Platform

HadoopSystem

Stream Computing

Data Warehouse

Trend #3Processing

of (machine) data in real-

time

Solutions

Analytics and Decision Management

Big Data Infrastructure

IBM Platform for Big Data: Streams

Page 15: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Search for recent facts

Analysis of the data while moving, before

storage

"Real-Time“-Paradigm, “Push“-Model

Data-driven. Data is brought to the

analysis

Search for historic facts

Find and analyze information stored

“Batch”-Paradigm, “Pull”-Model

Query-driven. Queries are placed on

static data

Traditional DWH Computing Stream Computing

15

Real-time Analytics

Stream Computing: A Paradigm Shift

Page 16: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Streams Analyzes All Kinds of Data

Mining in Microseconds

(included with Streams)

Image & Video (Open Source)

Simple & Advanced Text

(included with Streams)Text(listen, verb),

(radio, noun)

Acoustic

(IBM Research)

(Open Source)

Geospatial

(IBM Research)

Predictive

(IBM Research)

Advanced

Mathematical

Models

(IBM Research)

Statistics

(included with

Streams)

∑population

tt asR ),(

Page 17: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

DB2 10.5 with In-Memory

Acceleration

� The DB2 release of

the latest generation,

which allows the

transition of

conventional

database technology,

to seamlessly

implement in-memory

analysis.

IBM Big Data Platform

HadoopSystem

Stream Computing

Data Warehouse

In-Memory Database

Trend #4In-Memory Databases

Solutions

Analytics and Decision Management

Big Data Infrastructure

IBM Platform for Big Data: DB2 10.5 BLU

Systems

Management

Application

Development

Visualization

& Discovery

Page 18: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

1

Customer Speedup over

DB2 10.1

Large Financial

Services Company 46.8x

Global ISV Mart Workload 37.4x

Analytics Reporting Vendor 13.0x

Global Retailer 6.1x

Large European Bank 5.6x

10x-25x improvement

is common

“It was amazing to see the faster query times compared to the performance

results with our row-organized tables. The performance of four of our

queries improved by over 100-fold! The best outcome was a query that

finished 137x faster by using BLU Acceleration.” - Kent Collins, Database Solutions Architect, BNSF Railway

DB2 10.5 with In-Memory Acceleration: Typical Results

Page 19: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Govern data quality and

manage the information

lifecycle

�InfoSphere Information

Server –Cleanses data,

monitors quality and integrates

big data with existing systems

�InfoSphere Optim –manages business information

throughout its lifecycle

�InfoSphere Master Data

Management – manages and

maintains trusted views of

master and reference data

�InfoSphere Guardium – real-

time database security and

monitoring

IBM Big Data Platform

HadoopSystem

Stream Computing

Data Warehouse

In-Memory Database

Information Integration

& Governance

Solutions

Analytics and Decision Management

Big Data Infrastructure

IBM Platform for Big Data: Information Governance

Systems

Management

Application

Development

Visualization

& Discovery

MustHave

IntegrationAnd

Security

Page 20: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Speed time to value with

analytic and application

accelerators

�Analytic

Accelerators – text

analytics, geospatial,

time-series, data

mining

�Application

Accelerators –

financial services,

machine data, social

data, Telco event

data

�Industry Models

- comprehensive data

models based on

deep expertise and

industry best practice

IBM Big Data Platform

Accelerators

HadoopSystem

Stream Computing

Data Warehouse

In-Memory Database

Information Integration

& Governance

Big Data Infrastructure

IBM Platform for Big Data: Accelerators

Solutions

Analytics and Decision Management

Page 21: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Capabilities

BLOGS

DISCUSSION FORUMS

TWITTER

NEWSGROUPS

FACEBOOK

Source Areas

� Dimensional analysis and filtering

� Tunable sentiment rules

� Detect and predict emerging topics and viral posting patterns

� Discover associated themes

SENTIMENT

EVOLVING TOPICS

� Ad-Hoc keyword searches

� Automatic detection changes ““““consumer vocabulary””””

� Relationship heat-maps to understand affinity

� Quantify strength of affinity

COMPREHENSIVE ANALYSIS

AFFINITY ANALYTICS

Business Drivers

Customer CareCorporate Reputation

Campaign Effectiveness

Competitive Analysis

Product Insight

MULTILINGUAL

PREDICTIVE ANALYSIS

� Forward-looking detection of discussion topics

� Identify KPPs

� Predict impact of social interaction on business KPI’’’’s

� Predict ability to influence social interaction

Example Big Data Analytics Application: Social Media Analytics

Page 22: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation22

IBM Big Data Platform

Accelerators

HadoopSystem

Stream Computing

Data Warehouse

In-Memory Database

Information Integration

& Governance

Solutions

Analytics and Decision Management

Big Data Infrastructure

IBM Platform for Big Data: Accelerators

Systems

Management

Application

Development

Visualization

& Discovery

Discover, understand,

search, and navigate

federated sources of

big data

�InfoSphere Data

Explorer – Discovery

and navigation

software that provides

real-time access and

fusion of big data with

rich and varied data

from enterprise

applications for

greater insight

Trend #5Search

anddiscover

Page 23: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Leverage the full power of IBM’s Big Data Platform

© 2013 IBM Corporation23

CM, RM, DM RDBMS Feeds Web2.0 Email Web CRM, ERP File Systems

ConnectorFramework

IBM Data Explorer & App Builder

BigInsights

Integration & Governance

UI / User

Streams Warehouse

Data Explorer

Inte

gra

tio

n &

Go

ve

rna

nce

Data access & integration

• Index structured &

unstructured data in place

• Support existing security

• Federate to external

sources

• Leverage MDM,

governance, and

taxonomies

Discovery & navigation

• Clustering & categorization

• Contextual intelligence

• Easy-to-deploy applications

• All at the scale required for

today’s big data challenges

Page 24: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Tabbed Search (1) für Quellen

basierte Suche.

Alerts (2) um auf Veränderungen im

Kontent hinzuweisen.

Expertise Location (3) um schnell die

richtigen Experten zu finden.

Such Ergebnisse anreichern durch

Ratings (4), Tagging s (5) oder frei

Text.

Suchergebnisse Speichern(6) und

Bookmarken

Schnelles und einfaches finden durch

Text Clustering (7).

Strukturierte Navigation (8), Filterung,

Verteilung von Informationen und

Zusammenarbeit.

Grafische Navigation (9) in Datums-

bereichen oder Häufigkeiten.

Query Expansion (10) Einbindung

von Thesauri oder Suchvorschlägen.

Out-of-the-Box Funktionalitäten

Page 25: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Data Explorer + Analytics = Complete Picture

Enterprise Unstructured Sources

Unstructured DataContent Mgt Systems

Enterprise Systems & Content Stores

Databases Data Warehouse

s

SCM SOA, ESB,Web Service

Each system

has its own

but different

structure

Does not

have any

structureWeb RSS Feed____________

Social Media

20%80%World’s Total Data

Unstructured Structured

Data Explorer

handles the

qualitative on

unstructured info.

Analytics handles

the quantitative on

structured info.

Data Explorer surfaces insights from the unstructured

in context with the analytics.

Significant data cleansing occurs on data collected before being run

through systems like Cognos.

Page 26: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Landing, Exploration& Archive

Security, Governance and Business Continuity

Information Movement, Matching & Transformation

Real-Time Analytics

Landing, Exploration& Archive

IBM End-to-End Big Data & Analytics Portfolio

Analytic Appliances

Enterprise Warehouse

Data Marts

Information

& Insight

Data

Sources

Structured

Operational

Unstructured

External

Social

Sensor

Geospatial

Time Series

Streaming

BI & Performance

Management

Predictive Analytics

& Modeling

Exploration &

Discovery

+ Insures ability to address broader requirements that may be needed now or in the future

+ Apply data security to Big Data (Guardium)

+ Enable a 360° view of all customer related Big Data (MDM)

+ Provide full information integration capabilities for Big Data (Information Server)

+ Integration enables use of existing tools and skills to start leveraging Big Data more quickly

PureData for

Analytics

InfoSphere Data Click, Information Server, MDM, G2

Guardium, Optim

InfoSphere

BigInsights

DB2 BLU,

PureData for

Analytics

PureData for

Operational

Analytics

InfoSphere Streams

SPSS

Cognos

InfoSphere

Data Explorer

Page 27: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Big Data Exploration Enhanced 360o View

of the Customer

Operations Analysis Data Warehouse Augmentation

Security/Intelligence

Extension

Big Data Use Cases

27

Page 28: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Page 29: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

29

Ralph Behrens IBM Deutschland GmbH

Client Technical Professional

IBM Big Data

Wilhelm-Fay-Straße 30-34

65936 Frankfurt

Phone +49 (0) 7034 / 6430680

Mobile +49 (0)172 / 6511333

[email protected]

Page 30: IBM BIG Data Plattform · IBM PureData Systems overview System for Analytics System for Operational Analytics ... – Adaptive MapReduce, Compression, Indexing, Flexible Scheduler,

© 2013 IBM Corporation

Client Reference Base

3

0© 2013 IBM Corporation

Telecom

Other

Digital Media

Financial Services

Health & Life Sciences

Retail / Consumer Products

3

0