17
1 © Cloudera, Inc. All rights reserved. How Hadoop Changes the Analytics Paradigm Kunal Taneja, System Engineering Manager A/NZ

How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

1© Cloudera, Inc. All rights reserved.

How Hadoop Changes the Analytics ParadigmKunal Taneja, System Engineering Manager A/NZ

Page 2: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

2© Cloudera, Inc. All rights reserved.

Why is Big Data Happening Now?

Everything that can be measured will be measured.

Employees and customers expect more personal interactions, but not at the cost of their privacy.

The most innovative companiesembrace experimentationand agility.

Instrumentation Consumerization Experimentation

Page 3: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

3© Cloudera, Inc. All rights reserved.

Example: Instrumentation in Banking

2012 2013 2014 2015 2016 2017 2018 2019

ICB Ring-fencing

ICB Loss Absorbency

Leverage Ratio -Basel III

NSFR – Basel III

MiFID II

T2S

LCR – Basel III

ICB / Competition

Audit Policy

Cross Border Debt Recovery

Financial Transaction Tax

Market Abuse Directive (MAD II)

PRIP

Accounting Directive ReviewAIFM Directive

EU Transparency Directive

EU Reg on Credit Rating Agencies

CRDV

Internal Governance GuidelinesFATCA

PD

EMIR

SWAPS Push Out –Dodd Frank

Securities Law Directive (SLD)

Volker Rule – Dodd Frank

Short Selling

Close Out Netting

Crisis Management

Recovery & Resolution

Effective dates yet to be confirmed

Page 4: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

4© Cloudera, Inc. All rights reserved.

DataSources

DataSystems

DataAccess

BusinessAnalytics

Custom Applications

Existing Data

Databases

Operational Applications

New Data

Traditional Architectures Under Pressure

Limited DataNot efficient to keep existing data, let alone handle new data sources.

Time consuming to transform data for analysis in existing systems.

Limited InsightsPower users struggle with data.

Many users have no data.

Compliance and PrivacyMore data, more users, and more tools create complexity.

Need to balance business agility with security and governance.

Page 5: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

5© Cloudera, Inc. All rights reserved.

Traditional Architectures Under Pressure

Source Systems

Enterprise Data Warehouse

BI Abstraction & Reporting Layer

Data Acquisition Layer

•Extraction&Staging

•Cleansing

ATOMIC Layer

•Normalisation & Storage

Performance & Access

•Transformation & Calculation

• Performance & Access

Dashboard & Reports Ad-hoc Analysis Mobile

E

T

L

Data

Model l ing

Page 6: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

6© Cloudera, Inc. All rights reserved.

Source Systems

Enterprise Data Warehouse

BI Abstraction & Reporting Layer

Data Acquisition Layer

•Extraction&Staging

•Cleansing

Atomic Layer

•Normalisation & Storage

Transformation & Access Layer

•Transformation & Calculation

• Performance & Access

Dashboard & Reports Ad-hoc Analysis Mobile

E

T

L

D

A

T

A

M

O

D

E

L

L

I

N

G

Example - Limited Data “iPhone users more likely to Buy?”

Page 7: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

7© Cloudera, Inc. All rights reserved.

Example - Limited Data

Page 8: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

8© Cloudera, Inc. All rights reserved.

Source Systems

Enterprise Data Warehouse

BI Abstraction & Reporting Layer

Data Acquisition Layer

•Extraction&Staging

•Cleansing

Atomic Layer

•Normalisation & Storage

Transformation & Access Layer

•Transformation & Calculation

• Performance & Access

Dashboard & Reports Ad-hoc Analysis Mobile

E

T

L

D

A

T

A

M

O

D

E

L

L

I

N

G

Example - Limited Data (slow transforms) “What is my VaR?”

8 Hours

12 Hours

2 Hours

CDC

ETL

ETL

Page 9: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

9© Cloudera, Inc. All rights reserved.

Example - Limited Insights

Page 10: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

10© Cloudera, Inc. All rights reserved.

Example - Limited Insights

Page 11: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

11© Cloudera, Inc. All rights reserved.©2014 Cloudera, Inc. All rights reserved.

Expanding Data Requires A New Approach

What we doCopy Data to Applications

What we should doBring Applications to Data

DataInformation-centric

businesses use all Data:

Multi-structured, Internal & external data

of all types

App

App

App

Process-centric businesses use:

• Structured data mainly• Internal data only• “Important” data only• Multiple copies of data

App

App

App

Data

Data

Data

Data

Page 12: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

12© Cloudera, Inc. All rights reserved.©2014 Cloudera, Inc. All rights reserved.

The Old Way: Bringing Data to Applications

Can’t Get a 360 View• Many special-purpose

systems• Moving data around• No complete views

Can’t Retain Valuable Data• Leaving data behind• Risk and compliance• High cost of storage

Can’t Meet ETL SLAs• Up-front modeling• Transforms slow• Transforms lose data

Can’t Ask New Questions• Existing systems strained• No agility• “BI backlog”

4

1

2

3

SERVERSMARTSEDWS DOCUMENTS STORAGE SEARCH ARCHIVE

ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES

Page 13: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

13© Cloudera, Inc. All rights reserved.©2014 Cloudera, Inc. All rights reserved.

The New Way: Bringing Applications to Data

SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE

ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS ESTERNAL DATA SOURCES

Consolidated Architecture• Bring applications to data• Combine different workloads on

common data (i.e. SQL + Search)• True analytic agility

4

1

2

3 4

Active Archive• Full fidelity original data• Indefinite time, any source• Lowest cost storage

1

Scalable Transformations• One source of data for all analytics• Persist state of transformed data• Significantly faster & cheaper

2

Agile Exploration• Simple search + BI tools• “Schema on read” agility• Reduce BI user backlog requests

3

Page 14: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

14© Cloudera, Inc. All rights reserved.

One Platform, Many Workloads

Security and Administration

Process

InsightsSqoop, Flume

TransformMapReduce,

Hive, Pig, Spark

Discover

Analytic DatabaseImpala

SearchSolr

Model

Machine LearningSAS, R, Spark,

Mahout

Serve

NoSQL DatabaseHBase

StreamingSpark Streaming

Unlimited Storage HDFS, HBase

YARN, Cloudera Manager,Cloudera Navigator

A new kind of data platform• One place for unlimited data

• Unified, multi-framework data access

Only with Cloudera:

• Leading performance

• Enterprise system and data management

• Fundamentally secure

• Open source, open standards

Page 15: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

15© Cloudera, Inc. All rights reserved.

SAS on Cloudera Enterprise

• Tight Integration with SAS suite of access engines, big data, and in-memory analytics solutions

• Certified Impala connector to SAS VA delivers the fastest interactive SQL on Hadoop

• Comprehensive data security and governance enable SAS users to innovate with confidence

Page 16: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

16© Cloudera, Inc. All rights reserved.

Joint Customer Successes

With SAS® Visual Analytics, business executives at Telecom Italia can compare the performance between all operators for a key indicator – such as accessibility or percentage of dropped calls – on a single screen for a quick overview of pertinent strengths and weaknesses.

Epsilon built a next-generation marketing application, leveraging Cloudera and taking advantage of SAS® capabilities by our data science/analytics team, that provides its clients with a 360-degree view of their customer

AMERAN provides 360-degree viewsinto energy usage patterns and similar household comparisons to help consumers save energy.

Optimize Discover Empower

Page 17: How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

We Are Hiring in NZ

https://jobs.jobvite.com/cloudera/