22
Data Integration Alternatives Paul Moxon, Senior Director, Product Management

Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

  • Upload
    denodo

  • View
    1.006

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Data Integration AlternativesPaul Moxon, Senior Director, Product Management

Page 2: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Agenda1.Three Key Trends Affecting IT

2.The Logical Data Warehouse

3.Data Integration Layer Alternatives

4.The Logical Data Warehouse Revisited

Page 3: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Three Key Trends Affecting IT

Page 4: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

4

1. Reduce corporate data silos to

gain efficiency and

productivity

2. Towards a common data

backbone for operational and

informational use

3. Enterprises going with

bimodal IT in their

modernization efforts

Three Key Trends

Page 5: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

5

1. Reduce corporate data silos to

gain efficiency and

productivity

2. Towards a common data

backbone for operational and

informational use

3. Enterprises going with

bimodal IT in their

modernization efforts

• Organizational structures create

specialized data and application

silos

• The proliferation of silos has

inhibited access to and the sharing

of data across the organization

• Consolidating and opening up

these silos (while retaining

ownership and control) will

promote efficiency and productivity

Trend I - Consolidation

Page 6: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

6

1. Reduce corporate data silos to

gain efficiency and

productivity

2. Towards a common data

backbone for operational and

informational use

3. Enterprises going with

bimodal IT in their

modernization efforts

• Access to data via logical layer for common and consistent view of data assets

• Example: Customer Data

• All analytics, reports, processes, applications (web, mobile, desktop) should see same customer data

• Is this a Data Lake?

• In reality there will be more than one data lake (separate or refined)

Trend II – Common Data Backbone

Page 7: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

7

1. Reduce corporate data silos to

gain efficiency and

productivity

2. Towards a common data

backbone for operational and

informational use

3. Enterprises going with

bimodal IT in their

modernization efforts

• Bimodal IT has two IT ‘flavors’

• Type 1 – focused on stability and efficiency (traditional IT)

• Type 2 – experimental and agile focused on TTM and rapid app evolution. Aligned with business.

• Some have compared to ‘SoR’ and ‘SoE’ differentiation

• Two need to live side-by-side and interact

• New apps still need data from ‘SoR’

Trend III – Bimodal IT

Page 8: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

8

What Does This Mean?

• A data access layer is needed to ‘open up’ data silos

But retaining local ownership and control of the data

• The access layer must provide access to all data sources and support different

modes of access

Reporting/analytics, real-time applications access (mobile/web and ‘traditional’), etc.

• New technologies will be an important part of the information infrastructure

Hadoop ecosystem, NoSQL, streaming data, “Data Lakes”

• The traditional IT infrastructure is not going away soon

‘Systems of Record’ still needed

• The new and the old need to work together

Newer systems still needs to interact with ‘Systems of Record’

How does this affect the ‘Information Architecture’?

Page 9: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Logical Data Warehouse

Page 10: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

10

Logical Data Warehouse

Definition:

“The Logical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.”

“The LDW is an evolution and augmentation of DW practices, not a replacement”

“A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information”

“The LDW permits an IT organization to make a large number of datasets available … via query tools and applications”

Gartner Hype Cycle for Enterprise Information Management, 2012.

Page 11: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

11

Architecture of the Logical Data Warehouse

Data Warehouse

Sensor Data

Machine Data (Logs)

Social Data

Clickstream Data

Internet Data

Image and Video

Enterprise Content (Unstructured)

Big Data

Enterprise Applications

Traditional Enterprise

Data

Cloud

Cloud Applications

Metadata Management, Data Governance, Data Security

NoSQL

EDWIn-Memory

(SAP Hana, …)Analytical

Appliances

Cloud DW(Redshift,..)

ODS

Big DataETL

CDC

Sqoop

(Flume, Kafka, …)

Real-Time Data Access (On-Demand / Streaming)

Batch

YARN / Workload Management

HDFS

HiveSparkDrill

Impala

Storm HBase SolrHunk

DW Streams NoSQL SearchSQL

Hadoop

TezMapRed.

Data

In

teg

rati

on

/S

em

an

tic L

ayer

Real-TimeDecision

Management

Alerts

ScorecardsDashboards

Reporting

Data DiscoverySelf-Service

Search

Predictive Analytics

Statistical Analytics (R)

Text Analytics

Data Mining

Page 12: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

12

Autodesk Data Architecture

Data

In

teg

rati

on

/S

em

an

tic L

ayer

Page 13: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Data Integration/Semantic Layer Alternatives

Page 14: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

14

Three Integration/Semantic Layer Alternatives

Application/BI Tool as Data Integration/Semantic Layer

EDW as Data Integration/Semantic Layer

Data Virtualization as Data Integration/Semantic Layer

Application/BI Tool Data Virtualization

EDW

EDW

ODS ODS EDW ODS

Page 15: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

15

Application/BI Tool as the Data Integration Layer

Application/BI Tool as Data Integration/Semantic Layer

Application/BI Tool

EDW ODS

• Integration is delegated to end user tools

and applications

• e.g. BI Tools with ‘data blending’

• Results in duplication of effort – integration

defined many times in different tools

• Impact of change in data schema?

• End user tools are not intended to be

integration middleware

• Not their primary purpose or expertise

Page 16: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

16

EDW as the Data Integration Layer

EDW as Data Integration/Semantic Layer

EDW

ODS

• Access to ‘other’ data (query federation) via EDW

• Teradata QueryGrid, IBM FluidQuery, SAP Smart Data Access, etc.

• Often coupled with traditional ETL replication of data into EDW

• EDW ‘center of data universe’

• Provides data integration and semantic layer

• Appears attractive to organizations heavily invested in EDW

• More than one EDW? EDW costs?

Page 17: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

17

Data Virtualization as the Data Integration Layer

Data Virtualization as Data Integration/Semantic Layer

Data Virtualization

EDW ODS

• Move data integration and semantic layer to

independent Data Virtualization platform

• Purpose built for supporting data access

across multiple heterogeneous data sources

• Separate layer provides semantic models for

underlying data

• Physical to logical mapping

• Enforces common and consistent security

and governance policies

• Gartner’s recommended approach

Page 18: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Logical Data Warehouse Revisited

Page 19: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

19

Architecture of the Logical Data Warehouse

Real-TimeDecision

Management

Alerts

ScorecardsDashboards

Reporting

Data DiscoverySelf-Service

Search

Predictive Analytics

Statistical Analytics (R)

Text Analytics

Data MiningData Warehouse

Sensor Data

Machine Data (Logs)

Social Data

Clickstream Data

Internet Data

Image and Video

Enterprise Content (Unstructured)

Big Data

Enterprise Applications

Traditional Enterprise

Data

Cloud

Cloud Applications

NoSQL

EDWIn-Memory

(SAP Hana, …)Analytical

Appliances

Cloud DW(Redshift,..)

ODS

Big DataETL

CDC

Sqoop

(Flume, Kafka, …)

Data Virtualization

Real-Time Data Access (On-Demand / Streaming)

Data Caching

Data

Serv

ices

Data Search & Discovery

Governance

Security

Optimization

Data

Abstr

action

Data

Tra

nsfo

rmation

Data

Federa

tionBatch

YARN / Workload Management

HDFS

HiveSparkDrill

Impala

Storm HBase SolrHunk

DW Streams NoSQL SearchSQL

Hadoop

TezMapRed.

Page 20: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

20

Autodesk Data Architecture

Page 21: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

21

1. The 3 trends will change your

‘information architecture’

2. Logical Data Warehouse (LDW) is a key

architectural pattern to address many of

the challenges of the new information

architecture

3. LDW requires a data

integration/semantic layer

4. Data Virtualization is the recommended

approach for this critical layer

Summary

Page 22: Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

Thanks!

www.denodo.com [email protected]

© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.