26
1 MDM Architecture Deep Dive: Implementing Real-time Bi-Directional Synchronization Ron Matusof, VP MDM Solutions Informatica Dmitri Korablev, VP Strategy and Planning Informatica

MDM Architecture Deep Dive: Implementing Real-time Bi-Directional

Embed Size (px)

Citation preview

1

MDM Architecture Deep Dive: Implementing Real-time

Bi-Directional Synchronization

Ron Matusof, VP MDM Solutions

Informatica

Dmitri Korablev, VP Strategy and Planning

Informatica

2

Agenda

• Introduction

• Understanding Synchronization Requirements

• Real World Use Cases

• Best Practices

• Summary

3

Introduction

4

Who We Are?

Dmitri Korablev

VP, Strategy & Planning

Almost 20 years designing,

architecting and building

enterprise solutions

Favorite quote:

“Simple things should be

simple and complex things

should be possible.”

Alan Kay

Ron Matusof

VP, Solution Architecture

Almost 30 years architecting

and integrating complex

systems

Favorite quote:

“Nothing is as simple as it

seems.”

Corollary to Murphy’s Law

5

Why is Synchronization an issue?

It may be complex, but it should be possible.

…and it is not as simple as it seems.

Theme for Today’s Presentation

“Make everything as simple as possible, but not simpler.”

Albert Einstein

6

Synchronization Design Tradeoffs

Consistency Performance

Quality

Coherence

Correlation Timeliness

Throughput

Latency

Accuracy

Fidelity

7

Complex MDM Scenario

Data Warehouse

Legacy Systems

Data Marts

Data Steward

Business Users

Call Center

Legacy CIF

Policy DB

B2B Transforms

Federated

Query

Bi-

Di

Sy

nc

hro

niz

ati

on

Read/Update Hub GSDN Integration

1

2

3

4

5

8

Data Warehouse

Data Marts

Data Steward

Business Users

Call Center

Legacy CIF

Policy DB

1

Case 1: Hub as System of Record

9

Questions you should ask • Consistency

• Do I Synch or do I Access in the Hub?

• Performance • Is Real-time Always Better?

• Quality • Do I want a Batch Window?

Data Warehouse

Data Marts

Data

Stewar

d

Business User

s

Call Center

Legacy CIF

Policy DB

1

Case 1: Hub as System of Record

Solutions • Look at Consumption Scenarios

• Synthesize appropriate keys for downstream use

• Leverage Existing Interfaces and Transport • Don’t reinvent the infrastructure

• Understand the Full Data Supply Chain • Propagation, Syndication, or Synchronization?

10

Synchronizing Multiple Systems

Classical approach focuses on data latency

No emphasis on data correlation

Limited metrics for “what is good enough”

Home Office

Taos

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

Home Office

Taos

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

Home Office

Taos

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

11

Synchronizing Multiple Systems

Latency causes discrepancy in targets

At t = 1, System 1 is synchronized with Hub

At t = 3, System 2 is synchronized with Hub

Between t = 1 and t = 3, no downstream correlation

Home Office

Flagstaff

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

Home Office

Flagstaff

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

Home Office

Taos

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

12

Legacy Systems

Data Steward

Business Users

Legacy CIF

Policy DB

2 Federated

Query

Bi-

Di

Sy

nc

hro

niz

ati

on

Case 2: R/T Synch with an application

13

Questions you should ask • Consistency

• How do I correlate with a other systems?

• Performance • What do I do about feedback loops?

• Quality • Can I catch DQ Issues at source?

Case 2: R/T Synch with an application

Solutions • Develop Appropriate Logical Data Models

• Consider handling changes through abstraction & insulation

• Understand both direct and indirect feedback loops • Use Delta Detection and/or CDC for loop suppression

• Integrate with the application business workflow • Event Driven vs. Process Driven (or both)

14

Synchronizing Multiple Systems

Bi-Directional Synch causes race conditions

Need to characterize the latency, throughput and

correlation impacts and tolerances.

Home Office

Flagstaff

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

Home Office

El Paso

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

Home Office

Taos

Orbit City

Scranton

Party Name

ACME Products

Spacely Space Sprockets

Dunder Mifflin

informatica data services informatica data services

15

Data Steward

Business Users

Legacy CIF

Policy DB

3

Case 3: Synch On-Premise with Cloud

16

Case 4: Global Data Synchronization

Questions you should ask • Consistency

• How do I transform the data?

• Performance • How do I handle simultaneous updates?

• Quality • How do I handle End-To-End Lineage?

Solutions • Use composite services to handle transformations

• Use Hub Data Objects with the B2B transform as a facade.

• Put complexity in the transform & not the data model • Watch for the costs of serialization/deserialization and IO

• Integrate with the application business workflow • Event Driven vs. Process Driven (or both)

17

Data Steward

Business Users

Legacy CIF

Policy DB

5

Case 5: Global Distribution of Hubs

18

Case 5: Global Distribution of Hubs

Questions you should ask • Consistency

• How do I handle one hub being off-line?

• Performance • How do I handle simultaneous updates?

• Quality • How do I govern data across the Hubs?

Solutions • Develop connection agnostic processes

• Implement Queued Updates

• Design appropriate Replication and Failover strategy • Use embedded or external CDC to generate change lists

• Develop integrated workflow across the Hubs • Event Driven vs. Process Driven (or both)

19

Complex MDM Scenario

Data Warehouse

Legacy Systems

Data Marts

Data Steward

Business Users

Call Center

Legacy CIF

Policy DB

B2B Transforms

Federated

Query

Bi-

Di

Sy

nc

hro

niz

ati

on

Read/Update Hub GSDN Integration

1

2

3

4

5

20

Best Practices

21

Application Integration

• Design for Application Data Consumption Pattern(s) • Analyze and understand the business use of the data

• Socialize the application use of Master Data

• Model for the Business Usage of the Master Data • Consider using the application data model as a starting point

• Reconcile with the data models for contributing sources

• Optimize the model for application use cases

• Architect for application security/performance • Review regulatory constraint for distributing and sharing data

• Develop methodology for creating test data and testing the app

• Architect to achieve application performance requirements

22

Bi-Directional Synchronization

• Optimize the Synchronization Architecture

• Determine Requirements Based on Consumption Patterns

• Evaluate the Cost/Benefit of overachieving on the SLA

• Architect solution to meet the optimal cost/benefit

• Analyze Round trips and Feedback Loops

• Consider loops that pass through more than two systems

• Use CDC/Delta detection to suppress loops

• Create Transactionality/Compensation Strategies

• Architect solutions for handling cross platform transactions

• Consider how to compensate for discrepancies

23

Global Distribution

• Architect to a Common Core Data Model

• Ensure that synchronized attributes are formatted identically

• Account for local differences only in non-synchronized fields

• Account for Data Differences in 3rd party software

• Date/time

• Currency

• Unique processing approaches

• Make adding additional countries/regions easy

• Ensure the core data model considers future requirements

24

Summary

Do not over-engineer the solution.

• Architect the Synchronization Design to meet the SLA.

Design performance for the whole solution, not the

individual parts.

• Synchronization has the potential for introducing additional bottlenecks.

Design synchronization solution to maintain

appropriate correlation between target systems.

• Perceived synchronization issues are typically related to correlation between downstream systems.

25

Marketplace Overview A Trusted, Open Ecosystem

• Virtual Marketplace for Data Integration Apps

• Solutions across all technology areas – DI, DQ, MDM, Cloud, etc.

• Open Ecosystem – Apps from Partners, ISVs, Consultants, and Developers

• Seal of Approval ensures App quality

• More than 600 Apps, over 200 Free!

• 15k visits per month, 2k downloads

http://marketplace.informatica.com

R

M

26

Questions?