29
Roman Gruhn Director, Information Strategy (EMEA) [email protected] A Modern Enterprise Architecture

Big Data Paris - A Modern Enterprise Architecture

  • Upload
    mongodb

  • View
    228

  • Download
    8

Embed Size (px)

Citation preview

Roman Gruhn Director, Information Strategy (EMEA) [email protected]

A Modern Enterprise Architecture

The World Of Data Management Has Changed

Digital Platforms Have ChangedThe platforms your end users and customers use to engage with your applications and services have fundamentally changed at an unprecedented speed over the past 5 years.

UPFRONT SUBSCRIBE Business

YEARS / MONTHS WEEKS / DAYS Applications

PC MOBILE / BYOD Customers

ADS SOCIAL Engagement

SERVERS CLOUD Infrastructure

Goals of Digital Transformation

1.  Unlocking operational intelligence

2.  Enhancing business agility

3.  Improving customer-centricity

Source https://451research.com/report-short?entityId=90066 http://www.slideshare.net/JakeHird/101-digital-transformation-statistics-2016

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Boosting bottom line in 5 years

Competing in new segment in 3 years

Disadvanted by lack of transformation

Actively digitizing business

Challenges of Digital Transformation

Existing Systems Overwhelmed

Growth in Siloed Data

Lack Real-Time Insight

Data Warehouse Challenges

“Of Gartner's "3Vs" of big data (volume, velocity, variety), the variety of data sources is seen by our clients as both the greatest challenge and

the greatest opportunity.”*

Data VarietyDiverse, streaming or new data types

Data VolumeGreater than 100TB

Other DataLess than 100TB

* From Big Data Executive Summary of 50+ execs from F100, gov orgs; 2014

TRADITIONAL MODERNIZED

APPS On-Premise, Monoliths SaaS, Microservices

DATABASE Relational (Oracle) Non-Relational (MongoDB)

EDW Teradata, Oracle, etc. Hadoop

COMPUTE Scale-Up Server Containers / Commodity Server / Cloud

STORAGE SAN Local Storage & Data Lakes

NETWORK Routers and Switches Software-Defined Networks

The New Enterprise Stack

Data as a Cross-Enterprise Asset

1.  Re-use data to power multiple apps

2.  Enrich, analyze & monetize the data

3.  Enforce privacy and governance

Data Pipeline Ingest & Store Query & Transform Aggregate & Share Analyze

Architecture Patterns

3 Patterns to Turn Data into a Cross-Enterprise Asset

Single View

Data-as-a-Service

Operationalized Data Lake

Single View•  Efficiently retrieve status of any

business entity in real time •  Foundation for analytics: i.e. cross-

sell, upsell, churn risk •  REQUIREMENTS:

– Flexible schema + data governance

– Rich query, aggregation, search & reporting

– Highly scalable & continuously available

Why Not Stick with Relational?

Solution: Aggregate with a Dynamic Schema

… MobileApp

Web

Call

Centre CRM SocialFeed

COMMONFIELDSCustomerID|Ac/vityID|Type…

DYNAMICFIELDSCanvaryfromrecordtorecord

Single View

High Level Data FlowSource: Web App

Source: CRM App

Source: Mainframe

System

Batch or real-time

Documents/Objects

Customer Service App

Churn Analytics

Risk Model

Real-Time Access

Update Queue

… Group

Filter Sort Count Average Deviations

Valid

atio

n

Single View of Customer Insurance leader generates coveted single view of customers in 90 days – “The Wall”

Problem Why MongoDB Results Problem Solution Results

No single view of customer, leading to poor customer experience and churn 145 years of policy data, 70+ systems, 24 800 numbers, 15+ front-end apps that are not integrated Spent 2 years, $25M trying build single view with Oracle – failed

Built “The Wall,” pulling in disparate data and serving single view to customer service reps in real time Flexible data model to aggregate disparate data into single data store Expressive query language and secondary indexes to serve any field in real time

Prototyped in 2 weeks Deployed to production in 90 days Decreased churn and improved ability to upsell/cross-sell

Data-as-a-Service: Drivers

1  Development agility 2  Data re-use

3  Operational efficiency

4  Corporate governance

5  Cost accountability

DaaS Architecture

API Access Layer

Operational Data

Customers Products

Accounts Transactions

Infrastructure

App1 App2 App3 •  Shared, multi-tenant database accessible via a common API

•  Exposes CRUD, search, geospatial, graph, analytics

•  Each data domain isolated into its own collection

•  Access privileges and views defined for each collection

•  Self-service provisioning, scaling on-demand

Square Enix: DaaS

•  Multi-tenant OnLine Suite

•  DaaS to studios & developers, exposed as an API

•  On-Prem Private Cloud: Manages data shared by all titles

•  Player profiles •  Credits •  Leaderboards •  Competitions •  Catalog •  Cross-platform messaging

API Access Layer

MongoDB Shared Data Service

On-Prem Infrastructure (Private Cloud)

•  In-App functionality provisioned to private clusters on AWS

•  Game state •  Player metrics •  Game-specific

content & features

•  Elastically scalable

Data Lake

•  Centralized repository for analytics against data collected from operational systems

•  Extension of EDW: often based on Hadoop

•  50% of organizations invested in data lakes*

* Gartner

http://www.infoworld.com/article/2980316/big-data/why-your-big-data-strategy-is-a-bust.html

“Thru 2018, 70 percent of Hadoop deployments will not meet cost savings and revenue generation objectives due to skills and integration challenges.” Nick Heudecker, Research Director, Data Management & Integration

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing Framework

s

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing Framework

s

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake Configure where to land incoming data

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing Framework

s

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake

Raw data processed to generate analytics models

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing Framework

s

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake MongoDB exposes analytics models to operational apps. Handles real time

updates

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing Framework

s

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalized Data Lake

Compute new models against

MongoDB & HDFS

Operational Database Requirements

1  “Smart” integration with the data lake 2  Powerful real-time analytics

3  Flexible, governed data model

4  Scale with the data lake

5  Sophisticated management & security

Problem Why MongoDB Results Problem Solution Results

Existing EDW with nightly batch loads No real-time analytics to personalize user experience Application changes broke ETL pipeline Unable to scale as services expanded

Microservices architecture running on AWS All application events written to Kafka queue, routed to MongoDB and Hadoop Events that personalize real-time experience (ie triggering email send, additional questions, offers) written to MongoDB All event data aggregated with other data sources and analyzed in Hadoop, updated customer profiles written back to MongoDB

2x faster delivery of new services after migrating to new architecture Enabled continuous delivery: pushing new features every day Personalized user experience, plus higher uptime and scalability

UK’s Leading Price Comparison Site Out-pacing Internet search giants with continuous delivery pipeline powered by microservices & Docker running MongoDB, Kafka and Hadoop in the cloud

Patterns for Modern Data Architectures

Existing Systems Overwhelmed

Growth in Siloed Data

Lack Real-Time Insight

Single View Data-as-a-Service Operationalized Data Lake