29
1 Getting Started: Modeling the Structure and Operations of Big Data Session BG2, February 11, 2019 Deepesh Chandra, Associate Partner & Senior Expert Pierre-Arnaud Klaskala, Associate Partner, Director of Product & Technology

Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

1

Getting Started: Modeling the Structure and Operations of Big Data

Session BG2, February 11, 2019

Deepesh Chandra, Associate Partner & Senior Expert

Pierre-Arnaud Klaskala, Associate Partner, Director of Product & Technology

Page 2: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

2

Deepesh Chandra, Associate Partner & Senior Expert

Pierre-Arnaud Klaskala, Associate Partner, Director Of Product & Technology

Have no real or apparent conflicts of interest to report.

Conflict of Interest

Page 3: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

3

Provide a technical overview of big data analytics

• Describe big data storage, frameworks, and other critical aspects

of usable healthcare data structures

• Explore uses of healthcare structured/unstructured data and

metadata

• Discuss transforming legacy data into trusted and actionable data

structures

• Assess data analytics, data visualization, and business

intelligence and their roles in big data

Learning Objectives

Page 4: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

4

Contents

Introduction

Big data components

Building trusted and usable data structures

Analytics and visualization in big data

Key learnings

Page 5: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

5

The Challenge1 The Current State2 The Opportunity3

$3.0T Spent on healthcare

in 2015 in US –

>18% of GDP

1.9%Health care spending

in US grows 1.9 basis

points faster than

GDP growth (OECD

historical rate)

0.5%Annual growth in

healthcare labor

productivity in US

over this same period

20%In 2017, 20% of all local

VC investment in SF went

into the AI, Big Data &

Analytics sub-sector

Despite massive investment in IT, the

industry still lags in maturity of AA and

digital capabilities

12thOut of 13 industries in

the McKinsey

Advanced Analytics

maturity index

8thOut of 9 industries in the

McKinsey Digitization

maturity index

11thOut of 13 industries in

terms of readiness to

adopt and employ AI

The opportunity represented by advanced analytics and digital in healthcare, and the urgency to act

SOURCE: 1 OECD Policy Implications of the New Economy 2000 -50 (2001); Global Insight WMM2000 -37;Espicom: World Pharmaceutical Fact Book 2008; International Monetary

Fund. World Economic Outlook Database. October 2009; Espicom: World Pharmaceutical Fact Book 2008; McKinsey< 2 McKinsey Global Institute – AI the Next digital frontier, The age of analytics: competing in a data-driven world3 Fuel by McKinsey

Page 6: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

6

SOURCE: McKinsey analytics

Data

ecosystem

Modeling

insights

Workflow

integrationAdoption

Source

of value

Analytics-to-insights Insights-to-impact

Technology and infrastructure Organization and governance

Effective healthcare advanced analytics and digital transformations require work across the entire analytics workflow

SOURCE: McKinsey Analytics; McKinsey Global Institute analysis

Page 7: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

7

Big data, advanced analytics, and digital need to be combined to capture business opportunities

ingest, manage,

Integrate, and

analyze large and

complex data

enable more

sophisticated predictive

and prescriptive

analytics, and work

against large,

incomplete, or

unstructured data

Application of modern

(digital) technologies to

core business

processes, Advanced

analytics

Big data

Digital

Page 8: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

8

Healthcare data spans the spectrum of data complexity

Unstructured

Semi-Structured

Structured

Sensors and

fitness trackers

Social media

Healthcare claimsAudio

recording

Email, PDF,

PPTX, DOCXEDI

communications

Medical

images

Scheduling

data

Clinical

notes

80% of all data is unstructured1 AND it’s growing at CAGR of 36%2

1 - Source: International Data Corporation, EMC Corporation, Harmony Healthcare IT

2 - Source: International Data Corporation

Page 9: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

9

New opportunities create requirements that traditional data stacks cannot meet

Master blueprint for a

data architecture

transformation

… enabling new business

insights

… improving business

transparency

… lowering cost of IT and

operations

… increasing business

agility

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Page 10: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

10

Contents

Introduction

Big data components

Building trusted and usable data structures

Analytics and visualization in big data

Key learnings

Page 11: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

11

What is a data lake?

Persist all raw source data in a

common place (including history)

Provides data storage and processing

at extremely low cost

Easily connects with data

discovery tools to explore data

Allows to search and integrate data

without knowing exact schema of

data

Stores relational data as well as

media, emails, PDFs and more

(unstructured)

A data lake is NOT

a data warehouse

▪ No facility to

generate reports

▪ No

harmonization or

integration of

data

▪ Data may be

wrong

or inaccurate

Data lake

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Page 12: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

12

The data lake is the first step of the analytics journey and the center of the big data stack

Workflow automation

Rapid Prototyping App Factory

Analyze/

test/

optimize

Transfer/

clean up/

expand

Visualize/

test/

improve

Develop/

automate/

operate

Analytics Garage

Data Lake

Collection of a

comprehensive and valid

data set

Development of

successful proto-types

as solutions

Fast development of a

prototype based on

convincing ideas

Analytics Garage with a

variety of tools for

analyzing the data

▪ Data transfer

▪ Workplace

▪ Backups

▪ External data sources

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Page 13: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

13

Landing zone, data lake and analytics environment constitute the central elements of the data lake architecture

Data flowArchitecture

Landing zoneData sources

Data lake

Landing zone

Advanced Analytics Environment

Plain data

without tagging

Plain data

with basic tagging

Raw data

fully tagged

Prepared data

Data for Analysis

D

A

B

C

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Page 14: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

14

The data lake is structured into different zones that distinguish raw and production data

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Advanced Analytics Environment

Data Lake

Governance

Data catalogue

Taxonomy

Lineage

Access management

Retention management

3Production zone

Raw zone: Tagging describes data

File storage

III

II

I

Graph DB

File storage

Relational DB

API

Landing zone

Preparation

API

Page 15: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

15

The production zone is comprised of further sub-zones for specialized production purposes

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Data Lake

Governance

Data catalogue

Taxonomy

Lineage

Access management

Retention management

Production zone

Raw zone: Tagging describes dataIII

II

I

API

Landing zone

Preparation

Analytics workbench DWHAnalytical apps

Use case

analytics zoneCorporate

production

API

I II III Satellite

zones

APIAPI API

Advanced analytics env.

Page 16: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

Hosting, Security, Monitoring and Scheduling

Meta data management, Data Governance, Data Lineage

Data

marts

Batch

Ingestion

Streaming

Ingestion

Collaborati

ve Data

science

Platform

Streaming

Analytics

Stream processing

layerReal time Views

Multi-Domain

MDM

ODS Layer

(Warm Data)

Enterpris

e Data

Lake

Big Data

Preparation

Tool

Customer

360 degree

Platform

Business

Intelligence

Dashboards

Analytical

Apps

Transient

Landing

Zone

Data

Access

Layer

Curated

Zone

Extract

& Load

Extract

& Load

Hot path to support streaming use cases

Delivery

Hub to

Source

System

Cleansed,ValidatedCustomerdata

Golden

Records

Real time analytical

decisions

Near Real-

time/Real-

time

Processing

Batch

Processing

Serving

Layer

Data

Ingestion1

Frontend

Layer

Analytics

LayerData preparation LayerData LakeData Sources

Structured Data

• Electronic

Medical

Records

• Billing and

Charge Data

• PO/Supply

Chain

• HR and

operational

data

Unstructured

Data

• Medical

images

• External

Sources

• Web Logs

• Social Media

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Big data reference architecture

Page 17: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

The big data and analytics tool vendor landscape is immensely diverse and highly dynamic

Hosting, Security, Monitoring and Scheduling

Meta data management, Data Governance, Data Lineage

Hot path to support streaming use cases

Data preparation Layer Serving LayerData Ingestion Frontend LayerData LakeData Sources

Structured Data

▪ Electronic

Medical

Records

▪ Billing and

Charge Data

▪ PO/Supply

Chain

▪ HR and

operational

data

Unstructured

Data

▪ Medical

images

▪ External

Sources

▪ Web Logs

▪ Social Media

Cleansed,

Validated

Customer

data

Golden

Records

Analytics Layer

Data marts

Real time

analytical

decisions

Streaming

AnalyticsReal time Views

Stream processing

layer

Extract

& Load

Extract

& Load

Batch

Processing

Big Data

Preparation

Enterprise

Data Lake

ODS Layer

(Warm Data)

Near Real-

time/Real-

time

Processing

Delivery

Hub to

Source

System

Page 18: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

18

Contents

Introduction

Big data components

Building trusted and usable data structures

Analytics and visualization in big data

Key learnings

Page 19: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

19

Key data governance processes and supporting toolsDimensions

Tools

Key things to have

Metadata

mgmt

• Business glossary

• Metadata management software

• Data lineage

• ETL code generation automated

Data

quality• Data quality tool deployed, covering data profiling,

matching, cleansing, monitoring

Master data

mgmt

• MDM tool

• Integration with other systems and processes

Data governance• Data owners defined

• Data governance body

• Define data governance process

SOURCE: Digital McKinsey - Building best-in-class Data Management Architecture

Page 20: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

20

Data quality diagnostic criteria

1 except for pre-agreed cases

2 optional criterion for organizing data in Vertica or DB2

"Satisfactory""Good" “Poor"

▪ Table refers to clear

directories

▪ There is a unique key

▪ Data are stored in a big

table, no directories

available

▪ Key is not available

Normalization2

▪ Number of entries per

month from the start of

data acquisition deviates

by less than 50% from

median1

▪ Number of entries

deviates from the mean

by more than 50% in at

least one of the periods

▪ Number of entries

deviates from the mean

more than 2 times in at

least one of the periods

Timecompleteness

▪ No outliers (>500% of the

median)1

▪ More than 1% of outliers

with a delta of more than

500% of the median

Correctness

▪ Values are presented fully

and sufficiently (filled-in for

90% and above)

▪ Insignificant gaps (<30%)

in at least one attribute

▪ >30% of gaps in at least

one attribute

Quality

SOURCE: Digital McKinsey - Building best-in-class Data Management Architecture

Page 21: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

21

Example of end product – data quality diagnostics

Page 22: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

22

Data catalog tools usually come with 8 core functionalities

1. Metadata repositories

2. Business glossary

3. Data lineage

4. Impact analysis

5. Rules management

6. Semantic frameworks

7. Metadata ingestion

8. Collaboration

Data catalog capabilities

SOURCE: Digital McKinsey - Data catalogs as metadata management solution

Page 23: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

23

Contents

Introduction

Big data components

Building trusted and usable data structures

Analytics and visualization in big data

Key learnings

Page 24: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

24

Analytics and visualization are fed from the data lake

Workflow automation

Rapid Prototyping App Factory

Analyze/

test/

optimize

Transfer/

clean up/

expand

Visualize/

test/

improve

Develop/

automate/

operate

Analytics Garage

Data Lake

Collection of a

comprehensive and valid

data set

Development of

successful proto-types

as solutions

Fast development of a

prototype based on

convincing ideas

Analytics Garage with a

variety of tools for

analyzing the data

▪ Data transfer

▪ Workplace

▪ Backups

▪ External data sources

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Page 25: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

25

A typical big data stack has a range of coding and visualization tools

Ext.

APIs

Clients

Options for compute engines

Sparkling

Water

Options

MapReduce

Specific

Use

Cases

Options

+ Others

Graphical coding

Exploration

and

Visualization

Plain codingSupporting infra-

structure services

Application server compute

(analyst workbench)

Plain compute

(analyst backend)Database compute (data lake)

Server IVa IVbIVc IVc

SOURCE: Digital McKinsey Big Data and Advanced Analytics Compendium: "From Garage to Factory" – Big Data architecture and technologies

Page 26: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

26

Contents

Introduction

Big data components

Building trusted and usable data structures

Analytics and visualization in big data

Key learnings

Page 27: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

27

We believe that effective healthcare advanced analytics and digital transformations require work across the entire analytics workflow

SOURCE: McKinsey Analytics; McKinsey Global Institute analysis

Data

ecosystem

Modeling

insights

Workflow

integrationAdoption

Source

of value

Analytics-to-insights Insights-to-impact

Technology and infrastructure Organization and governance

Page 28: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

28

Five insights into building a great big data analytic platform

#1 - Ensure everything you do starts delivering impact

within six months

#2 - Use existing data to build in bite-size chunks

#3 - Deploy analytics only to solve tangible business

problems

#4 - Invest twice as much in your talent, culture, and

processes as in tools

#5 - Democratize data across your business to catalyze

innovation from within

Page 29: Getting Started: Modeling the Structure and Operations of ......Customer 360 degree Platform Business Intelligence Dashboards Analytical Apps Transient Landing Zone Data Access Layer

29

Please complete the online session evaluation!

Questions

Deepesh ChandraAssociate Partner & Senior Expert

[email protected]

Pierre-Arnaud KlaskalaAssociate Partner, Director Of Product & Technology

[email protected]