19
Business Data Lake best practices OOP Munich, 2017-01-31

Business Data Lake Best Practices

Embed Size (px)

Citation preview

Page 1: Business Data Lake Best Practices

Business Data Lake best practices OOP Munich, 2017-01-31

Page 2: Business Data Lake Best Practices

2 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

The speaker – Arne Roßmann

!  Part of Insights & Data team •  Global team delivering around BI, DWH, Information

Strategy & Big Data

!  Working in Business Intelligence since 2008 !  Delivering as Big Data architect & Project

Manager at our clients •  Defining processes •  Creating architectures •  Leading projects

!  Worked in many industries •  Retail, Chemical, Financial, Logistics, Automotive, ...

Page 3: Business Data Lake Best Practices

3 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Capgemini’s Insights & Data Global Practice

With 15,000 experts globally, we are a recognized leader in information-led transformation

Capgemini’s Insights & Data Global Practice

Expertise in Big Data & Analytics Capgemini Solutions

!  Over 15,000 consultants globally

!  Industrialized delivery framework Next Gen Business Insights Service Centre

!  CUBE lab on the cloud with various demonstrations for BI environments

!  Built-in Tools for interactive agile BI and Devops

Partner Ecosystem

800+ Big Data & 400+ Data Science Global Consultants

Customer Analytics !  Segmentation &

Behavior Profiling !  Behavior Propensity

scoring !  Pricing Analytics

Marketing & Campaign Analytics

!  Campaign Recommendation

!  Cross Sell/Up Sell !  Campaign

Measurement !  Campaign Execution

Management

Operations Analytics !  Sales/ Demand

Forecasting !  Activity Based Costing !  Call Center Analytics

Asset/ Equipment Analytics

!  Warranty Analytics !  Asset Performance

Monitoring !  Predictive Asset

Maintenance !  Insights from Connected

Equipment

Fraud Analytics !  Fraud Scoring !  Collusion Fraud

Identification !  Fraud Framework for

Public Sector (Trouve)

Content Analytics !  Text Mining Accelerators !  Key Opinion Leader !  Content Analytics for

Fraud Detection

Business Data Lake offering

Data Warehouse Optimization Solution

Strategic Alliances and partnerships with major vendors

Enabling Co-Innovation with the CUBE lab

Experience in designing and deploying big data analytics solutions in a varied ecosystems

Page 4: Business Data Lake Best Practices

4 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Table of Contents

!  Why the Business Data Lake works

!  Services your Business Data Lake should provide

!  Standardize, Industrialize and Innovate!

Page 5: Business Data Lake Best Practices

Why the Business Data Lake works

Page 6: Business Data Lake Best Practices

6 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Big Data creates opportunities but poses challenges as well

Where do I start ?

“We know that Big Data can be helpful but how do we quantify the

benefits and develop a Business Case?”

“How do we know which Big Data technology/platform(s) suits our

architecture and business requirement? “

“How do I get all the unstructured data (mainly images) out of my operational

processes, into an analytical environment that allows me to

experiment with data?”

“Can we easily combine data from multiple source systems into our Big Data environment and visa versa?”

“Can I do it myself? What skills do I need for Big Data? “

“How do I measure the effectiveness or performance of my Big Data

initiative? How do I measure ROI?”

Page 7: Business Data Lake Best Practices

7 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Businesses are looking to close the gap towards ‘insight driven’

Have not completely integrated their data sources across the organization

79% Scattered data lying in silos across the organization

Do not have well-defined criteria to measure the success of their own Big Data initiatives

67%

Absence of clear business case for funding and implementation

Dependence on legacy systems for data processing and management Use cloud based Big Data

and analytics platforms 36%

Have either scattered pockets of resources or follow a decentralized model for analytics initiatives

Ineffective co-ordination of Big Data and analytics teams

47%

Page 8: Business Data Lake Best Practices

8 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

The Business Data Lake delivers what we need for the new data landscape.

Govern Where it matters

Encourage local requirements

Distill on demand

Store securely

!  Focus on MDM !  Enforce only when sharing !  Treat Corporate as

aggregation of Local.

!  Let the business decide what they need

!  Build from the bottom !  Enable traceability to

source disposable data views.

!  Store everything ‘as is’ !  Include structured and

unstructured data !  Store it cheaply were

possible

!  Select only what you want !  Business friendly tooling !  Re-usable information

maps !  Rapid change cycle.

Page 9: Business Data Lake Best Practices

9 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Business Challenges driving the need for BDL services

Business Enablement

!  Achieve real-time optimization of business processes through predictive insights and performance analytics

!  Enhance new services and stay competitive in the market

!  Be agile, get insights fast

Control Control

!  Ensure data security and compliance with EU data regulations

!  Enable up- and downscaling according to business needs

Control Control

!  Reduce costs associated with the governance and secure storage of data

!  Control the costs of running flexible data services

!  Reduce Capex

Page 10: Business Data Lake Best Practices

Services your Business Data Lake should provide

Page 11: Business Data Lake Best Practices

11 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Capgemini can help accelerate clients’ journey to Insights..

A cloud powered, big data & insights service; bring all your data in one place, deliver insights at the point of action and generate differentiated business value.

‘Software- Defined’’, full stack cloud

infrastructure

Flexible ‘Pay-as-you-go’

Commercial Model

Secure as a Vault

‘Ready to Harvest’ Sector & Domain

Insights

Modular Hybrid & Elastic

powered by ‘Intelligent

Automation’

Get started quickly: with our platform , tools and expertise we can support you at any level to manage your data and harvest insights

Your ‘Lab in the Cloud’

!  Experiment !  Hypothesize !  Simulate

Page 12: Business Data Lake Best Practices

12 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

The BDL architecture we built for our clients

Pla$

orm

asaService

Insights Platform UX Portal HTML 5, CSS, Angular JS

Big Data Lab Dataset Library

Data Science Lab Models Library

Insights Lab Ready Insights

Common Services Common Services

Ingest Algorithm Library

Sector Insight Labs

Smart Insights 360

Catalog & Provision

Meter &

Bill

Resource M

onitor

Provision

Service C

atalog

IoT Framew

ork

Access M

gmt

Know

ledge Base

Helpdesk

RESTfulWebServices

Infrastructure

asaService

HybridCloudExtensibility-(Bosh,CF)CG-CSB,Virtustream

StorageandParallelizaIon-EMCIsilon

Compute&Memory-EMCVCE

BigDataSuite–Pivotal,Cloudera,Hortonworks

VMware,Cortex

DataManagement–InformaIca,Talend,HDF,ApacheNify

AnalyIcstools-SAS,Madlib,RStudio,Spark

Vmware

Security&Governance

RSA,AD,Kno

x,Ranger,Ke

rberos,A

tlas,TDE

,W2W

,Metron,

Falcon

ITSM

-BMCRe

med

y

•  Common Web UI and UX architecture

•  Fully Virtualized compute, storage & Network

•  Intelligent automation of provisioning, process, service and support orchestration

•  Modular Component Architecture

•  Multiple points of presence

•  Seamless integration between on-premise, private & public cloud

•  Proven reference and component architecture for on-premise builds

•  Professional Services teams to build full stack

•  Demo of full stack •  Accelerated Partner

enablement

MD&LM Environment

HadoopDistribuIon–Hortonworks,Cloudera

RE&D,DevOps-CloudFoundry,Jira,Jit,

Application Layer Infra Layer User Access Layer Software & Services

VisualisaIon–Qlik,Tableau,SASVA,D3,HighCharts

Visualisation Visualisation Self Service Insights

CapgeminiPrivateCloud OnPremiseCloud

Page 13: Business Data Lake Best Practices

13 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

BDLaaS – illustrative example service Dashboard

Page 14: Business Data Lake Best Practices

14 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Standardize, Industrialize and Innovate!

Page 15: Business Data Lake Best Practices

15 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Big data processing is done in three different stages and we have to cater to each stage differently

!  Continuously running analytics processes

!  Trust in data quality !  Service levels secured !  Managed by IT

Operationalize

!  Store everything: internal and external, structured and unstructured

!  Store granular data !  Minimal effort on IT

Load “as-is”

!  Agile and explorative way of work

!  Self service !  Fail fast

Distill on demand

Time

Stage

Actors

Paradigms

IT implements data integration process for

production

Data providers and IT provide and store data

Data scientists and engineers explore and

analyze data

1 2 3

Allow creativity Encourage collaboration Ensure Business Meta

Data & Data Catalogue Enable Data Masking

Industrialize!

Examples of technical metadata !  Path (folder location)

!  Filename

!  File type

!  File size

!  Date of ingestion

!  Technical Owner / Group

!  For HIVE:

!  Nr of records / lines

!  Column number

!  Column names if available

!  Column data types

!  Value distribution

!  Min/Max

Examples of business metadata

!  Project (possibly automatic)

!  Data set name

!  Logical description of dataset

!  Data owner/data stewart

!  Confidentiality classification

!  Line of business

Page 16: Business Data Lake Best Practices

16 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

Start using ELT tools now!

Need for more platform updates

Need for more denormalization

Need for more specialized Know-How

" Abstraction layer to Hadoop processing engines

" Abstraction layer to NoSQL & SQL databases

" Standardized control flows

" Availability of developers

ELT Tools offer:

Page 17: Business Data Lake Best Practices

17 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

17 Copyright © Capgemini 2016. All Rights Reserved

Insights as a Service – Analytics Cloud for Oil & Gas major

Well Health Dashboards

Equipment Performance

Disaster Management

Supply Chain Analytics

Predictive Maintenance z z z Device Data

Driving behavior, GPS, diagnostics, etc.

Real Time Data System Data

Environment Data Project Data • 10 data points per

sec • 40 GB per field • 5-6 GB per day

per well, • 80TB Well data

year

• 24x7x365 monitoring usage

• Real time charts of streaming data

• Real time alerts • Thermal

Visualizations

Page 18: Business Data Lake Best Practices

18 Copyright © Capgemini 2015. All Rights Reserved

OOP MUC 2017 - Business Data Lake best practices

We helped customers getting to real value within 12 weeks from idea to production.

1 3 a 5 6 7 9 11

Business Insights Need

Integrate DataSet

Model Build and Training

Iterate and Tune

Data Exploration

Test Data Science Model

Apply Data Science

12

Business Validation

Publish Insights

Weeks

Business Problem Identified

Business Value Delivered

Page 19: Business Data Lake Best Practices

The information contained in this presentation is proprietary. Copyright © 2015 Capgemini. All rights reserved.

Rightshore® is a trademark belonging to Capgemini.

www.capgemini.com

About Capgemini

With more than 145,000 people in over 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2014 global revenues of EUR 10.573 billion.

Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience™, and draws on Rightshore®, its worldwide delivery model

Learn more about us at www.capgemini.com.