31
INTRODUCING CLOUDERA DATA PLATFORM Gergely Devenyi | Director of Engineering Balazs Gaspar | Solutions Engineer

Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

INTRODUCING CLOUDERADATA PLATFORMGergely Devenyi | Director of Engineering

Balazs Gaspar | Solutions Engineer

Page 2: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

2 © Cloudera, Inc. All rights reserved.

Page 3: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 3

• 85CountriesCustomers

3,000+Employees

2,000+Partners

3,000+

8/10TOP

GLOBAL

10/10TOP

GLOBAL

9/10TOP

GLOBAL

40+GOVERNMENT CUSTOMERS

BANKING TELCO PHARMAPUBLIC

8/10TOP

GLOBAL

TECHNOLOGY

10/10TOP

GLOBAL

AUTOMOTIVE

THE NEW CLOUDERA

Page 4: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 4

OUR CUSTOMERS ARE ASKING FOR

Hybrid Deployments• Move data and applications

without rewriting and retraining

• Separate data management strategy from infrastructure strategy

• Manage all environments from a single pane of glass

Multi-Function & Open• Deploy one platform to

address current and future workload needs

• Connect disparate workload types to develop Edge2AI applications on one platform

• Open source and open APIs

Secure & Governed• Manage data security and

governance centrally• Automate application

security at all layers• Reduce time to value with

enterprise-grade productivity tools

Customer Experience• Easy to use with self-serve

capabilities• Elasticity and agility to meet

changing demands of workloads and company

• Simple to manage and maintain environments and applications

Page 5: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 5

HOW TIMES HAVE CHANGED

2008SCALE 1 JOB TO

1000s OF SERVERS

2019SCALE 1 PLATFORM TO

1000s OF USERS

Page 6: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 6

DATA TEAMS ARE HIGHLY SPECIALIZED

App DevelopersData Engineers

Compliance OfficersData Architects

BI Analysts Data Scientists

Infrastructure Managers

Page 7: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 7

App DevelopersData Engineers

Compliance OfficersData Architects

BI Analysts Data Scientists

Infrastructure Managers

SPECIALIZATION CREATES A DIVERSITY OF NEEDS

Continuous availability, custom tooling

Capacity guarantees to enable consistent SLAs

Capacity on demand to support bursty workloads

Latest tools and hardware, ad-hoc resources

Seamlessly integrated data landscape

Fine-grain access controls, privacy and verifiable audit

Reliability, cost, & scale, fault tolerance

Page 8: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 8

A DATA PLATFORM DESIGNED FOR MULTI-TENANCY

Cloudera Data Platform

SDX

App Developers

Data ArchitectsCompliance Mgrs. Infrastructure Mgrs.

Centralized Data, Security, Governance and Management

Data Engineers BI Analysts Data Scientists

CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS

Page 9: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

Confidential — Restricted 9

OUR APPROACH

Page 10: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 10

CLOUDERA DATA PLATFORM

Page 11: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 11

CDP HOME

A single login to access the full platform, documentation, and support - all controlled through corporate SSO

Page 12: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 12

DATAHUB

A familiar and highly customizable cluster service optimized for the separation of storage and compute

DataEngineers

AppDevelopers

Page 13: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 13

DATA WAREHOUSE

A data warehousing service optimized for concurrency, caching, and isolation

BI Analysts

Page 14: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 14

A machine learning workspace service to connect teams of data scientists to enterprise data

MACHINE LEARNING

Data Scientists

Page 15: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 15

WORKLOAD MANAGER

A centralized management tool for analyzing and optimizing workloads within and across environments

DataEngineers BI Analysts

Page 16: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 16

REPLICATION MANAGER

A centralized management tool for replicating and migrating data, metadata, and policies between environments

Data Architects

Page 17: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 17

DATACATALOG

A centralized data stewardship tool for searching, organizing, securing, and governing data across environments

Compliance Officers

Page 18: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 18

A single pane of glass to manage 100s of clusters all with different lifecycles - across multiple environments

MANAGEMENT CONSOLE

Infrastructure Managers

Page 19: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

INSIDE LOOK INTO CDP

Page 20: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 20

CDP ENABLES INFRASTRUCTURE AGNOSTIC DEPLOYMENTS

CDP Data Center(monocluster, bare metal, no containers)

Spark, Hive, Impala, HBase, ...

SDX(backed by HDFS)

CDP Private Cloud(separate storage / compute, containers)

SDX(backed by HDFS / Ozone)

DataHub(on VMs)

CDW(on K8s)

CML(on K8s)

CDP Public Cloud(separate storage / compute, containers)

SDX(backed by S3 / ADLS / GCS)

DataHub(on VMs)

CDW(on K8s)

CML(on K8s)

CDP Management Console

Data Catalog Workload Manager Replication Manager

Page 21: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 21

CDP HIGH LEVEL ARCHITECTURE

Management ConsoleManagement Console - A single pane of glass to manage one or more environments and the services that run within each environment

Environment

SDX

Data Hub

Clusters

DWClusters

MLClusters

DataHubClusters

CDWClusters

CMLClusters

Environment - A logical encapsulation of a customer network and the the services that run within that network (like an Azure virtual network)

Cluster – A distributed computing service that running on VMs (Data Hub) or K8s (the experiences) and has access the shared data lake

SDX – The data access control layer that sits on top of the backend object store and provides coherent data security and governance for all the applications running with the environment

Data Catalog Workload Manager

Replication Manager

Page 22: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 22

CLOUDERA DATA PLATFORM

Page 23: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 23

Maintaining lineage requires integrating changes across disparate systems, to determine data origin and track data throughout its lifecycle.

DATA LINEAGE

To effectively manage data, it is critical to define a common point of reference, a system-of-record to ensure data quality, consistency, and integrity across all applications.

MASTER DATA MANAGEMENT

To explore data classification, audit information, and metadata, navigation paths need to be in place for running multiple queries across multiple data types.

SEARCH AND INDEX

Archiving least frequently used data to streamline systems allows the data lake to stay performant by reducing the volume of unused data.

DATA RETENTION AND ARCHIVAL

There is a need to understand the data and ensure prescribed data quality rules are applied.

DATA QUALITY AND PROFILING

Security around the data supply chain process requires access control, agreed upon tokenization or encryption standards, and monitoring and alerting systems.

SECURITY AND ACCESS CONTROL

Metadata, the attributes gathered from data, has to be integrated into a repository and maintained.

METADATA MANAGEMENT

Auditing needs to be carried out to account for the data and ensure users are compliant across multiple environments.

AUDITINGA business glossary provides a common vocabulary and standardization to data definitions which facilitates communication across teams.

BUSINESS DEFINITIONSThe resources who are involved in maintaining data need a framework to govern processes and workflows among various governance roles

GOVERNANCE

DATA MANAGEMENT FOR BUILDING TRUSTED DATA LAKES

Page 24: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 24

CDP DATA CENTER – POWERED BY CLOUDERA RUNTIME

New features for CDH 6 customers

Ranger

• Dynamic row filtering• Dynamic column masking• Attribute-based access control• SparkSQL fine-grained access control

Atlas 2.0• Advanced data discovery• Improved performance and scalability

Hive 3 • Better fit for EDW Optimization use cases (large joins, analytical style workloads)

Knox • Gateway-based SSO

Hive on Tez • Better ETL performance

New features for HDP 3 customers

Cloudera Manager

• Virtual private clusters• Automated wire encryption setup• Fine-grained RBAC for administrators• Streamlined maintenance workflows

Atlas 2.0• Advanced data lineage• Faceted search

Impala • Better fit for Data Mart migration use cases (interactive, BI style queries)

Hue • Built-in SQL editor

Kudu • Better performance for fast changing / updateable data

Includes SDX and many other important capabilities

Page 25: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

25Confidential — Restricted

WHEN CAN I GET IT?

CDP PUBLIC CLOUD

AWS (Q3) / EKS

AZURE / AKS

GCP / GKE

CDP DATA CENTER

CONTINUITY

“CDP BARE METAL”

CLOUDERA RUNTIME

CDP PRIVATE CLOUD

KUBERNETES-BASED

“BATTERIES INCLUDED”

3rd PARTY K8 DISTROS

Q3 Q4 2020

HDP 2.x / 3.x

CDH 5.x / 6.x+

Page 26: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

EDGE TO AI

Page 27: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

27Confidential — Restricted

CLOUDERA DATAFLOW DATA-IN-MOTION PLATFORM

Page 28: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

28

CLOUDERA MACHINE LEARNINGAccelerate and simplify machine learning from research to production

ANALYZE DATA• Explore data securely and

share insights with the team

TRAIN MODELS• Run, track, and compare

reproducible experiments

DEPLOY APIs• Deploy and monitor

models as APIs to serve predictions

MANAGE SHARED RESOURCES• Provide a secure, collaborative, self-service platform for your data science teams

Page 29: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

29

WHAT INDUSTRIALIZED MACHINE LEARNING LOOKS LIKE

Predictive Services

BI Tools and SQL Editors

Data Products

DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT

MACHINE LEARNING

DATA ENGINEERING

DATAWAREHOUSE

OPERATIONAL DATABASE

Sensors/IoT Devices

DATA FLOW & STREAMING

Page 30: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

© 2019 Cloudera, Inc. All rights reserved. 30

DRIVING INNOVATION IN STREAMING

Page 31: Balazs Gaspar | Solutions Engineer Gergely Devenyi ... · Spark, Hive, Impala, HBase, ... SDX (backed by HDFS) CDP Private Cloud ... object store and provides coherent data security

THANK YOU