23
Modernize Your Data Analytics Architecture with a Unified Approach to Data + AI Anand Venugopal Global Leader - Industry Solutions (Migrations) Databricks

Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Modernize Your Data Analytics Architecture with a Unified Approach to Data + AI

Anand VenugopalGlobal Leader - Industry Solutions (Migrations)Databricks

Page 2: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Topics

Why migrate from Hadoop to Databricks ?

Success stories, technical and business benefits

How can you migrate fast with low costs & low risk ?

Page 3: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Legacy On-Prem Analytics Architectures Are Not Keeping Up

Hadoop costs rising when costs need to be cut

Innovation hinges on MLand predictive insights

Business agility requiresreal-time data

This is preventing teams from driving high-impact business outcomes

Page 4: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Why Migrate to Databricks ?

Forrester study finds 417% ROI for companies switching to Databricks

47%Cost-savings from retiring

legacy infrastructure

5%Increase in revenue

25%Data team productivity

increase

Page 5: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

DEVOPS INTENSIVE RIGID AND INELASTIC

Hadoop is Costly, Complex and Ineffective

Hadoop ecosystem is complex and hard to manage

that is prone to failures

Low Productivity

24/7 HDFS clusters that need to be built for peak

use and costly to upgrade

Cost Prohibitive

LACKS AI CAPABILITIES

No out-of-box Hadoop support for ML/AI and separate

environments for data and AI

Slow Innovation

X

Page 6: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Enterprises Need a ModernData Analytics Architecture

CRITICAL REQUIREMENTS

Cost-effective scale and performance in the cloud

Easy to manage and highly reliable for diverse data

Predictive and real-time insights to drive innovation

Page 7: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Enhanced Productivity Lower Cost at Scale New Insights Faster

Building a Modern Cloud Analytics Architecture with Databricks

Data Science Workspace

EASY TO MANAGE MASSIVE SCALE AI-ENABLED INNOVATION

Managed cloud platform that can reliably handle all

types of data

On-demand, elastic autoscale clusters with

optimized Apache Spark

Unified and collaborative notebooks with built-in ML

capabilities

Page 8: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Databricks Unified Data Analytics

High performance query engineDELTA ENGINE

One platform for every use caseStreaming Analytics

BI Data Science

Machine Learning

Data Lake for all your dataStructured, Semi-Structured and Unstructured

Data

Structured transactional layer

Page 9: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Powering Innovation with Modern Data AnalyticsCustomers that migrated from Hadoop

Page 10: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Business value: What did they do with us?

“The Un-carrier strategy is an approach that seeks to listen to the customer, address their pain points, bring innovation to the industry and improve the wireless experience for all.”

Situation

○ Every network interaction (call, website load, text, app) logged in 1,600 node HDP data lake (30PB).

○ 4-5 “large scale” pipelines, with hundreds of downstream pipelines feeding the business

● PCMD (network measurement data), CDR (call records), EDR (DNS (website)), LSR (Location)

○ Process call data to get critical network insights: call-failure reasons and network outages.

○ PCMD – Per Call Measurement Data

● Provides insights on call failures at a granular level

● Best source to determine the outage cause and effect

● Provides rich information about the Sprint customers roaming in T-Mobile network

Page 11: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Solution: Holistic transformation instead of ‘lift & shift’Overview

● Migration and transformation of streaming data analytics from Apache Storm and Hive on Hortonworks to Azure Databricks

● The Data was streaming in at an average of 2M records per second, 375GB per batch, 23 TB per day (uncompressed)

Results

Accelerating key insights e.g. hourly dashboards protecting revenue and customer churn.

78.5xPerformance gain versus

on-prem operation

BEFORE (with Hive on Tez): 47 mins for 15k cores to do the jobAFTER: 35 mins for 256 cores to do the same job KPI computations took 1/4th of the time enabling new hourly dashboards (w/out optimizations e.g. warm pool and others still in process)

40%Reduction in use of 1600 node

on-prem cluster

Page 12: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Supply Chain decisions

Apply ML to 5000+ stores data

Impact• 70% reduction in operational costs • Accelerated Business growth

Demand Forecasting

500K stores, 2TB, 250 pipelines

Impact• 10X more capacity • 2X faster data pipelines

Predict Bakery food spoilage

10+ Large Hadoop clusters

Impact• $100M in fresh food spoilage saved • $900K costs down, Time: 7 hr → 40m

Optimize programming

• Could not process 90 days of data with large Hadoop cluster

Impact• 26% Team productivity increase • More Data, lower costs, low devops

Page 13: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Databricks Drives New BusinessValue at 3 Levels

Databricks Value Framework

The DataPlatform

Business Outcome

More value

Less value

$$$

$$

$

BUSINESS IMPACTING USE CASES

PRODUCTIVITY

INFRASTRUCTURE

Databricks accelerates and expands the realization of value from business-oriented use cases that use net-new capabilities vs. Hadoop

Higher productivity among data scientists & data engineers eliminating manual tasks

Reduced infrastructure spend with the performance of the Databricks runtime

3

1

2

Page 14: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

$12.8M in value delivered with Databricks

Value of Databricks■ Removed Cloudera licensing■ No need to add expensive new hardware for additional capacity■ Avoided data center costs■ Avoided Hadoop administration costs

Cloudera costs vs. Databricks value & investmentUnits: $ Cumulative PV over 3 years

Potential value with Databricks

Cloudera - Cost of inaction

Investment - Databricks, migration & cloud

Net impactIncludes cost of both solutions during migration

$13.8 M

-$18.7 M

-$4.9 M

$12.8 M

Cloudera costs■ Data center, Hadoop administration, new

hardware, licensing

Databricks investment■ Databricks usage & support■ Migration■ Cloud compute

Databricks customer example:Large U.S. Telco, 156 node cluster

Source: Databricks value model

Value of Databricks■ Avoided Cloudera licensing■ No need to add expensive new hardware

for additional capacity■ Avoided data center costs■ Avoided Hadoop administration costs

Page 15: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Work with us for a Tailored Value Casefor Your MigrationTailored Financial Analysis

Tailored business case to be produced by answering 4 core questions:

1. How many nodes in your Hadoop environment?

2. How many people support yourHadoop environment?

3. When is your Cloudera renewal?

4. How do you expect your data needs to grow over time?

Customer example

Page 16: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Proven Migration Strategy:Reduce Risk, Costs

Databricks Expert Team

System IntegratorsTools, ISV Partners

AUTOMATION, TOOLS ANDPROVEN METHODOLOGIES

Cloud Partners

COMPONENTS TO MIGRATE SUCCESSFUL MIGRATION

Data + Metadata

Workloads/ Jobs

Security &governance

Other tools, integrations

Strategy Options: Lift & shift (faster, automatable) Transformation (higher impact)

Page 17: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Automated conversion for most workload types

Data Migration

Metastore Migration

SQL Migration

Security

Scheduled Data pulls

Orchestration

HDFS

Hive Databases / Tables / Views

Impala Databases / Tables/ Views

HDFS

Hive Queries

Spark Queries

Sentry permissions /Ranger policies

HDFS access permissions

Sqoop statements

Oozie Jobs

Azure ADLS Gen 2, AWS S3

Databricks Tables

Databricks Tables

Spark Sql Databricks Notebooks

Spark Sql Databricks NotebooksDatabricks Notebooks

Databricks permissions

AWS IAM, ADLS ACLs

Databricks compatible PySpark code

Airflow DAGs & Databricks Jobs

Page 18: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Typically, customers save 55-66 % in costs and see a reduction of 2-3x in timelines by using Automation tool

Data MigrationAssessment & Design

Manual Migration

Workloads Migration, Validation Cutover Operations

17- 20 Weeks

8 WeeksUsingAutomation

Accelerated Data & Workloads Migration, Validation

Accelerated Assessment &

Design

Cutover Operations

* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered

Page 19: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Our Partner Ecosystem will Accelerate Migrations

ISV Partners and Migration ToolsSecurity

Governance

Consulting & SI Partners

Databricks

MigrationSWAT team +

CS PackagedServices

For Migration

Cloud

Page 20: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Customized Hadoop Migration Success Plan with a Free Expert-led Assessment

1

2

3

Pre-questionnaire + Discovery, education workshops led by experts▪ Learn about how Databricks works and how your current workloads, tools and

processes map and transform in the future state in cloud

Proposal and Recommendations for path forward▪ The expert team will summarize all the findings and walk through the proposed

costs, business value summary and recommended migration plan

Technical, Use-case and Business Value analysis▪ High level current and future state architecture, discuss use-cases and prioritize

them, understand how $$ value is driven with the migration

Page 21: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Databricks Experts Know Hadoop▪ More than 100 years of combined experience in Hadoop

▪ Practitioners, Architects, Engineers, and Consultants, Open Source Contributors and Committers

▪ Expertise with all Hadoop ecosystem components and distributions

IMG IMG IMG IMG IMGIMG IMG

Page 22: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Hadoop migration to Databricks - recap

Why - Costs, Productivity, Innovation → Business Impact

Your competitors and market leaders are doing it NOW

Databricks experts and automation strategy can help you migrate faster, with much lower cost and risk

Page 23: Databricks Global Leader - Industr y Solutions (Migrations ... · Performance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35

Thank youPlease visit databricks.com/migration