41
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Time's Up! Getting Value from Big Data Now

Embed Size (px)

Citation preview

Page 1: Time's Up! Getting Value from Big Data Now

Grab some coffee and enjoy the pre-show banter

before the top of the

hour! !

Page 2: Time's Up! Getting Value from Big Data Now

The Briefing Room

Time's Up! Getting Value from Big Data Now

Page 3: Time's Up! Getting Value from Big Data Now

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Page 4: Time's Up! Getting Value from Big Data Now

u Reveal the essential characteristics of enterprise software, good and bad

u Provide a forum for detailed analysis of today’s innovative technologies

u Give vendors a chance to explain their product to savvy analysts

u Allow audience members to pose serious questions... and get answers!

Mission

Page 5: Time's Up! Getting Value from Big Data Now

Big Integration

u  Old infrastructure lacking

u  New pipes are needed

u  Well begun is half done!

Page 6: Time's Up! Getting Value from Big Data Now

Analyst

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Page 7: Time's Up! Getting Value from Big Data Now

CASK

u  CASK offers a unified integration platform for big data applications and data lakes

u  Its CDAP architecture provides data containers, program containers and application containers for data and applications on Hadoop

u  CASK also offers Hydrator for building and managing data pipelines and data lakes, and Tracker for data lake governance

Page 8: Time's Up! Getting Value from Big Data Now

Guest

Jonathan Gray Jonathan Gray, Founder & CEO of Cask, is an entrepreneur and software engineer with a background in startups, open source and all things data. Prior to founding Cask, Jonathan was a software engineer at Facebook where he drove HBase engineering efforts, including Facebook Messages and several other large-scale projects from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase and is now a core contributor and active committer in the community. Jonathan holds a bachelor’s degree in Electrical and Computer Engineering and Business Administration from Carnegie Mellon University.

Page 9: Time's Up! Getting Value from Big Data Now

Big Data on Tap

cask.co November 1, 2016

The Briefing Room

Jonathan GrayFounder & CEO

Page 10: Time's Up! Getting Value from Big Data Now

cask.co

Hadoop Enables New Applications and Architectures

2

ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS

Batch and Realtime Data Ingestion

Any type of data from anytype of source in any volume

Batch and Streaming ETLCode-free self-service creationand management of pipelines

SQL Exploration andData Science

All data is automaticallyaccessible via SQL and client SDKs

Data as a ServiceEasily expose generic or

custom REST APIs on any data

360o Customer ViewIntegrate data from any source

and expose through queries and APIs

Realtime DashboardsPerform realtime OLAP

aggregations and serve them through REST APIs

Time Series AnalysisStore, process and serve massive

volumes of time-series data

Realtime Log AnalyticsIngestion and processing of high-throughput streaming

log events

Recommendation EnginesBuild models in batch using

historical data and serve them in realtime

Anomaly Detection SystemsProcess streaming events and predictably compare them in

realtime to historical data

NRT Event MonitoringReliably monitor large streams of data and perform defined actions

within a specified time

Internet of ThingsIngestion, storage and processing of events that is highly-available,

scalable and consistent

ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS

Batch and Realtime Data Ingestion

Any type of data from anytype of source in any volume

Batch and Streaming ETLCode-free self-service creationand management of pipelines

SQL Exploration andData Science

All data is automaticallyaccessible via SQL and client SDKs

Data as a ServiceEasily expose generic or

custom REST APIs on any data

360o Customer ViewIntegrate data from any source

and expose through queries and APIs

Realtime DashboardsPerform realtime OLAP

aggregations and serve them through REST APIs

Time Series AnalysisStore, process and serve massive

volumes of time-series data

Realtime Log AnalyticsIngestion and processing of high-throughput streaming

log events

Recommendation EnginesBuild models in batch using

historical data and serve them in realtime

Anomaly Detection SystemsProcess streaming events and predictably compare them in

realtime to historical data

NRT Event MonitoringReliably monitor large streams of data and perform defined actions

within a specified time

Internet of ThingsIngestion, storage and processing of events that is highly-available,

scalable and consistent

ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS

Batch and Realtime Data Ingestion

Any type of data from anytype of source in any volume

Batch and Streaming ETLCode-free self-service creationand management of pipelines

SQL Exploration andData Science

All data is automaticallyaccessible via SQL and client SDKs

Data as a ServiceEasily expose generic or

custom REST APIs on any data

360o Customer ViewIntegrate data from any source

and expose through queries and APIs

Realtime DashboardsPerform realtime OLAP

aggregations and serve them through REST APIs

Time Series AnalysisStore, process and serve massive

volumes of time-series data

Realtime Log AnalyticsIngestion and processing of high-throughput streaming

log events

Recommendation EnginesBuild models in batch using

historical data and serve them in realtime

Anomaly Detection SystemsProcess streaming events and predictably compare them in

realtime to historical data

NRT Event MonitoringReliably monitor large streams of data and perform defined actions

within a specified time

Internet of ThingsIngestion, storage and processing of events that is highly-available,

scalable and consistent

Data Applications Drive Meaningful Business Value

Page 11: Time's Up! Getting Value from Big Data Now

cask.co3

But Getting Value from Big Data is Hard

Too much focus on infrastructure and integration, rather than applications and analytics

Divergence of distributions and technologies

Integration silos created by narrow point solutions

Proliferation of projects, services and APIs

Complexity of technologies and new user learning curve

Page 12: Time's Up! Getting Value from Big Data Now

cask.co4

Without a consistent set of tools, IT will not be an effective data enabler for the business

Developer

Architecture & Programming

Focused on Apps & Solutions

Ops

Configuring & Monitoring

Focused on Infrastructure & SLA’s

LOB / Product

Driving Revenue & Decision Making

Focused on Products & Insights

Data Scientist

Scripting & Machine Learning

Focused on Data & Algorithms

And There Are Many Faces of Hadoop

Page 13: Time's Up! Getting Value from Big Data Now

cask.co5

Enter Cask

AT&T, Cloudera and Ericsson Strategic Investors

3.5 Cask Data Application Platform, Cask Hydrator and Cask Tracker

Latest Release

AT&T, Ericsson, Lotame, Salesforce, Cloudera, Hortonworks, MapR, Microsoft, IBM, Tableau…

Key Customers & Partners

By early Hadoop engineers from Facebook and Yahoo!

Founded in 2011

Andreessen Horowitz, Safeguard, Battery Venture and Ignition Partners

Raised $37+ Million

Featuring Cask Market, the “big data app store”

NEW: CDAP 4 Preview

A Container Architecture that puts Big Data on Tap

Why “Cask” ?

Page 14: Time's Up! Getting Value from Big Data Now

cask.co6

Convergence of Big Data Apps and Data Integration

The Evolution of the Cask Platform

Big Data Apps + Data Integration

• Data ingest

• Data pipelines

• Workflows and metadata

“WebLogic Meets Informatica”

CDAPv3

Big Data App Server

• Abstractions & integrations

• Metrics & logs

• Debugging environment

“WebLogic for Hadoop”

CDAPv2

Unified Integration for Big Data

• Security & governance

• Self-service environment

• Enterprise integrations

“Unified Big Data Integration”

CDAPv4

Page 15: Time's Up! Getting Value from Big Data Now

cask.co

Introducing Cask Data Application Platform (CDAP)

7

First Unified Integration Platform for Big Data

Platform for distributed apps, bringing together application management with data integration

• 100% open source and built for extensibility

• Supports all major Hadoop distributions and clouds

• Integrates the latest open source big data technologies

Data Lake FraudDetection

RecommendationEngine

Sensor DataAnalytics

Customer360

Modern DataIntegration

DistributedApplicationFramework

Self-ServiceUser Experience

Enterprise-gradeSecurity &Governance

Page 16: Time's Up! Getting Value from Big Data Now

cask.co8

• Real-time and Batch • Reliable and Scalable • Simple and Self-Service

Modern Data Integration

EXPLOREfor analytics and

data science

PROCESSfor ETL and

machine learning

SERVEany data to any

destination

INGESTany data from

any source

Page 17: Time's Up! Getting Value from Big Data Now

cask.co9

Distributed Application Framework

DEVELOPrapidly build applications

TESTpowerful test and

CI framework

DEPLOYrun any apps in

any environment

SCALEhorizontally scale

apps and data

• Real-time and Batch • Memory, Local, Distributed • Analytics and Applications

Page 18: Time's Up! Getting Value from Big Data Now

cask.co10

Security and Governance

CAPTUREstore all metadata about your data

DISCOVEReasily locate any

of your data

TRACKevery audit plus lineage graphs

ANALYZEunderstand usage patterns of data

AUTHENTICATE AUTHORIZEENCRYPT

Page 19: Time's Up! Getting Value from Big Data Now

cask.co11

A data discovery tool to explore metadata and usageA code-free framework to build and run data pipelines

Self-Service User Experience

Drag & drop graphical interface

Create, debug,

deploy and manage

Separation of logic and execution

environment

Native to Hadoop & Spark —

scales out

Rich app-level

metadata

Track lineage and

audits

Analyze usage of datasets

MDM integration framework

Page 20: Time's Up! Getting Value from Big Data Now

cask.co

The CDAP Architecture

12

Applications

ProgramsMapReduce SparkTigon Workflow Service Worker

Metadata

DatasetsTable Avro ParquetTimeseries OLAP CubeGeospatial ObjectStore

Metadata

Metadata

• Application Container Architecture

• Reusable Programming Abstractions

• Global User and Machine Metadata

• Highly Extensible Plugin Architecture

Page 21: Time's Up! Getting Value from Big Data Now

cask.co13

Single framework for building and running data apps and data lakes on Hadoop and Spark

Rapid Development

• Standardization, deep integrations, tools and docs

• Separation of app logic from data logic and integration logic

• Conceptual integrity within applications and consistency across environments

Production Operations & Governance

• Simplified packaging, deployment and monitoring of apps on Hadoop

• Enhanced security and governance with centralized metrics and logs

• Tracking and exploration of metadata, data provenance, audit trails and usage analytics

CDAP Enables the Full Big Data Application Lifecycle

reduces time to develop and deploy big data apps by 80% reduces time to insights and accelerates business value removes barriers to innovation and future-proofs your apps

Page 22: Time's Up! Getting Value from Big Data Now

cask.co14

Customer Success Stories

Customer Situation

Lack of existing Hadoop expertise and frustration with hand-coding

and scripting tools

Cask Hydrator for rapid creation of data pipelines and Cask Tracker for

data discovery

POC in 2 daysProduction in 2 months

Cask Solution

Small team and significant technical challenges limit pace of development and solution scale

CDAP for real-time ingestion and consistent processing with

production operations support

Development in 1 monthProduction in 3 months

Hundreds of UsersThousands of Pipelines

Multiple teams and technologies with widely varied skillsets and incompatible design choices

CDAP for data lake management and orchestration, tightly

integrated into existing systems

Health Insurance Provideroffloading clinical / immunization

reporting from Netezza

Leading SaaS Platformtaking new real-time, massive

scale products to market

Large Telco Enterprisebuilding a centralized, secured,

multi-tenant Data Lake

Page 23: Time's Up! Getting Value from Big Data Now

cask.co15

Cask was Named a Gartner Cool Vendor 2016

Cask was Certified a Great Place to Work 2016

“ … for the rest of us who lack the technological chips or patience to make it all work, there’s good news: it will soon get easier, thanks to the work done by the big data pioneers, as well as vendors like Cask …”(Alex Woodie, Managing Editor, Datanami)

Awards and Accolades

“ … “Cask has tilted the playing field, earning a massive unfair advantage over proprietary point products for data integration and ingest …”(Nik Rouda, Senior Analyst, Enterprise Strategy Group)

“ … “CDAP is a big win for us … the amount of code we needed to write was minimal with CDAP, and it was much easier and faster than we ever expected …”(Jia-Long Wu, Data Architect, Lotame)

Page 24: Time's Up! Getting Value from Big Data Now

cask.co16

NEW: CDAP 4 — Big Data Apps on Tap!

Available for download now!Release of CDAP 4 Preview

“Big Data App Store”

Cask MarketInteractive Data Preparation

Cask Wrangler

Interactive Wizards for Common Tasks

Resource CenterRewrite based on React

Reimagined CDAP UI

Page 25: Time's Up! Getting Value from Big Data Now

cask.co17

The “App Store for Big Data”Cask Market

•Goal: Time to value in minutes w/ no existing experience•Application and Library Ecosystem with pre-built Hadoop solutions, reusable templates, and third-party plugins•Available from anywhere inside the CDAP UI with a click

• Initially, everything in the Cask Market has been bootstrapped by Cask based on ongoing work across our customers, is 100% open source and available on GitHub•Eventually, developers and ISVs will be able to showcase and market their own applications and libraries (ex: Graylog)

Cask Market includes Interactive, Guided Wizards for Configuring Pre-Built Templates

NEW: CDAP 4 — Big Data Apps on Tap!

Page 26: Time's Up! Getting Value from Big Data Now

cask.co18

Building Data Pipelines on Hadoop with Cask Hydrator

Data Lake Webinar

Introduction to Cask Hydrator

CDAP - Containers on Hadoop

CDAP Extensions - Cask Hydrator and Cask Tracker

ESG Solution Spotlight

CDAP Technical Concepts (video)

Cask / Cloudera Solution Brief

Cask Resources

Page 27: Time's Up! Getting Value from Big Data Now

cask.co

● CDAP provides the first unified integration platform for big data

● Cask Hydrator and Cask Tracker are visual extensions of CDAP for self-service access

● CDAP empowers enterprise IT to deliver faster time to value for Hadoop and Spark, from prototype to production

● Cask Market is a “big data app store” available in CDAP 4 with pre-built apps, pipelines, plugins

● CDAP is 100% open source, highly extensible, enterprise-ready, and commercially supported

Big Data on Tap

Summary

Page 28: Time's Up! Getting Value from Big Data Now

cask.co20

For more information, go to: cask.co

Thanks!

Page 29: Time's Up! Getting Value from Big Data Now

Perceptions & Questions

Analyst: Robin Bloor

Page 30: Time's Up! Getting Value from Big Data Now

Big Data Foundations?

Robin Bloor, PhD

Page 31: Time's Up! Getting Value from Big Data Now

Neither Hadoop Nor Spark Is a Solution

However, both are useful and increasingly versatile components for

Big Data applications

Page 32: Time's Up! Getting Value from Big Data Now

The Evolution of the Little Elephant

u Hortonworks: Apache pure play. No apparent vision.

u  Cloudera: Some proprietary components (Cloudera Manager, Impala, Cloudera Search). Vision is corporate data hub(?)

u MapR: Also some proprietary components (MapR-FS, MapR Streams, MapR-DB)

u  And then there’s the cloud.

Page 33: Time's Up! Getting Value from Big Data Now

The Ship of Fools

Until Hadoop’s direction is controlled by a single “captain” we may have to

tolerate the ship of fools

Page 34: Time's Up! Getting Value from Big Data Now

The “Big Data Hype Cycle” Is Misleading

u  Big Data is an ecosystem, not a technology – which distorts this graph

u  Some analytics applications have experienced “absurd acceleration”

u Hadoop is, in many instances, a laggard - Spark too

u Nevertheless, we seem to be exiting “the trough”

Page 35: Time's Up! Getting Value from Big Data Now

Data lake or Governance Hub?

Page 36: Time's Up! Getting Value from Big Data Now

The System Management Issue

MobileDevicesDesktopsServers

IoT

TheCloud

Archive

DataStores

DataAssaying

DataCapture

Real-TimeStreaming?

DataMgt

DataServing

The Prospecting Domain

AppsData

Life CycleMgt

StagingArea

(Hadoop?)

SystemManagement

Page 37: Time's Up! Getting Value from Big Data Now

The Fundamental Issue

Big Data does not really have a foundation. Neither, imho, does the

Data Lake.

Luckily, there are third parties…

Page 38: Time's Up! Getting Value from Big Data Now

u  Regarding Hadoop, do you have any “preferred components?”

u  How do you stay current with the various distros? Backward compatibility? Can a customer upgrade at will?

u  How does your technology impact performance (if at all)?

u  Do you provide a consultancy service?

Page 39: Time's Up! Getting Value from Big Data Now

u  Which companies/services do you regard as competitive?

u  Do you have any specific partners?

u  What does an implementation look like?

Page 40: Time's Up! Getting Value from Big Data Now
Page 41: Time's Up! Getting Value from Big Data Now

THANK YOU for your

ATTENTION!

Some images provided courtesy of Wikimedia Commons