27
May 23, 2016 | Confidential Tech Primer: Big Data In the Cloud Hannah Smalltree, Cazena Big Data & Cloud Expo New York, June 2016

Tech Primer: Big Data In the Cloud

Embed Size (px)

Citation preview

Page 1: Tech Primer: Big Data In the Cloud

May 23, 2016 | Confidential

Tech Primer: Big Data In the Cloud

Hannah Smalltree, Cazena

Big Data & Cloud Expo

New York, June 2016

Page 2: Tech Primer: Big Data In the Cloud

Slide #2 | Confidential

Agenda

• Why Manage and Analyze Big

Data in the Cloud?

• Categories – Cloud and

Emerging Data Categories

• Criteria – Picking the Best

Solution for Your Needs

• Use Cases – How Techs Are

Being Used

Page 3: Tech Primer: Big Data In the Cloud

Slide #3 | Confidential

Agenda

• Why Manage and Analyze Big Data in the Cloud?

• Categories – Cloud and Emerging Data Categories

• Criteria – Picking the Best Solution for Your Needs

• Use Cases – How Techs Are Being Used

Hannah Smalltree Director, Cazena

Former Editorial Director/Reporter, TechTarget

[email protected]

Page 5: Tech Primer: Big Data In the Cloud

Slide #5 | Confidential

Why (or Why Not) Cloud for Big Data?

On-Prem Cloud

Best (or worst!) of both worlds

Hybrid

Existing architecture

Data sources (on-prem)

Existing processes

Security perceptions

Cost

Status quo

Elasticity (volume, compute)

Data sources (cloud)

Automation

Sharing (resources, data)

Cost

New capabilities

Page 6: Tech Primer: Big Data In the Cloud

Slide #6 | Confidential

Shifting Data Gravity

Page 7: Tech Primer: Big Data In the Cloud

Slide #7 | Confidential

How Companies Use the Cloud

Offload compute or

storage intensive

workloads

Create flexible sandboxes and

self-serve analytics environments

Improve data access and

performance for employees

and stakeholders

Reduce costs for disaster

recovery, testing/dev and

other functions

$

Collect, Store and Analyze

data generated in the cloud

Share and monetize Data with

Partners/customers

Page 8: Tech Primer: Big Data In the Cloud

Slide #8 | Confidential

Big Data Services Cross Categories

Software as a Service Apps: Salesforce, Workday, etc.

Data: BI, Analytics, Analytic Applications

Platform as a Service (Middleware) 16 categories of xPaaS offerings: Application,

Database, Integration, Communication, Data…

Infrastructure as a Service Amazon Web Services (AWS), Microsoft

Azure, Google Cloud Platform

Hosted private clouds

Big Data

Services

Page 9: Tech Primer: Big Data In the Cloud

Slide #9 | Confidential

Cloud Databases

• Transactional: Power

sites, apps, etc.

• Analytical: Data

Warehouses, Data

Lakes, Big Data

Platforms, etc.

• SQL, Hadoop, NoSQL,

in-memory, etc.

• Solutions often include

storage, processing,

integration, visualization

What is a

Data Lake?

Page 10: Tech Primer: Big Data In the Cloud

Slide #10 | Confidential

As a Service Trend…

• Big Data as a Service

• Data Warehouse as a

Service

• Hadoop as a Service

• Data Lake as a Service

• Spark as a Service

• Managed Services

• Cloud Service

• Database Platform as a

Service

• Data Management as a Service

• Cloud Application Services

Page 11: Tech Primer: Big Data In the Cloud

Slide #11 | Confidential

Definitions…

• Gartner, Market Guide dbPaaS (June 2015):

A database platform as a service (dbPaaS) is a database

management system (DBMS) or data store engineered as a

scalable, elastic, multitenant service, with a degree of self-

service and sold and supported by a cloud service provider

(CSP), or a third-party software vendor on CSP infrastructure.

• Gartner, Cool Vendors in DBMS (April 2016):

“Enter the concept of ‘big data as a service,’ where vendors

are combining components of analytic platforms in the

cloud with multiple processing engines, hybrid on-premises

integration, and secure data movement. The use of such

services can speed up the adoption of analytics in the cloud,

address skills shortages within the enterprise, and make it easier

to transition from, and integrate with, existing on-premises

investments.”

• Forrester, Big Data Tech Radar (January 2016):

Big-data-as-a-service technology provides capture

management and operations capability delivered as-a-

service in the public or hybrid cloud. Uses generally include

SQL analytics (data warehouse or data mart), data lake,

machine learning, and operational analytics application support.

☑ Data processing

☑ Automated provisioning

☑ Faster implementation

☑ Support, service

☑ Subscription

☑ Maintenance

? Data movement

? Integration

? Security

? Ease of use

Common Attributes

Page 12: Tech Primer: Big Data In the Cloud

Slide #12 | Confidential

Best Advice: Focus on Requirements!

Best fit for workloads, provisioning

Security, encryption and governance

Integration with existing data flow

Data movement, connectors, etc.

Operations, support, maintenance, etc.

Contracts, pricing model

Futures, growth, lock-in

Page 13: Tech Primer: Big Data In the Cloud

Slide #13 | Confidential

Sample Cloud Use Cases

Consolidate data

Collect cloud, SaaS,

purchased data

Share and

monetize data

Analytics data

science sandbox

Offload EDW jobs

Disaster recovery

Data pipeline

Log, sensor and

IOT data

Page 14: Tech Primer: Big Data In the Cloud

Slide #14 | Confidential

What to Consider During

Evaluations….

Page 15: Tech Primer: Big Data In the Cloud

Slide #15 | Confidential

Evaluation Considerations: Workloads

CRITERIA

Data and Analytic

Workload Data type, volume, velocity, source, format, frequency...

Analytic Requirements Functions, tools, applications, API/dev requirements…

Processing

Engine(s) Price-performance, fit for purpose, maintenance, stability…

Scalability and Growth Likely growth in workload or analytic functions…

Security and Governance Compliance, Encryption, Tenancy, Access, Logs, Mgmt…

Page 16: Tech Primer: Big Data In the Cloud

Slide #16 | Confidential

Evaluation Considerations: Integration

CRITERIA

Data Collection, Movement

& Pipeline Ingest, structure, storage, frequency, movement…

Data Quality, Prep,

Integration Format, integration, identifiers, MDM, quality…

Existing Infrastructure Systems, processes, standards, integration, firewall…

Access and Delivery User locations, tools, applications, APIs, futures…

Page 17: Tech Primer: Big Data In the Cloud

Slide #17 | Confidential

Evaluation Considerations: Operational

CRITERIA

Implementation

“Time to Analytics”

Provisioning, project timeline, risk points, infra vs. analytics

Skills Available?, training, learning curve, culture..

Agility Implementation, value, change, fast fail…

Pricing, Budget Models, sourcing, lock-in, contingencies…

Service, Support Level, method, boundaries, components…

Vendor Stability, heritage, culture, agility…

Success Metrics Hard, soft, business, incremental, agility…

Page 18: Tech Primer: Big Data In the Cloud

Slide #18 | Confidential

Recommended Reading

• Forrester – Big Data Tech Radar, Q1 2016

– Big Data Options in the Cloud, Gualtieri & Staten, Dec 2014

• Gartner – Cool Vendors in DBMS and Big Data, April 2016

– Market Guide for Database Platform as a Service, June 2015

– Answering Big Data's 10 Biggest Planning & Implementation Questions, January 2015

– Toolkit: Big Data Business Opportunities From Over 100 Use Cases, July 2013

• Eckerson Group – Selecting a Big Data Platform: Building a Data Foundation for the Future, Dec 2015

– Big Data Analytics Benchmark Report, May 2015

• Others by request! ([email protected])

Page 19: Tech Primer: Big Data In the Cloud

Slide #19 | Confidential

Q&A and Thank You!

Hannah Smalltree

[email protected]

Cazena

Big Data as a Service

Cazena makes it easy for

enterprises to process big data in

the cloud, offering data marts, data

warehouses and data lakes as a

service, securely connected into

existing enterprise infrastructure.

Page 20: Tech Primer: Big Data In the Cloud

Slide #20 | Confidential

Additional Cloud Big Data Use

Cases (appendix for discussion

and sharing)

Page 21: Tech Primer: Big Data In the Cloud

Slide #21 | Confidential

Data Mart

Data Sources

Cloud Data

Sources

Cloud Data

Sources

Cloud Data

Sources

BI/Analytics Tools

• Consolidate data

from multiple cloud

and on-premises

systems in one

place for analytics

• Ensure data is

easily accessible

Consolidate Data for Agility,

Access

Page 22: Tech Primer: Big Data In the Cloud

Slide #22 | Confidential

Data Mart of

Data Lake

Enterprise Data

Warehouse

BI/Analytics Tools ETL

• Offload data or

compute-intensive

workloads from

existing data

warehouse to cloud

• Free capacity in on-

premises systems

Data Warehouse Offload

to the Cloud

Page 23: Tech Primer: Big Data In the Cloud

Slide #23 | Confidential

Data Sharing and Monetization

• Provide separate,

secure environment

for external

users/partners;

enable new analytic

capabilities

• Monetize data by

selling to customers

or creating/

enhancing data

products

Customer Partner Colleague

Data Marts

Enterprise Data

Warehouse

Page 24: Tech Primer: Big Data In the Cloud

Slide #24 | Confidential

Data Lake or Mart for

External Data

Cloud Data

Sources

SaaS or

Mobile Apps Purchased

Datasets

Data Mart

of Data

Lake

Enterprise Data

Warehouse

BI/Analytics

Tools

• Leverage new data

sources: web,

mobile, social, etc.

• Store, manage and

analyze cloud data

in the cloud, reduce

costs of managing

on-premises

• Or use cloud to

collect and pre-

process data before

bringing back to on-

premises systems

Page 25: Tech Primer: Big Data In the Cloud

Slide #25 | Confidential

Data Science Sandbox

On-premises

Datasets Analytical Tools

Data Mart or

Data Lake

Cloud Data

Sources New

Datasets

Statistical Tools

(R, R Studio, etc.)

• Self-service

environment for

analysts, data

scientists

• Track utilization and

costs separately

from production

systems

Page 26: Tech Primer: Big Data In the Cloud

Slide #26 | Confidential

Data Warehouse Disaster

Recovery

Data Mart of

Data Lake

Enterprise Data

Warehouse BI/Analytics

Tools

Enterprise Data

Warehouse

Old way

X

• Build a Disaster

Recovery

environment that

scales as DW

grows

• No need to buy

upfront capacity

• Replaces

expensive

traditional method

of duplicating data

warehouse

environment

Page 27: Tech Primer: Big Data In the Cloud

Slide #27 | Confidential

Data Mart of

Data Lake

Enterprise Data

Warehouse

BI/Analytics Tools ETL

• Offload data or

compute-intensive

workloads from

existing data

warehouse to cloud

• Free capacity in on-

premises systems

Data Warehouse Offload

to the Cloud