Upload
phunghuong
View
224
Download
2
Embed Size (px)
Citation preview
May 23, 2016 | Confidential
Tech Primer: Big Data In the Cloud
Hannah Smalltree, Cazena
Big Data & Cloud Expo
New York, June 2016
Slide #2 | Confidential
Agenda
• Why Manage and Analyze Big
Data in the Cloud?
• Categories – Cloud and
Emerging Data Categories
• Criteria – Picking the Best
Solution for Your Needs
• Use Cases – How Techs Are
Being Used
Slide #3 | Confidential
Agenda
• Why Manage and Analyze Big Data in the Cloud?
• Categories – Cloud and Emerging Data Categories
• Criteria – Picking the Best Solution for Your Needs
• Use Cases – How Techs Are Being Used
Hannah Smalltree Director, Cazena
Former Editorial Director/Reporter, TechTarget
Slide #4 | Confidential
Level Set!
Why we’re keeping
things higher-level
for the next 30
minutes….
Data Platforms Map – June 2015 (C) 2015 By 451 Research LLC. All rights reserved GET THE FULL MAP FREE:
https://451research.com/blog/13-have-you-seen-our-data-platforms-map
Slide #5 | Confidential
Why (or Why Not) Cloud for Big Data?
On-Prem Cloud
Best (or worst!) of both worlds
Hybrid
Existing architecture
Data sources (on-prem)
Existing processes
Security perceptions
Cost
Status quo
Elasticity (volume, compute)
Data sources (cloud)
Automation
Sharing (resources, data)
Cost
New capabilities
Slide #6 | Confidential
Shifting Data Gravity
Slide #7 | Confidential
How Companies Use the Cloud
Offload compute or
storage intensive
workloads
Create flexible sandboxes and
self-serve analytics environments
Improve data access and
performance for employees
and stakeholders
Reduce costs for disaster
recovery, testing/dev and
other functions
$
Collect, Store and Analyze
data generated in the cloud
Share and monetize Data with
Partners/customers
Slide #8 | Confidential
Big Data Services Cross Categories
Software as a Service Apps: Salesforce, Workday, etc.
Data: BI, Analytics, Analytic Applications
Platform as a Service (Middleware) 16 categories of xPaaS offerings: Application,
Database, Integration, Communication, Data…
Infrastructure as a Service Amazon Web Services (AWS), Microsoft
Azure, Google Cloud Platform
Hosted private clouds
Big Data
Services
Slide #9 | Confidential
Cloud Databases
• Transactional: Power
sites, apps, etc.
• Analytical: Data
Warehouses, Data
Lakes, Big Data
Platforms, etc.
• SQL, Hadoop, NoSQL,
in-memory, etc.
• Solutions often include
storage, processing,
integration, visualization
What is a
Data Lake?
Slide #10 | Confidential
As a Service Trend…
• Big Data as a Service
• Data Warehouse as a
Service
• Hadoop as a Service
• Data Lake as a Service
• Spark as a Service
• Managed Services
• Cloud Service
• Database Platform as a
Service
• Data Management as a Service
• Cloud Application Services
Slide #11 | Confidential
Definitions…
• Gartner, Market Guide dbPaaS (June 2015):
A database platform as a service (dbPaaS) is a database
management system (DBMS) or data store engineered as a
scalable, elastic, multitenant service, with a degree of self-
service and sold and supported by a cloud service provider
(CSP), or a third-party software vendor on CSP infrastructure.
• Gartner, Cool Vendors in DBMS (April 2016):
“Enter the concept of ‘big data as a service,’ where vendors
are combining components of analytic platforms in the
cloud with multiple processing engines, hybrid on-premises
integration, and secure data movement. The use of such
services can speed up the adoption of analytics in the cloud,
address skills shortages within the enterprise, and make it easier
to transition from, and integrate with, existing on-premises
investments.”
• Forrester, Big Data Tech Radar (January 2016):
Big-data-as-a-service technology provides capture
management and operations capability delivered as-a-
service in the public or hybrid cloud. Uses generally include
SQL analytics (data warehouse or data mart), data lake,
machine learning, and operational analytics application support.
☑ Data processing
☑ Automated provisioning
☑ Faster implementation
☑ Support, service
☑ Subscription
☑ Maintenance
? Data movement
? Integration
? Security
? Ease of use
Common Attributes
Slide #12 | Confidential
Best Advice: Focus on Requirements!
Best fit for workloads, provisioning
Security, encryption and governance
Integration with existing data flow
Data movement, connectors, etc.
Operations, support, maintenance, etc.
Contracts, pricing model
Futures, growth, lock-in
Slide #13 | Confidential
Sample Cloud Use Cases
Consolidate data
Collect cloud, SaaS,
purchased data
Share and
monetize data
Analytics data
science sandbox
Offload EDW jobs
Disaster recovery
Data pipeline
Log, sensor and
IOT data
Slide #14 | Confidential
What to Consider During
Evaluations….
Slide #15 | Confidential
Evaluation Considerations: Workloads
CRITERIA
Data and Analytic
Workload Data type, volume, velocity, source, format, frequency...
Analytic Requirements Functions, tools, applications, API/dev requirements…
Processing
Engine(s) Price-performance, fit for purpose, maintenance, stability…
Scalability and Growth Likely growth in workload or analytic functions…
Security and Governance Compliance, Encryption, Tenancy, Access, Logs, Mgmt…
Slide #16 | Confidential
Evaluation Considerations: Integration
CRITERIA
Data Collection, Movement
& Pipeline Ingest, structure, storage, frequency, movement…
Data Quality, Prep,
Integration Format, integration, identifiers, MDM, quality…
Existing Infrastructure Systems, processes, standards, integration, firewall…
Access and Delivery User locations, tools, applications, APIs, futures…
Slide #17 | Confidential
Evaluation Considerations: Operational
CRITERIA
Implementation
“Time to Analytics”
Provisioning, project timeline, risk points, infra vs. analytics
Skills Available?, training, learning curve, culture..
Agility Implementation, value, change, fast fail…
Pricing, Budget Models, sourcing, lock-in, contingencies…
Service, Support Level, method, boundaries, components…
Vendor Stability, heritage, culture, agility…
Success Metrics Hard, soft, business, incremental, agility…
Slide #18 | Confidential
Recommended Reading
• Forrester – Big Data Tech Radar, Q1 2016
– Big Data Options in the Cloud, Gualtieri & Staten, Dec 2014
• Gartner – Cool Vendors in DBMS and Big Data, April 2016
– Market Guide for Database Platform as a Service, June 2015
– Answering Big Data's 10 Biggest Planning & Implementation Questions, January 2015
– Toolkit: Big Data Business Opportunities From Over 100 Use Cases, July 2013
• Eckerson Group – Selecting a Big Data Platform: Building a Data Foundation for the Future, Dec 2015
– Big Data Analytics Benchmark Report, May 2015
• Others by request! ([email protected])
Slide #19 | Confidential
Q&A and Thank You!
Hannah Smalltree
Cazena
Big Data as a Service
Cazena makes it easy for
enterprises to process big data in
the cloud, offering data marts, data
warehouses and data lakes as a
service, securely connected into
existing enterprise infrastructure.
Slide #20 | Confidential
Additional Cloud Big Data Use
Cases (appendix for discussion
and sharing)
Slide #21 | Confidential
Data Mart
Data Sources
Cloud Data
Sources
Cloud Data
Sources
Cloud Data
Sources
BI/Analytics Tools
• Consolidate data
from multiple cloud
and on-premises
systems in one
place for analytics
• Ensure data is
easily accessible
Consolidate Data for Agility,
Access
Slide #22 | Confidential
Data Mart of
Data Lake
Enterprise Data
Warehouse
BI/Analytics Tools ETL
• Offload data or
compute-intensive
workloads from
existing data
warehouse to cloud
• Free capacity in on-
premises systems
Data Warehouse Offload
to the Cloud
Slide #23 | Confidential
Data Sharing and Monetization
• Provide separate,
secure environment
for external
users/partners;
enable new analytic
capabilities
• Monetize data by
selling to customers
or creating/
enhancing data
products
Customer Partner Colleague
Data Marts
Enterprise Data
Warehouse
Slide #24 | Confidential
Data Lake or Mart for
External Data
Cloud Data
Sources
SaaS or
Mobile Apps Purchased
Datasets
Data Mart
of Data
Lake
Enterprise Data
Warehouse
BI/Analytics
Tools
• Leverage new data
sources: web,
mobile, social, etc.
• Store, manage and
analyze cloud data
in the cloud, reduce
costs of managing
on-premises
• Or use cloud to
collect and pre-
process data before
bringing back to on-
premises systems
Slide #25 | Confidential
Data Science Sandbox
On-premises
Datasets Analytical Tools
Data Mart or
Data Lake
Cloud Data
Sources New
Datasets
Statistical Tools
(R, R Studio, etc.)
• Self-service
environment for
analysts, data
scientists
• Track utilization and
costs separately
from production
systems
Slide #26 | Confidential
Data Warehouse Disaster
Recovery
Data Mart of
Data Lake
Enterprise Data
Warehouse BI/Analytics
Tools
Enterprise Data
Warehouse
Old way
X
• Build a Disaster
Recovery
environment that
scales as DW
grows
• No need to buy
upfront capacity
• Replaces
expensive
traditional method
of duplicating data
warehouse
environment
Slide #27 | Confidential
Data Mart of
Data Lake
Enterprise Data
Warehouse
BI/Analytics Tools ETL
• Offload data or
compute-intensive
workloads from
existing data
warehouse to cloud
• Free capacity in on-
premises systems
Data Warehouse Offload
to the Cloud