Upload
qubole
View
206
Download
1
Embed Size (px)
Citation preview
The Cloud Data Platform for Insights-Driven Enterprises
Today’s Speakers
Craig Carl Xing QuanDirector of Solutions Architecture Senior Director of Product Management
Big Data Disrupts Markets
What do they have in Common?
Design products that fit customers according to their DNA
Program recommendations and commissioning new
content
Accurate estimated time of arrival
Price suggestions for hosts
New stores in very close proximity
Search for similar images
Challenges Implementing Big Data
• Variety (40%) and Volume (14%) are the main drivers for big data explosion– Many disjointed sources
• Data silos only provide partial answers
• Deploying big data on-premises:– Is complex to maintain and operate– Is expensive– Requires expertise– Unable to scale
Collect multiple data sources
Make them usable
Make it available to the business
Big Data
Why Spark?
Spark Streamingreal-time
Spark SQLStructured ad-hoc
MLlibMachine Learning
GraphXGraph Processing
Spark CoreScala, Python
• Spark does processing in memory, which is faster than traditional HDDs
• It has a fully-featured ecosystem of products and use cases; in particular, it is tailored toward a Data Scientist and algorithm/machine learning development
• It has a very simple API
• It’s open source and helps you avoid vendor and technology lock-in
Hadoop and Spark Model & Issues
• Hadoop/Spark puts compute and storage together within a compute node
• Forces compute and storage to scale together, which is not ideal
• The cluster must be persistently on or else the data is inaccessible
C+S
C+S
C+S
C+S
C+S
C+S
C+S
C+S
C+S
C+S C+S C+S
A Modern Data Platform
• Leverage the cloud– On-demand and elastic compute– Scale out object storage
• Expand and contract based on workloads
• Turnkey service, rather than a managed software or hardware– Increase time to value
• High degree of automation, orchestration and self-service enablement– Reduce costs and complexities
Big Data
Ephemeral
Automation
Self-service
Orchestration
8
Oracle Bare Metal Cloud Services
Craig CarlDirector of Solutions Architecture, Bare Metal Cloud
• Over 600 people in Seattle and Northern California
• Hundreds of experts at delivering high-scale production cloud products
– AWS, Azure, Google, Joyent, F5, Salesforce
• To a one we’re passionate about solving large scale distributed compute problems, passionate people build amazing product
• Combined with Oracle’s decades of success in the enterprise market
9
Deep cloud engineering experience
Oracle Bare Metal Cloud Services
10
Industry’s first Bare Metal Cloud Service (with Virtual Machines, of course!)
Fully Dedicated
Industry’s first fully dedicated instances –no hypervisor, agents,
noisy neighbors or shared resources
Built for Enterprise Apps
Built to support demanding enterprise
applications
Performance-First
Performance-first approach with
significantly higher performance than
existing cloud options
Pay-as-you-go Pricing
Pay by the hour for everything: compute, IP address and block storage – burst up or
down quickly
Automated and API Driven
RESTful APIs, SDKs, orchestration, CLIs,
complete and public documentation
Fast Provisioning
Spin-up bare metal instances in less than 5
minutes, virtual instances in 90
seconds
Mix Bare Metal and virtual instances
Identical user experience between
Bare Metal and Virtual instances
11
OBMCS Fundamentals: Availability Domains Regional Model
Sub-millisecond latency between ADs
10Gb/sec between each instance, inter and intra AD
12
• Multiple instance types
– Standard – 256 GB RAM
– High I/O – 12.8 TB NVMe SSD, 512 GB RAM
– Dense I/O – 28.8 TB NVMe SSD, 512 GB RAM
– 1, 2, 4, 8, 16 core VMs (7GB mem/core)
• Bare Metal instance shapes
– 36 cores 2.3 GHz Intel® Xeon® processor E5-2600 v3
– 10Gb network
• Images
– Oracle Linux, CentOS, Ubuntu, Windows
– Support for custom images and custom OSes
Compute
13
• Single node Oracle database
– High and Dense instances
• 2 node Oracle RAC
• Exadata
– Quarter
– Half
– Full rack
DB Systems
14
Services Oracle BMCS vs AWS
High Performance Compute (DenseIO compared to AWS I2.8xlarge)
8 core Virtual Machine(Compare to AWS M4.2xlarge)
Outboard Data Transfer $
86%Lower
$
38%Lower
2.25 xCores
$
21%Lower2 x
RAM11.5 xIOPS
4.5 xStorage
SimilarRAM
SameCores
1 Pricing dimension
vs. 4
Free inter-AD
10 x Free Egress
Bare Metalcompute
10Gb network
No oversubscriptionLow latency network
NVMe SSDs
No noisy neighbors
Object store Oracle RDMS
Simple• A complete data platform solution• No need to manage infrastructure• Self-service data access across the enterprise
Agile and Fast• Spark and Hadoop clusters in minutes• Builds on Oracle Bare Metal Cloud
performance advantages • Get business insights faster
Cost• Stand up your Spark or Hadoop infrastructure
at a fraction of the cost• Reduce operation and management cost
Qubole is a Turnkey
Big Data Service on
Oracle Bare Metal Cloud
Built for Anyone who Uses Data
Analysts l Data Scientists l Data Engineers l Data Admins
Big DataYour Way.
Qubole automates, controls and orchestrates your big data workloads so that you can optimize performance, cost and scale.
A Single Platform for Any Use Case
ETL & Reporting l Ad Hoc Queries l Machine Learning l Streaming l Vertical Apps
Open Source Engines, Optimized for the Cloud
Native Integration with Oracle Bare Metal Cloud Service
Leverages the Oracle Cloud Platform’s speed and performance
Spin up real-time streaming data processing on-demand
115% Fasterthan on-premises
QUBOLE DATA SERVICE (QDS) SPARK SQL ON ORACLE CLOUD PLATFORM INFRASTRUCTURE
• 115% faster on reporting queries and 50% faster on analytics queries than Cloudera Impala on-premises*
What makes us different
19Qubole Confidential
User Productivity
• Self-service data access
• Simple Interfaces
• Increased Personas on Oracle BMC
Amplify the Cloud
• Object Store as data lake
• Leverage Network Performance
• Support for all shapes
Automation
• Automatic use of Oracle BMC APIs
• Cluster lifecycle management
• Auto-scaling
• Software Upgrades
Elasticity
• Scale 34x on average
• Reduce TCO by 33%
• Drives scale to Oracle BMC
The Most Scalable Platform
500 PB
Data Processed in the Cloud Monthly
500 Nodes
Largest Spark Cluster in the Cloud
2000
Clusters Started per month
6 PB 80 PB 150 PB 500 PB
Data Driven Companies Use Qubole
Maximize productivity and reduce complexity with automated lifecycle cluster management
Control costs – pay only for what you use with Auto-scaling
Control mixed workloads, multiple clusters and different engines with a single control panel or REST API
Data Engineers and Data Admins
Faster exploration & iteration with an agile infrastructure
Built to adopt existing, new & future technologies – no vendor lock-in
Improve productivity with a collaborative platform
Data Analysts and Data Scientists
Qubole auto-scaling advantage12.5
10.0
7.5
5.0
Ten Node Cluster (fixed)
Five Node Cluster (fixed)
7 8 9 10 11 12 13 14 15 16 17 10% cheaper, but 90% slower
Commands per Hour Auto-scale –Nodes per Hour
Workload fluctuation 60% of the time
13% faster, but 32% more expensive
Dataflow Diagram
User Access
Qubole UI via Browser
SDK
ODBC/JDBC
Qubole SaaS Tier
Web Servers andControl Logic
DatabaseAccount and User Settings
Default Hive Metastore
Customer’s Bare Metal Cloud Tenancy
RESTAPI
Oracle Bare Metal Compute
Ephemeral Clusters
Oracle Cloud
Platform Object
Store
Oracle Cloud VCNCompartment
OracleUser
DB DB
Oracle Bare Metal Compute
Oracle Bare Metal Compute
Oracle Bare Metal Compute
Oracle Bare Metal ComputePersistent Storage
Thank You
Get Free TrialGET BOOK REGISTER FOR A WEBINARREGISTER FOR CONFERENCE
http://bit.ly/DataOpsBook https://www.dataplatforms.com/ https://www.qubole.com/event/