Upload
spark-summit
View
204
Download
0
Embed Size (px)
Citation preview
© 2016 Mesosphere, Inc. All Rights Reserved. 1
@joerg_schad @dcos #smack
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search
Spark Summit EastFebruary 08, 2017
© 2016 Mesosphere, Inc. All Rights Reserved. 3
HYPERSCALE MEANS VOLUME AND VELOCITY
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics
Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations
© 2016 Mesosphere, Inc. All Rights Reserved. 4
SMACK stack
EVENTSUbiquitous data streams from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable databaseReal-time and batch
process dataVisualize data and build data driven applications
DC/OS
Sensors
Devices
Clients
© 2016 Mesosphere, Inc. All Rights Reserved. 5
NAIVE APPROACH
Typical Datacentersiloed, over-provisioned servers,
low utilization
Industry Average12-15% utilization
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
© 2016 Mesosphere, Inc. All Rights Reserved. 7
MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS
Typical Datacentersiloed, over-provisioned servers,
low utilization
Mesos/ DC/OSautomated schedulers, workload multiplexing onto the
same machines
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
© 2016 Mesosphere, Inc. All Rights Reserved. 8
DC/OS ENABLES MODERN DISTRIBUTED APPS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions & Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Distributed systems kernel to abstract resources
Ecosystem of frameworks & apps
Consistent architecture to run on top of kernel
User Interface (GUI & CLI)
Core system services (e.g., distributed init, cron, service discovery, package mgt & installer, storage)
Any Infrastructure (Physical, Virtual, Cloud)
© 2016 Mesosphere, Inc. All Rights Reserved. 13
THANK YOU!
ANY QUESTIONS?
@dcos
/groups/8295652
/dcos/dcos/examples/dcos/demos
chat.dcos.io
© 2016 Mesosphere, Inc. All Rights Reserved. 15
SERVICE OPERATIONS
● Configuration Updates (ex: Scaling, re-configuration)● Binary Upgrades● Cluster Maintenance (ex: Backup, Restore, Restart)● Monitor progress of operations● Debug any runtime blockages
© 2016 Mesosphere, Inc. All Rights Reserved. 16
Typical Use: distributed, large-scale data processing; micro-batching
Why Spark Streaming?● Micro-batching creates very low
latency, which can be faster● Well defined role means it fits in well
with other pieces of the pipeline
APACHE SPARK (STREAMING)