Upload
bertha-shelton
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
mPlane – an Intelligent Measurement Plane for Future Network and Application Management
Grant Agreement n. 318627
WP3 overview
Marco Milanesio - EURECOM
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
Outline
Role of the Repository Main achievements Highlights
DBStream HFSP
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
WP3: Role of the Repository
Receive and store data from the probes Large amount of data Data diversity
Provide aggregated and pre-processed data to the reasoners Specific for each use case Generic data processing
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
WP3: Role of the Repository
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
WP3: Role of the Repository
Store probe data in the repository Interactions with WP2 Distributed file-system and/or database layer
Pre-process data Interactions with WP4 Batch (MapReduce), near real-time (DBStream) and real-time
(Blockmon)
Serve data and pre-processing results to the Reasoner Interactions with WP4 Database layer + efficient storage and indexing
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
Aggregate
WP3: Main objectives
Scalable algorithm design Use-case driven (top-down approach) Workload specification: data, I/O
Scheduling analytic jobs Resource allocation related to concurrent analytic jobs Target systems
Batch processing engine(s) Stream processing engine(s)
Access to analytic and external data Database layer tailored to probe data Access control
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
WP3: Main Achievements
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
Deliverables D.3.1: Basic Network data analysis D.3.2: Database layer design D.3.3: Algorithm and Scheduler design and implementation D.3.4: Final Implementation of the data processing and
storage layer
Publications 37 scientific articles
www.ict-mplane.eu/public/software
WP3: Software and tools
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
Query Engines Blockmon controller – stream processing DBStream – flexible data stream warehouse EZRepo – measurement data preprocessing for RCA MATH – export bulk data to DBStream mPlane interfaces for Tstat – RI-based to import RDDs MongoDB proxy repoSim – NS2-based simulator for fine-tuning
Schedulers HFSP – Hadoop Fair Sojourn Protocol Schedule – Cache-oblivious scheduling
WP3: Highlights
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
Query Engines Blockmon controller – stream processing DBStream – flexible data stream warehouse EZRepo – measurement data preprocessing for RCA MATH – export bulk data to DBStream mPlane interfaces for Tstat – RI-based to import RDDs MongoDB proxy repoSim – NS2-based simulator for fine-tuning
Schedulers HFSP – Hadoop Fair Sojourn Protocol Schedule – Cache-oblivious scheduling
DBStream in a Nutshell
Store and analyze large amounts of network monitoring data
DBStream is a flexible and easy to use Data Stream Warehouse (DSW)
Implemented as a middleware layer on top of PostgreSQL
Receive, store and process multiple data streams in parallel
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
DBStream
Continuous analytics system process and combine data from multiple sources
as they are produced create aggregations store query results for further processing by
external analysis modules Target: continuous network monitoring
but not limited to this context (smart grid, intelligent transportation systems, …)
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
DBStream: Architecture
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
General Database Approach
Database QueriesInsert
Analysis
DBStream approach
Queries
ImportModule
View GenerationFramework
Views
PostgreSQL
Analysis
Query Workload – analysis jobsWe consider 7 different analysis tasks (jobs) normally performed in traffic monitoring (the graph shows job-dependencies):
J1: RTT stats per Orgname J2: Akamai stats J3: Top 10 Orgname J4: Top 10 /24 subnets J5: Up/download per source IP J6: IPs active in the last hour
Updated every minute J7: Avg. up/download last hour
Updated every minute
Performance comparison w/SPARK
SPARK performance details
J6 is a “rolling query” continuously updatedmPlane Final Workshop
Heidelberg, Nov 30th, 2015
Conclusions
1 node DBStream up to 2.6 faster than 10 Spark nodes for specific analysis jobs
Result Projections 446 minutes for 4 VP 12 VP in one day Each VP is 5 days
DBStream can process a equivalent of 60 VP or 1 VP with 60 GBit/s
HW can be updated, more disks, SSDs? Running on top of parallel databases (e.g.,
Greenplum)
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
HFSP in a Nutshell
Objectives Size-based scheduler that achieves
both fairness and small response times Scalable design: no scheduling bottlenecks Small number of knobs to tune
Challenges Job size estimation while making progress Virtual time for a multi-processor setting Lack of efficient preemption primitives
HFSP: size estimation
Offline Use history as a training set Initial guess, to bootstrap the system
Online Task runtime measurements (source of errors) Reserve training slots from resource pool No updates after training
HFSP: virtual time
Job Aging: Progress in virtual time even if job is not scheduled in the
real time
Virtual time: Max-min processor sharing (GPS version as well) Takes into account failures
Job size Estimated, aggregate sequential task runtime Map and reduce phases treated separately
HFSP: scheduling
When a job is submitted If tiny: assign null size fast track to execution Else: initial guess schedule training phase
When a resource becomes available If training phase is not empty, schedule job with
smallest initial guess Else assign a task from the job with the smallest
virtual size
HFSP: preemption
HFSP is a preemptive scheduler Jobs with small sizes can “steal” resources
Task preemption in Hadoop-like systems KILL tasks waste work Wait tasks sub-optimal allocation
Our contribution: OS-assisted preemption Introduce new primitives/state to the scheduler Uses low-level UNIX SIG-* No trashing!
Experimental evaluation
Setup 20-nodes cluster (in our own private cloud) 100-nodes AWS cluster (not shown here, available on our
papers) Workload: PigMix, 100 jobs
HFSP: results for “responsiveness”
Overall benchmark: MST is ~30% smaller with HFSP than with FAIR
Tiny jobs receive the same treatment with HFSP and FAIR
Medium, large and huge jobs are consistently treated better with HFSP because they are scheduled in “sequence”
HFSP: results for “fairness”
DEV workload TEST workload PROD workload
HFSP dominates FAIR sharing
The “heavier” the workload, the better HFSP is compared to FAIR
For the PROD workload, the median gap is one order of magnitude in favor of HFSP
Conclusion
“Long-live” to size-based scheduling Previous “theoretical” results too negative Accurate size information is not required
The devil is in the details Under-specification can cause severe problems Practical implementation is tricky on multi-
processors
Thank you!
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
BACKUP
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
Size-based schedulers example
mPlane Final WorkshopHeidelberg, Nov 30th, 2015
Impact of Errors
MST = mean sojourn timeMST(PS) = MST achieved by processor sharing
Major problems with heavy-tailed job size distributions
SRPT with errors
FSP with errors
What’s the problem?
Only one job is affected
All jobs are affected
Continuous Execution Language (CEL)
CEL Job: Multiple input time
windows SQL query Data schema Single output
Incremental Queries Use past output as input Rolling set analysis
SQL
…
Output
Inputs
Job