Upload
samson-long
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
COMPUTING ON JETSTREAM:STREAMING ANALYTICS IN THE WIDE-AREA
Matvey Arye
Joint work with: Ari Rabkin, Sid Sen, Mike Freedman and Vivek Pai
THE RISE OF GLOBAL DISTRIBUTED SYSTEMS
Image shows CDN
TRADITIONAL ANALYTICS
Image shows CDN
Centralized
Database
BANDWIDTH IS EXPENSIVE
Series1
CPU(16x)Storage(10x)Bandwidth(2.7x)
Price Trends 2005-2008
[Above the Clouds, Armbrust et. al.]
BANDWIDTH TRENDS
[TeleGeography's Global Bandwidth Research Service]
BANDWIDTH TRENDS
[TeleGeography's Global Bandwidth Research Service]
20% 20%
BANDWIDTH COSTS• Amazon EC2 bandwidth: $0.05 per GB
• Wireless broadband: $2 per GB
• Cell phone broadband (ATT/Verizon): $6 per GB
– (Other providers are similar)
• Satellite Bandwidth $200 - $460 per GB
–May drop to ~$20
THIS APPROACH IS NOT SCALABLE
Image shows CDN
Centralized
Database
THE COMING FUTURE: DISPERSED DATA
Dispersed
Databases
Dispersed
Databases
Dispersed
Databases Disperse
d Database
s
WIDE-AREA COMPUTER SYSTEMS
• Web Services– CDNs– Ad Services– IaaS– Social Media
• Infrastructure– Energy Grid
• Military– Global Network– Drones– UAVs– Surveillance
NEED QUERIES ON A GLOBAL VIEW
• CDNs: – Popularity of websites globally– Tracking security threats
• Military– Threat “chatter” correlation– Big picture view of battlefield
• Energy Grid– Wide-area view of energy production and
expenditure
SOME QUERIES ARE EASY
Alert me when servers crash
Server Crashed
OTHERS ARE HARD
How popular are all of my domains? Urls?
RequestsRequestsRequestsCDN
Requests
RequestsRequestsRequestsCDN
Requests
BEFORE JETSTREAM
Time [two days]
Bandw
idt
h95% Level
Analyst’s remorse: not enough datawasted bandwidth
Buyers’s remorse: system overload or overprovisioning
Needed for backhaul
? ? ? ? ? ?
WHAT HAPPENS DURING OVERLOAD?
Time [one day]
Bandw
idth Needed for backhaulAvailable
Time
Late
ncy
Queue size grows without bound!
THE JETSTREAM VISION
Time [two days]
Bandw
idt
hAvailable
Needed for backhaul
JetStream lets programs adapt to shortages and backfill later.
Need new abstractions for programmers
Used by JetStream
SYSTEM ARCHITECTURE
compute resources (several sites)
Control plane
Data plane
…
Query graph
CoordinatorDaemon
Optimized query
JetS
tream
API
…
…
Planner Library
…
worker node
stream source
AN EXAMPLE QUERY
File Read Operator
Parse Log File
Local Storag
e
File Read Operator
Parse Log File
LocalStorag
e
QueryEvery 10 s
Central
Storage
QueryEvery 10 s
Site A
Site B
Site C
Feedback control
ADAPTIVE DEGRADATION
Local Data
DataflowOperator
s
Summarized or
Approximated
Data
Feedback control to decide when to degradeUser-defined policies for how to degrade
data
NetworkDataflowOperator
s
MONITORING AVAILABLE BANDWIDTH
Data Data Data Time
Marker Data
• Sources insert time markers into the data stream every k seconds
• Network monitor records time it took to process interval – t
=> k/t estimates available capacity
WAYS TO DEGRADE DATA
Can drop low-rank values
Can coarsen a dimension
AN INTERFACE FOR DEGRADATION (I)
• First attempt: policy specified by choosing an operator.
• Operators read the congestion sensor and respond.
Coarsening Operator
Incoming data
SampledData Network
Sending 4x too much
COARSENING REDUCES DATA VOLUMES
01:01:01 foo.com/a 1
01:01:01 foo.com/b 10
01:01:01 foo.com/c 5
01:01:02 foo.com/a 2
01:01:02 foo.com/b 15
01:01:02 foo.com/c 20
01:01:* foo.com/a 3
01:01:* foo.com/b 25
01:01:* foo.com/c 25
BUT NOT ALWAYS
01:01:01 foo.com/a 1
01:01:01 foo.com/b 10
01:01:01 foo.com/c 5
01:01:02 bar.com/a 2
01:01:02 bar.com/b 15
01:01:02 bar.com/c 20
01:01:* foo.com/a 1
01:01:* foo.com/b 10
01:01:* foo.com/c 5
01:01:* bar.com/a 2
01:01:* bar.com/b 15
01:01:* bar.com/c 20
DEPENDS ON LEVEL OF COARSENING
Data from CoralCDN logs
GETTING THE MOST DATA QUALITY FOR THE LEAST BW
IssueSome degradation techniques result in good quality but have unpredictable savings.
SolutionUse multiple techniques– Start off with technique that gives best quality– Supplement with other techniques when BW
scarce=> Keeps latency bounded; minimize analyst’s remorse
ALLOWING COMPOSITE POLICIES
• Chaos if two operators are simultaneously responding to the same sensor
• Operator placement constrained in ways that don’t match degradation policy.
Coarsening Operator
Incoming data Network
Sampling Operator
Sending 4x too much
INTRODUCING A CONTROLLER
• Introduce a controller for each network connection that determines which degradations to apply
• Degradation policies for each controller• Policy no longer constrained by operator topology
Coarsening Operator
Incoming data Network
Sampling Operator
Controller
Drop 75% of data! Sending 4x too much
DEGRADATIONType Mergeabili
ty Errors Predictable
Size Savings
Dimension Coarsening
Yes* Resolution No
Consistent Sampling
Yes Sampling Yes
Local Filtering
No Sampling Yes
Multi-round Filtering
No None No
Aggregate Approx.
Depends Depends Depends
MERGEABILITY IS NONTRIVIAL
• Can’t cleanly unify data at arbitrary degradation• Degradation operators need to have fixed levels
01 - 05
06 - 10
11 - 15
16 - 20
21 - 25Every 5
01 - 10 11 - 20Every 10 21 - 30
01 - 30Every 30??
26 - 30
01 - 06 07 - 12 13 - 18 19 - 24Every 6 25 - 30
??????
INTERFACING WITH THE CONTROLLER
Coarsening Operator
Incoming data Network
Sampling Operator
Controller
Operator
Shrinking data by 50%Possible levels:
[0%, 50%, 75%, 95%, …]
Controller
Go to level 75%
Sending 4x too much
A PLANNER FOR POLICY
Query planners:Query + Data Distribution => Execution Plan
Why not do this for degradation policy?What is the Query?
For us the policy affects the data ingestion=> Effects all subsequent Queries
Planning All Potential Queries + Data Distribution => Policy
EXPERIMENTAL SETUP
80 nodes on VICCI testbed in US and Germany
Princeton
Policy: Drop data if insufficient BW
WITHOUT ADAPTATIONBandwidt
h Shaping
WITH ADAPTATIONBandwidth Shaping
COMPOSITE POLICIES
OPERATING ON DISPERSED DATA
Dispersed
Databases
Dispersed
Databases
Dispersed
Databases Disperse
d Database
s
CUBE DIMENSIONS
01:01:00
foo.c
om
/rfo
o.c
om
/qbar.
com
/nbar.
com
/m
01:01:01
Tim
e
URL
CUBE AGGREGATES
Count Requests
Max Latency
bar.
com
/m
01:01:01
CUBE ROLLUP
01:01:00
foo.c
om
/rfo
o.c
om
/qbar.
com
/nbar.
com
/m
Tim
e
URL
bar.com/
*
foo.com/*
FULL HIERARCHY
(5,90) (3,75) (8,199) (21,40) 01:01:00
foo.c
om
/rfo
o.c
om
/qbar.
com
/nbar.
com
/m
Tim
e
URL
(8,90)(29,19
9)
(37,199)
URL: *Time: 01:01:01
RICH STRUCTURE
(5,90)(3,75)
(8,199)(21,40)
D
A
C
01:01:59
01:01:00
fo
o.
co
m/
r
fo
o.
co
m/
q
ba
r.
co
m/
n
bar.c
om/m
01:01:01
01:01:58
B
…
E
Cell URL Time
A bar.com/* 01:01:01
B * 01:01:01
C foo.com/* 01:01:01
D foo.com/r 01:01:*
E foo.com/* 01:01:*
TWO KINDS OF AGGREGATION
1. Rollups – Across Dimensions2. Inserts – Across Sources
The data cube model constrains the system to use the same aggregate function for both.
Constraint: no queries on tuple arrival order
Makes reasoning easier!
AN EXAMPLE QUERY
File Read Operator
Parse Log File
Local Storag
e
File Read Operator
Parse Log File
LocalStorag
e
QueryEvery 10 s
Central
Storage
QueryEvery 10 s
Site A
Site B
Site C
SUBSCRIBERS
• Extract data from cubes to send downstream
• Control latency vs completeness traeoff
File Read Operator
Parse Log File Local
Storage
QueryEvery 10 sFile Read
OperatorParse
Log File
Site A
SUBSCRIBER API
• Notified of every tuple inserted into cube
• Can slice and rollup cubes
Possible policies:• Wait for all upstream nodes to
contribute• Wait for a timer to go off
FUTURE WORK
• Reliability
• Individual queries– Statistical methods– Multi-round protocols
• Currently working on improving top-k
• Fairness that gives best data quality
Thanks for listening!