View
216
Download
1
Category
Tags:
Preview:
Citation preview
A Software-Defined Networking based Approach for Performance Management of Analytical Queries onDistributed Data Stores
Pengcheng Xiong (NEC Labs America)Hakan Hacigumus (NEC Labs America)Jeffrey F. Naughton (Univ. of Wisconsin)
Agenda Why?
Motivation and background How?
System architecture and implementation So what?
Real system and benchmark query evaluation Conclusion
2
Motivation Data analytics applications or data scientists
query the data from distributed stores. A huge amount of data traffic on the network.
Join Many applications want to share a cluster
Data backup, video streaming, etc Response time is critical
Deadline-driven reports Query service differentiation
Batch queries, interactive queries
3
An example query (TPC-H Q14)
4
Data StoreSite Sl
Data StoreSite Sp
lineitem part
We assume that tables are distributed at relational data stores.
Relational data stores are connected by networking
Network change implies plan perf. change
5Phase 1 Phase 2 Phase 3
(1) Huge gap
(2) The best plan can become the worst one
Network status
changes
What if?
6Phase 1 Phase 2 Phase 3
What if query optimizer can dynamically monitor the network bandwidth and
adaptively choose plan?
Adaptive plan is chosen and query execution time is kept short.
Network busy implies no good plan
7
Run query right now and right away. I need that ASAP to catch my
deadline!
User Distributed DBMS
Well… I am sorry. None of the candidate plans can meet your
deadline due to current busy network status.
What if?
8
Run query right now and right away. I need that ASAP to catch my deadline!
User Distributed DBMS
OK. Although current network is busy, I can control it to prioritize the bandwidth for the
query.
What if query optimizer can control the network?
Sounds like a mission impossible Database always treats the underneath
networking as a black box unable to monitor let alone to control
With software-defined networking inquire about the current status of the network, or control the network with directives
10
Networking Networking
With SDNUnable to
monito
r,
let alone to
contro
l Able to inquire
and control
13
Data Path (Hardware)
Control Path OpenFlow
OpenFlow ControllerOpenFlow Protocol (SSL/TCP)
Dist. Query Optimizer
APIOur contribution
Cost estimation
17
Cost model for network operator Amount of data transferred Real-time transfer speed
(Monitor) Take any bandwidth left
(Control) Assign the highest priority Make a bandwidth reservation
SDN support
Evaluation Setup
TPC-H, scaling factor 100, Q14 Small tables (supplier, nation, region) are
replicated. Other tables are placed at a single data store site Neighbor traffic generator-iperf Summary of case studies
18
Case 1: single user, single-thread, iperf
19Phase 1 Phase 2 Phase 3
Bottleneck
Bottleneck
BottleneckBased on SDN, query optimizer can dynamically monitor the network
bandwidth and adaptively choose the best plan
Case 3: multiple users, multiple-thread,no contention traffic, priority queue
20
Based on SDN, premium queries run faster than regular ones.
Based on SDN, all queries run faster.
Case study 5: single user, multi-thread, iperf, weighted-fair queue
21
Based on SDN, more reservation makes queries run faster.
Conclusion SDN can be effectively exploited for
performance management of analytical queries on distributed data stores Directly monitor the network and adaptively pick
the best plan. Control the priority of network traffic or make
network bandwidth reservations to differentiate the query service.
Lots of opportunities
22
Recommended