Highly Scalable Trip Grouping for Large–Scale Collective Transportation Systems Győző Gidófalvi, Geomatic ApS Torbern Bach Pedersen, Aalborg University

Highly Scalable Trip Groupingfor Large–Scale Collective

Transportation Systems

Győző Gidófalvi, Geomatic ApSTorbern Bach Pedersen, Aalborg University

Tore Risch and Erik Zeitler, Uppsala University

March 27th, 2008 Gidófalvi, Pedersen, Risch, Zeitler 2

Outline

Introduction A Streaming trip grouping algorithm Preliminary experiments

• Serial implementation• Parallel implementation using naïve partitioning

Parallel Implementation Using Space Partitioning

Conclusions Related work Ongoing work


Introduction

Transportation is a major problem in large cities• Congestion, parking, pollution, etc…• Cab-sharing solves some of these problems.• The trip grouping algorithm [Gidófalvi, Pedersen

2007] groups “nearby” cab requests into shared cabs• However, the trip grouping algorithm cannot keep up

with large-scale request streams. Using SCSQ, we have implemented the

following:• A streamed trip grouping algorithm.• A partitioning of the input stream, enabling scalability

for high-volume request streams.• Parallel execution of the streamed trip grouping

algorithm.


A Streaming Trip Grouping Algorithm, tg()

Given a stream of requests R = <orig, dest>find cab-shares such that• Each cab-share has at most K requests.• Total transportation cost of serving the requests

is minimized ( savings are maximized).

Complexity of tg() is O(n3)

input streampendingrequests

forced to bescheduled

output stream

r

r

origin

x y

dest

x y

rid

r r r rr

tg(stream requests, integer k, integer max_retention)


Preliminary performance investigation and analysis

Implementation of the tg() algorithm• Stored procedures in SCSQ – Super Computer

Stream Query processor• Called usingtg(stream(“requests.dat”), 4, 600);

Data set• Full load = 251’000 requests of 3 km length• Generated using ST-ACTS [Gidófalvi, Pedersen

2006]• Realistic request load for 12 hours in a city of

Copenhagen’s size Hardware

• Intel® Pentium®4, 2.8GHz PC


Performance

LoadExec time

(s) Savings

1/16 29 33%

1/8 120 38%

1/4 703 45%

1/2 6343 49%

1 69772 53%

69772 s!19 h to schedule 12 h worth of requests Not able to keep up

Savings = decrease in total transportation cost


Divide-and-conquer

The tg() algorithm was unable to keep up with the high-volume stream

Partition the work between several nodes• Less work per node

Better chance to keep up with request stream

• Tradeoff: Might miss grouping opportunities

• A ”smart” partitioning will miss less chances for grouping.


Parallelize the stream processing

Partition the data stream between several nodes• using the Partition-Compute Combine parallelization

scheme [Ivanova, Risch 2005]

The Partition node splits the input stream S1 into sub streams.

Each Compute node executes one instance of the tg() algorithm on one of the sub streams.

The Combine node merges the result from each Compute node and outputs the result stream S2.


SCSQ Query Language

SCSQL: Stream Processes (SPs) as 1st class objects• Each SP is a handle to a process,

which is executing a query• SPs on sub-queries enable

parallelization in the query language

streamof (bag) stream

sp (stream) sp

extract (sp) bag

spv (vector of stream) vector of sp

merge (vector of sp) bag of vector

bag

streamof()

sp()

extract()

stream process

stream


Postfilter

For each output tuple from an SP, the postfilter is called once per subscriber.

The postfilter function is any function in SCSQL.

Postfilter functions were used in the partition node in the experiments


select merge(compute)

from vector of sp compute, sp partition, integer n

where compute = spv(

select streamof(tg(extract(partition)))

from integer i where i = iota(1, n))

and

partition =

sp(stream(’requests.dat’), n, ’partition_fn’)

and n = 16;

Implement PCC in SCSQ

merge

compute

compute

partition ...


Naïve parallelization: Round Robin

1

10

100

1000

10000

100000

0 0,2 0,4 0,6 0,8 1

load

log

(exe

cuti

on

tim

e) [

s]

serial

2 proc

4 proc

8 proc

16 proc

Every n requests is routed to SP n. Data set: Full load = 251 thousand requests Hardware: cluster of Intel® Pentium® 4 CPU 2.80GHz

PCs (each SP executed on a separate node)

RR partitioning gives speed-up ~3500 at 16 processes!


Grouping quality, Round Robin Savings = decrease in total transportation cost

0,000

0,100

0,200

0,300

0,400

0,500

0,600

0 0,2 0,4 0,6 0,8 1

load

savi

ng

s

serial

2 proc

4 proc

8 proc

16 proc

Round Robin partitioning degrades grouping quality substantially!


Global division of space:

Point Quad Partition (PQ)

Local division of space:KD Partition (KD)

Separate medians alongthe nth dimension (n>1)

Spatial Stream Partitioning

Objective 1: balanced partitioning Objective 2: put closeby requests in the same partition Constraint: Cover the entire space, with no overlap Two static strategies to subdivide 4D request space

(origin_x, origin_y, destination_x, destination_y):• globally (Point Quad) • locally (KD)

The static splits are based on analyzing the entire data• Problem in reality: Cannot know the stream in advance.


Adaptive stream partitioning Adapt to changes in request distribution by

periodically adjust partition boundaries. Partition boundaries are adjusted based on

samples of the requests.Adaptive Point QuadPartitioning (APQ)

Adaptive KDPartitioning (AKD)

Scheduled request

New request

Old partition boundary

New partition boundaries

No reassignment


0.3

0.35

0.4

0.45

0.5

0.55

2 4 6 8 10 12 14 16

n

savi

ng

s

RR

SPQ

SKD

APQ

AKD

0.5

0.505

0.51

0.515

0.52

0.525

0.53

2 4 6 8 10 12 14 16

n

savi

ng

s

SPQ

SKD

APQ

AKD

Grouping quality, spatial partitioning

n=16

Method

Savings

serial 53%

RR 33%

SPQ 51%

SKD 51%

APQ 51%

AKD 51%All spatial partitioning methods achieve savings close to serial

Full load. Savings for different n.


Execution time, spatial partitioning

n=16

Method

Exe time

Serial 19:23:00

RR 00:00:34

SPQ 00:27:00

SKD 00:09:00

APQ 00:07:00

AKD 00:04:00

Full load. Execution times for different n.

Of the spatial partitioning methods, AKD is clearly the fastest, which re-balances the load dynamically.

10

100

1000

10000

100000

2 4 6 8 10 12 14 16

n

log

(exe

tim

e) [

s] RR

SPQ

SKD

APQ

AKD


Conclusions

Through data parallelization and spatial partitioning in SCSQ, cab sharing is made realistic for high-volume request streams

SCSQL enables very easy specification of different parallelization functions• Both static and dynamic parallelization functions

Of the spatial partitioning methods, Adaptive KD partitioning• achieves the highest streaming throughput• gives the best grouping quality (highest savings)


Related Work

Tribeca [Sullivan, Heybey 1998]• Pipes as QL objects• No parallelization or distribution of the execution

Aurora [Carney et al 2002],WaveScope [Girod et al 2007]• Arbitrary computations are specified as functions

over streams• No parallelization or distribution

Stream Processing Core [Jain et al 2006]• Distributed stream processing• No query language


Related Work

MapReduce [Dean, Ghemawat 2004]• One distribution topology (map+reduce)• Not stream oriented

Sawzall [Pike, Dorward, Griesemer, Quinlan 2005]• MapReduce + high level language

GSDM [Ivanova, Risch 2005]• Distributed execution of expensive UDFs• No distribution in the query language

Volcano [Graefe, Davison 1993]• No user defined distribution


Ongoing and future work:

Automatic parallelization of continuous queries

Automatic postfilters Adaptively adjust the degree of

parallelism


References J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large

Clusters, Proc. 6th Symp. on OS De sign and Implementation, USENIX Association 2004, pp 137 – 150.

G. Gidófalvi, T. B. Pedersen: ST–ACTS: A Spatio-Temporal Activity Simulator, Proc. ACM GIS 2006, Arlington, Virginia, USA, pp 155–162.

G. Gidófalvi and T. B. Pedersen. Cab–Sharing: An Effective, Door–to–Door, On–Demand Transportation Service, Proc. ITS, 2007.

L. Girod, Y. Mei, R. Newton, S. Rost, A. Thiagarajan, H. Balakrishnan, S. Madden, The Case for a Signal-Oriented Data Stream Management System, Proc. CIDR 2007, Asilomar, CA.

M. Ivanova and T. Risch, Customizable Parallel Execution of Scientific Stream Queries, Proc. VLDB 2005, Trondheim, Norway, pp 157–168.

N. Jain et al, Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core, Proc. SIGMOD 2006, Chicago, IL, USA.

R. Pike, S. Dorward, R. Griesemer, S. Quinlan, Interpreting the Data: Parallel Analysis with Sawzall, Scientific Programming Journal 13:4 (2005), pp. 227-298.

M. Sullivan, A. Heybey, Tribeca: A System for Managing Large Databases of Network Traffic, Proc. USENIX Conf., New Orleans, 1998.

G. Graefe, D. L. Davison, Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution, IEEE Trans. on Softw. Eng. 19(8), August 1993, p. 749.

Documents

Highly Scalable Trip Grouping for Large–Scale Collective Transportation Systems Győző Gidófalvi, Geomatic ApS Torbern Bach Pedersen, Aalborg University