72
1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

Embed Size (px)

Citation preview

Page 1: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

1

A Grid-Based Middleware for Processing Distributed Data

Streams

Liang ChenAdvisor: Gagan Agrawal

Computer Science & Engineering

Page 2: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

2

Roadmap• Introduction

– Motivation– Our approach and challenges

• System Overview and Initial Evaluation– Introduce system architecture and design– Discuss the self-adaptation function

• Self-Adaptation Algorithm– Explain the algorithm– Evaluate the system by using two data mining applications

• Resource Allocation Schemes• Dynamic Migration

– Motivation– Light-weight summary structure (LSS)– How applications utilize the dynamic migration– Evaluation

• Adaptive Volume Rendering• Related work• Conclusion and Future work

Page 3: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

3

Introduction-Motivation• What is data steam

– Data stream: data arrive continuously – Enormous volume and must be processed

online– Need to be processed in real-time– Data sources could be distributed

• Data Stream Applications:– Online network intrusion detection– Sensor networks– Network Fault Management system for

telecommunication network elements

weili lin
grid is first developed to enable resource sharing within far-flung scientific collabration. such as colllaborative virsulization of large scientific datasets and distributed computing for highly computaionally demanding data anylysis. just as www began as a technology for scientific coopration and was adopted by e-biness, people expect the same trajectory of grid technologies
weili lin
resource in different data formatresources on different platformsdifferent kinds of resources, like storage resouces, softwars, data and the likesome sharing relationship is transient. it could be because of the upgrade of resources
Page 4: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

4

Introduction-MotivationNetwork Fault Management System (NFM)

analyzing distributed alarm streams

Switch Network

X

NFM (Network Fault Management) System

Page 5: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

5

Introduction-Motivation

Switch Network

X

• Challenges– Data and/or computation intensive– System can be easily overloaded

Page 6: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

6

Introduction-Motivation • Possible solutions

– Grid computing technologies– Automatically adjust processing rate

Switch Network

weili lin
public key basedsingl sign on: allows user to anthenticate once and thus create a proxy credential that a program use to anthenticate with remote servise on user's behalf wihtout intervention of other users
Page 7: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

7

Introduction-Motivation

• The needs for processing distributed data streams– A middleware running in Grid– Allocate Grid resources– Provide self-adaptation function

Page 8: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

8

Introduction-Our Approach• We implemented a middleware to meet the nee

ds• Five contributions of our work

1. Utilizing existing grid standards Liang Chen, K. Reddy and G. Agrawal “GATES: A Grid-Based Middleware for

Processing Distributed Data Streams”.HPDC, 2004. 2. Providing self-Adaptation functionality Liang Chen and G. Agrawal “Supporting Self-Adaptation in Streaming Data M

ining Applications”. IPDPS, 2006.

3. Supporting automatic resource allocation Liang Chen and G. Agrawal “A Static Resource Allocation Framework for Gri

d-Based Streaming Applications”. Concurrency Computation: Practice and Experience Journal, Volume 18, Issue 6 , Pages 653 - 666.

4. Supporting efficient dynamic migration Liang Chen, Q. Zhu and G. Agrawal “A Supporting Dynamic Migration in Tig

htly Coupled Grid Applications”. SC 2006.

5. Studying adaptive rendering application

Page 9: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

9

Roadmap• Introduction

– Motivation– Our approach and challenges

• System Overview and Initial Evaluation– Introduce system architecture and design– Discuss the self-adaptation algorithms

• Self-Adaptation Algorithm– Introduce the algorithm– Evaluate the system by using two data mining applications

• Resource Allocation Schemes• Dynamic Migration

– Motivation– Light-weight summary structure (LSS)– How applications utilize the dynamic migration– Evaluation

• Adaptive Volume Rendering• Related work• Conclusion and Future work

Page 10: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

10

System Architecture and Design(Architecture)

• Use Globus Toolkit 3.0, built on OGSA

• Allows users to specify their algorithms implemented in Java

• Take care of plugging user-defined algorithms into the system and running them in Grid.

• Applications need be broken down into a number of pipelined stages

Page 11: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

11

A B C

Stage A Stage B Stage C

:GATES services

:Stages of an application :Queues between Grid services

:Buffers for applications

System Architecture and Design(Architecture)

Application

Stage A

Stage B

Stage C

Page 12: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

12

Public class Second-Stage implements StreamProcessing{ … void work(buffer in, buffer out) {

… while(true) { DATA = GATES.getFromInputBuffer(in); Inter-Results = Processing(Data); GATES.putToOutputBuffer (out, Inter-Results); }

}}

System Architecture and Design

(GATES API Functions)

Page 13: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

13

Adaptation Parameter• Definition:

– A parameter in an application– Changing the parameter’s value can change

processing rate of the application, also impact accuracy of the processing

• Two kinds of adaptation parameters– Performance parameter– Accuracy parameter

– Example• Sampling rate is an accuracy parameter

AccuracyProcessing rateAccuracy Parameter

AccuracyProcessing ratePerformance Parameter

Page 14: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

14

Pseudo Codes Again with Self-adaptation API Functions

Public class Second-Stage implements StreamProcessing{ … //Initialize sampling-rate Sampling-rate = (Max+ Min)/2; void work(buffer in, buffer out) {

GATES.specifyAccuracyPara(Sampling-rate, Max, Min);

while(true) { DATA = GATES.getFromInputBuffer(in); Inter-Results = Processing(Data, Sampling-rate); GATES.putToOutputBuffer (out, Inter-Results); Sampling-rate = GATES.getSuggestedValue(); }

}}

Page 15: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

15

Roadmap• Introduction

– Motivation– Our approach and challenges

• System Overview and Initial Evaluation– Introduce system architecture and design– Discuss the self-adaptation function

• Self-Adaptation Algorithm– Explain the algorithm– Evaluate the system by using two data mining applications

• Resource Allocation Schemes• Dynamic Migration

– Motivation– Light-weight summary structure (LSS)– How applications utilize the dynamic migration– Evaluation

• Adaptive Volume Rendering• Related work• Conclusion and Future work

Page 16: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

16

• View the system as a pipeline

• To ensure real-time processing, a balanced pipeline is needed

• When average queue length is too small or too large, queue is under or over loaded. Pipeline is not balanced.

Self-Adaptation Algorithm

A B C

• When GATES.getSuggestedValue() is invoked, use the heuristic way to determine a new value for the adaptation parameter according to the measured lengths

• Measure the average lengths of the queues in the pipeline

Page 17: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

17

Self-adaptation Algorithm

• The way we measure average queue length

• the heuristic way to adjust an adaptation parameter

– Should the adaptation parameter be modified, and if so, in which direction?

– How to find a new value (update the value) of the adaptation parameter

))(*)(*),(*(*)1(~

*~

33222111 dPPttPadad BB

Page 18: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

18

Self-adaptation Algorithm

• Should the adaptation parameter be modified, and if so, in which direction?– The answer is related to the pipeline’s

load state.

Page 19: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

19

Self-adaptation AlgorithmPerformance Parameter BP

A B C

A B C

A B C

A B C

A B C

A B C

A B C

A B C

Convergent States

Non-Convergent States

:Overloaded

:Properly-loaded

:lightly-loaded

A B C

A B C

Page 20: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

20

Self-adaptation Algorithm

Summary of Load States

Page 21: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

21

Self-adaptation Algorithm

• How to determine a new value for the adaptation parameter– Linear update: increase or decrease

by a fixed value

• Hard to find a proper fixed value

– Binary search

BPBP

PPP )(P

P

Page 22: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

22

Self-adaptation Algorithm

Left Border

Current Value

Right Border

New Value

Left Border

Current Value

Right Border

Page 23: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

23

Page 24: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

24

Self-adaptation Algorithm

• Two Data mining applications– Clustream: Clustering data-points in stre

ams

Page 25: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

25

Data Mining Applications &

System Evaluation• Dist-Freq-Counting: finding frequent i

temsets from distributed streams

Page 26: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

26

Data Mining Applications &

System Evaluation

Page 27: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

27

Data Mining Applications &

System Evaluation

Page 28: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

28

Data Mining Applications &

System Evaluation

Page 29: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

29

Data Mining Applications &

System Evaluation

Page 30: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

30

Data Mining Applications &

System Evaluation

Page 31: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

31

Data Mining Applications &

System Evaluation

Data Mining Applications &

System Evaluation

Page 32: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

32

Data Mining Applications &

System Evaluation

Data Mining Applications &

System Evaluation

Page 33: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

33

Data Mining Applications &

System Evaluation

Data Mining Applications &

System Evaluation

Page 34: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

34

Roadmap• Introduction

– Motivation– Our approach and challenges

• System Overview and Initial Evaluation– Introduce system architecture and design– Discuss the self-adaptation algorithms

• Self-Adaptation Algorithm– Explain the algorithm– Evaluate the system by using two data mining applications

• Resource Allocation Schemes• Dynamic Migration

– Motivation– Light-weight summary structure (LSS)– How applications utilize the dynamic migration– Evaluation

• Adaptive Volume Rendering• Related work• Conclusion and Future work

Page 35: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

35

Resource Allocation Schemes

• Problem Definition– Grid resource allocation for pipelined applicati

ons that process distributed streaming data in real-time is challenging

– The scheme consists of two parts– Static Part: allocate resources before an applic

ation runs– Dynamic Part: re-allocate resources in run-time– A framework to monitor resources and support

dynamic resource allocation

Page 36: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

36

Static Allocation Scheme

Destinationm1.cluster2.edu

Data Source 1162.9.23.1

Data Source 278.29.242.8

Data source 3192.168.2.8

Data Source 4123.97.61.9

Placement 1 Placement n1

Placement n2Placement 1

Placement 1 Placement n3

Stage 2:

Stage 3:

Stage 4:

Static allocation problem: determining a deployment configurationObjective: Automatically generate a deployment configuration according to the information of available resources

The number of data sources and their location

The destination The number of stages

consisting of a pipeline? The number of instances

of each stage? How the instances

connect to each other? The node where each

instance is placed

Page 37: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

37

Roadmap• Introduction

– Motivation– Our approach and challenges

• System Overview and Initial Evaluation– Introduce system architecture and design– Discuss the self-adaptation algorithms

• Improved Self-Adaptation– self-adaptation algorithm– Evaluate the system by using two data mining applications

• Resource Allocation Schemes• Dynamic Migration

– Motivation– Light-weight summary structure (LSS)– How applications utilize the dynamic migration– Evaluation

• Adaptive Volume Rendering• Related work• Conclusion and Future work

Page 38: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

38

Dynamic Migration-Motivation– Grid resources vary frequently– Dynamically allocating new resources and migrating applications to

the new resources improve performance – Checkpointing is a classic method to support dynamic migration

• A snapshot of system’s running state• Transmit to a remote site• Restore execution context and restart processes

– Disadvantages of checkpointing• Platform dependent• Inefficient• Involve lots of implementation efforts

– Our approach is base on Light-weight Summary Structure (LSS)

Page 39: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

39

Dynamic Migration-LSS• Processing Structure:

...

while(true){

read_data_from_streams();process_data();accumulate_intermediate_results();reset_auxiliary_structures();

}...

• Data structure storing summary information is Light-weight summary structure

• Others are Auxiliary structures

Page 40: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

40

Dynamic Migration-LSS• Two observations with respect to LSS

– The size of LSS is much smaller than that of the total memory

– Auxiliary structures are usually reset at the end of each loop. Unnecessary to migrate auxiliary structures when migration occurs at the end of a loop

• LSS can be used to support dynamic migration– GAETS provides an API function to allocate a block

of memory to be LSS– An application stores summary information to LSS– transmit only LSS at the end of the loop to a new

node and restore the LSS at the new node

Page 41: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

41

Dynamic Migration–supported by GATES

... while(true) { ... //check if migration is needed

if(GATES.ifMigrationNeeded()) { GATES.migrate(lss); break; } }

Codes running atRemote Computing Node

Page 42: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

42

Dynamic Migration

• Advantages of using LSS– Efficient, only LSS is migrated– Not impact the accuracy of processing– Support migration across

heterogeneous platforms– Reduce application developers’ efforts

on making application capable of migration

Page 43: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

43

Dynamic Migration

Page 44: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

44

Dynamic Migration

• Evaluation– Three applications

• Counting sample– LSS stores intermediate top M frequently occurrin

g numbers• Clustream, clustering data points in streams

– LSS stores micro-clusters computed at the second stage

• Dist-Freq-Counting, finding frequent itemsets in distributed streams.

– LSS stores unprocessed itemsets

Page 45: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

45

Dynamic Migration

• Memory usage of LSS

Page 46: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

46

Dynamic Migration• Migration using LSS is efficient

Page 47: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

47

Dynamic Migration• Migration using LSS is efficient

Page 48: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

48

Dynamic Migration• Benefits of migration in a dyamic envi

ronment

Page 49: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

49

Dynamic Migration

• Memory usage of LSS

Page 50: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

50

Dynamic Migration• Migration using LSS is efficient

Page 51: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

51

Dynamic Migration• Migration using LSS is efficient

Page 52: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

52

Dynamic Migration• Benefits of migration in a dynamic

environment

Page 53: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

53

Dynamic Migration• LSS migration does not impact

processing accuracy– The counting sample application was

used– Compared the average accuracy of

the processing results from the non-migration and the migration versions, they are 97.28% and 97.51% accurate

Page 54: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

54

Roadmap• Introduction

– Motivation– Our approach and challenges

• System Overview and Initial Evaluation– Introduce system architecture and design– Discuss the self-adaptation algorithms

• Self-Adaptation Algorithm– Explain the algorithm– Evaluate the system by using two data mining applications

• Resource Allocation Schemes• Dynamic Migration

– Motivation– Light-weight summary structure (LSS)– How applications utilize the dynamic migration– Evaluation

• Adaptive Volume Rendering• Related work• Conclusion and Future work

Page 55: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

55

Adaptive Volume Rendering

• Motivation – Grid computing is needed• Visualization involves large volumes of dataset • We focus on streaming volume data• Interactively visualizing volume data in real-time is

needed– Computationally intensive– Resources consumed– Real-time processing can not be guaranteed

• The places where data are generated are distributed

• Typical client-server architecture is not scalable– Network bandwidths of wide-area networks are low– Computing capability of normal desktop is not enough

• Grid techniques would be a good solution– Divide the procedure into stages organized in a

pipeline – Allocate nodes close to data source to pre-process

volume data– The size of intermediate results is much smaller

Page 56: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

56

Adaptive Volume Rendering

• Motivation – GATES is desirable– Automatic adaptation is desirable

• Volume rendering algorithms running on a grid need to be highly adaptive

• Adaptation usually achieved by manually adjusting adaptation parameters

• Such manual parameter adaptation is very challenging in a grid environment

– Automatic resource allocation is desirable• Grid environment is highly changeable

– The GATES middleware could fulfill the needs• Grid-based• Provide the self-adaptation function to applications• Automatically allocate Grid resources

Page 57: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

57

• Overall design– Two pipelined steps – the first step:

• Build octrees from volume data– Octree is a tree data structure, in which each internal no

de has up to 8 children– Here, we use an octree to represent multiresolution info

rmation for a volume– Procedure to build an octree for a volume is as follows:

» Divide volume space into 8 subvolumes and create 8 children nodes

» For each subvolume, calculate standard deviation of all voxels in the subvolume, and store the deviation to the corresponding child node

» If the deviation is larger than a pre-defined value, divide the subvolume, repeat the above procedure. Otherwise, stop

Adaptive Volume Rendering

Page 58: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

58

Adaptive Volume Rendering

• Overall design– Two pipelined steps – the second step:

• Use an octree and its corresponding volume to render images

• Provided an error tolerance (or user-defined resolution), use DFS to traverse the octree and stop at the nodes where the deviation is less than the resolution or error tolerance.

• Project the corresponding 3D-subvolumes to an image

Page 59: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

59

Adaptive Volume Rendering

Page 60: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

60

Adaptive Volume Rendering

• Make the rendering self-adaptive

– Two adaptation parameters used in the third stage• Error Tolerance – performance parameter• Image Size – accuracy parameter

– Only one adaptation parameter can be adjusted by GATES. So we fix one and adjust the other

Page 61: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

61

Page 62: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

62

Adaptive Volume Rendering

• Experiment 1

Page 63: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

63

Adaptive Volume Rendering

100kbps150kbps

200kbps 250kbps

Page 64: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

64

Adaptive Volume Rendering

• Experiment 2

Page 65: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

65

Adaptive Volume Rendering

• Experiment 3: compare the performance of two implementations– Java-imple– C-imple

Page 66: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

66

Adaptive Volume Rendering

• Experiment 3: compare the performance of two implementations

Page 67: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

67

Related Work• Middleware for data stream processing

– Data cutter, Stampede– Differences: in a cluster, no self-adaptation, no specificall

y for real-time processing• Continuous query systems

– STREAM, dQUOB, TelegraphCQ, NiagraCQ– Differences: centralized, no adaptation supports

• Distributed continuous query systems– Aurora*, Medusa, Borealis– Differences: continuous queries, not in Grid environment

• In-Network aggregation in sensor network• Stream-based overlay networks

Page 68: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

68

Related work• Grid Resource Allocation

– Condor, Realtor, ACDS– Main Differences: our work focus on Grid resou

rce allocation for workflow applications• Adaptation Through a Middleware

– Cheng et al.’s adaptation framework, SWiFT, Conductor, DART, ROAM

– Main Differences: our work focus on general supports for adaptation in run-time

• Dynamic Migration in Grid Environment– Condor, XCATS, Charm++– Main Differences: our work use LSS

Page 69: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

69

Conclusion• Grid computing could be an effective

solution for distributed data stream processing

• GATES – Distributed processing– Exploit grid web services– Self-adaptation to meet the real-time

constraints– Grid resource allocation schemes and

dynamic migration

Page 70: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

70

Future Work • CPU cycles and Network bandwidths

– Currently, only network bandwidth is considered a constraint when scheduling Grid resources

– Few related work proposes a metric to integrate both for pipelined appliations

• Port GATES from GT3 to GT4• Support fault-tolerance and high availability• Further relieve programming burdens from

application develops– Specify meta-data

• Support distributed continuous queries– Specify a set of query operators

Page 71: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

71

Acknowledgements

• My advisor, Prof. Agrawal, proposed the idea of implementing the middleware, and gave lots advices for the directions of my research

• Prof. Shen gave lots of helps on implementing the render application, and provided lots of write-up for the chapter 7

Page 72: 1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

72

Questions?• No more questions? Thanks!