19
A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, 47404 {jekanaya,spallick,gcf}@indiana.edu 06/20/22 1 Jaliya Ekanayake - cts2008 CTS-2008 Irvine California

A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Embed Size (px)

Citation preview

Page 1: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

A Collaborative Framework for Scientific Data Analysis and

Visualization

Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer Science

Indiana University Bloomington, IN, 47404 {jekanaya,spallick,gcf}@indiana.edu

04/20/23 1Jaliya Ekanayake - cts2008

CTS-2008 Irvine California

Page 2: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Talk Outline

• Collaborative Data Analysis• Typical Collaborative Techniques• Proposed Architecture • High Energy Physics Data Analysis• Conclusion

04/20/23 2Jaliya Ekanayake - cts2008

Page 3: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Collaborative Scientific Data Analysis

• The final step of data analyses involves human interpretation• The data, the processing power, and the experts in the field are all

distributed• Collaboration brings all these to a single session• Participants from different geographic locations• Different interests (active participation or simply observe results)

04/20/23 3Jaliya Ekanayake - cts2008

Page 4: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Collaborative Techniques

• Focused on sharing multimedia content– Audio, video streams– Desktop sharing– Collaborative whiteboards, online meetings– E.g. WebEx, Windows Meeting Place, Anabas,

EVO

• The Data Turbine and the Real Time Data Viewer (RDV)– Remote monitoring of events/streams from

scientific instruments– The content dissemination is closely

coupled with the architecture

04/20/23 4Jaliya Ekanayake - cts2008

Page 5: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

The Proposed Architecture

• Compute Server acts as the gateway for a particular domain of control

• Results shared among the participants• Set of agents manage the sessions, and

track entities in the system

04/20/23 5Jaliya Ekanayake - cts2008

Session ManagementEntity TrackingGossip

Page 6: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

How does it work?

04/20/23 6

Site 1

Data

C1

R11

R1m1

ComputeClient

1

ComputeClient

p

Site n

Data

Cn

Rn1

Rnmn

Agents

ComputeServers Register with an Agent

Agent Keeps Track of the ComputeServers

ComputeClient Retrieve Details of ComputeServers

ComputeClient Submit Compute jobs

Results Reach all the Interested Entities

1

2

3

4

5

Time Line

04/20/23 6Jaliya Ekanayake - cts2008

Page 7: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Collaborative Modes - Shared Events

• Support further processing of data by the receiving end – Active participation

• Push paradigm

• Clients can further process the events if necessary

• Higher quality data

• Compute server notifies either the results or the location of the results to the participating clients

• For small data products, the output can directly be sent to the clients

• For larger data products, the outputs can be stored in a file system and the clients can retrieve them via Compute server

04/20/23 7Jaliya Ekanayake - cts2008

Page 8: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Collaborating Modes – Shared Display

• One client captures its display and share it as an image

• Suitable for passive participation• Suitable for clients joining with minimum computation

capabilities– E.g. hand held devices

• Capability to publish data to the public• May limits further analysis• Less accurate than the shared events

04/20/23 8Jaliya Ekanayake - cts2008

Page 9: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Security and Fault Tolerance

• Compute server Security– Authentication via PKI– Authorization via grid-map file

• Content Dissemination Network provides secure, end to end delivery of messages

• Content Dissemination Network is fault tolerant• Multiple set of agents maintains the state of the

system• No single point of failure• Compute server failure results manual re-start

04/20/23 9Jaliya Ekanayake - cts2008

Page 10: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

High Energy Physics Data Analysis

• Large volumes of data

• Distributed data

• Identify a certain type of data products from a collection of millions of data products

• Analyses are fine tuned iteratively

• Same analysis on different data sets

• Collaborative interpretation

Site 1

Data

C1

R11

R1m1

ComputeClient

1

NaradaBrokering

Agents

ROOT

04/20/23 10Jaliya Ekanayake - cts2008

Page 11: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

User InterfaceAvailable Clarens Servers

Session Information

Results received & merged

Results received & currently merging

Results not yet received

04/20/23 11Jaliya Ekanayake - cts2008

Page 12: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Results: # Participants vs. Event Propagation Time

04/20/23 12Jaliya Ekanayake04/20/23 12Jaliya Ekanayake - cts2008

Page 13: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Results : Event Rate vs. Communication Latency

04/20/23 13Jaliya Ekanayake - cts2008

Page 14: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Conclusions & Future Work

• A Collaborative Framework for Scientific Data Analysis• Processing data across domains of control• Sharing results

– Shared Event– Shared Display– Synchronous / Asynchronous

• Complete the Agent Implementation• Map-reduce style programming model for the

Compute Server

04/20/23 14Jaliya Ekanayake - cts2008

Page 15: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Thank You!

04/20/23 15Jaliya Ekanayake - cts2008

Page 16: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Security• The framework spans into multiple domains of control

• Use PKI for security

• Each entity in the framework owns a X509 certificate

• Communication medium - > Content dissemination framework

• The messages carries a signature

• Messages from unauthorized entities are discarded

• Agent uses a proxy certificate to submit computation jobs on behalf of the ComputeClient

• The framework provides the necessary APIs to generate a proxy certificate

• ComputeServer maps user’s DN to the user account

• Computation jobs are executed as user processes

• The code which performs the above user account mapping is kept auditable

04/20/23 16Jaliya Ekanayake - cts2008

Page 17: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Handling Failures 1: ComputeServer

• Agent detects the failure of a ComputeServer

• Agent notifies the ControlConsole about the failure

• User restarts the failed ComputeServers

• ComputeServer keeps the status of the processing jobs in memory– This will simplify the ComputeServer’s functionality

• Once restarted, the agent will re-submit the incomplete jobs to the ComputeServer

• ComputeClient can retrieve the results of the completed computations (even the results of the computations, which were completed before the failure) aft the restart

04/20/23 17Jaliya Ekanayake - cts2008

Page 18: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Handling Failures 2: Agent

• Master Agent(MA) keeps the status of the entire framework

• A set of Buddy Agent(BA)s keeps track of the MA

• MA assigns a unique ID to each BA

• MA sends the status of the framework to BAs

• BAs detect a failure of MA

• First BA will assume duty of MA

• New MA contacts ComputeServers and build the status

BA1

MA

BA2

BA3

04/20/23 18Jaliya Ekanayake - cts2008

Page 19: A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer

Computation Tasks and the Associated Cost

04/20/23 19Jaliya Ekanayake - cts2008