View
217
Download
0
Category
Tags:
Preview:
Citation preview
A Collaborative Framework for Scientific Data Analysis and
Visualization
Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer Science
Indiana University Bloomington, IN, 47404 {jekanaya,spallick,gcf}@indiana.edu
04/20/23 1Jaliya Ekanayake - cts2008
CTS-2008 Irvine California
Talk Outline
• Collaborative Data Analysis• Typical Collaborative Techniques• Proposed Architecture • High Energy Physics Data Analysis• Conclusion
04/20/23 2Jaliya Ekanayake - cts2008
Collaborative Scientific Data Analysis
• The final step of data analyses involves human interpretation• The data, the processing power, and the experts in the field are all
distributed• Collaboration brings all these to a single session• Participants from different geographic locations• Different interests (active participation or simply observe results)
04/20/23 3Jaliya Ekanayake - cts2008
Collaborative Techniques
• Focused on sharing multimedia content– Audio, video streams– Desktop sharing– Collaborative whiteboards, online meetings– E.g. WebEx, Windows Meeting Place, Anabas,
EVO
• The Data Turbine and the Real Time Data Viewer (RDV)– Remote monitoring of events/streams from
scientific instruments– The content dissemination is closely
coupled with the architecture
04/20/23 4Jaliya Ekanayake - cts2008
The Proposed Architecture
• Compute Server acts as the gateway for a particular domain of control
• Results shared among the participants• Set of agents manage the sessions, and
track entities in the system
04/20/23 5Jaliya Ekanayake - cts2008
Session ManagementEntity TrackingGossip
How does it work?
04/20/23 6
Site 1
Data
C1
R11
R1m1
ComputeClient
1
ComputeClient
p
Site n
Data
Cn
Rn1
Rnmn
Agents
ComputeServers Register with an Agent
Agent Keeps Track of the ComputeServers
ComputeClient Retrieve Details of ComputeServers
ComputeClient Submit Compute jobs
Results Reach all the Interested Entities
1
2
3
4
5
Time Line
04/20/23 6Jaliya Ekanayake - cts2008
Collaborative Modes - Shared Events
• Support further processing of data by the receiving end – Active participation
• Push paradigm
• Clients can further process the events if necessary
• Higher quality data
• Compute server notifies either the results or the location of the results to the participating clients
• For small data products, the output can directly be sent to the clients
• For larger data products, the outputs can be stored in a file system and the clients can retrieve them via Compute server
04/20/23 7Jaliya Ekanayake - cts2008
Collaborating Modes – Shared Display
• One client captures its display and share it as an image
• Suitable for passive participation• Suitable for clients joining with minimum computation
capabilities– E.g. hand held devices
• Capability to publish data to the public• May limits further analysis• Less accurate than the shared events
04/20/23 8Jaliya Ekanayake - cts2008
Security and Fault Tolerance
• Compute server Security– Authentication via PKI– Authorization via grid-map file
• Content Dissemination Network provides secure, end to end delivery of messages
• Content Dissemination Network is fault tolerant• Multiple set of agents maintains the state of the
system• No single point of failure• Compute server failure results manual re-start
04/20/23 9Jaliya Ekanayake - cts2008
High Energy Physics Data Analysis
• Large volumes of data
• Distributed data
• Identify a certain type of data products from a collection of millions of data products
• Analyses are fine tuned iteratively
• Same analysis on different data sets
• Collaborative interpretation
Site 1
Data
C1
R11
R1m1
ComputeClient
1
NaradaBrokering
Agents
ROOT
04/20/23 10Jaliya Ekanayake - cts2008
User InterfaceAvailable Clarens Servers
Session Information
Results received & merged
Results received & currently merging
Results not yet received
04/20/23 11Jaliya Ekanayake - cts2008
Results: # Participants vs. Event Propagation Time
04/20/23 12Jaliya Ekanayake04/20/23 12Jaliya Ekanayake - cts2008
Results : Event Rate vs. Communication Latency
04/20/23 13Jaliya Ekanayake - cts2008
Conclusions & Future Work
• A Collaborative Framework for Scientific Data Analysis• Processing data across domains of control• Sharing results
– Shared Event– Shared Display– Synchronous / Asynchronous
• Complete the Agent Implementation• Map-reduce style programming model for the
Compute Server
04/20/23 14Jaliya Ekanayake - cts2008
Thank You!
04/20/23 15Jaliya Ekanayake - cts2008
Security• The framework spans into multiple domains of control
• Use PKI for security
• Each entity in the framework owns a X509 certificate
• Communication medium - > Content dissemination framework
• The messages carries a signature
• Messages from unauthorized entities are discarded
• Agent uses a proxy certificate to submit computation jobs on behalf of the ComputeClient
• The framework provides the necessary APIs to generate a proxy certificate
• ComputeServer maps user’s DN to the user account
• Computation jobs are executed as user processes
• The code which performs the above user account mapping is kept auditable
04/20/23 16Jaliya Ekanayake - cts2008
Handling Failures 1: ComputeServer
• Agent detects the failure of a ComputeServer
• Agent notifies the ControlConsole about the failure
• User restarts the failed ComputeServers
• ComputeServer keeps the status of the processing jobs in memory– This will simplify the ComputeServer’s functionality
• Once restarted, the agent will re-submit the incomplete jobs to the ComputeServer
• ComputeClient can retrieve the results of the completed computations (even the results of the computations, which were completed before the failure) aft the restart
04/20/23 17Jaliya Ekanayake - cts2008
Handling Failures 2: Agent
• Master Agent(MA) keeps the status of the entire framework
• A set of Buddy Agent(BA)s keeps track of the MA
• MA assigns a unique ID to each BA
• MA sends the status of the framework to BAs
• BAs detect a failure of MA
• First BA will assume duty of MA
• New MA contacts ComputeServers and build the status
BA1
MA
BA2
BA3
04/20/23 18Jaliya Ekanayake - cts2008
Computation Tasks and the Associated Cost
04/20/23 19Jaliya Ekanayake - cts2008
Recommended