Collaborative eScience: Evolving Approaches
Charles Severance
Rutgers CyberInfrastructure Meeting
April 4, 2006
www.dr-chuck.com
Outline
• A look back at the past 15 years • Putting the “collab” in Collaborative eScience• The current tools of Collaborative eScience
– Collaboration– Portals– Repository
• Reflecting on 15 years of Experience– What is wrong with Middleware?– Authorization and Authentication - Are we there yet?
• A “future” eScience Case Study
The Founding Concepts
• Scientific Domain• Groups of People• Common User Interface• Data Sharing
– In the moment– Long-term
• Experimental Equipment• Compute• Visualization
Over 15 Years of Collaborative eScience
20001991 - 1999 2001 2002 2003 2004 2005 2006 2007
UARC/SPARC
SakaiWorktools CHEF
OGCE Grid Portal
NEESGrid
Globus Tool Kit
NEESIT
SCIGate ?
What was SPARC?
BeforeUARC..
What was SPARC?
UARC/SPARC
SPARC
2/2001 600 users 800 data sources
SPARC Software
• Written from scratch– No Middleware– No Portal Technology
• Three rewrites over 10 years– NextStep– Java Applets with server support– Browser based - kind of like a portal
• At the end, in 2001 - it was ready for another rewrite
Keys to SPARC Success
• Ten years of solid funding– Team consistency – Long enough to learn from “mistakes”
• Long term relationship between IT folks and scientists - evolved over time - relationship was “grey”
• Software rewritten several times over life of project based on evolving user needs and experience with each version of the program
• Portion of effort was invested in evaluation of usability - feedback to developers
After SPARC: Now What?
• Getting people together is an important part of collaborative eScience– WorkTools - Based on Lotus Notes– CHEF - Collaborative framework - Based on Java and
Jetspeed– Sakai - Collaboration and Learning Environment - Java
• Critical point: Collaborative software is only one component of eScience
• UM Focus: Building reusable user interface technologies for the people part of collaborative eScience
WorkTools
Over 9000 users (2000 active) at the end of 2003
WorkTools - The “organic” single-server approach - if you build it (and give away free acounts), they will come…
CompreHensive CollaborativE Framework (CHEF)
• Fall 2001: CHEF Development begins – Generalized extensible framework for building
collaboratories
• Funded internally at UM• All JAVA - Open Source
– Jakarta Jetspeed Portal– Jakarta Tomcat Servlet Container– Jakarta Turbine Service Container
• Build community of developers through workshops and outreach
CHEF Applications
• CourseTools Next Generation
• WorkTools Next Generation
• NEESGrid
• NSF National Middleware Grid Portal
NEESGrid - The EquipmentNetwork for Earthquake Engineering Simulation
NSF Funded. NCSA, ANL, USC/ISI, UM, USC, Berkeley, MSU
CHEF-Based NEESGrid Software
Overall Data Modeling EffortsOverall Data Modeling Efforts
NEES
Site A Site CSite B
Equipment People
Experiments Trials
Equipment People
Experiments Trials
Data Data Data
TsnumaiSpecimen
Shake TableSpecimen
GeotechSpecimen
CentrifugeSpecimen
Units Sensors Descriptions
SiteSpecificationsDatabase
ProjectDescription
Domain Specificmodels
Common Elements
Data / Observations
DT Main System
PTZ/USB
StillCapture
DT Client
BT848Video
Frames
DT Client
Capturing Video and Data
Camera ControlGateway
DAQData
CaptureDT Client
SimulationCoordinator
Site A Site B
DT Main System
Data Monitoring Tools
Still Image / Camera Control
~
< >^
^
< >
Camera ControlGateway
Creareviewers
Still imagecameracontrol
Thumb-nail
What’s in a name?
Sakai is named after Hiroyuki Sakai of the Food Channel Television program “Iron Chef”. Hiroyuki is renowned for his fusion of French and Japanese cuisine.
Sakai General Collaborative Tools
• Announcements • Assignments
• Chat Room
• Threaded Discussion
• Drop Box
• Email Archive
• Message Of The Day
• News/RSS
• Preferences
• Resources
• Schedule
• Web Content
• Worksite Setup
• WebDAV
Requirements Overlap
PhysicsResearch
Collaboration
EarthquakeResearch
Collaboration
Teachingand
Learning
Grid ComputingVisualization
Data Repository
Large DataLibraries
QuizzesGrading Tools
SyllabusSCORM
ChatDiscussionResources
Sakai: Product Placement
Collaboration and eResearch
TeachingandLearning
Additional General CollaborationTools Under Development
• Wiki based on Radeox
• Blog• Shared Display• Shared
Whiteboard• Multicast Audio• Multicast Video
These are works-in-progress by members of the Sakai eResearch community. There are no dates for release.
NMI / OGCE www.ogce.org
NSF National Middleware InitiativeIndiana, UTexas, ANL, UM, NCSA
Chalk Talk:School of Portals (2004)Chalk Talk:School of Portals (2004)
OGCE 1.1OGCE 1.1
XCATXCAT
NEES 3.0NEES 3.0
GridPortGridPort
NEES 1.1NEES 1.1
GridPort 3GridPort 3
SakaiSakai
uPortaluPortal
CHEFCHEF OGCE 1.2 ?OGCE 1.2 ?
OGCE 2OGCE 2JetspeedJetspeed
AllianceAlliance
GridPort 2GridPort 2
CompetitionCompetition CollaborationCollaboration ConvergenceConvergence
GridSphereGridSphere
Chalk Talk:School of eScience Portals (2006)Chalk Talk:School of eScience Portals (2006)
OGCE 1.1OGCE 1.1
XCATXCAT
GridPortGridPort
NEES 1.1NEES 1.1
GridPort 3GridPort 3
SakaiSakai
uPortaluPortal
CHEFCHEF
OGCE 2OGCE 2JetspeedJetspeed
AllianceAlliance
GridPort 2GridPort 2
CompetitionCompetition CollaborationCollaboration ConvergenceConvergence
GridSphereGridSphere
SciGate ?SciGate ?
SciDocSciDoc
Atlas
Portal Gateway Desktop Gateway
Applicationsand Users
ITER CMS
GatewayTechnologies
Services andComponents
Resources
SR
B
PetascaleCompute
Cla
ren
s
Ide
ntit
y
Se
curit
y
Op
al
Me
taD
ata
PetascaleData
SciGateProduction
Integration andAdministration
Sa
kai
Glo
bu
s
Blu
eG
en
e
OR
NL
…
ManagementComponents
Co
ntr
ol
Exp
erim
en
t
Sim
ula
tion
Kn
ow
led
ge
Sto
re
…Pro
cess
…
…
Configure: Atlas Portal Experiment Process Control Knowledge Store Sakai SRB Opal Clarens Metadata
Configure: ITER Portal Experiment Process Control Knowledge Store Sakai SRB Opal Clarens Metadata
The Ecology of Collaborative eScience
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
Scope of Collaborative E-Science“..composing and orchestrating many technologies…”
“..interoperability is key…”
IdentityACL
User Interface for Collaborative E-
Science
Portals are an excellent technology for building a federated user interface across these disparate components using standards like JSR-168.
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
IdentityACL
Portals may only be an intermediate
step in the process..
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
IdentityACL
DesktopApplications
Focus of Sakai Activity in eScience
Sakai is focused primarily on integration with portals and working closely with data repositories.
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
IdentityACL
Discuss First
Collaboration .vs. Portal • Basic organization is about the
thing it represents - Teragrid, NVO
• Site customization is based on the resource owners
• Sometimes there is an individual customization aspect
• Many small rectangles to provide a great deal of information on a single screen
• Portals think of rectangles operating independently - like windows
• Think “Dashboard”
• Basic organization is about the shape of the people and groups
• Customization based on the “group leaders”
• New groups form quickly and organically
• Doing one thing at a time - chat, upload - perhaps multiple active windows on a desktop
• Very interactive• Think of navigation as picking a tool
or switching from one class to another
• Think “Application”
Sakai Portlet Version 0.2
• Tree View
• Gallery View
• Proxy portlets
• Source in SVN
• Configurable via properties file
Announcements (sakai.announcements)
Assignments (sakai.assignment)
Chat Room (sakai.chat)
Discussion (sakai.discussion)
Gradebook (sakai.gradebook.tool)
Email Archive (sakai.mailbox)
Membership (sakai.membership)
Message Forums (sakai.messageforums)
Preferences Tool (sakai.preferences)
Presentation (sakai.presentation)
Profile (sakai.profile)
Resources (sakai.resources)
Wiki (sakai.rwiki)
Tests & Quizzes (sakai.samigo)
Roster (sakai.site.roster)
Schedule (sakai.schedule)
Site Info (sakai.siteinfo)
Syllabus (sakai.syllabus)
Sakai JSR-168 Portlet
• Web Services are used to login to Sakai establish a session and retrieve a list of Sakai Sites, Pages, and Tools
• The portlet is 100% stock JSR-168– Works in Pluto, uPortal, and GridSphere
Three Variations
• Display the Sakai gallery - all of Sakai except for the login and branding.
• Retrieve the hierarchy of sites, pages and tools display in a tree view with the portlet and show selected tools/pages in an iframe within the portlet
• Proxy tool placement for a particular Sakai tool such as sakai.preferences
Sakai Gallery View
How Gallery Works
uPor
tal,
Plu
to,
or G
ridS
pher
e
Sak
ai
Web
Svc
sC
haro
nP
orta
l
Sak
aiP
ortle
t
Login
/portal/gallery
Sakai Tree View
How Tree View Works
uPor
tal,
Plu
to,
or G
ridS
pher
e
Sak
ai
Web
Svc
sC
haro
nP
orta
l
Sak
aiP
ortle
t
Login
ToolList
/portal/page/FF96
Sakai Proxy Tool
Proxy Tool Selection
How Proxy Portlet Works
uPor
tal,
Plu
to,
or G
ridS
pher
e
Sak
ai
Web
Svc
sC
haro
nP
orta
l
Sak
aiP
ortle
t
Login
SiteList
/portal/page/FF96
1
2
SakaiSite.getToolsDom<sites> <portal>http://localhost:8080/portal</portal> <server>http://localhost:8080</server> <gallery>http://localhost:8080/gallery</gallery> <site> <title>My Workspace</title> <id>~csev</id> <url>http://localhost:8080/portal/worksite/~csev</url> <pages> <page> <id>af54f077-42d8-4922-80e3-59c158af2a9a</id> <title>Home</title> <url>http://localhost:8080/portal/page/af54f077-42d8-4922-80e3-59c158af2a9a</url> <tools> <tool> <id>b7b19ad1-9053-4826-00f0-3a964cd20f77</id> <title>Message of the Day</title> <toolid>sakai.motd</toolid> <url>http://localhost:8080/portal/tool/b7b19ad1-9053-4826-00f0-3a964cd20f77</url> </tool> <tool> <id>85971b6b-e74e-40eb-80cb-93058368813c</id> <title>My Workspace Information</title> <toolid>sakai.iframe.myworkspace</toolid> <url>http://localhost:8080/portal/tool/85971b6b-e74e-40eb-80cb-93058368813c</url> </tool> </tools> </page> </pages> </site></sites>
New WS method is upwards compatible with getSitesDom
Sakai Repository Integration Approach
Focus of Sakai Activity in eScience
Sakai is focused primarily on integration with portals and working closely with data repositories.
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
IdentityACL
Discuss Now
Collaboration .vs. Repository
• Many different systems may be active at the same time
• Systems evolve, improve, and are often replaced every few years
• Systems focused on the dynamic needs of users and applications
• Thousands of simultaneous online users
• Performance tuning• Must be very easy to use;
almost unnoticeable• Used informally hundreds of
times per day per user• Think “E-Mail”
• Generally one system for the area
• Long term strategic choice for institution
• System focused on accessing, indexing, curation, and storage
• Millions of high quality objects properly indexed
• Data and metadata quality• Must enforce standards and
workflow to insure data quality• Most use is very purposeful:
search, publish, add value• Think “Library”
Inbound Object Flow
Ingest
Create and use in
native form
Pre
pare
for
stora
ge
DataModel
Store
Curate, convert, update and maintain over time
Index Lens
Se
arch
Vie
w
Re
use
DRSakai
The DR establishes a data model for “site” objects. The CLE hands sites to the DR. The DR may have to do “model” or content cleanup
before completing the ingest process.
The lens or disseminator understands
the data model and is capable of
rendering the objects. The lens is part of
the DR.
Preparation for storage may include cleanup, conversion,
copyright clearance, and other workflow steps.
Outbound Object Flow
DataModel
Index LensSearch
Vie
w
Reuse
DR
Sakai
Sakai can find and re-use objects in the
repository.
DataModel
Lens
Vie
w
Se
arch
Reuse
Sakai and Repositories Going Forward
• Instead of solving the problem by creating a single DR technology that is a superset - which might take years
• Focus on data portability between systems - reduce the impedance mismatch (or needed conversion between systems)
• RDF enables object portability across systems, languages, and technologies
Sakai Repository Approach
• Move Sakai and other Collaboration systems toward RDF– Experiment with using RDF as native storage format– High Performance RDF - Fedora testing - 180M tuples -
complex queries - 70ms
• Move data repositories toward RDF– Move from schema-based stovepipe objects to OWL/RDF
based objects with referential integrity– Explore dimensions of portability of disseminator / lenses -
this is an important research area
• Get started immediately….
Fedora Images
Some Reflections
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
Where is the Middleware?
“..composing and orchestrating many technologies…”
“..interoperability is key…”
IdentityACL
Middleware
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
IdentityACL
Is Middleware The Universal Connector?
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
IdentityACL
The Universal Connectors
tcp/ip http/https
web services
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
Is Middleware “inside” each application?
IdentityACL
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
Middleware is simply another component - used as needed
Middleware
IdentityACL
CollaborativeTools
SharedCompute
DataSources
DataRepository
PortalTechnology
KnowledgeTools
Identity and Access Control: A very important function of Middleware
Middleware
IdentityACL
Lets Talk about This
Chalk Talk:Identity and Access ControlChalk Talk:Identity and Access Control
CASCAS
ShibbolethShibboleth
KerberosKerberos
GlobusGlobus
CompetitionCompetition CollaborationCollaboration ConvergenceConvergence
LDAPLDAP
PubCookiePubCookie
K.X509K.X509
MyProxyMyProxy
????
GridShibGridShib
CosignCosign
???
IdentityACL
Identity and ACL: Goal State
• One server - one software distribution• Virtual Organization Software• Supports all protocols
– Globus Certificate Authority– Shibboleth– LDAP– MyProxy– Kerberos
• Who will do this? Who will fund this? Who can get these competitors to cooperate?
AUTHN/AUTHZ Meetings
My eScience Fantasy
The pre-requisites
• My net worth is $5B (I give myself grants)• I encounter some tech-savvy scientists in a field who
are using technology to do world-class research…• They have never been visited by any other computer
scientist…• They are working in groups of 1-30 geographically
distributed around the world• They all work on a beach with Internet2 connections
and wide-open wireless and favourable exchange rates
A
B
D
E
Vol 4Vol 3
Vol 2Vol 1F
C
Compute
Data Models
Tutorials
Experiments
Remote Observation
eDocuments
Step 1: Visit The Scientists
• Understand what they are doing and how they are doing it?
• Ask them how they would like to improve it.• Show each application to other scientists.
Ask the other scientists how they would improve it.
• Help each group improve their work - help them using whatever technology they are currently using
Step 2: Add some technology
• Install the super-multi-protocol Virtual Organization software and provide a NOC for the VO software - identity and simple attributes
• Install Sakai - point it at the VO software for identity add icon at the top of Sakai
• Give each scientist an account in the VO• Give each effort in the field a site within Sakai
Heart Study CollaboratoryLogin
My Workspace A B C D E Open Forum
Home
Chat
Resources
Tutorials
Site B
Mail List
Live Meetings
Step 2: Use the VO
• For those who want to protect their information, help them add SSO to their sites, backed by the VO service
• Since it is multi-protocol - likely there will be no modification of the underlying science code - only a server configuration change Identity
ACL
A
B
D
E
Vol 4Vol 3
Vol 2Vol 1F
C
Compute
Data Models
Tutorials
Experiments
Remote Observation
eDocumentsIdentityACL
Heart Study CollaboratoryLogin
My Workspace A B C D E Open Forum
Home
Chat
Resources
Tutorials
Site B
Mail List
Live Meetings
Step 4: Unique Identifier Service
• Come up with a way for any member of the VO to “get” a unique identifier
• Demand some information (build a little data model)– Person’s name and organization (implicit from request)– What kind of thing this will represent (experiment, document, image
series)– Simple description– Keyword/value extensions
• Build an simple way request and retrieve these through a simple web service - capture implicit metadata from request (when, IP address, etc). Make sure it works from perl!
• Encourage community to start marking “stuff” with these identifiers in their stovepipes
Step 5: Data Models
• Begin to work with subsets of the field to try to find common data models across stovepipes
• Start simple - use very simple RDF - human readable
• Broaden / deepen model slowly - explore variations
• Define simple file-system pattern for storing metadata associated with a file and/or a directory
Step 6: A Backup-Style Repo
• Build a data repository which will function as a backup
• Basic idea - each time you get identifier - this enables backup space - any data and/or metadata can be uploaded under that particular identifier and left in the repository
• Make the repo multi-protocol, FTP, DAV, Web-Service with attachments, GridFTP, etc.
• Make it so there can be a network of cooperating repositories
A
B
D
E
Vol 4Vol 3
Vol 2Vol 1F
C
Compute
Data Models
Tutorials
Experiments
Remote Observation
eDocumentsIdentityACL
Heart Study CollaboratoryLogin
My Workspace A B C D E Open Forum
Home
Chat
Resources
Tutorials
Site B
Mail List
Live Meetings
GUIDService
CentralRepo
LocalRepo
LocalRepo
Year 4 and on…
• Once the basic stovepipes have been “brought in from the cold” and made part of a community with no harm, the next steps are to begin to work “cross-stovepipe”– Evolve data models to be far richer with many variants– Build value added tools that are aware of the data models
and are usable across stovepipes
• Teach the community to build and share tools - gently encourage development standards - Java / JSR-168 perhaps
• Most important: Always listen to the users
Science at the center of
eScience
Connect
Enhance
Data Models
Data Storage
New Tools
New Approaches
PriorityScience
Scientists
… start at the center and work outwards…
… apply technology when the users will see it as a “win” …
Com
mun
icat
e
New
Tec
hnol
ogie
s
Rep
osito
ries
Conclusion
• Many years ago, eScience had science as its main focus
• Custom approaches resulted in too many unique solutions
• Computer scientists began a search for the “magic bullet” - each group found a different magic bullet
• Each group now competes for mind share (and funding) to be the “one true” magic bullet
Conclusion (cont)
• One way to solve the “many competing technologies” solution is to form “super groups” which unify the technologies
• No single technology gets to claim “they are the one” (Middleware is not “in the middle”)
• Each technology needs to become a drop-in service/component which is available for use only when appropriate
• Once we can get past looking at the technologies as the main focus, we get back to science as the main focus
Lets remember why we started this whole field in the first place…
• Scientific Domain• Groups of People• Common User Interface• Data Sharing
– In the moment– Long-term
• Experimental Equipment• Compute• Visualization
To downloadwww.dr-chuck.com
“Chuck’s Talks”