26
National Center for Supercomputing Applications Towards A Rich-Context Participatory Cyberenvironment Yong Liu Robert E. McGrath James D. Myers Joe Futrelle {yongliu, mcgrath, jimmyers, futrelle}@ncsa.uiuc.edu GCE 2007 Workshop, Nov.11-12, 2007 Supercomputing Conference 2007

National Center for Supercomputing Applications Towards A Rich-Context Participatory Cyberenvironment Yong Liu Robert E. McGrath James D. Myers Joe Futrelle

Embed Size (px)

Citation preview

National Center for Supercomputing Applications

Towards A Rich-Context Participatory Cyberenvironment

Yong Liu

Robert E. McGrath

James D. MyersJoe Futrelle

{yongliu, mcgrath, jimmyers, futrelle}@ncsa.uiuc.edu

GCE 2007 Workshop, Nov.11-12, 2007

Supercomputing Conference 2007

National Center for Supercomputing Applications

Outline• Motivation• Web 2.0 and Where 2.0• Definition of Participatory Cyberenvironment• Cyberenvironment Technology Stack

– CyberCollaboratory Portal– Approach and Goals of a Rich-context Participatory

Cyberenvironment

• The Role of Contexts– Social, Geospatial, Provenance, Conceptual Contexts– Science Drivers and Our Work So Far and Next Steps– Two Examples on How These Contexts Can Play Together

• Concluding Remarks• Acknowledgements

National Center for Supercomputing Applications

Motivation

• Increasingly Collaborative Scientific Efforts– Cross-disciplines, laboratories, observatories and organizations

• Heterogeneous Scientific Resources– Sensors, software components, data/databases, networks,

computers

• Avoiding Data Silos– Most existing portals are creating data silos– Like to access a context-relevant knowledge network– Like to exchange information across application boundaries

(desktop vs. web-based, portal A vs. portal B)

• Promoting User Participation– Allow individual user innovation and contribution to community

cyberenvironment

National Center for Supercomputing Applications

Web 2.0 and Where 2.0

• Architecture of Participation– Software and Data (Mashup)– People (Social Networking,

Collaboration)

• Open, Light Weight (de-facto) Standards and Formats– RDF (Resource Definition

Framework)– Microformats– Variants of XML (such as KML,

obsKML etc.)

• “Where 2.0” highlights the importance of spatial context– It is estimated that over 80% all

information have geospatial components

The mind-map constructed by Markus Angermeier on November 11, 2005

National Center for Supercomputing Applications

Participatory Cyberenvironment

• A Web 2.0 and Semantic Web approach for Cyberinfrastructure• An architecture of participation for scientific activity

– This refers to both human and software/data participation• Human-to-human collaboration and social networking (using blog, message

board etc.) and user-generated scientific artifacts (e.g. workflow)• Software participation means mashup

– API-based and Content/Data-based

• An open service platform – Reusable and standard-compliant service components/interface must be

built and presented for third-party application use/reuse• E.g. NCSA CyberIntegrator ( a desktop Java-based workflow application can use

the CyberCollaboratory open service API (SOAP, or JSON) to query user/group affiliation and publish workflow template to the CyberCollaboratory’s document library

• An integration and presentation platform for knowledge network– Knowledge network about sensor, data, model, workflow, people,

publication, computing resources etc• Dynamically generated and proactively presented in the portal

– Exchanging information across application boundaries

Cyberenvironment Technology Stack

CyberIntegrator

Workflow Development and Publication

Event-triggered Workflow Execution

Context (social, geospatial, provenance, ontologies, …), metadata fabric

Tupelo semantic content management middleware

Workflow/Model

Registries and Data Storage

Data/Documents/Content

CyberCollaboratory

Portal/Group Workspace

External Data Services

High-End Visualization

High-res ApplicationsVisual Orchestration

Auto-stereo Visualization

External sensor networks and data stores

GIS

Workspace mgmt.

Visualization, Graphing, Reporting

Single Sign-On SecurityModeling Analysis/

Translation

Computational Resources

ScienceApplications

Services Clouds

Infrastructure

Note: Boxes with yellow background are this talk’s focus

National Center for Supercomputing Applications

CyberCollaboratory Portal• Since its inception in 2004, over 400

users have registered– Built on top of open source portal framework

Liferay with additions/changes/integrations using NCSA technologies

• Group Spaces• Document/Image Library, discussion forums,

announcements, wiki, blog, RSS reader, etc. …

• Production/Pilot Deployment in multiple projects

– NSF-funded WATERS (WATer and Environmental Research Systems) Network Project office (in production-mode since 2004)

– NSF-funded multiple WATERS Testbed projects

– NCSA Infectious disease informatics project– NSF-funded Hydro Synthesis Project– EPA-funded Small Water Public Systems

Project– NCASSR-funded Palantir collaborative

computer security investigation Portal– Office of Naval Research (ONR)-funded

Education Project

http://www.linux.com/feature/118675August 23, 2007

Evolving Towards a Rich-context Participatory Cyberenvironment

• Hybrid Approach– Leverage Web 2.0 pattern/technologies

• Architecture of participation

– Leverage Semantic Web technology (RDF)• Through the use of NCSA Tupelo as the semantic

content repository middleware

• Goals– Break data silos created by different portals, or non-

web-based applications

– Enable user participation and content-based mashup

National Center for Supercomputing Applications

The Role of Contexts

• Context: – “the parts of a discourse that surround a word or

passage and can throw light on its meaning” • From Merriam-Webster Online Dictionary

• Semantic Contexts for Cyberenvironment– Social Context (Who ?)– Geospatial Context (Where ?)– Causal Context (Why ? and How? )– Conceptual Context (What ?)

• Role: the above four areas build the foundation so that heterogeneous tools/portals can have a shared view and the ability to interact

National Center for Supercomputing Applications

Social Context (Who ?)• What’s It About?

– People, Group, Community, Virtual Organization– Who am I, Who are my friends and/or collaborators, team members– Social Networking (People-to-People)

• How Does It Work ?– RDF-based: FOAF (Friend-of-A-Friend)– Microformats-based: XFN (Xhtml Friends Network), hCard

• What Are the Scientific Use Case Drivers?– Environmental Observatories involve lots of researchers/stakeholders from diverse

disciplines nationally and internationally• Collaboration on complementary expertise• Find out who works on what and has what kind of expertise• Filtering information

– Research in social network area has shown that people will more likely to respond to collaboration requests if you know them (directly or indirectly through the person-to-person network)

– Complex coupled human-nature system science research calls for “Participatory Science”

National Center for Supercomputing Applications

Social Context (contd.)• What Have We Done So far?

– The key is to promote user participation to help build the virtual community in the CyberCollaboratory

– Production Implementations• My Page, My Menu, My Groups

navigation• Streamlined group creation

– Group template • Email invitation to both registered and non-

registered users to join group• Harvesting emails and associated

attachments into message boards and document library from mailinglist to allow full-text search

– Pilot Implementations• Social Network Analysis/Visualization• Recommender System

– People reads/uses this paper/tool also reads/uses other papers/tools

National Center for Supercomputing Applications

Social Context (contd.)• What Are Our Next Steps?

– Expose group/personal page information as microformats (hCard)

• Yahoo! Local etc. can find such group information

– Learn lessons from and exchange ideas with similar efforts in other scientific collaborative portals

• MyExperiment.org• OurSpaces.net

– Build dynamic social network graph

– Help build up the momentum of “social grid” (from Tony Hey, Microsoft Research)

National Center for Supercomputing Applications

Geospatial Context (Where ?)• What’s It About?

– Location, Location, Location– Point, line, polygon, …

• Intersection, overlap, coverage …..

– The advent of GeoSpatial Web or GeoWeb

• How Does It Work?– Lightweight formats and APIs/Services facilitate geo-referenced information

representation, exchange and mashup• GeoRSS, GeoURL, KML, Geo Microformat, GeoJSON, W3C Geo• GeoIQ, Google Map API, Microsoft Virtual Earth Visual SDK

– Easy-to-use Virtual Globe software puts earth metaphor right in front of users

• 3D/2D Geo-centric browsers allow non-GIS specialist to explore geo-referenced information

– Microsoft Virtual Earth, Google Earth, NASA WorldWind

– Standardized efforts promote geospatial services/data interoperability• OGC (Open Geospatial Consortium) geospatial standards

National Center for Supercomputing Applications

GeoSpatial Context (contd.)• What are the Scientific Use

Case Drivers?– Environmental Observatory data

needs to be interpreted within a geospatial context to enable holistic study of the system

• Common location components are important integration vehicle to link diverse information across different domains

• Eg. Digital Watershed data integration requires explicit geospatial context

– Spatial analysis in computational modeling of complex watershed science study also requires geo-referenced data

Urban Watershed

Hydrology

Meteorology/Hydrometerology

Social Science/Economics

Geology/Hydrogeology

Biology/Pathobiology

Water Chemistry

HumanPsychology/

Events

Sensor Networks/Engineered Infrastructure

Public Policy/Water as Commodity

A Complex System

With Many Interactions/Feedback

National Center for Supercomputing Applications

GeoSpatial Context (contd.)

• What Have We Done So Far?– Pilot Implementation

• Google Map-based sensor network map portlet

– Allow user to subscribe to both raw and derived data streams from the sensors

• What are Our Next Steps?– Incorporate geo-location information

into user profile– Build geo-social network

• Group formation based on geographical boundary

– Virtual observatory and digital watershed geo-referenced data integration using OGC-standards

National Center for Supercomputing Applications

Causal Context (How? And Why?)

• What’s It About?– Also known as Provenance– Describes the causal relationships and history

• among artifacts (e.g., data, people, instruments/sensors, publications, etc.) and

• events (e.g., processing steps, accession, custody) in a complex work process

– Useful for experiment validation and reuse of workflow, data products etc.

• How Does It Work?– RDF Triples – Open Provenance Model (OPM)

National Center for Supercomputing Applications

Causal Context (contd.)

• What Are the Scientific Use Case Drivers?– Researchers are using more data from Environmental

observatories and from others where they won't otherwise know the history

– More pieces of the data processing pipeline/workflowwill be changing and will need to be tracked

– Interdisciplinary/systems-oriented projects such as the watershed-scale human-nature interaction study will have more moving part

– Dynamic generation of knowledge network requires provenance data for events, workflow etc. across application boundaries

National Center for Supercomputing Applications

Causal Context (contd.)• What Have We Done So Far?

– Production Implementations • User activities/events in the CyerCollaboratory have been

harvested into RDF triple store through Tupelo middleware• documents, images, blog access/upoad/download• group mgmt (creation, user add/remove/invite)

– Pilot Implementations• Provenance tracking in CyberIntegrator (workflow)• Knowledge network creation based on provenance

• What Are Our Next Steps?– Ubiquitous provenance tracking cross portal boundaries and non-web-

based tools– Data QA/QC and workflow provenance are the major efforts at this moment

• Work with environmental observatory community on various use cases

– Geo-referenced provenance map for visualization of sensor data processing pipeline

National Center for Supercomputing Applications

Conceptual Context (What ?)• What’s It About?

– Mainly for domain-specific semantic concept relationships, i.e., ontologies• How Does It Work?

– Community consensus• Control vocabulary

– Folksonomy• User-generated metadata, tagging

– Hybrid approach• Allow user to add new control vocabulary to existing ontology

• What Are the Scientific User Case Drivers?– Ontology driven data search/integration has been recognized in many

scientific domains (including environmental observatory community)• E.g.:ODM (Observation Data Model)

– CUAHSI: Consortium of Universities for the Advancement of Hydrological Sciences, Inc.

• Semantic mediator to reconcile different ways of describing data– This is usually a community effort

Conceptual Context (contd.)

• What Have We Done So Far?– Production Implementation

• CyberCollaboratory allows user tagging on many tools, such as blogging, document library, wiki etc.

– Pilot Implementation• CyberIntegrator starts to build workflow ontology/tagging and

allow such information to be exposed to Portal user for filtering and searching workflow templates

• What Are Our Next Steps?– Leveraging environmental observatory ontology efforts

(such as CUAHSI ODM) for data integration and dissemination

– Establishing a set of control vocabulary for cyberenvironment development needs so that different tools can use consistent representation

National Center for Supercomputing Applications

How Would These Contexts Actually Play Together?

• Independently-produced context metadata in different portlets, portals, or desktop tools can be merged using RDF triples using Tupelo

• Allow non-invasive sharing data/information cross application boundaries without using same database schema– Portal A vs. Portal B– Desktop Application vs. Web-based Application

• Allow generation of knowledge network– Web-scale data integration and presentation

• Two examples– A production implementation with event/provenance capture and content-

based mashup• Mainly uses provenance context cross portal boundaries and desktop-web

boundary– An End-to-End pilot implementation which uses all contexts we discussed

so far

National Center for Supercomputing Applications

Tupelo Semantic Content Repository Middleware Fabric

Relational Database: MySQL

RDF StoreSesame

User1 User2

CyberCollaboratory Portal Instance 1

Portal Event Listener(Add/Update/Delete/Read)

Event/Provenance RDF Triples

Harvesting Remixing &Presenting

Example 1: Event/Provenance Capture & Content-based Mashup

Relational Database: MySQL

CyberCollaboratory Portal Instance 2

Portal Event Listener(Add/Update/Delete/Read)

CyberIntegrator

Provenance

National Center for Supercomputing Applications

Example 2: A Pilot End-to-End Implementation Using Participatory Cyberenvironment

• Environmental Observatory Use Case

– Sensor data anomaly detection in Corpus Christi Bay of Texas

– A group was created for this testbed project (social context)

– A google-map-based sensor map portlet to allow user to subscribe to sensor data stream (both raw and derived) (geospatial context, API-based mashup)

– User can monitor the sensor data and invoke another workflow in a different observatory from a proactively generated knowledge network which presents relevant sensors, workflows, publications, and people (provenance, ontologies context, content-based mashup, knowledge network)

• Individual researcher uses and contributes back tocommunity infrastructures

– Participatory Science needs/uses Participatory Cyberenvironment !

Individual User’s Desktop Dashboard Alert

Workflow remote executionwith modification

NewDerivedData Stream

National Center for Supercomputing Applications

Concluding Remarks

• Paradigm shifting in science are driving a need for increased sharing of contents across applications/systems

• Our research on four contexts (social, geospatial, causal, and conceptual) helps us take the Web 2.0/Semantic Web approach for CyberCollaboratory portal and other tools to enable such sharing

• Semantic middleware Tupelo can manage these contexts – Make a standard portal such as CyberCollaboratory more context-sensitive– Make cross-application boundaries content-based mashup possible

• Initial experiences with using these contexts have been positive

Concluding Remarks (contd.)

• Participatory cyberenvironments enable individual researcher to directly customize and then share their enhancements to community infrastructures– Participatory Science!

• Further research & development are being made at NCSA towards the full realization of the vision of a participatory cyberenvironment

National Center for Supercomputing Applications

Acknowledgements

• Teams:– NCSA ECID (Environmental CI Demo) team – Corpus Christi Bay WATERS Testbed team – WATERS Project Office – NCSA TRECC Year-8 Project Team

• Funding sources:– NSF grants BES-0414259, BES-0533513, and

SCI-0525308– Office of Naval Research grant N00014-04-1-

0437