30
1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet E. Topcu Community Grids Laboratory, Indiana University Bloomington IN 47404 [email protected] , http:// www.infomall.org

1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Embed Size (px)

Citation preview

Page 1: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

11

Semantic Research Grid

Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington

October 15 2007

Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet E. Topcu

Community Grids Laboratory, Indiana University Bloomington IN 47404

[email protected], http://www.infomall.org

Page 2: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

ExistingUser Interface

Semantic Scholars Grid

etc.

Google Scholar

ManuscriptCentral

Science.gov

Windows Live Academic Search

Citeseer

CMT Conference

Management

Existing Documentbased Tools

Web serviceWrappers

New Document-enhancedResearch Tools

Integration/EnhancementUser Interface

Community Tools

Generic Document Tools

MyResearchDatabase

Bibliographic Database

Export:RSS, BibtexEndnote etc.

CiteULike

Connotea

Del.icio.us

Bibsonomy

BioliciousPubChem

PubMed

Traditional GridCyberinfrastructure

MySpace

Web 2.0

MASHUP

Page 3: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Delicious Semantic Web/Grid http://del.icio.us purchased by Yahoo for ~$30M http://www.CiteULike.org http://www.connotea.org (Nature) Associate metadata with Bookmarks specified by

URL’s, DOI’s (Digital Object Identifiers) Users add comments and keywords (called tags) Users are linked together into groups (communities) Information such as title and authors extracted

automatically from some sites (PubMed, ACM, IEEE, Wiley etc.)

Bibtex like additional information in CiteULike This is perhaps de facto Semantic Web – remarkable

for its simplicity

Page 4: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Example Parallel

Computing Collection selected on Cell Tag

So far no clear “winner” in tagging space

Maybe CiteUlike with different metadata better

How do I preserve investment?

Page 5: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

General Document Semantic Analysis Citeseer and Google Scholar scour the Internet and analyze documents

for incidental metadata

• Title, author and institution of documents

• Citations with their own metadata allowing one to match to other documents

These capabilities are sure to become more powerful and to be extended

• Give “Citation Index” in real time

• Tell you all authors of all papers that cite a paper that cites you etc. (Note it’s a small world so don’t go too far in link analysis)

• Tell you all citations of all papers in a workshop

• Helps journal editor by suggesting referees based on document analysis or by doing a “plagiarism” analysis by scoring comparison with other Internet documents

Page 6: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Possible challenges Use of Web 2.0 tools in science (and business) is very

promising but adoption is currently small Which of many tools will be popular with your

colleagues? What happens if tool you chose is not adopted or worse

– just disappears in a industry “shake-up”? How to best integrate web-tagged document with Word

and Latex citations? Need to tag URI’s – e.g. database entries, not just

URL’s (did for journal control system) Is currently security model sufficient? Can we link virtual organization of tagging system with

that of other Cyberinfrastructure/Web 2.0 subsystems

Page 7: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Roughly what we are doing We are NOT building a new tagging or search system We are building tools integrating and adding value to existing

systems We built a mashup linking to del.icio.us, CiteULike, Connotea

allowing exchange of tags between sites and between local repositories

Repositories also link to local sources (PubsOnline) and Google Scholar (GS) and Windows Academic Live (WLA)• GS has number of cited publications. • WLA has Digital Object Identifier (DOI)

We implement a rather more powerful access control mechanism We build heuristic tools to mine “web lists” for citations We have an “event” based architecture (consistency model)

allowing change actions to be preserved and selectively changed• Supports integrating different inconsistent views of a given document and

its updates on different tagging systems

Page 8: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

del.icio.us Tags

Download toLocal System

del.icio.us Tags

Page 9: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Semantic Research Grid (SRG) Architecture

Page 10: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Key Concepts of System Architecture Digital Entity (DE): a digital collection of metadata for a

citation Event: a time-stamped action on a digital entity. Our

event-based model consists of:• Major Events:

Insertion or deletion of a digital entity• Minor Events:

Modifications to an existing digital entity• Dataset:

Collection of major and minor events Service-based Framework (SOAP over Http)

04/22/2310

Page 11: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Example Subsystem

Transfer Download/Upload Modify Digital

Entity (DE) Share DE with

other users Add/Get More info

on a DE History (as a set of

events) of a DE

and rollback04/22/23 11

CiteULike DelicousConnotea

Research Database

Research Database

Research Database

Core Web Services

Page 12: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

SRG System Modules I Digital Entity (DE) Management Service

• Manual DE entity into the system• DE history• DE versioning and flexible choices (rollback)• Editing and more info tools for a DE (Update Model)

Session and Event Management Services• Event and dataset management• DE view options • User credentials (username/password) - cookie-based

Annotation Tools Service• Transfer Service• Download service• Upload Service• Extract DE and tags from web lists

04/22/23 12

Page 13: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

SRG System Modules II Search Tools Services

• Google Scholar/Windows Live Academic• Google Scholar Advanced• Local Database Search:

Via integrated PubsOnline Tool from Indiana University My Research Database My Research Database Advanced

Authentication and Authorization Services• Login and Logout service• DE Access rights management• Database access rights management• Administrative tools

Other Services• User Registration• Username and password recovery

• User’s Profile Management

• DE metadata view options

04/22/23 13

Page 14: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Technical Issues Event-based model

• Manipulating data and metadata• How to build event-based model ?

Major and Minor events Datasets (collection of minor events)

• How to apply event-based model ?• How to apply modifications to a record (Digital Entity) ?

Keep them in user’s session and let user apply them Or apply them automatically to a DE

• How to merge metadata fields of Event and Digital Entity ? Identification of metadata fields as dynamic or static

field How to apply service-based framework as wrapper?

04/22/23 14

Page 15: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Some recent Features of SRG• Hybrid Consistency Framework Implementation

– Data-centric strict consistency model– Implements primary-copy based consistency protocol– Pull-based:

• Time-based consistency approach. • Communicates with Annotation Tools to collect updates

periodically– Push-based:

• Updates are distributed to Annotation Tools immediately once they occurred on the primary copy

• Periodic Search Tools Implementation– Search, compare and apply the updates made to a Digital Entity

(DE) in the system.• Unique (128 bit) UUID assignment for each Digital Entity• User Tags view in the system

– Displays all tags belongs to a user– Allow easy update or more info request on a Digital Entity by tags

Page 16: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Hybrid Consistency Framework for Semantic Research Grid

Page 17: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet
Page 18: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Tool Updating Database from Web Page

Page 19: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet
Page 20: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Metadata Collection from CGL web pages

• The aim is to

– Eliminate duplicate data entry in different web platforms.

– Building richer metadata in SRG using base collected Digital Entities from web pages.

– Share new Digital Entities with other tools and users in SRG

– Push new collected Digital Entities to other communities using web 2.0 features

Page 21: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Methodology for Collection• Collect:

– Digital Entities in Community Grid Publication web pages.• Analyze:

– Using heuristic methodology to extract metadata fields of the Digital Entities for CGL publications

• Build:– RSS objects using collected Digital Entities.– New tags using collected Digital Entities.

• Compare: – Collected Digital Entities from CGL web pages with the existing Digital

Entities in SRG.• If they are:

– different: Store new Digital Entities in SRG storage.– same: Option to update tags and other fields.

• Share:– New Digital Entities with other Tools using SRG.

Page 22: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Security Model Security in Web 2.0 can be limited We implement a simple but more powerful security

model around local tools that wrap Web 2.0 systems We used an access-control matrix model to provide

security for our information system• Supports multiple groups and multiple users for each object.• Similar to UNIX file system

The Unix RWX bits corresponds to Read, Write, and Execute operation for each file and directory.

• In SRG, DE (Digital Entity) correspond to the file element and folder corresponds to the directory element.

• For each DE and folder, there are three types of access rights defined in the systems: Read, Write, and Delete.

Page 23: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Security Model II We have a security model that supports

• Level of Authorization Roles are defined as Super Administrator (SA) and Group

Administrator (GA), User (U) The system allows having more than one SA. An existing SA can add other SAs to the system. SA can assign any U to become GA, and remove GA from

group. Each group should at least one GA. GA add/remove U

from group

• User profile Share user profile between Web 2.0 sites.

Page 24: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet
Page 25: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Current Usage of Semantic Research Grid Project

We have used/tested Semantic Research Grid (SRG) (a prototype model) for published scientific research publications in Community Grids Lab at Indiana University

In CGL 20 students ,post-docs and faculty members are testing

They are using the prototype model for collecting of publication, uploading/ downloading them and sharing them with other users

Page 26: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Summary Integration• We have successfully integrated Google Scholar and Windows Live Academic

search tools and CiteUlike, Delicious, and Connotea annotation tools which provide a system that allow dynamic publication.

Flexibility and Extensibility• We provides flexibility allowing integration of different tools having common

metadata.• Easy to add and extend service mechanism

Management and Consistency Scheme of Digital Entities• Allows the manipulation of a digital entity• Applies Event-based model based on the concept of:

Major events Minor events Datasets

• Provides a rollback feature to: Support for history tool for a DE Merge and change the content of a digital entity

• A service-based framework for using existing annotation tools through web services

Prototype project web site: http://gf6.ucs.indiana.edu:58080/SRGrid

Page 27: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Domain Specific Semantic Document Analysis It is natural to develop core document Services such as those

used in Citeseer/Google Scholar but applied to “your” documents of interest that may not have been processed yet

• As just submitted to a conference perhaps These tools can help form useful lists such as authors of all cited

or submitted papers to a journal OSCAR3 (from Peter Murray-Rust’s group at Cambridge)

augments the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms

• This tool is a Service that can be applied to “your” document or to a set of documents harvested in some fashion

• Luis Rocha has developed related ideas for Biology

• Other fields have natural application specific metadata and OSCAR like tools can be developed for them

This is another Semantic Scholar Grid Tool

Page 28: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

OSCAR3 Chemistry Document analysis

It detects “magic” chemical strings in text and then• Stores them as

metadata associated with document

Queries ChemInformatics repositories to tell you lots of information about identified compounds

Tells you which other documents have this compound

Page 29: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

Initial Results from OSCAR on PubMed We have a small sample (100) of full text Chemistry papers selected at

random from 15 years of PubMed with over 5 million abstracts• OSCAR3 generates 4.17 compound names per abstract• and 36.7 compound names per full text• 555,007 PubMed abstracts of 2005 – 2006 (part) used for Abstracts (on

Big Red) Illustrates how much knowledge journal publishers are hiding from us

Page 30: 1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet

CICC Chemical Informatics Cyberinfrastructure Collaboratory

PubMedDatabase

OSCARText

Analysis

POV-RayParallel

Rendering

Initial 3DStructure

Calculation

ToxicityFiltering

ClusterGrouping

Docking

MolecularMechanics

Calculations

Quantum Mechanics

Calculations

IU’sVaruna

Database

NIHPubChemDatabaseNIH

PubChemDatabase

Product databases are wrapped with Web service interfaces and are suitable for inclusion in Taverna workflows.

PubChemDatabase

MOADDatabase

Integrating document (OSCAR) and conventional services on the IU Big Red Supercomputer