27
Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu [email protected] Advisor: Prof. Geoffrey C. Fox 1/14/2009

Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu [email protected] Advisor: Prof. Geoffrey C. Fox 1/14/2009

Embed Size (px)

Citation preview

Page 1: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

Internet Documentation and Integration of Metadata (IDIOM)

Presented by Ahmet E. Topcu

[email protected]: Prof. Geoffrey C. Fox

1/14/2009

Page 2: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

2

Page 3: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

3

Why IDIOM Project?

Necessities for integration Need for common data format No easy way to find all publications Wealth of information contained in numerous field

remains largely outside the scope of tools What happens if tool you choose is not adopted or

worse just disappears, for example Windows Live Academic (WLA)

Architecture support Event based systems

Page 4: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

4

IDIOM Architecture

Page 5: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

5

IDIOM Key features We are NOT building a new tagging or search system We are building tools integrating and adding value to existing systems We built a mashup linking to del.icio.us, CiteULike, Connotea allowing

exchange of tags between sites and between local repositories Repositories also link to Google Scholar (GS) and Windows Academic

Live (WLA) GS has number of cited publications. WLA has Digital Object Identifier (DOI)

We implement a rather more powerful access control mechanism We build heuristic tools to mine “web lists” for citations We have an “event” based architecture (consistency model) allowing

change actions to be preserved and selectively changed Supports integrating different inconsistent views of a given

document and its updates on different tagging systems Event based system implemented by Fatih Mustacoglu

Original slide from Open Grid Forum Web 2.0 Workshop OGF21 presented by Geoffrey FoxNote: WLA is no longer available now

Page 6: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

6

IDIOM System Modules Search Tools Services

Google Scholar/Windows Live Academic Google Scholar Advanced Web Page Metadata Collection Local Database Search:

My Research Database My Research Database Advanced

Authentication and Authorization Services Login and Logout service DE Access rights management Database access rights management Administrative tools

Other Services User Registration Username and password recovery User’s Profile Management DE metadata view options

Page 7: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

7

Digital Entity (DE) Management Service Manual DE entity into the system DE history DE versioning and flexible choices (rollback) Editing and more info tools for a DE (Update Model)

Session and Event Management Services Event and dataset management DE view options User credentials (username/password) - cookie-based

Annotation Tools Service Transfer Service Download service Upload Service Extract DE and tags from web lists

Page 8: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

8

Tools Screen Shots

Google Scholar Windows Live Academic(WLA)

Note: no longer avaliable now Connotea Del.icio.us

Page 9: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

9

Page 10: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

10

Page 11: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

11

Page 12: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

12

del.icio.us Tags

Download toLocal System

del.icio.us Tags

Page 13: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

13

Page 14: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

14

Web Search Tools(GS/WLA) results

Web Search Tools after insertion operation

Page 15: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

15

Local Repository Metadata Search

Page 16: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

16

Metadata Collection from web pages

The aim Eliminate duplicate data entry in different web

platforms. Building richer metadata in IDIOM using base

collected Digital Entities from web pages. Share new Digital Entities with other tools and users

in IDIOM Push new collected Digital Entities to other

communities using web 2.0 features

Page 17: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

17

Methodology for CollectionCase: CGL Publications web page

Collect: Digital Entities in Community Grid Publication web pages.

Analyze: Using heuristic methodology to extract metadata fields of the Digital Entities for

CGL publications Build:

RSS objects using collected Digital Entities. New tags using collected Digital Entities.

Compare: Collected Digital Entities from CGL web pages with the existing Digital Entities

in IDIOM If they are:

different: Store new Digital Entities in SRG storage. same: Option to update tags and other fields.

Share: New Digital Entities with other Tools using IDIOM

Page 18: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

18

Web Metadata Collection

Page 19: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

19

Display list of the feeders for searched web pages

Page 20: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

20

Displaying the metadata dynamically

Displaying the metadata in RSS/XML format Displaying the metadata using RSS subscription

Page 21: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

21

Key Definitions

Digital Entity (DE): A digital collection of metadata for a citation stored in a system database forms a primary copy of a DAR.

Event: A time-stamped action on a digital entity Major Events:

Insertion or deletion of a digital entity Minor Events:

Modifications to an existing digital entity Distributed Annotation Record (DAR): Collection

of metadata stored at an annotation tool.

Page 22: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

22

Security Model Security in web 2.0 can be limited. We implemented a simple but more powerful security

model around local tools that wrap Web 2.0 systems. We used an access-control matrix model to provide

security for our information system Supports multiple groups and multiple users for each Digital

Entity (DE). Similar to UNIX file system

The Unix RWX bits corresponds to Read, Write, and Execute operation for each file and directory.

In our system, DE correspond to the file element and folder corresponds to the directory element.

For each DE and folder, there are three types of access rights defined in the systems: Read, Write, and Delete.

Page 23: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

23

Security Model II We have a security model that supports

Level of Authorization Roles are defined as Super Administrator (SA) and Group

Administrator (GA), User (U) The system allows having more than one SA. An existing SA can add other SAs to the system. SA can assign any User to become GA, and remove GA

from group. Each group should at least one GA. GA add/remove User

from group User can allow other Users and groups to share their

resources

User profile Share user profile between sites.

Page 24: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

24

Manage Digital Entities

Manage All Digital Entities for selected repository

Page 25: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

25

Super Admin capabilities Group Admin capabilities

Page 26: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

26

Summary

Build integration architecture We do not reinvent existing tools Use existing features of tools Provide tagging services Provides common metadata Allows to use consistent data

Page 27: Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu atopcu@cs.indiana.edu Advisor: Prof. Geoffrey C. Fox 1/14/2009

27

End

Thanks!