fox@vt fox.cs.vt Dept. of Computer Science, Virginia Tech

Preview:

DESCRIPTION

CS4624 Closing Slides (May 6, 2009) “From Multimedia to Hypertext to Information Access to Digital Libraries” by Edward A. Fox. fox@vt.edu http://fox.cs.vt.edu Dept. of Computer Science, Virginia Tech Blacksburg, VA 24061 USA. Acknowledgements. Mentors (Licklider, Kessler, Salton) - PowerPoint PPT Presentation

Citation preview

1

CS4624 Closing Slides(May 6, 2009)

“From Multimedia to Hypertext to Information Access to Digital Libraries”

by Edward A. Fox

• fox@vt.edu http://fox.cs.vt.edu

• Dept. of Computer Science, Virginia Tech

• Blacksburg, VA 24061 USA

Acknowledgements

• Mentors (Licklider, Kessler, Salton)• Virginia Tech, CS, Digital Library Research

Laboratory• NSF and other sponsors• Students, colleagues, co-investigators• Marcos André Gonçalves, Doug Gorton, Rao

Shen, ...• Barbara Wildemuth, Jeffrey Pomerantz,

Sanghee Oh, Seungwon Yang2

3

CC2001 Information Management Areas

IM1. Information models and systems*

IM8. Distributed DBs

IM2. Database systems* IM9. Physical DB design

IM3. Data modeling* IM10. Data mining

IM4. Relational DBs IM11. Information storage and retrieval

IM5. Database query languages

IM12. Hypertext and hypermedia

IM6. Relational DB design IM13. Multimedia information & systems

IM7. Transaction processing IM14. Digital libraries

* Core components

4

DL Curriculum FrameworkSemester 1:

DL collections:development/creation

Semester 2:DL services and

sustainability

CO

UR

SE

ST

RU

CT

UR

E

DigitizationStorage

Interchange

Digital objectsCompositesPackages

MetadataCataloging

Author submission

NamingRepositories

Archives

Spaces(conceptual,geographic,2/3D, VR)

Architectures(agents, buses,

wrappers/mediators)Interoperability

Services(searching,

linking, browsing, etc.)

Intellectual property rights mgmt.

PrivacyProtection (watermarking)

Archiving and preservation

Integrity

Architectures(agents, buses,

wrappers/mediators)Interoperability

CO

RE

DL

TO

PIC

S

DocumentsE-publishing

Markup

Info. NeedsRelevanceEvaluation

Effectiveness

ThesauriOntologies

ClassificationCategorization

Bibliographic information

BibliometricsCitations

RoutingFiltering

Community filtering

Search & search strategyInfo seeking behavior

User modelingFeedback

Info summarizationVisualization

Multimedia streams/structures

Capture/representationCompression/coding

Content-based analysis

Multimedia indexing

Multimediapresentation,

rendering

RE

LA

TE

DT

OP

ICS

DL Curric. Project – Acknowledgements, Info.

• NSF award to VT and UN C-CH

• CS and LIS

• http://curric.dlib.vt.edu/

• http://curric.dlib.vt.edu/wiki/index.php/Main_Page

• Advisory Board, reviewers, field testers5

6

7

8

9

For More Information• Magazine: www.dlib.org• Books: http://fox.cs.vt.edu/DLSB.html (1994)

– MIT Press: Arms, plus by Borgman, Licklider (1965)– Morgan Kaufmann: Witten... (several), Lesk (2nd edition)

• Conferences– ECDL: www.ecdl2008.org– ICADL: www.icadl.org– JCDL: www.jcdl2008.org

• Associations– ASIS&T DL SIG– IEEE TCDL: www.ieee-tcdl.org (student awards, doctoral

consortia)• NSF: www.dli2.nsf.gov• Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/ (old)

10

SynchronousScholarly Communication

Same time, Same or different place

11

Asynchronous, Digital Library Mediated Scholarly Communication

Different time and/or place

12

Libraries of the FutureJCR Licklider, 1965, MIT Press

World

Nation

State

City

Community

Computing (flops)Digital content

Com

mun

icat

ions

(ban

dwid

th, c

onne

ctiv

ity)

Locating Digital Libraries in Computing andCommunications Technology Space

Digital Libraries technologytrajectory: intellectualaccess to globally distributed information

less moreNote: we should consider 4 dimensions: computing, communications,content, and community (people)

14

Borgman et al.:Workshop Report onSocial Aspects ofDigital Libraries: http://www-lis.gseis.ucla.edu/DL/

InformationLifeCycle

15

Information Life Cycle

AuthoringModifying

OrganizingIndexing

StoringRetrieving

DistributingNetworking

Retention/ Mining

AccessingFiltering

UsingCreating

16

Digital LibrariesShorten the Chain from

Editor

Publisher

A&I

Consolidator

Library

Reviewer

17

DLs Shorten the Chain to

Author

Reader

Digital

LibraryEditor

Reviewer

Teacher

Learner

Librarian

18

DL Definitions - 1

• “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.”

• Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003

19

DL Definitions - 2

• “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities”

• Waters,D.J. CLIR Issues, July/August 1998• www.clir.org/pubs/issues/issues04.html

20

DL Definitions - 3

• Issues and Spectra

– Collection vs. Institution

– Content vs. System

– Access vs. Preservation

– “Free” vs. Quality

– Managed vs. Comprehensive

– Centralized vs. Distributed

21

DL Definitions - 4

• NOT a “digitized library”• NOT a “deconstruction” of existing

systems and institutions, moving them to an electronic box in a Library

• IS a new way to deal with knowledge– Authoring, Self-archiving, Collecting,– Organizing, Preserving,– Accessing, Propagating, Re-using

22

D ig ita l L ib ra r y C o n te n t

A rtic le s ,R e p o rts,

B o o ks

T e xtD o cum e n ts

S p ee ch ,M u s ic

V id eoA u d io

(A e ria l)P h o tos

G e og rap h icIn fo rm ation

M o d e lsS im u la tio ns

S o ftw a re ,P ro g ra m s

G e no m eH u m a n,a n im a l,

p la n t

B ioIn fo rm ation

2 D , 3 D ,V R ,C A T

Im ag es a ndG ra p h ics

C o nte n tT yp e s

23

Informal 5S & DL Definitions

DLs are complex systems that

• help satisfy info needs of users (societies)

• provide info services (scenarios)

• organize info in usable ways (structures)

• present info in usable ways (spaces)

• communicate info with users (streams)

24

5S LayersSocieties

Scenarios

Spaces

Structures

Streams

25

5Ss

Ss Examples Objectives

Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data

Structures Collection; catalog; hypertext; document; metadata

Specifies organizational aspects of the DL content

Spaces Measure; measurable, topological, vector, probabilistic

Defines logical and presentational views of several DL components

Scenarios Searching, browsing, recommending

Details the behavior of DL services

Societies Service managers, learners, teachers, etc.

Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

26

5S and DL formal definitions and compositions (April 2004 TOIS)

5S

structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)

structural metadataspecification(d.25)

descriptive metadataspecification(d.26)

repository(d. 33)

collection (d. 31)

(d.34)indexingservice

structured stream (d.29)

digitalobject (d.30)

metadata catalog (d.32)

browsingservice

(d.37)

searchingservice (d.35)

digital library(minimal) (d. 38)

services (d.22)

sequence (d. 3)

graph (d. 6)function (d. 2)

measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces

event (d.10)state (d. 18)

hypertext(d.36)

sequence (d. 3)

transmission(d.23)

relation (d. 1) language (d.5)

grammar (d. 7)

tuple (d. 4)*

27

Streams

text

audio

image

video digitalobject

Repository

CollectionCatalog

describes

stores

is_version_of/ cites/links_to

Index

Service

Scenario

event

extends

reuses

ServiceManager

Actor

operationexecutes

participates_in

recipient

runs

Scenarios

Societies

inherits_from/includes

association

uses

Topological

ProbabilisticMetric

Measurable

Measure

describes

employsproduces

employsproduces

employs

produces

Structures

Spaces

Vector

contains

metadata specifications

is_a is_a

precedes

happens_before

is_a

redefinesinvokes

contains

contains

28

Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing

Annotating Classifying Clustering Evaluating Extracting Indexing

Measuring Publicizing

Rating Reviewing (peer)

Surveying Translating

(language)

Conserving Converting

Copying/Replicating Emulating Renewing

Translating (format)

Acquiring Cataloging

Crawling (focused) Describing Digitizing

Federating Harvesting Purchasing Submitting

Preservational Creational

Add Value

Repository-Building

Information Satisfaction

Services

Infrastructure Services

29

Ontology: Applications

30

31

ETANA-DL

• Archaeological DL• Integrated DL

– Heterogeneous data handling

• Applies and extends the OAI-PMH– Open Archives Initiative Protocol for Metadata

Handling

• Design considerations– Componentized– Extensible– Portable

32

ETANA-DL ArchitectureDigBase and DigKit

Lahav

Nimrin

Umayri

Hisban

Megiddo

Jalul

New Sites

DATABASE

WRAPPERS

ETANA-DLUNION

CATALOG

SearchUSER

INTERFACE

Browse

Recommend

Note

Personalize

Review

Visualizations

ArchaeologySpecific

Work in progress

33Map courtesy: www.enchantedlearning.com

Initial ETANA-DL Member Locations

Virginia Tech

Mississippi State University

Vanderbilt University

Canadian University College

Walla Walla College

Andrews University

CWRU

Willamette University

34

35

36

Lahav Website

37

Megiddo Opening Screen

38

Locus Screen: Pictures

View all

39

Area Screen

40

41

ETANA-DL Website

42

Marking – writingnotes for

a specific user

Marking Items

43

ETANA-DL Multi-dimensional Browsing

3 new sites

2 new types of artifacts

44

Visual Browsing Nimrin: Topographical Drawings

Full site North west quadrant

Square:N40/W20

45

Visual Browsing Nimrin : Square information

Square:N40/W20

Locus: 86

Loci layout

46

Visual Browsing Bab edh-Dhra'

Cemetery

Pottery # 25

47

Visual Browsing Bab edh-Dhra'

Cemetery

Pottery # 25

48

ETANA Societies

1. Historic and pre-historic societies (being studied)2. Archaeologists (in academic institutes, fieldwork

settings, or local and national governmental bodies)

3. Project directors4. Technical staff (consisting of photographers,

technical illustrators, and their assistants)5. Field staff (responsible for the actual work of

excavation)6. Camp staff (e.g., camp managers, registrars, tool

stewards)7. General public (e.g., educators, learners, citizens)

49

ETANA Societies

• Social issues1. Who owns the finds?

2. Where should they be preserved?

3. What nationality and ethnicity do they represent?

4. Who has publication rights?

5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?

50

ETANA Scenarios1. Life in the site in former times2. Digital recording: the planning stage and the excavation stage 3. Planning stage: remote sensing, fieldwalking, field surveys, building

surveys, consulting historical and other documentary sources, and managing the sites and monuments

4. Excavation1. Detailed information is recorded, including for each layer of soil, and for

features such as pole holes, pits, and ditches. 2. Data about each artifact is recorded together with information about its

exact find spot. 3. Numerous environmental and other samples are taken for laboratory

analysis, and the location and purpose of each is carefully recorded. 4. Large numbers of photographs are taken, both general views of the

progress of excavation and detailed shots showing the contexts of finds. 5. Organization and storage of material6. Analysis and hypotheses generation and testing7. Publications, museum displays8. Information services for the general public

51

ETANA Spaces

1. Geographic distribution of found artifacts2. Temporal dimension (as inferred by

archaeologists) 3. Metric or vector spaces

1. used to support retrieval operations, and to calculate distance (and similarity)

2. used to browse / constrain searches spatially

4. 3D models of the past, used to reconstruct and visualize archaeological ruins

5. 2D interfaces for human-computer interaction

52

ETANA Structures

1. Site Organization1. Region, site, partition, sub-partition, locus,

2. Temporal orderings (ages, periods)

3. Taxonomies1. for bones, seeds, building materials, …

4. Stratigraphic relationships1. above, beneath, coexistent

53

ETANA Streams

1. successive photos and drawings of excavation sites, loci, unearthed artifacts

2. audio and video recordings of excavation activities and discussions

3. textual reports

4. 3D models used to reconstruct and visualize archaeological ruins.

54

Integrated Integrated CCLINC CCLINC Translingual Information SystemTranslingual Information System

Integrated Integrated CCLINC CCLINC Translingual Information SystemTranslingual Information System

DARPA

Extraction

What is th

e north korean

movement in th

e front li

ne?

CCLINC SERVER

Info Detection

Summarization

It seems that North Korea launch a missile againAfter North Korea launched a Daipodong missilelast month, NK is perceived to proceed to an additionaltest launch. Korea, US and Japan enter into an alertstate, and prepare for a joint response policy. Korea estimates that the additional launch will be on 09/05. Japan estimates that NK’s missile range is short. USinformation says that there is no sign of launch yet.

Translation

What is th

e status of nk

missile la

unch against japan?

BugHanI IlBonE Ddo MiSaIlEul

BalSaHan Deus HaDa

2-w

a yS

pe e

c h T

ran

s ati

on

55

Structured Video Browser(making video into hypermedia)

www.learn.umd.edu

• IBrowse

• Expository multimedia• Narrative Structures

56

MP

EG

-7 Video Library S

ystems T

ech.

ICUInformation and CommunicationUniversity

MPEG-7 Video Library Systems Tech.

Video Data

Description GeneratorDescription Schemes

Design Tool

DescriptionScheme

MetaDatabase

VideoDatabase

Retrieval ServerModule

PlayerP

resentation

Module

Architecture

58

Textual information retrieval

Query on Google using Sunset and Rio de Janeiro

Query result

59

Content BasedInformationRetrieval

60

Degree of Structure

Chaotic Organized Structured

Web DLs DBs

61

Digital Objects (DOs)

• Born digital

• Digitized version of “real” object– Is the DO version the same, better, or worse?– Decision for ETDs: structured + rendered

• Surrogate for “real” object– Not covered explicitly in metamodel for a

minimal DL– Crucial in metamodel for archaeology DL

62

Complex to Simple

MARC ($50) Dublin Core (DC)

+thesis

63

OAI – Repository PerspectiveRequired: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

64

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

65

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

66

Institutional Repositories - 1

• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”

• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA

• www.arl.org/sparc/IR/IR_Guide_v1.pdf

67

Institutional Repositories - 2

• “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”

• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003, www.arl.org/newsltr/226/ir.html

68

69

70

What is Fedora™?

• Slides courtesy Vinod Chachra of VTLS

Flexible Extensible Digital Object Repository Architecture

71

Fedora™ Digital Object ArchitecturePersistent ID (PID)

Disseminators

System Metadata

EAD, TEI, DC, MARC,

VRA Core, MIX, etc.

Datastreams

Images, E-books, E-journals, Music, Video, etc.

Globally unique persistent id

Public view: access methods for obtaining “disseminations” of digital object content

Internal view: metadata necessary to manage the object

Protected view: content that makes up the “basis” of the object

The Mellon Fedora Project

Adapted from Slide by V. Chachra, VTLS

72

Fedora™Repository

E x ter n a lC o n ten tS o u r c e

E x ter n a lC o n ten tS o u r c e

HT

TP

E x ter n a l C o n ten tR etr iev er

X M L F ile s

Re la t io n a l D B

S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n

P o l icies

U s ers /G ro u p s

H T T P

F T P

D atas tr eam s

D ig ita l O b jec tsS to rag e S u b s ys te m

S e c u rityS u b s ys te m

W e b Se r vi c eE xpo s ur eL aye r

SO

AP

R em o teS er v ic e

L o c alS er v ic e

M an ag e A c c e s s S e arc h O A I P ro v id e r

M an ag e m e n tS u b s ys te m

A c c e s sS u b s ys te m

HT

TP

FT

P

H T T PH T T P S O A P H T T P S O A P H T T P S O A P

C lie n tA pplica t io n

B a tchPro g ra m

S e rv e rA pplica t io n

W e bB ro ws e r

Co mp o n e n t M g mt

O b je c t M g mt

O b je c t Va lid a t io n

P ID Ge n e ra t io n

O b je c t D is s e min a t io n

O b je c t Re fle c t io n

P o lic y En fo rc e me n t

P o lic y M g mt

Co n te n t

Web Service Web Service Exposure Exposure LayerLayer

Adapted from Slide by V. Chachra, VTLS

73

VITAL / Fedora Relationship

74

Annotations

OAI Data

Harvester

EDUCATORS

ADMINISTRATORS LEARNERS

Multilingual Searching

Revising Annotating Filtering Browsing Administering

Filtering Profiles User Profiles

Union Metadata

OAI Data

Provider

Remote and Peer Digital Libraries (eg. NSDL -CIS)

PORTALS

SERVICES

REPOSITORIES

Digital library architecture for localand interoperable CITIDEL services

75

Cluster NDLTD-Computing

76

Example of Union Service: CitiViz

77

78

79

NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup

referenceditems &

collections

referenceditems &

collections

Special Databases

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

annotation

CI Services

discussion

CI Services

personalization

CI Services

authentication

CI Services

browsing

Core Services:information retrieval

Core Collection-Building Services

harvesting

Core Collection-Building Services

protocols

Core Services:metadata gathering

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollections

NSDLCollections

NSDLCollections

CoreNSDL“Bus”

A Digital Library Case Study

• Domain: graduate education, research

• Genre:ETDs=electronic theses & dissertations

• Submission: http://etd.vt.edu

• Collection: http://www.theses.org

Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org

Student Gets CommitteeSignatures and Submits ETD

Signed

Grad School

Library Catalogs ETD, Access isOpened to the New Research

WWW

NDLTD

83

QuickTime™ and aCinepak decompressor

are needed to see this picture.

http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/

84

85Repository1

DL1

Repository2

Union Catalog

Union Repository

Catalog1 Catalog2

Searching

Union DL DL2

archaeologists

Society

General Public

Society

ArchaeologistsGeneral Public

Union Society

ServiceBrowsingService

Union Service

Harvesting, Mapping,Searching, Browsing,

Clustering, Visualization

Architecture of a Union DL

86

Union Catalog Integration

VN MetadataFormat

Global MetadataFormat

VNCatalog

HDCatalog

Union Catalog

MappingTool

Wrapper

MappingTool

Wrapper

HD MetadataFormat

Virtual Nimrin(VN)

Halif DigMaster(HD)

Union ArchDL

87

Mapping confirmation

Mapping history

88

89

90

91

92

93

Conclusions• Digital libraries integrate multimedia, hypertext, and

information access into a unified framework.• The 5S theory helps with analysis, specification,

system development, implementation, assessment, and refinement.

• We provide services atop repositories that include digital objects and a catalog of metadata objects. Examples include archaeology and education.

• Integration extends to distributed sites, including heterogeneous systems where schema mapping as well as union services are needed.

• There is worldwide benefit in all areas.

Recommended