26
Beyond Seamless Access: Meta-data in the Age of Content Integration Spring 2000 Program Information Technology Interest Group of Association of College & Research Libraries, New England Chapter Univ. of Connecticut May 26, 2000 Amanda Xu Information Architect EBSCO, 10 Estes Street, Ipswich, MA 01938 [email protected]

Beyond Seamless Access: Meta-data in the Age of Content Integration

  • Upload
    erv

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Beyond Seamless Access: Meta-data in the Age of Content Integration. Spring 2000 Program Information Technology Interest Group of Association of College & Research Libraries, New England Chapter Univ. of Connecticut May 26, 2000 Amanda Xu Information Architect - PowerPoint PPT Presentation

Citation preview

Page 1: Beyond Seamless Access: Meta-data in the Age of Content Integration

Beyond Seamless Access:Meta-data in the Age of Content Integration

Spring 2000 Program Information Technology Interest Group of Association of College & Research Libraries, New England Chapter

Univ. of ConnecticutMay 26, 2000

Amanda Xu

Information ArchitectEBSCO, 10 Estes Street, Ipswich, MA 01938

[email protected]

Page 2: Beyond Seamless Access: Meta-data in the Age of Content Integration

OVERVIEW

•DefinitionsMeta-data, schemas, and XML linking structures

•Why content integration and analysis?Assumptions about information search and retrieval

•Meta-data applications for content integration and analysis

•How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web?

•Role of librarians, and information mediators in the wave of content

integration

Page 3: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (1)Meta-data, What is it? [1/6]

Definitions:1) “Data about data” or “information which describes a data set”2) Data elements, and attributes that facilitate the search and retrieval ofa set of associated attributes

Example 1:•An address label contains: name, address, city, state, zip•Address might feature a home or office, address access permissions,

last updated, internal references

3) A set of semantics that describe the data, classify it, categorize it, and provide instructions on how and where to exploit it

Example 2:•Standard bibliographic information, summaries, indexing terms, and

abstracts

Page 4: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (1)

Meta-data, What is it? [2/6]

Example 3: Simple XML Record

<record> <title>The Tao of Pooh</title> <author label=“personal”>Benjamin Hoff</author> <date label=“1st-published”>1982</date> <isbn>01400-67477</isbn> <publisher>Dutton</publisher> <subject label=“personal”>Winnie the Pooh</subject> <subject>Taoism in literature</subject> <classification scheme=“LCC”>PR6025.I65Z68 1983 </classification>

</record>

Page 5: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (1) Meta-data, What is it? [3/6]

4) Supports understanding of a document, its structure, relationship, locations, and usage

5) Helps you find things or make things disappear

Where is meta-data?

1) Internally:

• Embedded with markup, and with content

• Attached as resource header (HTML META Tag), or package

2) Externally:

• Stored separately from its resource

• Generated on demand, e.g. MS SQL Server or Oracle

• Static, e.g. bibliographic record

• Dynamic linked using Xlink/Xpointers/Xpath and ISO Hytime

Page 6: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (1)

Meta-data, What is it? [4/6]

Naming Issues:

Can your meta-data be interchanged, and shared with others via computer programs or parsers?

• URI = URN + URL + URC (IETF)

• Namespaces (W3C): qualify elements uniquely, and avoid name collision

• URIs specify the namespaces in use

• XML Namespaces provide a way for the name to be unique, but it doesn’t solve vocabulary ambiguity

Page 7: Beyond Seamless Access: Meta-data in the Age of Content Integration

Example 4:

<date> used in three different occasions:

From George’s document: <date>9-Sept-1999</date>From Martha’s document: <date>The lovely Deni</date>

From Hadley’s document: <date>Large Plump Medjool</date>

Use namespaces:

<george:date> 9-Sept-1999</george:date><martha:date>The lovely Deni</martha:date><hadley:date> Large Plump Medjool</hadley:date>

Note: Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston

Definitions (1) Meta-data, What is it? [5/6]

Page 8: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (1) Meta-data, What is it? [6/6]

Example 5: Simple Dublin Core Record with DC namespace, and qualifiers

<?xml version=“1.0” encoding=“UTF-8”?><?xml version=“1.0” standalone=“yes”?>

<record xmlns:dc=“http://purl.org/dc/elements/1.0/”xmlns:dc=“http://purl.org/dc/elements/qualifiers/1.0/”>

<dc:title>The Tao of Pooh</dc:title>

<dc:creator>Benjamin Hoff</dc:creator>

<dcq:creatorType>Illustrator</dcq:creatorType>

<dc:date>1982</dc:date> <dc:isbn>01400-67477</dc:isbn> <dc:publisher>Dutton</dc:publisher> <dc:subject>Winnie the Pooh</dc:subject> <dc:subject>Taoism in literature</dc:subject>

</record>

Page 9: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (2) Schemas, What is it? [1/3]

How do you know which meta-data/vocabularies that you are interchanging with?

– Schemas (DTDs): • understand document elements and structures • validation /parsing• schemas support data types (e.g. integer, time, time period), open content model, inheritance,

constraints, and namespaces

– Example: <xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">

<xsd:element name="state" type="xsd:string"/>

<xsd:element name="zip" type="xsd:decimal"/>

<xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed" value="US"/>

Note: Example from Brian Travis’s tutorial, “XML and Data-Driven Web Architectures”, Seybold Seminars, Boston, Feb. 11, 2000.

Page 10: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (2) Schemas, What is it? [2/3]

How many types of XML vocabularies are there?

Examples:

1) xml schema

<xs:schema xmlns:xs="http://www.w3.org/1999/XMLSchema targetNamespace="http://purl.org/metadata/dublin_core” version="M.n">...

</xs:schema>

2) RDF<? xml version=‘1.0’>

<rdf:RDF xmlns:rdf=“http://www.w3.org/TR/REC-rdf-syntax#”

xmlns:rdfs=“http://www.w3.org/TR/WD-rdf-schema#”

xmlns:dc=“ “>

Page 11: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (2) Schemas, What is it? [3/3]

3) Schema repositories: industry-specific – SOAP, BizCodes, XMLRPC, ICE, CDF, WebDav, XML/ASN.1, XML/EDI,

XER, and Z39.50

– BizTalk.org: routing information

<bizTalk>

<Route>

<From locationID=“206.247.76.187” locationType=“IP” handle=“72” process=“POConf” Path=“”/>

<To locationID=“83-627-54204” locationType=“DUNS” handle=“14” process=“PO_Process” Path=“”/>

</Route>

<body>

<purchaseOrder xmlns=“urn:schemas-toycat-com:PurchaseOrder.biz” PONumber=“10-01-2118”></purchaseOrder>

</body>

</bizTalk>

Note: Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston

Page 12: Beyond Seamless Access: Meta-data in the Age of Content Integration

Simple Meta-data Interchange Model

DB

XML/ASN.1ServerDirect Transfer

Sche

ma

& m

ap s

ys C

to S

ysA

XML/ASN.1 ServerSTMP

DB

Template m

apping

between SysA to Sys

B, then sys B to sys C

System B

ILL Request in XML/EDIFACT

Direct Transfer

•protocol•syntax•encoding

System A

System C

XML/EDIFACT to ASN.1/BER

ASN.1/BER XML/BER

Direct Transfer to STMP

ASN.1/BER to XML/BER

STMP to Direct TransferXML/BER to XML/EDIFACT

Page 13: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (3) Linking Structures [1/6]

My Element

Attlist my thing

Xlink-Root URIXpointer address

Remoteschema/ (DTD)

Root URI

address

Leveraging XML Syntax:

Link structures, which link an XML name tag to an external standard reference item, and which allow context query and

non-context query at element and attribute level

Notes:

Xlink specification <http://www.w3.org/TR/xlink>Xpointer Specification <http://www.w3.org/TR/xptr>

Page 14: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (3) Linking Structures [2/6]

ApplicationRequestlinkInfo

The API toretrieve

link information from the linkbase

Linkbase

Leveraging application:

The link structures, in which linkInfo partakes are returned to the application, which can be re-assembled for different purposes on the fly

Page 15: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (3) Linking Structures [3/6]

Leveraging resources and merging links

Original Doc

Link structures in which links are merged into the original doc, and formed a composite document.

API merge the links

Composite Doc

linkbase

Page 16: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (3) Linking Structures [4/6]

Topic Map:

“To qualify the content and/or data contained in information objects as topics to enable

navigational tools such as indexes, cross-references, citation systems, or glossaries.

To link topics together in such a way as to enable navigation between them

To filter an information set to create views adapted to specific users or purposes. For

example, such filtering can aid in the management of multilingual documents, management

of access modes depending on security criteria, delivery of partial views depending on user

profiles and/or knowledge domains, etc.

To structure unstructured information objects, or to facilitate the creation of topic-oriented

user interfaces that provide the effect of merging unstructured information bases with

structured ones.”

Note: Quote from Topic Map web site: http://www.ornl.gov/sgml/sc34/document/0058.htm/>

Page 17: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (3) Linking Structures [5/6]

Query

Category map

filter

profilesprofiles

knowledge domainslanguagesaccess rightsdelivery views/devices

DBDB

Structured docs

Unstructured docs

LinkCluster Adaptive categories

Attach categories

Match query

Result set w/ category map

Search/navigate

TOPIC MAPTOPIC MAP

Leverage Topic Maps

TOPIC MAP

TOPIC MAPTOPIC MAP

TOPIC MAPTOPIC MAP

TOPIC MAPTOPIC MAP

TOPIC MAPTOPIC MAP

1

2

Page 18: Beyond Seamless Access: Meta-data in the Age of Content Integration

Definitions (3) Linking Structures [6/6]

Topic association -Example<topic id=“n001” types=“city”>

<topicname><basename>New York City</basename>

</topname><mention adr1 adr2 adr3</mention></topic>

<topic id=“c98991” types=“monument”><topicname>

<basename>Brooklyn Bridge</basename></topname><mention>adr34 adr3462 adr9832</mention></topic>

<assoc type=“sightseeing” scope=“civil-engineering”><when-in>n001</when-in><visit>c98991</visit></assoc>

<topic id=“city” types=“topictypes”><topic id=“monument” types=“topictypes”><topic id=“civil-engineering”><topic id=“topictypes”>Note:Example from Steve R. Newcomb’s tutorial, “Metadata, Schemas, and Linking Structures” XML World conference, Ottawa, Sept. 13, 1999, updated 5/30/2000.

Page 19: Beyond Seamless Access: Meta-data in the Age of Content Integration

Why content integration and analysis?

Assumptions about information search and retrieval

Information retrieval is only the 1st step for information management.

The next step is information analysis and decision support, where information analysis is to cross-correlate information from multiple and diverse data sources in the net for specific problem solving, and where decision support is to detect, analyze and alert topics, trends and events based on the correlated information.

Notes:

Schatz, Bruce R. 1998. “Information Analysis in the Net: The Interspace of the Twenty-First Century.” Visualizing Subject Access for 21st Century Information Resources, edited by Pauline Cochrane and Eric E. Johnson. Univ. of Illinois at Urbana-Champaign.

Evans, David A. 1999. “Beyond Information Retrieval Workshop, 4 th Search Engine Conference,April 9, 1999, Boston, MA.”

Page 20: Beyond Seamless Access: Meta-data in the Age of Content Integration

Meta-data applications for content integration and analysis (1 of 3)

What has it to do with products for the library world?

Today:

– Full-text linking

• ILL/DocDelivery

• ILS linking for holdings

• Publishers & Authors’ Web sites

• Linking services

– Reference linking services provided by CrossRef, SFX, LANL

• Patent data

Tomorrow:

– User can link directly to any content published by a specific organization simply

by highlighting a phrase, sentence, paragraph, a document appearing in any

browser, word-processing package, email program or other application

Page 21: Beyond Seamless Access: Meta-data in the Age of Content Integration

Meta-data applications for content integration and analysis (2 of 3)

– Interwoven threads for subjects, journal titles, authors, collections

– No document boundary, but information space where a deeper understanding of knowledge within and across domain is facilitated for specific problem solving and decision support

Subjects

•UMLS•Word Net•LCSH•Lexicons•Dictionaries

Journal Titles

•Ulrich’s Serials Directory•LC Serials•Gale Directory

Authors

•Who’s Who•Wilson Bibliography•Gale Contemporary Authors•Authority files from LC•Community of Science

Linkbase

Linkbase

Article collectionsArticle collections

Book collectionsBook collections

Journal collections

Other media

Page 22: Beyond Seamless Access: Meta-data in the Age of Content Integration

Meta-data Applications for Content Integration and Analysis (3 of 3) Future -- decision support and problem solving

Meta-data standardization

Book directoryCollection directoryJournal directoryAuthor directory

Bi-directional linkingBi-directional linking

Collections

Library holdings

ILL/Document delivery

Reference linking

Site-mapKnowledge-base

Site-mapKnowledge-base

Websites

reviews/annotations

/publisher sites/author pages

/email/mailing lists

/chatting rooms/community

pages

Authority Control

Page 23: Beyond Seamless Access: Meta-data in the Age of Content Integration

How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (1 of 2)

XML is nothing but data interchange. It is the application that makes the data reusable, and thus adds functionality and intelligence to it:

In the beginning --> Editing

Generation X --> Look and feel

Intelligence (SGML/XML) --> Semantics:

Levels of fragmentation Schema recognition, Namespace handlingLinking registration and management

--> Viewing/Personalized delivery --> Interactive services, e.g. B2B --> Software applications,

e.g. re-purposing, concurrent editing

Page 24: Beyond Seamless Access: Meta-data in the Age of Content Integration

How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (2 of 2)

XML enables text mining which has become

– increasingly fine grained, subjective, and personal via

• extracting information

• counting by type (quantifying)

• categorizing/filtering

• discovering trends

• capturing critical details

• assessing trends

Note:

Evans, David A. 2000. “Text Mining Workshop.” Fifth Search Engines Conference, Boston, MA.

Page 25: Beyond Seamless Access: Meta-data in the Age of Content Integration

Role of librarians, and information mediators in the wave of content integration

Every aspect of librarian-ship is needed It is a matter of which parts you would like to

participate

Page 26: Beyond Seamless Access: Meta-data in the Age of Content Integration