23
School of Information Studies Syracuse University Linking Entities in Scientific Metadata Jian Qin, Miao Chen, Xiaozhong Liu, & Andrea Wiggins School of Information Studies, Syracuse University

Linking Scientific Metadata (presented at DC2010)

Embed Size (px)

DESCRIPTION

Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.

Citation preview

Page 1: Linking Scientific Metadata (presented at DC2010)

School of Information StudiesSyracuse University

Linking Entities in Scientific Metadata

Jian Qin, Miao Chen, Xiaozhong Liu, & Andrea Wiggins

School of Information Studies, Syracuse University

Page 2: Linking Scientific Metadata (presented at DC2010)

The context: Islands of research information

04/10/2023

Linking Entities in Scientific Metadata -- DC2010 2

Data

Projects

Publications

Research interest

Researchers

Page 3: Linking Scientific Metadata (presented at DC2010)

Unlinked entities

Same entity!

04/10/2023 3Linking Entities in Scientific Metadata -- DC2010

Page 4: Linking Scientific Metadata (presented at DC2010)

Duplication of entity data entry

04/10/2023

Linking Entities in Scientific Metadata -- DC2010 4

Seamless Daily Precipitation for the Conterminous United States

Metadata:Identification_InformationData_Quality_InformationSpatial_Data_Organization_InformationSpatial_Reference_InformationEntity_and_Attribute_InformationDistribution_InformationMetadata_Reference_Information

Page 5: Linking Scientific Metadata (presented at DC2010)

What’s lacking in scientific metadata?• Standards focus on describing datasets, not

entities• No mechanism is provided for linking entities

– It is considered as an implementation issue• Islands of entities duplication of data entry

for the same entity – Increased costs and time in creating metadata– Effect in resource discovery and browse

04/10/2023 5Linking Entities in Scientific Metadata -- DC2010

Page 6: Linking Scientific Metadata (presented at DC2010)

Defining the research Problem

04/10/2023 6Linking Entities in Scientific Metadata -- DC2010

How can we build an interlinked network of entities for a scientific domain?

How can we associate the linked entities with their corresponding metadata records?

Page 7: Linking Scientific Metadata (presented at DC2010)

Linked Data: A solution

04/10/2023 7Linking Entities in Scientific Metadata -- DC2010

Relational database

containing entities and relationships

Metadata records in

XML format

Problem: Lack relationships between entities

Problem: Not related to metadata records

ResourcePropertyType

Value

RDF TriplesConvert to Embed RDF triples into

Solution

Page 8: Linking Scientific Metadata (presented at DC2010)

Linked data: How it works

04/10/2023

Linking Entities in Scientific Metadata -- DC2010 8

Page 9: Linking Scientific Metadata (presented at DC2010)

Linked data is

04/10/2023

Linking Entities in Scientific Metadata -- DC2010 9

“…a recommended best practice for exposing, sharing, and connecting

pieces of data, information, and knowledge on the Semantic Web

using URIs and RDF.”

--Wikipedia, http://en.wikipedia.org/wiki/Linked_Data

Page 10: Linking Scientific Metadata (presented at DC2010)

A case study

04/10/2023 10Linking Entities in Scientific Metadata -- DC2010

Dataset collection search interface at HBES (http://hubbardbrook.org/data/dataset_search.php)

Page 11: Linking Scientific Metadata (presented at DC2010)

Hubbard Brook Ecosystem Study (HBES)• Long term ecological research sites since 1960s• 3,160 hectare reserve• Six principle organizations & 10 other participants:

– USDA Forest Service– Cornell– Dartmouth– Syracuse– Yale– the Institute of Ecosystem Studies (IES)– the U.S. Geological Survey

• Over 300 datasets available and 2000 publications

04/10/2023 11Linking Entities in Scientific Metadata -- DC2010

Page 12: Linking Scientific Metadata (presented at DC2010)

HBES Data Collection• Focused on entities on the HBES site:

– Projects– Persons– Publications– Subject interests– Datasets– Events

• Verified Person and Project information against the Long-Term Ecological Research (LTER) directory if necessary;

• Stored the entities in relational database• Metadata records in EML format

04/10/2023 12Linking Entities in Scientific Metadata -- DC2010

Page 13: Linking Scientific Metadata (presented at DC2010)

Ecological Metadata Language (EML) Structure and Modules

04/10/2023

Linking Entities in Scientific Metadata -- DC2010 13

Page 14: Linking Scientific Metadata (presented at DC2010)

Conditions required for interlinking

• URI-identified entities• Relationships between these entities• Relationships between the entities

and metadata records

04/10/2023 14Linking Entities in Scientific Metadata -- DC2010

Page 15: Linking Scientific Metadata (presented at DC2010)

Experiment stage 1: Data prep• Two sets of data:

– Entities and their relationships• Person, subject interest, project, dataset, and paper• Many-to-many relations between the entities

– Sample EML records in XML format• Downloaded from HBES website• Entity URIs added to the corresponding XML files to be

used as semantic identifiers and hyperlinks to the entities

• 126 XML files in total

04/10/2023 15Linking Entities in Scientific Metadata -- DC2010

Page 16: Linking Scientific Metadata (presented at DC2010)

Entity relationships

04/10/2023 16Linking Entities in Scientific Metadata -- DC2010

Page 17: Linking Scientific Metadata (presented at DC2010)

Experiment stage 2: Converting to RDF• Toolkit: D2R, a service for converting

relational databases into RDF triples and publishing them on the web– Turn each table into a class– Turn each column as class property– Make each value in a column as an instance– Assign a URI to each class, property, and instance

04/10/2023 17Linking Entities in Scientific Metadata -- DC2010

Page 18: Linking Scientific Metadata (presented at DC2010)

04/10/2023 Linking Entities in Scientific Metadata -- DC2010 18

Page 19: Linking Scientific Metadata (presented at DC2010)

Experiment stage 3: Incorporating URI into XML records

• Add the URIs generated from the D2R software to their corresponding entities in EML records by using an XSL program

• Transform the EML records with inserted URIs into the HTML format for display in browser

04/10/2023 19Linking Entities in Scientific Metadata -- DC2010

Page 20: Linking Scientific Metadata (presented at DC2010)

Example of name with URI inserted

04/10/2023 Linking Entities in Scientific Metadata -- DC2010 20

Original EML record without URI URI added to individual name element

<individualName> <givenName>Thomas G</givenName> <surName>Siccama</surName></individualName>

<individualName> <givenName>Thomas G. </givenName> <surName>Siccama</surName> <personURI>page/people/tsiccama </personURI></individualName>

Page 21: Linking Scientific Metadata (presented at DC2010)

04/10/2023

Linking Entities in Scientific Metadata -- DC2010 21

Original display of EML record RDF-enabled display of EML record

Page 22: Linking Scientific Metadata (presented at DC2010)

Discussion• Methodology for transforming islands of entities

into linked scientific metadata• A larger scale data set needed to test its

scalability• Potentials:

– Reducing duplicate entity data entry – Applicable to legacy metadata generated using older

data model– Linking semantic data already published on the web– Facilitating data/metadata visualization??

04/10/2023 22Linking Entities in Scientific Metadata -- DC2010

Page 23: Linking Scientific Metadata (presented at DC2010)

DEMO

http://sdl.syr.edu/eml/

04/10/2023 Linking Entities in Scientific Metadata -- DC2010 23