30
Life Science Database Cross Search and Metadata Maori Ito @ NIBIO

Life Science Database Cross Search and Metadata

Embed Size (px)

DESCRIPTION

Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.

Citation preview

Page 1: Life Science Database Cross Search and Metadata

Life Science Database Cross Search and Metadata

Maori Ito @ NIBIO

Page 2: Life Science Database Cross Search and Metadata

Database integrate collaboration among 4 ministries with NBDC• Database Catalog

• Life Science Database Cross Search

• Life Science Database Archive

• Database Reconstructive Integration

Page 3: Life Science Database Cross Search and Metadata

Why Cross Search?• Easy to use

• Accustomed to use

• Appropriate for comparing various kinds of databases

Page 4: Life Science Database Cross Search and Metadata

Sagace• Search for Biomedical Data &

Resources in Japan

Page 5: Life Science Database Cross Search and Metadata

Bad Skeptical Reputations for Search Results…• Useless…

• Slow….

• What is the advantage?

Page 6: Life Science Database Cross Search and Metadata

What is the most Important thing in cross search ?

Page 7: Life Science Database Cross Search and Metadata

Simple Answers

•Speed and Accuracy

Page 8: Life Science Database Cross Search and Metadata

Mechanism of Search Engine

1. Crawling

2. Indexing

3. Query Processing

4. Scoring

Page 9: Life Science Database Cross Search and Metadata

Crawling• Crawl databases and pages by

program

Program

Page 10: Life Science Database Cross Search and Metadata

Indexing

• Split data convenient size and store own server

External Data

Internal Server

Page 11: Life Science Database Cross Search and Metadata

Query Processing and Scoring

Page 12: Life Science Database Cross Search and Metadata

NIBIO

MEDALS

JCGGDB

NBDC / DBCLS

AgriTogo

Collaborate by using P2P architecture

Under Comtemplation

In case of Hyper Estraier (Search System)

12

Page 13: Life Science Database Cross Search and Metadata

Back to the simple answers to improvement

• Speed (Thanks to Johan-san ,Mizuguchi-san and many collaborators)1. Relax limits on access of DBCLS

(Use a liggle ingenuity in css and images)

• Accuracy NIBIO

NBDC / DBCLS

Page 14: Life Science Database Cross Search and Metadata

How to improve accuracy?• What is accuracy for life science

database cross search?

• What is accuracy for life science specialist?

Page 15: Life Science Database Cross Search and Metadata

• In general, developers emphasize search algorithms and scorings.

• However, general results and methods for cross search may not suitable for life science specialists..?

• Data (Index files) from life science databases are sometimes difficult to understand immediately.

• It’s hard to make each crawler program for each database and maintenance it.

• (We have no extra …. to make proper search page like entrez et al….)

Page 16: Life Science Database Cross Search and Metadata

To Improve Accuracy• Manually select Databases

• Assigned weights to crawled databases for improving the ranking system

Page 17: Life Science Database Cross Search and Metadata

Metadata!• One way to solve these problems

Difficult to understand

data immediatel

y

Page 18: Life Science Database Cross Search and Metadata

If metadata are added data…

Disease:Epithelial adenomaSpecies:Mouse Keywords:DNA sequenceLast Modified:2013-01-19

Metadata

Data

Page 19: Life Science Database Cross Search and Metadata

Easy to understand for users• It can be a guide to improve user

experience.

Image

Page 20: Life Science Database Cross Search and Metadata

Easy to understand for crawlers

Disease:Epithelial adenomaSpecies:Mouse Keywords:DNA sequenceLast Modified:2013-01-19

Metadata

Page 21: Life Science Database Cross Search and Metadata

How to use it?• Mark up data by microdata like a tag

Last Modified

TitleImage

ID

http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en

Page 22: Life Science Database Cross Search and Metadata

• Google, Yahoo! and Bing decided to use microdata to show search results more valuable.

• Some vocabularies have already applied to search results.

• E.g.

Is it a practical suggestion?

Page 23: Life Science Database Cross Search and Metadata

Schema.org• Provide a collection of schemas (htm tags)

• Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. (quoted by schema.org)

• We proposed “schema.org” extensions for “BiologicalDatabaseEntry” and “Biological Database”.

• Schema.org proposals : http://www.w3.org/wiki/WebSchemas/SchemaDotOrgProposals

Page 24: Life Science Database Cross Search and Metadata

Properties for BiologicalDatabaseEntry

entryID additionalType dateCreated

isEntryof description dateModified

taxon image keywords

seeAlso url provider

reference alternativeHeadline

breadcrumb

name inLanguage

Page 26: Life Science Database Cross Search and Metadata

How to markup ?

<div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”>ID <span itemprop="entryID">1556</span>Specied<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"> <span itemprop="name">Bacillus subtilis</span></span>Deposition: <span itemprop="dateCreated">2008-09-08</span>Last update: <span itemprop="dateModified">2012-10-24</span>

</div>

Declaration

Specify Property and markup with normal tag

Page 27: Life Science Database Cross Search and Metadata

And then• Crawl these microdata

• Reflect Search Results

At Present

Within the fiscal year (Preparation to

reflect)

Image

Page 28: Life Science Database Cross Search and Metadata

Ask for your help• If this approach have some efforts,

there are may be chances to reflect major search engines.

• Please markup your own site or database and give me feedback.

• If you have any suggestions or comments, please let me know.

Page 29: Life Science Database Cross Search and Metadata

Future Perspective• Focus on Accuracy continuously

• Microdata– Discuss many scientists and finalize the

proposal of schema.org extension

– Boost numbers of databases

– Make support tools to mark up microdata

• Add appropriate data from high-quality databases

Page 30: Life Science Database Cross Search and Metadata

Thank you for listening!