Upload
maori-ito
View
2.839
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
Citation preview
Life Science Database Cross Search and Metadata
Maori Ito @ NIBIO
Database integrate collaboration among 4 ministries with NBDC• Database Catalog
• Life Science Database Cross Search
• Life Science Database Archive
• Database Reconstructive Integration
Why Cross Search?• Easy to use
• Accustomed to use
• Appropriate for comparing various kinds of databases
Sagace• Search for Biomedical Data &
Resources in Japan
Bad Skeptical Reputations for Search Results…• Useless…
• Slow….
• What is the advantage?
What is the most Important thing in cross search ?
Simple Answers
•Speed and Accuracy
Mechanism of Search Engine
1. Crawling
2. Indexing
3. Query Processing
4. Scoring
Crawling• Crawl databases and pages by
program
Program
Indexing
• Split data convenient size and store own server
External Data
Internal Server
Query Processing and Scoring
NIBIO
MEDALS
JCGGDB
NBDC / DBCLS
AgriTogo
Collaborate by using P2P architecture
Under Comtemplation
In case of Hyper Estraier (Search System)
12
Back to the simple answers to improvement
• Speed (Thanks to Johan-san ,Mizuguchi-san and many collaborators)1. Relax limits on access of DBCLS
(Use a liggle ingenuity in css and images)
• Accuracy NIBIO
NBDC / DBCLS
How to improve accuracy?• What is accuracy for life science
database cross search?
• What is accuracy for life science specialist?
• In general, developers emphasize search algorithms and scorings.
• However, general results and methods for cross search may not suitable for life science specialists..?
• Data (Index files) from life science databases are sometimes difficult to understand immediately.
• It’s hard to make each crawler program for each database and maintenance it.
• (We have no extra …. to make proper search page like entrez et al….)
To Improve Accuracy• Manually select Databases
• Assigned weights to crawled databases for improving the ranking system
Metadata!• One way to solve these problems
Difficult to understand
data immediatel
y
If metadata are added data…
Disease:Epithelial adenomaSpecies:Mouse Keywords:DNA sequenceLast Modified:2013-01-19
Metadata
Data
Easy to understand for users• It can be a guide to improve user
experience.
Image
Easy to understand for crawlers
Disease:Epithelial adenomaSpecies:Mouse Keywords:DNA sequenceLast Modified:2013-01-19
Metadata
How to use it?• Mark up data by microdata like a tag
Last Modified
TitleImage
ID
http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en
• Google, Yahoo! and Bing decided to use microdata to show search results more valuable.
• Some vocabularies have already applied to search results.
• E.g.
Is it a practical suggestion?
Schema.org• Provide a collection of schemas (htm tags)
• Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. (quoted by schema.org)
• We proposed “schema.org” extensions for “BiologicalDatabaseEntry” and “Biological Database”.
• Schema.org proposals : http://www.w3.org/wiki/WebSchemas/SchemaDotOrgProposals
Properties for BiologicalDatabaseEntry
entryID additionalType dateCreated
isEntryof description dateModified
taxon image keywords
seeAlso url provider
reference alternativeHeadline
breadcrumb
name inLanguage
Related Link for our proposal • WebSchemas proposal ‘Biological
Databases’ for schema.org– http://www.w3.org/wiki/WebSchemas/
BioDatabases
• Discussions at BioHackathon– https://github.com/dbcls/bh12/wiki/Schem
a.org-extension
• Discussions at BH12.12 (Japanese only)– http://wiki.lifesciencedb.jp/mw/index.php
/BH12.12/schema.org
How to markup ?
<div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”>ID <span itemprop="entryID">1556</span>Specied<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"> <span itemprop="name">Bacillus subtilis</span></span>Deposition: <span itemprop="dateCreated">2008-09-08</span>Last update: <span itemprop="dateModified">2012-10-24</span>
</div>
Declaration
Specify Property and markup with normal tag
And then• Crawl these microdata
• Reflect Search Results
At Present
Within the fiscal year (Preparation to
reflect)
Image
Ask for your help• If this approach have some efforts,
there are may be chances to reflect major search engines.
• Please markup your own site or database and give me feedback.
• If you have any suggestions or comments, please let me know.
Future Perspective• Focus on Accuracy continuously
• Microdata– Discuss many scientists and finalize the
proposal of schema.org extension
– Boost numbers of databases
– Make support tools to mark up microdata
• Add appropriate data from high-quality databases
Thank you for listening!