Upload
naiara
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
U.S Geological Survey National Biological Information Infrastructure. Technical Overview: NBII Metadata Clearinghouse May 2008. Mike Frame. Topics for discussion. Metadata CH Background New Metadata CH Design & Demo Underlying Architecture. www. NBII. gov. My. NBII. gov. PORTAL. - PowerPoint PPT Presentation
Citation preview
U.S Geological Survey National Biological Information
Infrastructure
Technical Overview:
NBII Metadata Clearinghouse
May 2008
Mike Frame
Topics for discussion
Metadata CH Background New Metadata CH Design & Demo Underlying Architecture
text
Describe and Discover
www.NBII.gov
PORTAL
My.NBII.gov
Content Management Integrated/Federated SearchCollaboration Services
Database and Web Services
Model ServicesGeospatial Services
ITIS DIGR CatalogThesaurus Mapping Geoparsing CatalogGeo -
referencingDiscovery CatalogOperations
Dublin Core (plus)
UDDI / WSDL ??OGC/ISO FGDC/ISO
Distributed Applications , Databases , Websites , Tools and Models
Consume
Integrated View
DistributedServices
Resource and Service Catalogs
DistributedResources
Resource Catalog
Geospatial Services Catalog
Geospatial Dataset
Resource Clearinghouse
Database and Web Services
Catalog
Model ServicesCatalog
Services Overview
NBII Metadata Resourceshttp://metadata.nbii.gov
http://metadata.nbii.gov
Metadata Resources:FGDC Metadata Program
Tool reviews Training OpportunitiesResources for using the Standard
NBII Clearinghouse
7 Sections make up the FGDC Standard:
1. Identification Information
2. Data Quality Information
3. Spatial Data Information
4. Spatial Reference Information
5. Entity and Attribute Information
6. Data Distribution Information
7. Metadata Reference Information
Some basic metadata facts…about the FGDC Standard
NBII Metadata CH
Rational for Metadata CH Redesign
User Feedback Metadata creation Metadata management Metadata integration with data Open architecture framework Speed and Reliability Data quality Data visualization License Costs
NBII Metadata CH provides: Single portal to information contained in disparate data
management systems Free text, fielded, spatial, and temporal search
capabilities Allow individuals and database managers to distribute
their data while maintaining complete control and ownership
Leverage investment in existing information systems and research• NBII is part of the Mercury Consortium @ ORNL
NBII CH: New Functionalities
Rich Client Interface
Combined search results (status page)
Filterring search results (Facet)
Dynamic sorting of search results
Bookmark brief and full metadata pages
Based on open source technologies:
• Lucene• Solr
NBII CH New Functionalities Cont.. SOA based design
• Web services• RSS services for search results• Portlet support• Search Sharing support
Thesaurus Support Seamless data ordering/data extraction with various data
partners Seamless data visualization integration with external
visualization tools Improved User Statistics Collection
The Clearinghouse is operated for NBII by the Oak Ridge National Laboratory
Over 38,000 records
41 partners contributing metadata records
Ability to search in a variety of ways
Redesigned in 2008
The NBII Clearinghouse
NBII CH Demo NBII Clearinghouse interface:
http://mercdev3.ornl.gov/nbii3/
How does the NBII Clearinghouse work?
How does the NBII Clearinghouse work?
How does the NBII Clearinghouse work?
How does the NBII Clearinghouse work?
Metadata CH RSSWorld Data Center
http://wdc.nbii.gov
NBII Metadata ClearinghouseArchitecture
Metadata CH Architecture
CH Function of the NBII Metadata Program Operated by ORNL• NBII is 1 Organization in Mercury Consortium
Established relationship in 2001 Formerly based on “Blue Angel
Technologies” Currently based on Lucene/Solr Open
Source Technologies
3. Remote users query the index via a Web-based browser
6. Highly detailed data and documentation are downloaded directly from the contributing agency
1. Principal investigators create detailed metadata and data files using local applications or ORNL- OME 2. NBII Mercury collects metadata and key data
from contributing agencies’ servers distributed around the country and builds a centralized index
4. Metadata summaries are returned to the remote users, including links back to detailed information and data at the PIs’ server or data repository
5. Remote users select links to data of interest
Index
Users
Virtual Internet Database
P.I. Summary – John Smith Product A Container: 1; 10/12/2003 Container 2; 01/20/2002 Container 3; 07/05/2001 Product B Container 1; 03/05/1999….
P.I. NameProduct NumberProduct TitleSiteSubject AreaThematic AreaKeywordsetc.
Distributed Data Discovery and Access System
Custom Export
Program
Custom Export
Program
ExistingDatabase
ExistingDatabase
ExistingDatabase
ExistingDatabase
ExistingDatabase
ExistingDatabase
EncryptedXML
EncryptedXML
Index
Metadata exists in remote legacy databases using any
platform, OS or RDBMS
Metadata are extracted into XML files yielding standardized data objects
Harvested metadata are combined at the central site, transformed (if needed), and indexed
Users work with a single, simple, web-like interface to access all data simultaneously
Databases can be of different structures
and content
Export programs are easily written and automated
These files can be remotely harvested via the Internet
Frequent, automated harvesting and complete re-building of the index keeps the aggregate database up to date
No re-programming of existing systems
required
Business as usual for contributing
databases
EncryptedXML
EncryptedXML
Custom Export
Program
Custom Export
Program
Z39.50 or WS
Z39.50 or WS
A Virtual Aggregate Database
NBII CH Design Diagram
Solr Schema for defining the fields Index metadata
records
NBII CH Harvester
FGDC-BIO
Transformed Files MySQLMercury3_harvests_nbii
DB updater tool(custom Java)
Solr Indexer tool(custom java)
XML Beans to extract the contents
SOLR Search Server Extended Lucene Index
UI
Solr Searcher(custom Java Spring)
Web Service Web Service
RSS
Portlets Portlets
External Metadata
http, ftp, web crawl
Future Development Phase II (May 2008 to September 2008):
• Harvester engine to use open source tools (Remove COTS) (Phase I & II)• Portal integration through JSR-168 Portlet standard
• Search portlets, portlets for recent datasets, top most searched words etc..• Web service implementation (Phase I & II):
• Thesaurus support (semantic web integration support)• Gazetteer web service implementation • OGC Catalog Service (include Web Mapping/Coverage/Feature Servers in search)• Universal Description, Discovery, and Integration (UDDI) Directory Services
• Dynamic RSS support, including Geo-RSS support• ISO 19115 support• OpenSearch support• Documentation and Help (Phase I & II)• User Statistics Application modifications
Phase III (October 2008 to January 2009):• Save, Retrieve and Email user queries• Possible integration to OPeNDAP • Web Service Harvesting (OAI)• Internationalization• ????
Search technology using Lucene/SOLR
Lucene• Overview
• Who uses Lucene
Solr• Overview
• Who uses Solr
Lucene Overview
High-performance, full-featured text search engine library written entirely in Java
Mature Apache Open Source Java Project Index speed and integrity, search speed
• uses file based full text and inverted indexing
• is extremely fast with built-in caching
Can easily handle millions of documents Very active mailing list for support
Who uses Lucene
Wikipedia MediaWiki European Bioinformatics Institute Liferay Bigsearch.ca Monster Academic Archive On-line Complete list:
• http://en.wikipedia.org/wiki/Lucene
• http://wiki.apache.org/lucene-java/PoweredBy
SOLR Overview
Open source enterprise search server based on the Lucene Java search library
Apache project, sub-project of Lucene
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML and HTTP
Solr uses Lucene search library and extends it
SOLR Overview Contd..
A Real Data Schema, with Numeric Types, Date fields, Dynamic Fields
Dynamic Faceted Browsing and Filtering
Advanced, Configurable Text Analysis
Highly Configurable and User Extensible Caching
External Configuration via XML
Scalability - Efficient Replication to other Solr Search Servers
Administration Interface is available
Who uses SOLR
CNET Reviews shopper.com AOL Music netflix search.com The Digital Commonwealth mindquarry for complete list:
http://wiki.apache.org/solr/PublicServers
Mercury Instances Demo NBII Clearinghouse interface:
http://mercdev3.ornl.gov/nbii3/
ORNLDAAC interface: http://daac.ornl.gov/
LBA Mercury interface: http://mercdev3.ornl.gov/lba3/
DADDI Mercury interface: http://mercdev3.ornl.gov/daddi3/
GFIS RSS Portal interface: http://www.gfis.net/gfis/home.faces
User Statistics Report Generation Tool
Open source Harvester Re-design (Aperture)
Questions, Comments,
Mike Frame
865 576-3605
Thanks to:
Giri PalanisamySystems Architect and Team LeaderMercury Consortium [email protected]
Vivian HutchisonNBII Metadata Program [email protected]