GEOSS ADC Architecture Workshop Clearinghouse, Catalogues, Registries Doug Nebert U.S. Geological...

Preview:

Citation preview

GEOSS ADC Architecture WorkshopClearinghouse, Catalogues, Registries

Doug Nebert

U.S. Geological Survey

ddnebert@usgs.gov

February 5, 2008

GEOSS Access Context

GEOSSComponent,

Service registry

Standards,Special

ArrangementsRegistries

references

Web Portals and client applications

search

Offerors

contributeregister

CommunityResourcesaccess

GEOSSClearinghouse

Catalogues

Services

User

accesses

get list ofcatalogueservices

accesses

search

invoke

2

3

5

9

1

reference

operate

6

4

8

7

GEOSS Clearinghouse

• Clearinghouse as a broker to Community Catalogues

• Searches GEOSS Service Registry to identify services that can be searched

• Community Catalogues may either be “harvested” in advance or “searched” at the time of a user query

• Searches received from GEO Web Portal, Community Portals or any other external application acting as a catalog client

• Brief or full responses are marshaled and returned to requesting client as XML

GEOSS Clearinghouse

Use Case: coordination of Registry and Clearinghouse

• Providers interface the Registry using a GUI to register components and services.

• Clearinghouse routinely updated with select contents of Service Registry.

• Portals (both GEO and Community) and other clients search the Clearinghouse through a catalog service interface, i.e., not a GUI

• Searches of the Clearinghouse accomplished via – 1) metadata held in the clearinghouse - previously harvested

from remote catalogues – 2) distributed searches to remote catalogues at the time of

the users search.

Use Case: coordination of Registry and Clearinghouse• In the publishing activity, “A,” a GEOSS publisher activates an

online service and documents its existence or its data sources in a catalog.

• Activity “B” details the transactions taking place between a publisher who is registering a Component and a service and the Service and Standards registries.

• Activity “C” shows the GEOSS Clearinghouse discovering eligible services including catalog services in the GEOSS Service Registry and then accessing the found services directly. In some cases, the remote catalogs are set up for real-time distributed query – in others for harvesting or processing the results into a local cache.

• Activity “D” shows the expected interaction between a Web Portal and the clearinghouse and Component and Service registry.

Interaction Diagram – Clearinghouse

Interaction Diagram, continued

Clearinghouse testing

• Three implementations tested – Geonetwork Clearinghouse– ESRI Clearinghouse– Compusult Clearinghouse

• Three sets of tests were performed– Clearinghouse to Service Registry – Search of Clearinghouses by GEO Web Portal

candidates – Clearinghouse to Community Catalogues

Clearinghouse Requirements

• GEOSS Clearinghouse candidates assessment is based on the fulfillment of the requirements contained CFP – Requirements contain slight changes vs CFP

• Clearinghouse candidate self - assessment against requirements– Compliant except where requirements are

ambiguous– Expectation that all registered catalog services

should be made searchable through each Clearinghouse instance

Clearinghouse trade study: Distributed search vs. Harvest• Set of evaluation criteria defined followed by analysis

of the alternatives• Harvest alternative advantage: quick searches.

Disadvantage: metadata duplication• Distributed Search advantage is metadata is

maintained closer to source. Disadvantage that searching takes longer to complete and has more chances for the search to not be completed.

• Recommend Harvest when possible– Harvest only collection metadata– Policy of community catalogue must be respected

Integration Issues

• Catalogues registered with GEOSS have a wide variety of standardization. Protocols include:– ISO23950 (Z39.50) “GEO” Profile Version 2.2

• FGDC (CSDGM Metadata)• ANZLIC Metadata• ISO 19115 Metadata

– OGC Catalogue Service for the Web (Version 2.0.1 and 2.0.2)• ebRIM Profile (incl ISO and EO Extension Packages)• FGDC Profile• ISO 19115 Profile

– SRU/SRW OpenSearch– OAI-Protocol for Metadata Harvesting (OAI-PMH)– Dublin/Darwin Core Metadata– Web-accessible folder/ftp?

Who are the primary user types?

• Registries• Clearinghouse• Catalogues

What resource types should be registered?

• Consider service, data set, data collection (series), items as alternatives and the ability to transition from one to the other.

• Current results are too heterogeneous

What protocols can be expected?

• Let responses to CFP suggest choices• Support test harness capability to self-test registered

catalog service types• Clearinghouse instances must expose identical

service interfaces

What metadata formats are found?

• ISO 19139 and Profiles (INSPIRE, ANZLIC, NAP)• FGDC CSDGM• Dublin Core• Darwin Core

What metadata? How should it be presented?

• Need to refine the “core” metadata results that are handled and presented by the Clearinghouse as an intersection of data elements or “Summary” style record synthesized from the remote response

Specific recommendations (agreements for Clearinghouse testing and implementation)

• Performance issues and scalability need to be addressed, usage expectations, type & volume of use

• Typical use cases of query and presentation and load handling need to be included to gracefully handle numerous users and query loads

Recommended