35
Distributed Distributed Databases and Databases and metadata metadata G. Bégni, H. Makhmara - MEDIAS-France G. Bégni, H. Makhmara - MEDIAS-France July 18, 2004 July 18, 2004 ENVIROMIS Tomsk ENVIROMIS Tomsk

Distributed Databases and metadata

  • Upload
    egan

  • View
    34

  • Download
    1

Embed Size (px)

DESCRIPTION

Distributed Databases and metadata. G. Bégni, H. Makhmara - MEDIAS-France July 18, 2004 ENVIROMIS Tomsk. Aims of the presentation. Understanding the principles of metadata and databases. Making the scientific community aware of the efforts expected in terms of data documentation. - PowerPoint PPT Presentation

Citation preview

Page 1: Distributed Databases and metadata

Distributed Databases Distributed Databases and metadataand metadata

G. Bégni, H. Makhmara - MEDIAS-FranceG. Bégni, H. Makhmara - MEDIAS-France

July 18, 2004July 18, 2004

ENVIROMIS TomskENVIROMIS Tomsk

Page 2: Distributed Databases and metadata

Aims of the presentationAims of the presentation

Understanding the principles of metadata Understanding the principles of metadata and databases.and databases.

Making the scientific community aware of Making the scientific community aware of the efforts expected in terms of data the efforts expected in terms of data documentation. documentation.

Highlighting the positive impacts of such Highlighting the positive impacts of such efforts.efforts.

Demonstrating the need of an easy way to Demonstrating the need of an easy way to access distributed databases.access distributed databases.

Page 3: Distributed Databases and metadata

ApproachApproach

Presentation of the AMMA context and its Presentation of the AMMA context and its constraints: status of the problem.constraints: status of the problem.

Reflection on a solution.Reflection on a solution. Abstract dAbstract description escription of the various of the various

elements part of the solution.elements part of the solution. Selection and justification of Selection and justification of standards standards

andand techniques techniques.. Assessment of selections.Assessment of selections.

Page 4: Distributed Databases and metadata

AMMAAMMA context context

Scientific levelScientific level Multi-disciplinaryMulti-disciplinary Multi-Multi-scale.scale.

Technical levelTechnical level Multi-formatMulti-format Multi-Multi-volumevolume Multi-structureMulti-structure Multi-Multi-location.location.

Cultural levelCultural level Multi-Multi-lenguagelenguage Multi-usageMulti-usage Multi-Multi-possibilities.possibilities.

Page 5: Distributed Databases and metadata

Constraints involvedConstraints involved

Providing the various communities with the Providing the various communities with the best suited access to data best suited access to data ((languagelanguage, , mediummedium, , costcost, , sservices…)ervices…)

Guaranteeing the durability of data Guaranteeing the durability of data wherever they are produced. wherever they are produced.

Ensuring the durability of services as time Ensuring the durability of services as time goes bygoes by ( (technological developmentstechnological developments))..

Page 6: Distributed Databases and metadata

Access servicesAccess services

Easy web interface for data research and Easy web interface for data research and location (geographical, temporal, thematic, location (geographical, temporal, thematic, keywords).keywords).

Transparent service to access Transparent service to access heterogeneous distributed data heterogeneous distributed data (possibilities of compiling…).(possibilities of compiling…).

Homogeneous dHomogeneous documentation ocumentation for for heterogeneous dataheterogeneous data in order to optimise in order to optimise their exploitation.their exploitation.

Page 7: Distributed Databases and metadata

Data durabilityData durability

Multiple and systematic back-up Multiple and systematic back-up procedure.procedure.

Data transparency in relation to Data transparency in relation to technological changestechnological changes ( (hardware, hardware, softwaresoftware))..

Transparent data eTransparent data exploitation xploitation as time goes as time goes by.by.

Page 8: Distributed Databases and metadata

A A solutionsolution

Fully defined back-up process.Fully defined back-up process.

Data storage in standardised formats.Data storage in standardised formats.

Clear data dClear data documentation ocumentation for future for future exploitation.exploitation.

Page 9: Distributed Databases and metadata

Service durabilityService durability

Services should not depend on any Services should not depend on any proprietary or « exotic » software. proprietary or « exotic » software.

The quality of a service should not The quality of a service should not deteriorate according to technological deteriorate according to technological changes. changes.

Page 10: Distributed Databases and metadata

AA solution solution

Services Services based onbased on standards standards..

Services Services based on the « Open sourcebased on the « Open source»»..

Page 11: Distributed Databases and metadata

To sum-up:To sum-up:

Standardise storage.Standardise storage. Standardise services.Standardise services. Standardise exploitation.Standardise exploitation.

However, some data formats cannot be However, some data formats cannot be standardisedstandardised ( (satellite imagingsatellite imaging))..

Neither can the related services.Neither can the related services.

Page 12: Distributed Databases and metadata

Principles appliedPrinciples applied

Every item liable to be standardised Every item liable to be standardised should be standardised.should be standardised.

There should be a system gateway based There should be a system gateway based on standards only. on standards only.

Every item that cannot be standardised Every item that cannot be standardised should be described in a standardised should be described in a standardised way. way.

Page 13: Distributed Databases and metadata

A standard for each elementA standard for each element

Data storageData storage: ANSI/ISO: ANSI/ISO,, SQL, XML SQL, XML.. Data descriptionData description: FGDC-STD-001-1998 o: FGDC-STD-001-1998 orr

ISO 19115ISO 19115.. Service descriptionService description: W3C SOAP: W3C SOAP.. Catalogue: ANSI/ISO 23950 (Z39.50)Catalogue: ANSI/ISO 23950 (Z39.50)..

Page 14: Distributed Databases and metadata

Data descriptionData description M Metadataetadata

Formed from a Greek rootFormed from a Greek root (« meta »).(« meta »). What surpasses, encompasses a subject, a science.What surpasses, encompasses a subject, a science.

(Le Robert(Le Robert Dictionary Dictionary)).. Denoting a nature of a higher order or more fundamental Denoting a nature of a higher order or more fundamental

kind. kind. ((Ofxord Talking DictionaryOfxord Talking Dictionary)).. EnglishEnglish: metadata: metadata

FrenchFrench: métadonnées: métadonnées.. Literally speaking, metadata are data about data.Literally speaking, metadata are data about data. To be more precise, they are structured sets of To be more precise, they are structured sets of

information that describe resources.information that describe resources.

Page 15: Distributed Databases and metadata

Metadata standardsMetadata standards

Metadata have always existed.Metadata have always existed. An effort of world-wide standardisation An effort of world-wide standardisation

has been undertaken for several years.has been undertaken for several years. Several (georeferenced)Several (georeferenced) standards standards::

1.1. Content Standard for Content Standard for DDigital Geospatial igital Geospatial Metadata: FGDC-STD-001-1998Metadata: FGDC-STD-001-1998..

2.2. ISO 19115 ISO 19115 since the end ofsince the end of 2002 2002..

FFGGDC DC is a de facto standard.is a de facto standard.

Page 16: Distributed Databases and metadata

AAddvantagesvantages

Homogeneous presentation.Homogeneous presentation. Pooled developments.Pooled developments. PossibiliPossibility to automate data processing. ty to automate data processing. Comparison of examples:Comparison of examples:

1.1. GeoConneGeoConnectctions Portaions Portal, l, Canada: Canada: http://http://geodiscover.cgdi.cageodiscover.cgdi.ca

2.2. Portal Portal on desertification monitoringon desertification monitoring (OSS/Medias/SCOT):(OSS/Medias/SCOT):

http://http://geooss.oss.org.tn/geoossgeooss.oss.org.tn/geooss

Page 17: Distributed Databases and metadata
Page 18: Distributed Databases and metadata
Page 19: Distributed Databases and metadata
Page 20: Distributed Databases and metadata
Page 21: Distributed Databases and metadata
Page 22: Distributed Databases and metadata

EffortsEfforts asked asked from data providersfrom data providers

Be aware of standards.Be aware of standards. Endeavour to describe data as completely as Endeavour to describe data as completely as

possible. possible. Use data exchange formats as simple and Use data exchange formats as simple and

consistent as possible. consistent as possible. ----------------------------------------

Data providers do not have to care about the Data providers do not have to care about the technical or formal aspects of standards. technical or formal aspects of standards.

Database managers will provide them with easy Database managers will provide them with easy and user-friendly tools to describe their data. and user-friendly tools to describe their data.

Page 23: Distributed Databases and metadata
Page 24: Distributed Databases and metadata
Page 25: Distributed Databases and metadata

MetaCatalog (Portal to the AMMA I.S)

Meta database(ISO 19115 AND/OR FGDC)

DB AMMASAT DB LOPDB SOP

Exchange protocol Exchange protocol Exchange protocol

AMMA INFORMATION SYSTEM ARCHITECTURE

1.Search by criteria(User friendly interface)

2.Query metadata

3.Retrieve metadata

4.Choose datasets

4.Query data

5. Locate and query datasets from relevant data sources

6. Retrieve datasets

Page 26: Distributed Databases and metadata

Technical diagramTechnical diagram

Z39.50

YAZPHPZOOM

Other catalogues(GCMD, Clearinghouse FGDC)

XML records Metadata creation - validation

Web forms

Import XML

Catalogue service (any user) Edition service (data provider)

Zebraserver

Zebra indexer

ZAP client

Page 27: Distributed Databases and metadata

CharacteristicsCharacteristics

Management of multi-standard metadataManagement of multi-standard metadata ISO 19115ISO 19115 FGDCFGDC DIF DIF ifif XML XML schema. schema.

Transparent to the data provider.Transparent to the data provider. Transparent to the user.Transparent to the user.

Page 28: Distributed Databases and metadata

Data access servicesData access services

Médias-France is devoloping generic data Médias-France is devoloping generic data access servicesaccess services

These services have to be auto descriptive, These services have to be auto descriptive, registered and with well know interfacesregistered and with well know interfaces

For the moment, we focus our efforts on For the moment, we focus our efforts on software permitting access to software permitting access to geographically distant databases geographically distant databases (Distributed databases)(Distributed databases)

Page 29: Distributed Databases and metadata

PrincipePrincipe

Each service is registered within a Each service is registered within a directory serverdirectory server

Each data source declares what data it Each data source declares what data it servesserves

A web portal is used by scientists to locate A web portal is used by scientists to locate and request data from different sourcesand request data from different sources

Data is sent back to the user in a Data is sent back to the user in a standardized format standardized format

Page 30: Distributed Databases and metadata

ImplementationImplementation

Data sources are under PostgreSQL, flat Data sources are under PostgreSQL, flat files or other RDBSM systemsfiles or other RDBSM systems

Each data server is a DODS servlet Each data server is a DODS servlet (Distribued Oceanographic Data System)(Distribued Oceanographic Data System)

Sevlet container is Apache TomcatSevlet container is Apache Tomcat Metada are in XML filesMetada are in XML files

Page 31: Distributed Databases and metadata
Page 32: Distributed Databases and metadata
Page 33: Distributed Databases and metadata
Page 34: Distributed Databases and metadata
Page 35: Distributed Databases and metadata

ProspectsProspects

Develop Web services based on W3C Develop Web services based on W3C SOAP recommandationSOAP recommandation

Implement a Directory service for servicesImplement a Directory service for services Hope share development effors with other Hope share development effors with other

organisations, within the framework of organisations, within the framework of international projects (Funded by EC, international projects (Funded by EC, INTAS…)INTAS…)