Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
How will we manage largeHow will we manage largemultidisciplinary scientificmultidisciplinary scientific
datasetsdatasetsMichel Hoepffner, Nathalie Fourès, Hassan
Makhmara and Fernando Niño(Medias-France)
Preservation versus added-value
• The best long-term preservation forscientific data: have it stored in technicalcenters managed by data experts
• The best added-value for scientific data: totimely update datasets by the scientists incharge
We think it’s possible to solve this apparentdilemma
The actors• Scientists:
– Research teams, represented by their PrincipalInvestigators
– Users (the scientific community)– Multidisciplinary international scientific programs: IGBP
2, WCRP, IHDP, etc.
• Operators:– National agencies providing data (space agencies, etc.)– Technical operators like Medias-France
Medias-France in few wordswith:
A SERVICE STRUCTURE:– DATABASE AND INFORMATION SYSTEM DESIGN
AND MANAGEMENT– PROJECTS SUPPORT (OBSERVING SYSTEMS AND
NETWORKS, etc.)– TRAINING (FELLOWSHIPS, etc.)– CONSULTANCY AND EXPERTISE (GMES, …)
http://medias.obs-mip.fr
Development and management of database andDevelopment and management of database andinformation systemsinformation systems
Sea/Atmosphere/Hydrology/
/Atmosphericchemistry
EarthScience
Environment Tools
EndedEMET-CEPHAPEX-SAHELJGOFS1ELMASIFAEPITHERMEFETCH*JGOFS2*
EXPRESSOIDAFMOZAICPIC DU MIDIO3OAIRQUALIGAC I & SPRE-ESCOMPTE
POLLENWDC-A(EPD)(EDDI)FORMATAPDCD-Rom Pollen
SUD-SAHELImages-B/OSS
WEB-SiteCDROM MediterraneeMEDESERT 99UNISPACE IIIGIS GrassBASS 2000CEOS-CDROM 2000
In progress EMERCASEGMESCATCHPLUVIOM
DEBITSPREPA1*IDAFESCOMPTE
DIAFCLEHARESOLVECPCECLIPSE
ADAMMDMOSS/LIFEAID-CCDZA/ORME
RICAMAREGICC1SEARCHWEB-SiteISISAMMAPOLKAIMMEDIASEUFOREO
Planned AMMA/Histo/MétaProMedS2E/ARGOSIMFREXMEDWATERCHOLCLIMENSO/PNEDC
IGAC MetaPREPA2*AMMA/Chimie
PAGES/MetadataXPROXYGPS
BIODIVALPGEOLAND/POSTELGMES-NOWEDENORE/RETYSZA/APM
GICC2CIESINACMADIMEDIAS/ISSS2E/ARGOSPLAN BLEUPOSTELMAZURKAGMES/informations
Database• Disciplinary:
– Atmosphere• Emet
– Hydrology• Catch
– Palynology• APD
• Multidisciplinary:– Hapex-Sahel– Fetch– Amma (African Monsoon Multidisciplinary
Approach)
International coordination• Member of the PAGES Data Board (Past
Global Changes) with the World DataCenters of Boulder (USA) and Bremen(Germany)
• Mirror site of the World Data Center ofBoulder (USA)
• Coordinator of various European fundedprojects (Elmasifa, APD, Format, Search,Ricamare, etc.)
TOOLS• Management of Internet sites:
– web– ftp
• Data visualization and extracting interfacedevelopment
• Database network, with scientists involvedin the management of data
The scientific institutions• To distinguish:
– Scientific databases (experiments, networks )– Archives (Meteorological Services, BRGM, IGN, etc.– Collections (Museums, …)
• To question on the role of the different actors inthe scientific world :– the institutions of which the single goal is the
scientific research (CNRS-Insue, IRD, etc.),– those of which the scientific goal coexists with the
operational or commercial goal (e.g. BRGM, IGN,etc.)
Organization of the community• In scientific centers: network
developments between scientists ofdifferent projects in order to adopt:
• Same formats for common data• A common documentation (metadata) by discipline• Quality control informations
• In technical centers providing:– Database development and maintenance– Data distribution
Some examples of databaseorganization
Scientific discipline databaseScientific discipline databaseAn An example with pollen data
Scientific entityScientific entity Technical entity
Authors Publent
Publication
Pb210Age scale
CoringC14
Geochron
Samples
CountsTaxa
Lithology
Description
Location
Simplified structure of the Pollen DB
Examples of various data
Fossil wood
The taxa identificationand the observation oftree rings giveinformation on climatechanges
Megarid 3D of gravity system
Conglomerate
Guwaysah formation, Oman (F. Guillocheau, C. Robin)
The sedimentary faces
PALBOT� Base de données de la collection de Paléobotanique de l’Université Paris 6� Environ 15000 fossiles végétaux de natures très diverses :
macro et micro fossiles ;structures perminéralisées ;
empreintes et compressions
Classification Evolution et BiosystématiqueLaboratoire de Paléobotanique et Paléoécologie (J. Broutin)Laboratoire Informatique et Systématique (R. Vignes Lebbe)
� Importance pour la recherche ; nombreux holotypes
� Base de données en cours ; accessible à l’URL http://albinoni.snv.jussieu.fr
tmsi-idt-isib septembre 1999 18
Catalogues – Meta-données• Campagnes : 5445résumés ROSCOP/CSR(Cruise Summary Report)
• Bases/Jeux dedonnées des laboratoiresde la communauté françaisesou acquis à titre d'échange :306 fiches descriptivesEDMED (European Directoryof Marine EnvironmentalDatasets)(dont 82 au SISMER)
• Stations d’observations« Temps Réel » EDIOS,MAMA
• EUROSEISMIC
Les diagrammes polliniquesGolfe de Guinée Carotte CH22-KW31 ( 03°31'1N - 05°34'1E )
OLE
AC
EA
E
PA
LMA
E
PA
ND
AN
AC
EA
E
PO
DO
CA
RP
AC
EA
E
PR
OTE
AC
EA
E
RH
AM
NA
CE
AE
RH
IZO
PH
OR
AC
EA
E
RU
BIA
CE
AE
RU
TAC
EA
E
SA
LIC
AC
EA
E
SA
PIN
DA
CE
AE
Dep
ht (c
m)
Chr
ono.
age
s ca
lend
aire
sJa
smin
um
Ole
a
Ole
a ca
pens
is-ty
pe
Ole
a eu
ropa
ea-ty
pe
Cal
amus
dee
rratu
s
Ela
eis
guin
eens
is
Hyp
haen
e-ty
pe
Pho
enix
reci
nata
-type
Pan
danu
s
Pod
ocar
pus
Pro
tea-
type
Zizi
phus
-type
Rhi
zoph
ora
Dic
tyan
dra-
type
Gar
deni
eae
Hal
lea-
type
stip
ulos
a
Ixor
a-ty
pe b
rach
ypod
a
Kee
tia-ty
pe
Mitr
agyn
a-ty
pe in
erm
is
Mor
elia
-type
sen
egal
ensi
s
Mor
inda
-type
Nau
clea
-type
Pau
siny
stal
ia-ty
pe
Psy
drax
sub
cord
ata-
type
Unc
aria
-type
afri
cana
Van
guer
ia-ty
pe
Citr
us-ty
pe
Cla
usen
a an
isat
a
RU
TAC
EA
E
Zant
hoxy
lum
-type
Zant
hoxy
lum
-type
xan
thox
yloi
des
Sal
ix
Allo
phyl
us
Aph
ania
sen
egal
ensi
s
Dei
nbol
lia
Dod
onae
a
36 3530 0 0 0 0 0 0 0 0 0 40 0 0 110 0 0 0 2 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 044 3633 0 0 0 0 0 0 0 1 0 28 0 0 160 0 0 0 1 2 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 281 4126 0 0 0 0 0 0 0 0 1 12 0 0 253 0 0 0 1 3 3 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 1
121 4645 0 0 5 3 0 1 0 0 1 10 0 0 204 0 0 0 3 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0142 4929 0 0 6 0 0 0 0 0 0 11 0 1 208 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0182 5458 0 0 6 0 0 0 1 1 1 4 0 1 120 0 0 0 1 1 6 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0221 5970 1 0 7 0 0 0 0 0 2 1 0 0 140 0 1 1 4 0 10 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0281 6773 0 0 0 0 0 2 0 0 0 2 0 0 138 0 0 3 0 2 9 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 1318 7250 0 1 0 0 0 1 0 0 1 1 0 0 152 0 0 0 0 3 5 0 0 0 3 0 0 0 0 0 0 0 1 0 1 0 0 0380 8080 0 0 10 0 0 0 0 0 4 3 0 3 187 1 0 1 2 3 0 0 0 1 2 0 1 0 0 0 1 1 0 0 0 0 0 0422 8632 0 0 3 0 0 1 0 0 0 3 0 0 54 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0462 9156 0 0 0 0 0 1 0 0 6 3 0 4 156 0 0 0 1 4 9 1 0 3 2 0 0 0 1 0 0 0 0 0 0 0 0 1480 9402 0 0 5 0 0 0 0 0 2 2 0 0 191 0 0 1 4 1 2 0 0 0 7 2 0 0 0 0 0 1 1 0 1 0 0 0522 9953 0 0 2 0 0 0 0 0 0 3 0 0 94 0 0 0 0 2 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0580 10723 0 0 2 0 0 0 0 0 6 5 0 0 116 0 0 0 0 1 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0617 11209 0 0 1 0 1 1 1 0 0 2 1 0 222 0 0 0 2 1 5 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0639 11504 0 0 1 0 1 0 1 0 0 1 0 1 71 4 0 0 1 2 3 0 0 0 3 0 0 0 0 0 0 0 1 1 0 0 0 0642 11537 0 0 1 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0674 11959 0 0 3 0 1 0 1 0 1 2 0 0 132 0 0 0 1 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0700 12311 0 0 6 0 0 0 0 0 0 0 0 0 149 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0Juillet-2001 Jean-Pierre Cazet
)nus
20
Alnus
20
Betula
20 40 60
Corylus
20
T ilia Quercus
20
Ephedra
fragilis
-type
Maerua
20
Artemisia
20 40 60 80
Gramine
ae
20
Chenopo
diacea
e/Amara
nthace
ae
Hyphaen
eRhiz
ophora
Hymeno
cardia
20
Alchorn
ea
20
Elaeis g
uineen
sis
20 40
Cyperac
eaeTyph
a/Spar
ganium
Tubulif lo
raeLig
uliflora
ePlan
tago
20 40 60 80 100
Fungal s
pores
500 1000 1500Tota
l
20 40 60 80
GroupI
GroupII
20
G roupII
I
20 40 60 80
GroupIV
20 40 60
G roupV
GroupV
I
20
G roupV
II
GroupI GroupIIGroupIII GroupIV GroupV GroupVI GroupVII
METEOR M 16415-2 (09°34'N, 19°06W)Hooghiemstra, H. et al. (1988)
The African Pollen Database
�Data :
Tilia file
Paradox Table
�Free Software:
Tilia
Calib3
TGview
The numerical simulations
3D Visualization
Quantification
F. Guillocheau, C. Robin, D. Rouby
Data valorization
Oak migration inEuropeduring the lastdeglaciation
Taberlet et Cheddadi, Science (2002)
European PollenDatabase
Climate reconstitution 6000 years BP
(Cheddadi et al., ClimateDynamics, 1997)
Objectives given to the databases
• Data archiving, which implies a discussion onthe stakes and the duration of it.
• Data access, which the services to be given to theusers, with the realization:– of a data policy by the scientists
– by the technical operator, in strong relationship with thescientists:
• of a data visualisation and extraction interface
• of data valorization tools
A coordination to be obtainedbetween the institutions
• Database functioning rules
• Data value and property
• Relationship with industry for an open access to thedata
• Setting up of a data policy
Planned• The setting-up of a data portal:
– This implies the adoption of internationalstandards for metadata formats
- This allows a flexible organization for datamanagement
• The starting up of a community (inter-labsand inter-institutions) working indatabase interoperability
International standards forthe metadata
• In order to develop a portal for a common access to thedifferent databases, the priority is to collect all thedescriptive information on the data which will allow touse it.
• Numerous French institutions are working on,particularly Cnes (space), Medias-France, BRGM(geology), Ifremer (oceanography), etc.
• The international metadata used by them is: "ISO/DIS19115 Geographic information -- Metadata"
A federated and adaptabledata management
In this view, shared by Medias-France for thedevelopment of « multi-proxy » databases forscientific projects:
• The data expertise is staying in the laboratorywhich has produced it
• The data center centralizes the differentinitiatives and manages the coordinatingsystem based on a metadata catalog
• The user asks one server for the access to the datacoming from different databases initiatives
Objectives
• To make easier the efforts to archive thescientific data
• To allow to describe and to localize them
• To homogenize the ways of access and use
Data archiving
• Data are saved in centers specialized in thedata storage
• Data are coherent and easily updated
• Access policies are under the control of thescientists in the data centers
Facilities for the search and thelocalisation of the data
• Data are described by the metadata in astandard format
• Tools used for the description of the dataare easy to use by the data providers (Webforms )
• Management and data access policy of themetadata are provided by a single entity
Access facilities
• Data pre-visualization and extraction arestandardized
• Data is not redundant• Data access policy is managed by the data
providers
Organization(federating unit)
• A center has to manage the metadata
• It gives data centers the necessary tools fordata description
• It provides the scientific community thetools for data search, description andlocalization
Organization(Scientific Data Centers)
• The data centers manage the data with theirmethodologies
• They implement (with the help of thefederating unit) a standard interface for dataexchange
• They describe their data with the standardtools provided by the federating unit.
Script of data exploration andextraction
The metadata
• Problem: the user would like to access tothe data via their metadata
• Example : « I would like to know thepollens sudied by M. X in Africa »
One solution: the catalog server
• Principle :
User Catalog Server
Metadata selection
Pollen,X,Afrique ?
Metadata base
List of data servers corresponding to the request
Data catalog (Metadata)
Metadata base(data catalog)
Data source
Exchange protocol
Data source
Exchange protocol
1. Search by criteria
2. Catalog query
4. Data localization(pointer, address)
5. Query the adequate
data source
3. Data list corresponding to the criteria
6. Data extraction
Dat
a C
ente
rsLa
bora
tori
es,
Spec
ializ
ed c
ente
rs
Gat
e (p
orta
l)
Medias-France proposal• To create ISO 19115 profiles for various
scientific disciplines, in strong relationshipwith:– The Pi’s of scientific disciplinary programs
(GPD, etc.)– The responsible of scientific international and
interdisciplinary programs (WCRP, IGBP,IHDP, etc.)
• To install a catalog server which willexploit these profiles
The data servers
• Are accessibles via the catalog server
• Allow data visualization
• Example : the x-proxy server for thepaleoclimatology
The x-proxy server
• The need: to reach with a single interfaceheterogeneous and remote data
• The condition: data ownership andmanagement is done by the scientists
An example of an answer to thisneed
FederalCenter
World DataCenter
EuropeanData Center
Paleoclimatologist
Brest
Medias-France Arles
Boulder
Internet
Exchanges example
FederatingCenter
World DataCenter on
pollen
Europeancenter on
pollen
Paleoclimatologist
Brest
ToulouseArles
Boulder
“ I would like a map of the oak distribution in France, for the last ten years, from 1992 to 2002”
Oak distribution map in France
“I would like the data on oak pollen in France from 1992 to 2002 ”
Dataset concerning theoak pollens, in France,dated from 1992 to 2002
Las-Dods Architecture
Las Client
LasServer Dods
Server
Dods Server
LasInterface
Dods Client
Treatements
An example of « multi-proxy » database
Proposition• To archive the pertinent and essential data,
validated by the PI
• To manage and maintain the databases intechnical agencies like Medias-France andin research teams
Specific role of Medias-France
• To assure that the data will be stored aftertheir validation by the scientific community
• To propose the specific access andmanagement tools for each type of data
• To develop the "multi-proxy" interface
References(and acknowledgments)
• Communication of Anne-Marie Lezine at theProspective Symposium of the Insu Division « EarthSciences » the 24th of September 2002 at Vulcania
• Waldteufel’s report: «Les bases de données pour lesGéosciences, éléments d’un schéma directeur »published by the INSU et the CNES in 1999(http://medias.obs-mip.fr/www/francais/documentation/)
Medias:
A votre service