Upload
daryl-french
View
222
Download
5
Tags:
Embed Size (px)
Citation preview
Tim Green
NEFIS Analysis of partner metadata
records15 November 2004
D3: Metadata standards and Keyword Lists
NEFIS Interoperability and The Way Forward The extraction and analysis of data from a variety of sources in order to draw conclusions for decision-making, when carried out by a human data analyst, involves a number of stages including:•Identification of appropriate sources•Evaluation of those sources for relevance and reliability•Extraction of required data•Manipulation using tools appropriate for the required purpose•Interpretation of the compiled results•Presentation to the appropriate audience(s)
D3: Metadata standards and Keyword Lists
This knowledge generation cycle of retrieval, analysis, publication and
storage remains fundamentally unchanged by the widespread use of
ICTs, but the mechanisms, scale, accessibility and audience reach
have become very different.
Data gathered for a specific purpose for a limited audience can
potentially be retrieved and used for entirely different purposes or
audiences. This carries both benefits and risks: consolidation of
incompatible data could lead to erroneous conclusions with
unpredictable results.
Number of metadata records from partners (total 63)
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
Partner
Nu
mb
er
Metadata records
Datasets
•NFI data: Separate metadata and data tables for each variable (3 partners)OR 1 metadata record for whole NFI dataset (4 partners)
•For 1 dataset there were separate metadata records for each record in the dataset (480). The same fields were filled in for all the data records. Therefore they are treated as one metadata record in this analysis.
Mandatory elementsSomething was input for the mandatory elements for all
metadata records (including the refinements of these
elements)
Optional (desirable) elementsReturn for optional elements ranged from
20% (relation and its refinements) to
94% (audience)
Optional elements
0
10
20
30
40
50
60
70
•Identifier: Should more of the datasets have a specific identifier?
•Coverage: Should more of the datasets have some description of temporal coverage?
Title
0
10
20
30
40
50
60
70
Title Alternative (OriginalLanguage)
Alternative (Acronym)
Creator
0
10
20
30
40
50
60
Creator PersonalName Creator CorporateName
•Some datasets have both organisations and individuals as the creator
•Acronyms of organisation names
Subject
0
10
20
30
40
50
60
70
NEFISThemes NEFISTerms NominatedTerms
OrganismName
Classification
Subject•Themes and Terms: Number varied greatly from 1 (NEFISTerms and Nominated Terms) to 83.
•Dependent on dataset described, but in some cases, more terms would be useful to help users find the resources.
•Are the Themes, Terms and Nominated Terms used consistently?
•Some misunderstanding of how to use Subject, Themes and Terms. Some Title repeated in the Subject. Some entries Themes not in the list Should be clarified in the guidelines.
•Nominated Terms: do they contain any terms (or similar terms) included in the keyword lists. Analysis needs to be done.
•Organism Names: why such a low number (2)? Not included in the Metadata template. In some cases this information was included in the NEFIS Terms/Description. But where species data is given, then the species name should be given in the refinement ’organism names’.
•Classification: Not used at all. Why? Not useful? Or lack of familiarity?
Description
05
101520253035404550
De
scri
ptio
nB
asi
c
De
scri
ptio
nC
om
pre
he
nsi
ve
Ta
ble
of
Co
nte
nts
Ab
stra
ct
Qu
alit
y R
ep
ort
(Ba
sic)
Description
NEFIS Metadata Guidelines
” Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
Guidelines for creation of content: The Description is a potentially rich source of indexable terms, and care should be taken when creating the Description. It should be clearly structured and the main contents should be described in the first paragraph. Best practice recommendation for this element is to use full sentences, as Description is often used to present information to users to assist in their selection of appropriate resources from a set of search results.
Descriptive information can be copied or automatically extracted from the item if there is no abstract or other structured description available.The Description should at least contain the following information: type of resource, aims, contents, background information.”
Description
•Description: varied widely. Some comprehensive. Some basic or very basic (not adding very much information to that given in the title)
•In some cases a basic description would be enough, but in others a more detailed description of the resource would be helpful (essential) to someone searching EFIS in order to evaluate, extract, and interpret information in the resource.
Description
Comprehensive example
Title: Stumpage prices in Finnish non-industrial private forests
” The dataset present information on nominal stumpage prices, paid
in sales of roundwood from Finnish non-industrial private forests.
Prices for the following six major assortments are given: pine, spruce
and birch logs; pine, spruce and birch pulpwood. The regional
breakdown of data is forestry centres (14 in total). The price
information dates back to 1995, and it is updated on a monthly basis.
In Finland, stumpage sales is the dominant sales form in private
forests. They account for approx. 80% of total roundwood sales. ”
DescriptionComprehensive exampleTitle: Annual increment for forest types and tree species
” Annual increment estimates for tree species, forest types, counties and four periods. Productive forest land. Swedish NFI data. All trees at least 1.3 m high are included. Tree species: Pine - Scots pine (Pinus ssp excl P. contorta , Larix ssp) Spruce - Norway spruce (Picea ssp, Abies ssp) Contorta - Contorta pine (Pinus contorta)Birch - Birch (Betula pendula, B. pubescens) Other broadleaves - Other broadleaved trees, oaks, beech and other hardwood trees excluded. Principally aspen (Populus tremula), alder (Alnus incana/glutinosa), sallow (Salix caprea) and rowan (Sorbus aucuparia) Oak - Oak (Quercus robur/petraea) Beech - Beech (Fagus sylvatica) Other hardwood - Other hardwood trees defined by Swedish forestry act, principally common ash (Fraxinus excelsior), wych-elm (Ulmus glabra), lime (Tilia cordata), hornbeam (Carpinus betulus) and cherry (Prunus avium)Forest type is defined by basal area at breast height (1.3 m above ground level) percentage. Definition of forest types: Pine - At least 65 percent pine (Pinus ssp, Larix ssp) Spruce - At least 65 percent spruce (Picea ssp, Abies ssp) Mixed coniferous - At least 65 percent conifers, but not pine or spruce forest type. Mixed coniferous/broadleaved - Nor 65 percent conifers or broadleaved trees. Broadleaved - At least 65 percent broadleaved trees. Density 0 - Bare forest land with no trees”
Description
Basic example
Title: Forest map
”The forest map of Europe”
Description > Quality Report
Comprehensive examples.
Many gave references to other documentation
•http://www.ifn.fr/spip/article.php3?id_article=184
•Guidelines for data collection and processing can be found in: Study on
European forestry information and communication systems. Reports on
forestry inventory and information systems. Volume 2. European Commission.
1997. ISBN 92-827-9848-8
Description in metadata record
“Utilisation of international forest resources information officially published by
FAO and UN-ECE/FAO. The information collected by FAO and UN-ECE/FAO
is based on data questionnaire returns from designated national country
correspondents.
The data presented in the FAO and UN-ECE/FAO publications was transferred
to electronic format and organised in an interactive Internet database …”
Description > Quality > Quantitative measures for NFI data
0
5
10
15
20
25
30
NF
I
da
tase
ts
1 M
D r
eco
rd
sep
ara
te
MD
re
cord
s
Ava
ilab
ility
of S
E
SE
va
lue
Re
sam
plin
g
(ye
s/n
o)
Re
sam
plin
g
pe
rce
nta
ge
To
tal
sam
ple
siz
e
sam
plin
g
un
it
•4 NFI datasets with 1 metadata record; 3 NFI datasets with separate metadata records for different variables (total 27 records)
•Variables reported include forest land area, standing volume, increment, number of stems (by age class, species. So although 27 metadata records, these measures are reported for the 4 variables.
• Some difference in understanding of requirements? 2 groups reported resampling of 1-3%. 1 group of metadata record reported resampling of 100%.
Description > Quality > Quantitative measures for NFI data
NFI Record 1 NFI Record 2
Variablename Total volume
Variablename
Availabilitystandarderror (answer yes/no)
yes Availabilitystandarderror (answer yes/no)
No
Standarderror 1.5 - 5 % (varies between regions)
Standarderror No
Resampling (answer yes/no)
yes Resampling (answer yes/no)
Yes
ResamplingPercentage ca. 1% ResamplingPercentage 100 % Full sample
Totalsamplesize 65859 Totalsamplesize 100 %
samplingunit plot samplingunit Forest subcompartment
Date
0
10
20
30
40
50
60
Date Date refinements
•refinements: created, valid, available, issued, modified, dataAccepted, dataCopyrighted, dataSubmitted
Type
0
5
10
15
20
25
30
35
40
45
50
Type Type georeferenced
•18 georeferenced datasets
Format
0
10
20
30
40
50
60
70
Format refinements(extent/medium)
reference system
•18 georeferenced datasets, but only 16 entries for reference system. Information given for reference system ranged from comprehensive to basic
Coverage
0
10
20
30
40
50
60
Cover
ageS
patia
lPoin
tBox
Cover
ageT
empo
ral
Coverage
•Point and Box
•Why is Temporal coverage not used more (46)?
•Spatial. Often given at very broad level (e.g. World, Europe, Finland). A more detailed listing of countries might help users find data, but more time consuming.
•Spatial. For datasets containing information at the subnational level, should the names of the subnational areas be given?
Tim Green
NEFIS Analysis of partner metadata
recordsThe end