Upload
rosa-dean
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
LTER CONTROLLED VOCABULARY WORKSHOP MAY 26-27, 2011
OBJECTIVES
“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“ Get feedback on general direction
of working group activities Resolve some specific issues Decide on “Next Steps” Products
Comments to be acted on White paper concerning specific issues and
“next steps”
Time Activity
9:00 AM Introductions, Review of Agenda
9:15 AM Introduction to the LTER Controlled Vocabulary – Past and Future10:00 AM Break10:15 AM Discussion: Locating LTER Data – around-the-room experiences
What are your experiences with finding LTER data? What would be most helpful in finding data in the future? Review of “use cases”
11:15 AM Tour of draft LTER Controlled Vocabulary12-noon Lunch1:30 PM Feedback to entire group on things in the controlled vocabulary that
need improvement Things to be removed Things to be added Things to be reorganized
2:30 PM Break2:45 PM Discussion of specific issues
Core areas Are related-terms needed, or is a hierarchy sufficient? Management of the vocabulary – role of researchers
3:00 PM Next StepsHow do we engage larger LTER community?How much, and what sort of engagement is needed?
4:00 PM Adjourn
THE CHALLENGE
Eclectic use of terms to used for discovering LTER data makes it difficult to perform reliable or efficient searches
Often several terms for one concept One site uses CO2 another Carbon Dioxide, another Carbon-dioxide Carbon to Nitrogen Ratio, C:N, C:N Ratio, Carbon-to-nitrogen Ratio
No way to relate broader terms with narrower terms Searching on “Landscape Change” doesn’t find data sets
related to “desertification” even though desertification is a kind of landscape change
2006 ANALYSIS OF LTER KEYWORDSSource Numbe
r of Terms
Number used at 5 or more
sites
Most Frequently used
EML Keywords* 2,711 86 LTER (1002), Temperature (701)
EML Titles 2,480 921 And (768), Data (394), LTER (350)
DTOC Keywords*
2,774 103 ARC (1645), Temperature (732)
Bibliography Titles
13,538 1,855 Of (12,611), Forest (2,050)* Allows multi-word terms
Only 3.2%!
PAST
We started off by surveying what terms were already being used in a variety of LTER documents
Our goal was to see if there were any existing lexical resources that we could simply adopt
TEST OF LIST VS NBII THESAURUS - 2008
58% of LTER terms were not
found in the NBII Thesaurus
Results suggested that we needed to develop our own resource
GOALS FOR DEVELOPMENT OF KEYWORD LIST
Identify a list of preferred terms that would be used by sites in creating metadata documents
Focus on LTER-wide searches Want to facilitate cross-site synthesis People searching LTER Metacat rather than individual
sites are interested in relevant data from multiple sites Want to hit the “sweet spot” for the number of
terms Too many terms make keywording documents difficult,
and results in searches with too few datasets Too few terms make it hard to locate usably small
numbers of datasets
STEPS TAKEN
Assembled list of words already in LTER Metadata (EML documents)
Selected using criteria: Keywords shared with GCMD and NBII, or Keywords used at more than one LTER site
Reviewed by Information Managers Removals and additions were suggested
Edited based on voting Created a Draft set of Taxonomys
Included some additions and deletions
STRUCTURING THE CONTROLLED VOCABULARY
Goal: Improve Searching & Browsing Reliability (of all the suitable target
documents, what percentage did you find) Efficiency (of the documents your search
returned, what percentage were suitable) A list alone is not sufficient to support
browsing and sophisticated searching of data – more structure is needed
STRUCTURESList Synonym
RingTaxonomy Thesaurus Ontology
=
=
==
Complexity
Multiple taxonomys are a Polytaxonomy
TAXONOMYS – RULES OF THE ROAD Relationships should be independent of context
Must pass “Some-not-all test” Each taxonomy should include only one type of
entity (listed in Z39.19 section 6.3.2) Things and their physical parts (birds, trees, leaves) Materials (wood, nitrogen, sand) Activities or processes (acidification, production) Events or occurrences (germination, death) Properties or states of persons, things, materials or
actions (age, speed, nitrogen content) Disciplines or subject fields (ecology, ornithology) Units of measurement (m, km, miles) Unique entities (LTER,HJ Andrews Forest)
You can get into trouble if you start “mixing and matching” things within a single taxonomy!
EXAMPLESGood BadForests Boreal Forest Hardwood ForestGrassland Tallgrass Praire Tundra
Forests Fire Ecology
OK – these are all the same type of entity – all are THINGS
Mixing THINGS and PROCESSES and DISCIPLINES
Rodents Mice Rats
Desert Plants Cacti Grasses
OK – Is not dependent on context. Mice and rats are ALWAYS rodents
Problem: Context dependent, not all cacti or grasses are desert plants. Some occur in other systems. Fails “Some-not-all” test.
ACTIVITIES
The VOCAB Working Group has created a draft set of 10 taxonomys containing 713 terms Includes additional “broader” terms needed for
grouping Includes synonyms (non-preferred terms)
Some terms originally in the list have been removed because the were perceived to be too ambiguous or context-sensitive to be useful for the purposes of searching or browsing E.g., “Aboveground”
Some “related” terms have also been identified
APPROVALS
In 2010 a request for information was forwarded to the LTER Executive Board:
“The Information Management Committee has studied how keywords are used at LTER sites, how LTER keywords relate to external lexographical resources, and compiled a draft keyword. We request guidance from the LTER Executive Board on how a controlled vocabulary might be implemented within the context of LTER to improve the reliability of data searches. “
The EB generally endorsed the idea of a LTER Controlled Vocabulary, and agreed to help have scientists participate in vetting the list and deciding on next steps (THIS WORKSHOP)
HOW LIST AND POLYTAXONOMY WILL BE USED
Permit use of a browse interface Make searches more sophisticated
See “Use case” for searching search includes synonyms plus narrower terms
and/or related terms Develop tools to help in adding keywords
to LTER metadata documents Prototype versions of a couple are already
available See Keywording “Use Case”
TASK 1: LOCATING DATA
What are your experiences with finding LTER data?
What would be most helpful in finding data in the future?
Review of “Use Cases”
TASK 2: REVIEWING THE LIST & TAXONOMY Evaluate the utility of the draft polytaxonomy
Is it better than the existing LTER Metacat interfaces?
Are there large changes that need to be made? Elimination of specific taxonomys? Creation of new taxonomys? Addition of related terms to make a thesaurus?
Are there small changes needed? Removal or replacement of terms
CHANGES: IMPLICATIONS FOR SITES
Improvement of existing documents Review existing keywords and change to preferred forms Note: even without doing this the synonym ring will help
improve searching and browsing Use preferred terms for new documents
Ideally at least one term from each of the relevant taxonomys
Note: addition of new terms to the list, should require review of all existing documents to see if they should be added – so term additions should be rare
Changes in taxonomys and term relationships do not require re-keywording of existing documents
Time Activity
9:00 AM Introductions, Review of Agenda
9:15 AM Introduction to the LTER Controlled Vocabulary – Past and Future10:00 AM Break10:15 AM Discussion: Locating LTER Data – around-the-room experiences
What are your experiences with finding LTER data? What would be most helpful in finding data in the future? Review of “use cases”
11:15 AM Tour of draft LTER Controlled Vocabulary12-noon Lunch1:30 PM Feedback to entire group on things in the controlled vocabulary that
need improvement Things to be removed Things to be added Things to be reorganized
2:30 PM Break2:45 PM Discussion of specific issues
Core areas Are related-terms needed, or is a hierarchy sufficient? Management of the vocabulary – role of researchers
3:00 PM Next StepsHow do we engage larger LTER community?How much, and what sort of engagement is needed?
4:00 PM Adjourn
GROUP FEEDBACK Todd & Margaret
Focus on INTERFACE Ways to present the data
Allow “query within result set” Intersect query sets
Group options – by site, by time side by side comparisons
Be able find where different types of data intersect Can be very difficult due to missing data etc. Problem extends beyond query interface
Interface needs to be a higher priority – sooner rather than later
Recommendation to IMC/NISAC/EB
GROUP FEEDBACK
Rodger and Kristin Highest level of hierarchy
Found some things to change or add “root production”, “belowground productivity”
Were generally happy with overall organization Need system for adding new keywords – this is just a start Intrigued by theory and where we go from here
How does it matter what is in one place or another? Want to make sure things are well-organized…. Data vs research question Does not matter where it is when adding to keyword list Need to have “best practices” for adding keywords
How will that effect sites? How many data sets have no preferred terms?
BEST PRACTICE
At least one word from list At least one from at least 5 of the 10
taxonomys Signature datasets should be flagged
with “signature dataset” tag Should include Core area(s)
CORE AREAS Core area - Problems with definitions Some datasets are either none, or all core areas
Weather data Change entities to core areas? People will want to look for this Would not have hierarchy?
That would be OK – can have related terms Could link to signature datasets
Need “signature dataset” keyword – used to weight Or prioritize signature datasets for adding preferred terms
Treat as unique: Primary Production (core area)
Data can be applied to MANY core areas - won’t map e.g. Climate
Try adding core area taxonomy and then add core areas and related terms?????
May not be needed or appropriate – we are asking the data catalog to do too much – need catalog of research topics
“SIGNATURE DATA” CONCENSUS
Want to search for signature datasets at top level of the hierarchy Needs to be one click away
GROUP REPORTS
Julia and Don Would be interesting to tally the
number of hits for each keyword for each site
Tally of number of datasets for each site
GIS should be preferred term Can mean Geographical Information
Science
GROUP REPORTS – JULIA & DON STRUCTURAL CHANGES
Atmospheric processes cross listed under hydrologic properties Evapotranspiration should be above transpiration and evaporation Snow not under precipitation Geographical Properties ->Spatial Properties
Move imagery under that with satellite and photos under that – depricate landsat
Methods – field, spatial, lab, analytical subcategories Also cores, dendrometers etc. tools could go under this
Entities For detailed ones, tried to find other homes Diseases to disease and move under bio processes
Levels of organization for communities, populations, species Are these useful terms? How often used
Biomes instead of Ecosystems
TASK 3: SPECIFIC ISSUES
Core areas Do we need a special taxonomy for core
areas? Are related-terms needed, or is a
polytaxonmy (hierarchy) sufficient? Management of the vocabulary – role of
researchers? Preferred terms – are all really preferred?
E.g., Permanent forest plots
TASK 4: NEXT STEPS
How do we engage larger LTER community? How much, and what sort of engagement
is needed? Requests we should make to the EB or
IMC? Managing the controlled vocabulary What technology development is
needed, and who should pursue it?
PROPOSED MANAGEMENT PLAN
Anyone can propose adding, editing, deleting or moving terms within the hierarchy, with justification.
Proposals would be evaluated by the Controlled Vocabulary Working Group according to the following criteria: The proposed terms should provide clear utility for searching
and browsing, and not introduce ambiguity The proposed terms should be suitable for inclusion (e.g., not
locations or specific taxonomic identifiers) Proposed terms should not be redundant with existing term(s)
already in the vocabulary Terms and their proposed places in taxonomys or thesauri
should conform in form with NISO Z39.19 2005 and successor documents (e.g., sections 6.5.1, 8.3)
RECOMMENDATIONS
Best Practices for adding keywords Preferred terms (and preferred preferred terms
) Presentation to PIs
Statistics on numbers of hits Add workshop participants to VOCAB
Put in supplement proposal for development of search interface Write it up now – Shovel Ready! Like MALS – need to have all sites sign up with
letters of endorsement