Upload
eileen-fowler
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Information Systems for Information Systems for Ecological ResearchEcological Research
John Porter – University of VirginiaJohn Porter – University of Virginia
Scalable Information Networks for the Scalable Information Networks for the Environment - Oct. 30, 2001Environment - Oct. 30, 2001
WHY have Ecological Databases?WHY have Ecological Databases?
New ScienceNew Science Long TermLong Term
• long-term studies depend on databases to retain long-term studies depend on databases to retain project historyproject history
SynthesisSynthesis• use of data for a purpose other than which it was use of data for a purpose other than which it was
collectedcollected• Regional and global studies requiring data from a Regional and global studies requiring data from a
large array of sampling locationslarge array of sampling locations Integrated, multidisciplinary projectsIntegrated, multidisciplinary projects
• depend on databases to facilitate sharing of datadepend on databases to facilitate sharing of data
Scientists have been successfully Scientists have been successfully conducting research for centuries conducting research for centuries without databases. We need to focus without databases. We need to focus on information systems that will let on information systems that will let us expand our scientific horizons us expand our scientific horizons and realize the full potential of our and realize the full potential of our research – not just “business as research – not just “business as usual”usual”
RoadmapRoadmap
Who will be the users of Ecological Who will be the users of Ecological Information Systems? – What are their Information Systems? – What are their needs?needs?
An Idealized Ecological Information An Idealized Ecological Information EnvironmentEnvironment
System Needs for Development of an System Needs for Development of an Idealized Information EnvironmentIdealized Information Environment
UsersUsers
ScientistsScientists Policy MakersPolicy Makers Conservation and Development Conservation and Development
OrganizationsOrganizations StudentsStudents
• GraduateGraduate• UndergraduateUndergraduate• K-12K-12
Recreational UsersRecreational Users
The Ecological Information The Ecological Information ChallengeChallenge
Can we make information available to Can we make information available to ecologists:ecologists:• in ways they canin ways they can locatelocate the information the information
they need?they need?• with information in forms they can readilywith information in forms they can readily
useuse?? How can we assure that the information How can we assure that the information
is current and accurate?is current and accurate?
Examples- Population Ecologist Examples- Population Ecologist
Long-term time series of population size Long-term time series of population size and composition (his or her own data)and composition (his or her own data)
Climatological Data for the study site(s)Climatological Data for the study site(s) Comparative population data from other Comparative population data from other
locations or specieslocations or species Habitat change informationHabitat change information Predator community composition and Predator community composition and
abundanceabundance
Example -Ecosystem ModelerExample -Ecosystem Modeler
Specific information on the area being Specific information on the area being modeledmodeled• ClimateClimate• Species composition & Growth RatesSpecies composition & Growth Rates• Soil CharacterSoil Character
Example- Global Change ResearcherExample- Global Change Researcher
Global Scale DatasetsGlobal Scale Datasets• Satellite-derived productsSatellite-derived products• GIS data layersGIS data layers
Integrated data productsIntegrated data products• Comparable data from a large number of Comparable data from a large number of
sites, worldwidesites, worldwide– From International Monitoring ProgramsFrom International Monitoring Programs– From assembly of information sources From assembly of information sources
collected at local scalescollected at local scales
Example: Policy MakerExample: Policy Maker
Environmental policy decisions require Environmental policy decisions require data that are regional or nationaldata that are regional or national• worldworld• regionalregional• nationalnational• LocalLocal
Data need to be accessible and Data need to be accessible and understandable by non-scientistsunderstandable by non-scientists
How can the needs of all these How can the needs of all these types of users be met?types of users be met?
What types of systems do we What types of systems do we need?need?
Database CharacteristicsDatabase Characteristics
““Deep” Deep” Relatively few kinds Relatively few kinds
of dataof data Large numbers of Large numbers of
observationsobservations Sophisticated query Sophisticated query
and analysis toolsand analysis tools
““Wide”Wide” Many different types Many different types
of dataof data Smaller number of Smaller number of
observations of observations of each typeeach type
Few analysis toolsFew analysis tools
““Deep” vs “Wide”Deep” vs “Wide”
Ways of Obtaining Needed DataWays of Obtaining Needed Data ““Bring out yer Dead*”Bring out yer Dead*”
• Extract aggregated data from the literatureExtract aggregated data from the literature• Make “educated guesses” about the Make “educated guesses” about the
content of poorly documented or content of poorly documented or fragmentary datafragmentary data
*With apologies to “Monte Python and the Holy Grail”*With apologies to “Monte Python and the Holy Grail”
ExamplesExamples• Digitizing graphs in published papers to extract Digitizing graphs in published papers to extract
point valuespoint values• Piecing together documentation on the meaning of Piecing together documentation on the meaning of
columns of a spreadsheet based on various columns of a spreadsheet based on various publications that used the datapublications that used the data
Ways of Obtaining Needed DataWays of Obtaining Needed Data
““U-Haul”U-Haul”• Obtain well-documented, but eclectic, data from Obtain well-documented, but eclectic, data from
information systemsinformation systems• Analyze and process the data to obtain needed Analyze and process the data to obtain needed
forms of dataforms of data
ExamplesExamples• Get primary production data from 3 LTER sites, each in a Get primary production data from 3 LTER sites, each in a
different form. Write separate programs to read in each different form. Write separate programs to read in each dataset and create an integrated version.dataset and create an integrated version.
• Get specimen lists for a study area from 5 museums. Break Get specimen lists for a study area from 5 museums. Break into taxonomic groups and tally the number of species in into taxonomic groups and tally the number of species in each group.each group.
Ways of Obtaining Needed DataWays of Obtaining Needed Data
““Fast Food”Fast Food”• Use pre-integrated data from an Use pre-integrated data from an
“integrated” or “value added” database“integrated” or “value added” database• Specify the data you need and the system Specify the data you need and the system
provides it in the form you requestprovides it in the form you request ExamplesExamples
• A climatological graph comparing sites from the A climatological graph comparing sites from the LTER ClimDB systemLTER ClimDB system
• KU-Species AnalystKU-Species Analyst• ORNL Primary Productivity CDORNL Primary Productivity CD
National/Regional SystemsNational/Regional Systems
““Value-Added” or Value-Added” or
““Integrated”Integrated”
InfobasesInfobases
ResearchersResearchers
Individual datasetsIndividual datasets
Project or Site-Based SystemsProject or Site-Based Systems
An Idealized Information EnvironmentAn Idealized Information Environment
Examples and major system Examples and major system features features
Individual DatasetsIndividual Datasets
MetadataMetadata DataData
Most not originally intended for Most not originally intended for integrationintegration
EsotericEsoteric• Extremely diverse types of dataExtremely diverse types of data• High variability in topics, methods and High variability in topics, methods and
contentscontents• Metadata of highly variable qualityMetadata of highly variable quality
Data in a wide variety of formsData in a wide variety of forms• ASCII TextASCII Text• SpreadsheetsSpreadsheets• DatabasesDatabases
Characteristics of Characteristics of Individual DatasetsIndividual Datasets
Tools Needed forTools Needed forIndividual DatasetsIndividual Datasets
Tools for Data EntryTools for Data Entry• Relational Databases, Spreadsheets, Text Relational Databases, Spreadsheets, Text
editors editors Tools for Quality Assurance & ControlTools for Quality Assurance & Control
• Relational Databases, Statistical PackagesRelational Databases, Statistical Packages Tools for Capturing Primary MetadataTools for Capturing Primary Metadata
• MethodsMethods• GlitchesGlitches
In some cases these needs can be met In some cases these needs can be met using a site or project-based systemusing a site or project-based system
Site & Project SystemsSite & Project Systems
““Wide” databasesWide” databases• Wide variety of information types Wide variety of information types • Relatively few analysis and query toolsRelatively few analysis and query tools• Data usually in (or near) original formsData usually in (or near) original forms
Have metadataHave metadata• Forms and contents may vary between systemsForms and contents may vary between systems
– Structured (DBMS and structured text)Structured (DBMS and structured text)– Unstructured (variable, ASCII text)Unstructured (variable, ASCII text)
• Usually provide browse and free-text search Usually provide browse and free-text search capabilitiescapabilities
Characteristics of Characteristics of Site & Project SystemsSite & Project Systems
Tools Needed for Tools Needed for Site & Project SystemsSite & Project Systems
Metadata Management SystemsMetadata Management Systems• Structured Metadata (either relational Structured Metadata (either relational
database, XML or other structured text)database, XML or other structured text) Data Access SystemsData Access Systems
• WWW servers, often linked to relational WWW servers, often linked to relational databasesdatabases
Quality Assurance SystemsQuality Assurance Systems• Review, Error checking programsReview, Error checking programs
Regional & National Regional & National
““Wide” databasesWide” databases• Wide array of data types, few toolsWide array of data types, few tools• A few are “Deep” databases that focus on a A few are “Deep” databases that focus on a
single type of data (e.g. USGS map single type of data (e.g. USGS map databases)databases)
MetadataMetadata• Often follows some standardOften follows some standard
Some only provide links to data held by Some only provide links to data held by projects or individuals - National projects or individuals - National “Clearinghouses” “Clearinghouses”
Characteristics ofCharacteristics ofRegional & National Regional & National
Tools Needed forTools Needed forRegional & National Regional & National
(Similar to Project Systems)(Similar to Project Systems) Metadata Management SystemsMetadata Management Systems
• Structured Metadata (either relational Structured Metadata (either relational database or XML)database or XML)
Data Access SystemsData Access Systems• WWW servers, often linked to relational WWW servers, often linked to relational
databasesdatabases Quality Assurance SystemsQuality Assurance Systems
• Review, Error checking programsReview, Error checking programs
Integrated “Value Added” Integrated “Value Added” SystemsSystems
Example: Example:
KU-Species AnalystKU-Species Analyst
Climate Climate database database integrates integrates data from a data from a number of number of sitessites
Integrated “Value Added” Integrated “Value Added” SystemsSystems
““Deep” DatabasesDeep” Databases• Specialized query toolsSpecialized query tools• Deal with specific types of dataDeal with specific types of data
Can produce data in specialized formsCan produce data in specialized forms Draw data from one or more projects or Draw data from one or more projects or
national databasesnational databases
Characteristics: Integrated Characteristics: Integrated “Value Added” Systems“Value Added” Systems
Tools Needed: Integrated Tools Needed: Integrated “Value Added” Systems“Value Added” Systems
Tools to “harvest” the data from a Tools to “harvest” the data from a variety of sourcesvariety of sources
Tools to integrate that data into a unified Tools to integrate that data into a unified wholewhole• Often relational databasesOften relational databases
Tools for query, output of specialized Tools for query, output of specialized data products and graphicsdata products and graphics
NEONsNEONs
What roles would What roles would a NEON site a NEON site play?play?
IntegratedIntegrated
RegionalRegional
SiteSite
ResearchersResearchers
What are the database What are the database requirements for each of the requirements for each of the elements of the idealized elements of the idealized information environment?information environment?
Key Elements needed at each Key Elements needed at each levellevel
Site/ProjectSite/Project• Metadata – in structured forms, preferably Metadata – in structured forms, preferably
standards-basedstandards-based National or NetworkNational or Network
• Consistent keyword vocabulariesConsistent keyword vocabularies• Standards for metadata contentStandards for metadata content
““Value Added” Value Added” • Domain ExpertiseDomain Expertise• Need for structured metadataNeed for structured metadata• Standards for data productsStandards for data products
Making Links WorkMaking Links Work
IntegratedIntegrated
NationalNational
SiteSite
ResearchersResearchers
Controlled Controlled vocabulariesvocabulariesNeeded for Needed for identifying identifying needed dataneeded data
Spatio-Spatio-Temporal Temporal ReferencesReferences
Needed - InterfacesNeeded - Interfaces
IntegratedIntegrated
NationalNational
SiteSite
ResearchersResearchers
Documented/Documented/Standards-Standards-based based interfacesinterfacesNeeded to Needed to transfer data transfer data and metadataand metadata
““Missing Pieces”Missing Pieces”
Domain expertise applied to developing Domain expertise applied to developing “integrated” or “value added” systems“integrated” or “value added” systems
Methods for transferring attribution Methods for transferring attribution (credit) along with the data and (credit) along with the data and metadatametadata
SummarySummary
There are diverse needs for ecological There are diverse needs for ecological datadata
Meeting those needs will require a Meeting those needs will require a variety of interlinked information variety of interlinked information systemssystems• And the tools, technologies and standards And the tools, technologies and standards
that make the links functionalthat make the links functional
RolesRoles
Development of a functional information Development of a functional information infrastructure for ecology demands the infrastructure for ecology demands the involvement:involvement:• of scientists with expertise in ecology & of scientists with expertise in ecology &
related disciplines who are willing to related disciplines who are willing to participate in system developmentparticipate in system development
• of individuals with technical expertise who of individuals with technical expertise who can work with those scientistscan work with those scientists
Example- LTER ClimDBExample- LTER ClimDB The LTER Climate The LTER Climate
Committee needed to Committee needed to develop the standards develop the standards for for database contentdatabase content and needed and needed output output formsforms in consultation in consultation with LTER Information with LTER Information ManagersManagers
IM’s were then able to IM’s were then able to create a system that create a system that met those needsmet those needs
Baker, K.B. B.J. Benson, D.L. Henshaw, D. Baker, K.B. B.J. Benson, D.L. Henshaw, D. Blodgett, J.H. Porter, and S.G. Stafford. 2000. Blodgett, J.H. Porter, and S.G. Stafford. 2000. Evolution of a Multisite Network Information Evolution of a Multisite Network Information System: The LTER Information Management System: The LTER Information Management Paradigm. BioScience 50(11):963-978.Paradigm. BioScience 50(11):963-978.
Useful ReferencesUseful References
http://www.ecoinformatics.orghttp://www.ecoinformatics.org Baker, K.B. B.J. Benson, D.L. Henshaw, D. Blodgett, J.H. Baker, K.B. B.J. Benson, D.L. Henshaw, D. Blodgett, J.H.
Porter, and S.G. Stafford. 2000. Evolution of a Multisite Porter, and S.G. Stafford. 2000. Evolution of a Multisite Network Information System: The LTER Information Network Information System: The LTER Information Management Paradigm. BioScience 50(11):963-978.Management Paradigm. BioScience 50(11):963-978.
W.K. Michener and J. Brunt. 2000. Ecological Data: Design, W.K. Michener and J. Brunt. 2000. Ecological Data: Design, Processing and Management. Blackwell Science Ltd., Processing and Management. Blackwell Science Ltd., London. London.
Olson, R. J., J. M. Briggs, J. H. Porter, G. R. Mah, and S. G. Olson, R. J., J. M. Briggs, J. H. Porter, G. R. Mah, and S. G. Stafford. 1999. Managing Data from Multiple Disciplines, Stafford. 1999. Managing Data from Multiple Disciplines, Scales, and Sites to Support Synthesis and Modeling. Scales, and Sites to Support Synthesis and Modeling. Remote Sensing Environment 70:99-107. Remote Sensing Environment 70:99-107.