28
GeoViQua: GeoViQua: Advances in data Advances in data quality disclosing quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) [email protected]

GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) [email protected]

Embed Size (px)

Citation preview

Page 1: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

GeoViQua: GeoViQua: Advances in data quality Advances in data quality

disclosingdisclosing

Ivette Serral

Center of Research in Ecology and Forestry Applications (CREAF)[email protected]

Page 2: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

QUAlity aware VIsualisation for the

Global Earth Observation system of systems

It’s an FP7 project devoted to show quality information embedded in

GEOSS data (2011-2014)

10 partners, 7 countries

Page 3: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

The problemThe problem

GEOSS data is treated by means of the GEOSS Common Infrastructure (GCI)

• Is there quality information in the GCI?– There is some in the form of ISO19115 DQ elements and lineage– But.. not enough

• The GCI does not follow a global model for quality

The GCI is shown and searchable on the GEO Portal• The GEOPortal search and results

– are not ranged by quality– quality indicators are not easily comparable– spatially distributed uncertainty is not included

Page 4: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Community ViewCommunity View

Data Quality?

• Many researchers refer to the ‘famous five’ as the common criteria for evaluating spatial data quality:– lineage; completeness; consistency; positional accuracy; and attribute

accuracy.

• Broad scientific acceptance of the common spatial quality elements does not apply to all cases for “fitness-for-use” evaluation– user requirements can go far beyond the widely accepted ‘famous five’.

• We used semi-structured telephone and face-to-face interviews with a variety of geospatial data users and experts from a number of countries and application domains. More information at: http://www.geoviqua.org/Docs/SubmittedDeliverables/D2_1_GeoViQua.pdf

Page 5: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

What about users?What about users?

• Users are exceedingly interested in good quality metadata records – And information that can help to assess fitness-for-use of the data

• Users find metadata records typically incomplete with essential data omitted– The process of dataset discovery and selection is more difficult

• Users are also interested in ‘soft’ knowledge about data quality– Data providers’ comments on the overall quality of a dataset, known data errors,

potential data usage– Peers’ reviews and recommendations (they contact their peers to obtain suggestions)– Dataset provenance, citation and licensing information

• Citation is incomplete (lack of valid producer contact details), and licensing often missing• Citation: users rely on data from good reputation producers

• Currently, some of these cannot be recorded in standard metadata

• Users need to easily and systematically compare metadata records– Side-by-side visualisation of all metadata elements would allow geospatial datasets to be

compared more effectively, • especially when datasets are very similar and differences are hard to distinguish

Page 6: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Quality model is much more than Quality model is much more than positional accuracypositional accuracy

• There are many quantifiable aspects that can be recorded:– Consistency, completeness, positional,

thematic and temporal accuracy…• There are many qualitative aspects that are

needed:– Lineage (traceability), scientific papers, user

feedback, data usage…

Page 7: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

GeoViQua Data model treats statistical GeoViQua Data model treats statistical uncertaintiesuncertainties

<gmd:DQ_QuantitativeAttributeAccuracy><gmd:result>

<gmd:DQ_QuantitativeResult><gmd:valueUnit>m</gmd:valueUnit>

<gmd:value> <gco:Record>3.6</gco:Record>

</gmd:value></gmd:DQ_QuantitativeResult>

</gmd:result></gmd:DQ_QuantitativeAttributeAccuracy>

<gmd:DQ_QuantitativeAttributeAccuracy><gmd:result>

<gmd:DQ_QuantitativeResult><gmd:valueType>

<gco:RecordType xlink:href=“http://www.uncertml.org/distributions/normal”>Value of the vertical DEM accuracy

</gco:RecordType></gmd:valueType><gmd:valueUnit>m</gmd:valueUnit>

<gmd:value> <gco:Record>

<un:NormalDistribution><un:mean>1.2</un:mean><un:variance>3.6</un:variance>

</un:NormalDistribution></gco:Record>

</gmd:value></gmd:DQ_QuantitativeResult>

</gmd:result></gmd:DQ_QuantitativeAttributeAccuracy>

Explicit recognition that errors acceptably fit a Normal distribution with mean 1.2 • An overall positive bias was observed • A difficult feature to convey by traditional means)

Page 8: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Two models on data quality are Two models on data quality are neededneeded

• Producer’s quality metadata– In the producers metadata records– Encoded in the classical ISO 19115/19139– Some extensions required– Stored in the current catalogues (GEOSS Clearinghouse, etc)

• User’s quality metadata– In independent metadata repositories– Linked to producer’s metadata by id– Future component of the GCI?– Contains comments, “like it”, star rates, etc

Page 9: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in quality models: GVQ - Advances in quality models: GVQ - producerproducer quality model quality model

http://schemas.geoviqua.org/GVQ/3.1.0

Page 10: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

1. Publications. Based on ISO 19115 CI_Citation and extended with ISO 690 elements. Added to a number of quality elements within the metadata document. An existing DQ_ or MD_ element is extended to allow a ‘referenceDoc’ to be added.

2. Discovered issues. Added discovered issue class (e.g., a problem which the producer has identified during generation of a dataset) to the DQ_DataQuality element.

3. Reference datasets used for evaluation. Added to ‘dataEvaluation’ section of the 19157 to allow recording the reference dataset used to assess the quality indicator.

4. Traceability. Added a new ‘metaquality’ type to allow the lineage of a data quality assessment to be recorded, along with its representativity and coverage. This is a requirement of the QA4EO principles.

More information: Lucy Bastin [[email protected]]& a poster in this session room

Advances in quality models: GVQ - Advances in quality models: GVQ - producerproducer quality model quality model

Page 11: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in quality models: GVQ - Advances in quality models: GVQ - useruser quality model quality model

class Feedback model

GVQ_FeedbackTarget

+ parent :GVQ_FeedbackTarget- target :string

«XSDelement»+ natureOfTarget :MD_ScopeCode

GVQ_Rating

+ ratingValue :int

GVQ_UsageReport

+ usagePurpose :GVQ_ReportAspectCode [0..*]+ Citation :CI_Citation [0..1]+ usageDescription :string

«XSDelement»+ alternativeDatasets :MD_Identifier [0..-1]

GVQ_MetadataOv erride

+ alternativeDataQualityEstimate :DQ_DataQuality

The target reference identifies the "hard" discussion context. The most common case would be a data set or a sensor service. It unambiguously refers to a thing pre-existing in the domain of discourse - a user cannot freely create a feedback target.

The feedback focus is intended to qualify a "narrow" discussion context similar to a discussion thread. The "narrow" context is always within one "hard" context.The user may create (some types of) feedback focuses.

[The FB Focuses attributes are considered examples]

Together, target and focus constitute the subject of a given feedback item.

«abstract»GVQ_FeedbackFocusType

GVQ_ExternalFeedback

- resourceURL :String- mime :String

GVQ_UserInformation

+ user :CI_ResponsibleParty [0..1]+ applicationDomain :string [0..*] {ordered}+ expertiseLevel :int

GVQ_ThematicFocus

+ title :string

GVQ_DatacentricFocus

- layer :string+ extent :EX_SpatialTemporalExtent- band :string

GVQ_FeedbackFocus

+ item :GVQ_FeedbackItem*«abstract»

GVQ_FeedbackItem

«id»- identifier :string

A reply points to some other feedback item, but they require IDs for implementation purposes anyway.

GVQ_UserComment

- comment :String- mime-type :String = text/plain

GVQ_FeedbackGroup

- timestamp :CI_Date- user :GVQ_UserInformation- roles :GVQ_UserRoleCodeEnum [1..*]

GVQ_DomainFocus

- applicationDomainURN :string

GVQ_TagFocus

+ tags :string [0..*]

GVQ_GeoLabel

«enumeration»GVQ_ReportAspectCode

Useage = Useage Problem = Problem FitnessForPurpose = Fitness for Purpose Alternatives = Alternatives

«enumeration»GVQ_UserRoleCodeEnum

CommercialDataProducer = Commercial Data... ResearchEndUser = Research End-User NonResearchEndUser = Non-research En... ScientificDataProducer = Scientific Data...

1

+items

1..*

0..*1

+secondaryFoci

0..*

+supplementaryFoci

0..*

+primaryFocus

1

11

0

0..10..*1

http://schemas.geoviqua.org/GVQ/3.1.0

Page 12: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

ISO 19115 only provides the MD_Usage to report how users apply the dataset in their activities. This is insufficient for the GEOSS needs. GeoViQua has elaborated this model from scratch.

A user can submit a GVQ_FeedbackItem in a form of:• A user comment.• A rating mark.• A usage report supported by a citation of a report.• A link to external feedback (blog pages, Google docs

document, etc).• A metadata override that amends a producer metadata value.• A quality label (GEO Label). • These items are related to a dataset through an identifier.

More information: Lucy Bastin [[email protected]]& a poster in this session room

Advances in quality models: GVQ - Advances in quality models: GVQ - useruser quality model quality model

Page 13: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

• The GeoViQua Quality Model is explained in the GEOSS Best Practice Twiki: http://wiki.ieee-earth.org/GEOSS_Tutorials

• It has been presented in the AIP5 session and it i’s a contribution to the GEOSS Standards and Interoperability Forum (SIF).

More information: Anna Riverola

[[email protected]]

Advances in quality models: GVQ - Advances in quality models: GVQ - useruser quality model quality model

Page 14: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in visualizing metadata Advances in visualizing metadata quality informationquality information

GeoViQua has developed the Q-Rubric tool, an extension on the NOAA former’s version

• An XSLT tool that convert XML metadata files into an HTML punctuation page.

• Analyses every ISO quality metadata information and rates it by presence/absence (attributing one point when metadata exists, but not penalizing if information is missing).

• Help users to evaluate how many metadata elements related to data quality are provided.

• Adds two new information groups related to ISO quality: Quality and Usage.

• GEOSS representation style has been applied to the original Rubric tables.

Page 15: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in visualizing metadata Advances in visualizing metadata quality informationquality information

• Some results from the GCI:– 97203 metadata records held in the Clearinghouse; 96867 analysed

– 14.79% non defining mandatory topic category

– 80.63% do not have any quality element (of any class)

– Quality: Positional accuracy is the most populated class with 37.77% documented. 36.06% of completeness and 18.79% of logical consistency. Only 0.50% regards to thematic accuracy.

– Lineage: 35.27% do not have any lineage sub-element defined.

– Usage: 0.60% of elements documented.

• Conclusions:– Metadata providers do not comply with the ISO Core Mandatory. Many

topic categories present just a 75% of completeness.

– This impacts metadata search engines for data discovery requests.

Download it: http://www.geoviqua.org/ docs/ isoRubricQHTML.xsl

More information: Alaitz Zabala [[email protected]]

Page 16: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in visualizing quality Advances in visualizing quality information Iinformation I

Integrating UncertWeb project proposals: Use NetCDF-U The Network Common Data Form (NetCDF) is one of the primary

methods of self documenting data storage and access in the international geosciences research and education community and beyond.

NetCDF-U Conventions are used to formally qualify the uncertainty information in geospatial data encoded in the netCDF-3 format, by means of concepts from the UncertML best practice of the UncertWeb project

NetCDF-U Conventions are designed to be fully compatible with the netCDF Climate and Forecast Conventions, the de-facto standard for a large amount of data in the Fluid Earth Science community.

It is now a discussion paper in OGC.

Page 17: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

More information: Victor Zaldo [[email protected]]

Advances in visualizing quality Advances in visualizing quality information Iinformation I

• Many data involved in the GeoViQua scenarios are encoded in NetCDF.

• An open source format file.

• Gives strength and freedom to encode metadata.

GeoViQua is developing tools for reading and writing NetCDF-U files and import/export from/to other raster formats.

NetCDF file opened with

the NASA software Panoply

NetCDF file exported to IMG file and opened with the new tool

Page 18: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in visualizing quality Advances in visualizing quality information IIinformation II

Integration of Quality Information with OGC Web Map Service: WMS-Q

• The WMS 1.3.0 currently does not well support the integration of quality information into WMS.

• The current WMS does not support how data layer can semantically associate with the corresponding uncertainty layers.

• WMS-Q specification is proposed as far as possible within the bounds of the WMS 1.3.0 specification, requiring as few extensions as possible.

• To integrate the dataset-level quality information into the WMS, we propose to expand slightly “Type” attribute of “MetadataURL” element to have “unstructured” and “other-structured” options.

• Propose to add a “description” element for the “MetadataURL” element.• Pixel-level uncertainty information can be encoded using NetCDF Uncertainty

Conventions (NetCDF-U). • Work tested in the OGC interoperability experiment OWS-9

More information: Jon Blower [[email protected]]

Page 19: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Preliminary results from experiments with colour coding:• Quality should be intercomparable - i.e. the saturation should be intuitively comparable

even across hues/categories. Perceptual colour models make this possible.• Hue represents category, and saturation represents the "Purity for the parcel

enrichment" (in percent) or the certainty.

Advances in visualizing quality Advances in visualizing quality information IIIinformation III

More information: Simon Thum [[email protected]]

Nearly uncertain in bothcampaigns

Gain in certainty

22.03.07 16.12.2006

Page 20: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in visualizing quality Advances in visualizing quality information IVinformation IV

Creation of a “Carbon Atlas” portal

Combining the possibilities of web mapping with the comparison of models including uncertainty: combination of ncWMS (server) and OpenLayers (client):

1. Possibility to compare models between them:

ncWMS: Web Map Service for geospatial data that are stored in  CF-compliant  NetCDF files(developed and maintained by the  Reading

e-Science Centre)

Page 21: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in visualizing quality Advances in visualizing quality information IVinformation IV

2. Creation of Comparison map (based on IPCC’s visualization method): colour pixel = difference between models, patterns = % on how models agree.

Need to add to the ncWMS server the possibility to associate pattern/raster.

More information: Pascal Evano [[email protected]]

Page 22: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

Advances in applied scenarios IIAdvances in applied scenarios II

Uncertainty assessment for continuous and categorical variables

• Continuous variables: uncertainty related to citizens meteo data in relation to the official Metoffice ones. More information: Dan Cornford [[email protected]]

• Categorical variables: spatialized quality indicators coming from a satellite image classification. Global, local and pixel uncertainty level. Several statistical classification methods are used. More information: Eva Sevillano [[email protected]]

Cat1-Classification Probability of success (%)

Cat2 Cat3 Fidelity

Page 23: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

• Quality search integrated in the EuroGEOSS Discovery and Access Broker to be applied to the GEO Portal.

Advances in including data quality Advances in including data quality in searchin search

Page 24: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

• Retrieve quality information embedded in Metadata

Advances in including data quality Advances in including data quality in searchin search

More information: Lorenzo Bigagli [[email protected]]

Page 25: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

• What is it?– The GEO Label is intended to “assist the user to assess the scientific relevance,

quality, acceptance and societal needs of the components” (ST-09-02 Task Team, 2010).

• Purposes?– be a quality indicator for GEOSS geospatial data and datasets;– improve user recognition and trust in datasets that carry a GEO label;– assist in searching by providing users with visual clues of dataset quality and

relevance; and– increase visibility of EO data.

• GEO label development:– The GeoViQua project is currently undertaking research to define and evaluate the

concept of a GEO label. – The development is carried out in three phases:

Advances in labelling the quality: Advances in labelling the quality: the GEO Labelthe GEO Label

Done!

In progress!

Page 26: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

• Phase I Study:– Overall, GEO label questionnaire results show that users and producers agree on the

benefits of introducing a GEO label, with no distinct difference between user and producer views.

– The majority of respondents support an all-in-one drill-down interrogation facility as the key GEO label function.

• Phase II Study:– The GEO labels will be a graphical representation generated individually for each

dataset in the GEOSS (or other data portals and clearinghouses) based on the quality information that is available for that dataset.

– Second online questionnaire-based survey to identify the designs that convey quality information to users in most efficient and comprehensible way.

• Currently:– At this stage we are analysing the GEO label study II results to fully define and

establish a GEO label that meets the needs of the geodata user community.– Phase III: we will create physical prototypes which will be used in a human subject

study.

Advances in labelling the quality: Advances in labelling the quality: the GEO Labelthe GEO Label

More information: Victoria Lush [[email protected]]

Page 27: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

www.geoviqua.org

Foz d’Iguaçú, 21-23 November 2012

• Many possibilities has been shown.• Now the project enters in a development phase where the

concepts exposed and prototypes need to be developed.• Move the GeoViQua Quality Model for a broader adoption.• Develop a user feedback system prototype.• Test search and visualization developments in a GEO

Portal replica (ESA contribution)• Work with the Architecture GEO committees to move some

of this contribution for adoption in the GCI.

The futureThe future

Page 28: GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

GeoViQua: GeoViQua: Advances in data quality Advances in data quality

disclosingdisclosing

Thanks! Ivette Serral

Center of Research in Ecology and Forestry Applications (CREAF)[email protected]