24
A Practical Approach Towards Quality Assessment of Spatial Data and how it can be automated Antti Jakobsson, Matthew Beare, Jorma Marttinen, Earling Onstein, Lysandros Tsoulos, Frederique Williams ICC Paris 2011

Quality results esdin_ica

Embed Size (px)

DESCRIPTION

How to measure quality in geoinformation. Presentation at the International Cartographic Conference, Paris 2011. Based on ESDIN project.

Citation preview

Page 1: Quality results esdin_ica

A Practical Approach Towards Quality Assessment of Spatial Data

and how it can be automated

Antti Jakobsson, Matthew Beare, Jorma Marttinen, Earling Onstein, Lysandros Tsoulos,

Frederique Williams

ICC Paris 2011

Page 2: Quality results esdin_ica

What was ESDIN

• Project partially funded by eContentplus programme • Started in

September 2008 and was run for 30 months until March 2011• Coordinated by EuroGeographics with 20 project partners

Interactive Instruments Bundesamt für Kartographie

und Geodäsie

Lantmäteriet

National Technical University of Athens

IGN Belgium

Bundesamt für Eich- und

Vermessungswesen

Universität Berlin

EDINA, University Edinburgh

National Agency for Cadastre and

Real Estate Publicity Romania

Helsinki University of Technology

IGN France

Kadaster

Kort & Matrikelstyrelsen

Geodan Software Development &

Technology 1Spatial

The Finnish Geodetic Institute

National Land Survey of Finland

Institute of Geodesy, Cartography

and Remote Sensing

Statens kartverk

EuroGeographics

Page 3: Quality results esdin_ica

The ESDIN messages• Most of the quality assurance processes needed in SDIs can be automated

bringing significant savings to the producers and improving quality for users” – and we have demonstrated how this can be done

• Quality evaluation has to be done after every phase in SDIs, after the transformation process, edge-matching, generalization… but again these may be automated

• Use of common measures for Annex I, II and III themes is crucial so that usability can be evaluated

• Need for setting basic conformance levels for INSPIRE at target level-of-details if harmonized products for cross-border/pan-European/global use is required, minimum for INSPIRE would be meeting logical consistency

• Quality results are dependent on product specification, if transformation process changes these results should be re-evaluated – e.g. if road geometry is changed original positional accuracy result are no longer valid, or if level of details are not considered then completeness can not be reported

Page 4: Quality results esdin_ica

User communitiesINSPIREProducer

Inspire specifications

NSDI conformancelevelsfor referenceinformation

Data Specification

Conformancelevels

INSPIRE conformanceLevels (e.g. logicalconsistency)

User communityconformance levels

INSPIRE qualityevaluationmetadata requirements for reference data

Data

Production/qualitycontrol

Qualityevaluation

transformation

Conformancetesting

Metadata

MS

Conformancetesting

Conformancetesting

User Specification

Data /Metadata Data /Metadata

transformation

Data flow in SDIs

Quality Evaluation Quality Evaluation

Automation

Page 5: Quality results esdin_ica

Key sucesses of the ESDIN in context of quality

• Utilization of International and Open standards• Common understanding of what quality means

in respect to the target specifications and user requirements and

• How to measure it !• Provision of these results in metadata• Automation of the quality evaluation services

Page 6: Quality results esdin_ica

Benefits

• Early data error detection;

• Faster product turnaround;

• Reduced maintenance costs;

• Consistent evaluation procedures

• Better harmonisation;• Improved spatial

analysis;• Confident decision

making;• Data that is trusted

and usable.

Data providers Data consumers

Page 7: Quality results esdin_ica

ESDIN approach

Page 8: Quality results esdin_ica

ESDIN approach to quality

Page 9: Quality results esdin_ica

ESDIN approach to quality

Page 10: Quality results esdin_ica

Quality SpreadsheetsGEOGRAPHICAL NAMES 0

DATA QUALITY ELEMENTSCOMPLETENESS  

LOGICAL CONSISTENCY    

POSITIONAL ACCURACY  

TEMPORAL ACCURACY  

THEMATIC ACCURACY  

FEATURE TYPE & Attributes

COMMISSION

OMISSION

CONCEPTUAL CONSISTENCY

DOMAIN CONSISTENCY

FORMAT CONSISTENCY

TOPOLOGICAL CONSISTENCY

ABSOLUTE ACCURACY

RELATIVE ACCURAY

GRIDDED DATA POSITION ACCURACY

ACCURACY OF A TIME MEASUREMENT

TEMPORAL CONSISTENCY

TEMPORAL VALIDITY

CLASSIFICATION CORRECTNESS

NON-QUANTITATIVE ATTRIBUTE CORRECTNESS

QUANTITATIVE ATTRIBUTE ACCURACY

                                  

NamedPlace   DQ basic measure error rate: Id 7

DQ basic measure error count: Id 10

                       inspireId       DQ basi

c measure error count: Id 16

                     name (GeographicalName)  

 

                         geometry(GM_Object)             DQ basi

c measure CE95: Id 45

               type (NamedPlaceTypeValue)  

 

                      DQ basic measure error rate: Id 67

 localType (LocalisedCharacterString)  

 

  DQ basic measure error count: Id 16

                     RelatedSpatialObject (Identifier)  

 

  DQ basic measure error count: Id 16

                   IeastDetailedViewingScale  

   DQ basi

c measure error count: Id 16

                     mostDetailedViewingScale  

   DQ basi

c measure error count: Id 16

                     beginLifespanVersion       DQ basi

c measure error count: Id 16

                     endLifespanVersion       DQ basi

c measure error count: Id 16

                     Attributes of Data type GeographicalName                              spelling (SpellingOfName)                              language       DQ basi

c measure error count: Id 16

                     nativeness       DQ basi

c measure error count: Id 16

                     nameStatus       DQ basi

c measure error count: Id 16

                     sourceOfName       DQ basi

c measure error count: Id 16

                     pronunciation (PronunciationOfName)                              grammaticalGender       DQ basi

c measure error count: Id 16

                     grammaticalNumber       DQ basi

c measure error count: Id 16

                     Attributes of Data type SpellingOfName                                

text

       

DQ basic measure

error count: Id 19

                DQ basic measure error rate: Id 67

 script       DQ basi

c measure error count: Id 16

                     transliterationScheme       DQ basi

c measure error count: Id 16

                     Attributes of Data type PronunciationOfName                              pronunciationSoundLink

       

DQ basic measure

error count: Id 19

                   

Page 11: Quality results esdin_ica

Sampling/Full Inspection

The cells of DQ basic measures are colour coded. The colours indicate the evaluation procedure:

·         Attribute inspection by sampling according to ISO 2859 series (yellow cell)

·         Variable inspection by sampling according to ISO 3951-1 (green cell)

·         Full inspection (orange cell)

FEATURES AND ATTRIBUTES SAMPLING (ISO 2859)  

FULL INSPECTION (automatic)   SAMPLING (ISO 3951)  

  ISO 2859 states the principles of testing sufficient items of the whole population by sampling. When expressed as two integers the error ratios of data subsets can be summed up to data set error rate by dividing the total number of errors with the total c

If errors exist (error count > 0) the sub set should be rejected and corrective action by the producer is needed. It is assumed that the number of errors found is quite small. The customer may be attempted to make those few corrections them selves. This i

ISO 3951 variable sampling gives reliable results on small sample sizes. CE95/LE95 is close enough the upper limit (U) of the standard on AQL 4 level. The ISO 3959 offers a clear acceptance criteria based on the sample.

MandatoryVoidableOptional 

According to INSPIRE Data

Specifications v3

 

Page 12: Quality results esdin_ica

Testing plans

12

Page 13: Quality results esdin_ica

How to utilize the quality model• Quality model will be transformed to a rule set and conformance

levels• ELF specifications will include these for the NMCAs• Automated tools utilizing the rule and conformance levels

Page 14: Quality results esdin_ica

Quality requirements/Conformance levels

• To set the requirements use the quality measures• To consider the nature of reality

– Feature vagueness– Change rates– Themes

• Suggested guidance for positional accuracy• Suggestion on setting the classification of conformance levels

Page 15: Quality results esdin_ica

Positional accuracy

Page 16: Quality results esdin_ica

Quality evaluation Process• Step 1: Applying the data quality measure to the data to be checked.

The procedure for this is described in the the ISO19113/19114 standards

• Step 2: Reporting the score for each measure in a report form for each measure

• Step 3: Comparing the result from step two to the defined conformance level

• In addition, two continuing steps can be done:• Step 4: Summarizing the conformance results into one result for

each for each data quality elements• Step 5: Summarising the results from step 4 into one overall dataset

result

Page 17: Quality results esdin_ica

Grading data exampleGrade Data Quality description

Excellent Only class A for all quality measures

Very good A majority of A’s, but also some B’s

Good A majority of B’s, some A’s, no C’s

Adequat Only a very few C’s, the other B’s and better

Marginal A majority of C’s but also some B’s

Not good No measure reached the class B (i.e. all measures on class C)

Page 18: Quality results esdin_ica

ESDIN approach to quality

Page 19: Quality results esdin_ica

Where you utilize quality webservices?• If you are a data provider for SDI

– For quality control during production (automated) called here conformance testing (this includes edge-matching and generalization)

– For quality evaluation after the production (semi-automated)

• If you are the SDI co-ordinator or data custodian

– For quality audit for process accreditation or data certification doing either conformance testing and/or quality evalution

• If you are customer or data user

– To evaluate usability using metadata information

Page 20: Quality results esdin_ica

Schema/CRSTransformation

DataIntegration

& Conflation

EdgeMatching

GeometricGeneralisation

CartographicEnhancement

Transform data in local schemas to common INSPIRE/ExM data

specification framework.

Integrate and conflate INSPIRE/ExM data into collaborative datasets.

Ensure cross-border consistency across neighbouring data.

Reduce data complexity, for effective use at alternate scales.

Modify and enhance data for mapping purposes.

Transformation processes arekey to data interoperability, harmonisation & re-use.

Quality controlled automated processes enable consistent and responsive data delivery.

MaintainData

Page 21: Quality results esdin_ica

AccessExMData

ExMData

EvaluateData

Quality

ApproveCompliant

DataExMData

SourceData

ConformanceMeasures

AgreedDQ Rules

ExMQualityModel

QualityMetadataReport

OK

Not OK

DataImprovement

Report

ImproveData

AssuredData

Continuous Improvement

Page 22: Quality results esdin_ica

Rulesets &TemplatesDatabase

Object OrientedGeospatial Rules Engine

CollaborativeWeb-based

Rule Authoring

WebServicesInterface

Data QualityEvaluation

Service

BusinessRules

Data for Evaluation Quality Measures

GeospatialData File

Rule Builder:Intuitive user interface to author, agree and manage DQ measures.

DQ Client Application:Accessible, easy to use, automatic Data Quality Evaluation Service

DQ Rules Engine:W3C Web Services interface using open standards to describe & execute geospatial rule evaluation.

Rule Repository:Data Quality Rules, derived and guided by Quality Model.

Web FeatureService

Quality Evaluation Service

SOAP HTTP

22

Page 23: Quality results esdin_ica

Conclusions• It is important that INSPIRE will give a platform for data quality

information; minimum data quality comformance levels set and then ability to report other user community related conformance levels

• Quality evaluation metadata should be available for automated conformance testing

• Introducing a quality model which uses a same principles for all Annex I themes -> we will suggest this a guideline for INSPIRE implementation

• Introducing comformance levels that can be evaluated using semi-automated or automated based on ISO standards

• Automation of quality evaluation and conformance testing can be done for all transformation related workflows including schema transformation, generalization and edge matching

• Significant saving potential in quality reporting and improvement of data

Page 24: Quality results esdin_ica

THANK YOUMore information; www.esdin.eu