Quality and Reliability of CRIS data A case for euroCRIS? euroCRIS Membership Meeting November 1 –...

Preview:

Citation preview

Quality and Reliability of CRIS data

A case for euroCRIS?

euroCRIS Membership MeetingNovember 1 – 2, 2007, Vienna

Maximilian Stempfhuber

GESIS–IZ Social Science Information CentreBonn, Germanymax.stempfhuber@gesis.org

What to expect

No Answers…

…only Questions!

Current situation

• Data within a single CRIS is not up-to-date or correct

• Data harvested from different sources does not match

• Coupling of systems and data difficult because of different features, data structures / semantics, invalid references, …

• What more?

Data errors

• Single data source– Schema level

• Value out of range• Referential integrity violated• …

– Data level• Missing value• Typing errors• Wrong values• Duplicates• …

Data errors (cont.)

• Multiple data sources– Schema level

• Structural heterogeneity• Semantic heterogeneity• …

– Data level• Contradictory values• Different representations• Different level of aggregation• Duplicates• …

Quality of data

• When is an error an error?

• Who decides what is correct?

• How can we correct existing errors?

• How can we prevent future errors?

• What is Quality?

• How can we guarantee it in a CRIS?

What is Quality?

• Degree to which a set of inherent characteristics fulfills requirements (ISO 9000)

• Conformance to requirements (Philip B. Crosby)

• "Fitness for use". Fitness is defined by the customer. (Joseph M. Juran)

• The quality has two dimensions: "must-be quality" and "attractive quality“ (Noriaki Kano)

What is Quality?

A quality is a characteristic that a product or service must have. For example, products must be reliable, useable, and repairable. These are some of the characteristics that a good quality product must have. Similarly, service should be courteous, efficient, and effective. These are some of the characteristics that a good quality service must have. In short, a quality is a desirable characteristic.

What is Quality? (cont.)

However, not all qualities are equal. Some are more important than others. The most important qualities are the ones that customers want. These are the qualities that products and services must have.

What is Quality? (cont.)

So providing quality products and services is all about meeting customer requirements. It's all about meeting the needs and expectations of customers. So a quality product or service is one that meets the needs and expectations of customers.

http://www.praxiom.com/iso-definition.htm

What is Quality? (cont.)

The quality of a product or service refers to the perception of the degree to which the product or service meets the customer's expectations. Quality has no specific meaning unless related to a specific function and/or object. Quality is a perceptual, conditional and somewhat subjective attribute.

http://en.wikipedia.org/wiki/Quality

Information Quality

• IQ or data quality denotes the degree of relevance of information in relation to a specific context and information need.– Requirements may be user specific

or very general– Total of all requirements towards

information or information products ([information]process oriented view)

– Information that is fit for use by information consumers (user oriented view)

Information Quality (cont.)

• Business oriented view:– Creating your own data and

information: constructive information quality.

– Getting data and information from external sources: receptive information quality

http://www.b-i-t-wiki.de/index.php/Informationsqualit%C3%A4t

Criteria for IQ

EigenvalueCorrectness, objectivity, trustablity, reputation

Information context Relevance, added value, timeliness, completeness, amount of information

View to informationInterpretability, comprehensibility, free of manipulatoin, integrity, free of conflicts

Information accessAccess to the system, Secure access

(Wang & Strong)

Criteria for IQ (cont.)

User-specific view:

• Degree of confidence in the correctness of the information

• Trustability of information on the basis of previous experiences

• Verifiability of information

• Precision of information

• Timeliness of information (Heinrich)

Criteria for IQ (cont.)

For electronic media:• Internal quality

Precision, objectivity, trustability• Quality of access

Accessibility, Security• Quality in context

Meaning, added value, timeliness, completeness, information content

• Quality of displayInterpretability, comprehensibility, compactness

• Quality of metadata (meta information)Existence, adequacy

• Quality of structureExistence, adequacy, traceability

(Königer & Reithmayer)

Quality and CRISs

• User‘s view (determines categories for CRIS quality)

• Data producer’s view (initially creates information and (sometimes) has to maintain it)

• Data provider’s view (has to ensure information quality and quality of service)

Quality and CRISs (cont.)

• Roles: Data producers/researchers, CRIS/service providers, CRIS users

• IQ criteria: Precision, objectivity, trustability, timeliness, completeness, added value, accessibility, …

Is it going beyond Code of Good Practice?

Who is responsible for which quality criteria (in which phase)?

User‘s view

• Do we know the users‘ information needs (records, statistics,…)?

• Do we know of canonical needs (to specify pre-structured queries)?

• Do we know how information should be displayed, how it should be browsable, …?

• Do we know how information is used at the user‘s site (preferred formats, additional processing)?

CRIS provider’s view

• What scope and content should the CRIS have (= users‘ information needs)?

• How can we guarantee completeness

• How can we guarantee sustainability?

• How have quality criteria to be defined for local use of a CRIS?

• How for federated CRISs?

Data producer‘s view

• What support do I have in entering data?

• Who helps me in maintaining it?

• Can I reuse the data I entered in other contexts?

Questions to euroCRIS

Do we have • Use cases generally accepted?• Common set of information quality

criteria (beyond what is supported by database mechanisms and CERIF structure)?

• Do we need end-user testing?• How can we establish IQ in the CRIS

community?• How can we share IQ with other

actors?

23

Thank You!

Dr. Maximilian StempfhuberGESIS-IZ Social Science Information CentreLennéstr. 30, 53113 Bonn, Germanymax.stempfhuber@gesis.org

Recommended