26
The Question of Quality Week 9 Most of this presentation is based on the work of Marcos Goncales as cited in the references

The Question of Quality

  • Upload
    yamka

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

The Question of Quality. Week 9 Most of this presentation is based on the work of Marcos Goncales as cited in the references. Goals for this class. Consider quality in digital libraries How do we define quality How do we measure quality How does quality control impact a user? - PowerPoint PPT Presentation

Citation preview

Page 1: The Question of Quality

The Question of Quality

Week 9

Most of this presentation is based on the work of Marcos Goncales

as cited in the references

Page 2: The Question of Quality

Goals for this class

• Consider quality in digital libraries– How do we define quality– How do we measure quality– How does quality control impact a user?

• The role of logging– Helpful information– Privacy issues

• The status of DL logging

Page 3: The Question of Quality

Understanding Quality in a DL

• Quality indicators: proposed descriptions of quantities or observable variables that may be related to quality– “measures” = stronger term. Requires validation– Gonçalves et al provide analysis of quality conditions and

recommend specific quantities to be used.• Dimensions of quality

• Proposed indicators

• Application to DL concerns

Page 4: The Question of Quality

Getting the data

• Where does the data come from?– Logging– Surveys– Focus Groups

• Know what information is needed, then choose the method most likely to provide the data.– More about the sources of data after we see what

we need to know.

Page 5: The Question of Quality

What are we looking for?

• Consider that we are concerned about the quality of the following characteristics of a DL:– Data objects– Metadata– Collection– Catalog– Repository– Services

• What characteristics do we want each of those to have?

Page 6: The Question of Quality

Dimensions of Quality

• Digital Object– Accessibility– Pertinence– Preservability– Relevance– Similarity– Significance– Timeliness

• Metadata Specification– Accuracy– Completeness – Conformance

• Collection– Completeness

• Catalog– Completeness– Consistency

• Repository– Completeness– Consistency

• Services– Composability– Efficiency– Effectiveness– Extensibility– Reusability– Reliability

Page 7: The Question of Quality

What information do we need - related to Digital Objects

• Accessibility– What collection?– # of structured streams– Rights management metadata– Communities to be served

• Pertinence– Context– Information content– Information need

Page 8: The Question of Quality

Information need - Digital Objects, continued

• Preservability– Fidelity (lossiness)– Migration cost– Digital object complexity– Stream formats

• Relevance– Feature frequency– Inverse document frequency– Document size– Document structure– Query size– Collection size

Page 9: The Question of Quality

Information need - Digital Objects, continued

• Similarity– All the same features as in relevance– Also: citation/link patterns

• Significance– Citation/link patterns

• Timeliness– Age– Time of latest citation– Collection freshness

Page 10: The Question of Quality

Information need - Metadata Specification

• Accuracy– Accurate attributes– # attributes in the record

• Completeness– Missing attributes– Schema size

• Conformance– Conformant attributes– Schema size

Page 11: The Question of Quality

Information - Collection and Catalog

• Completeness of the Collection– Collection size– Size of an “ideal” collection

• Completeness of the Catalog– # of digital objects with no metadata

• Item level metadata

– Size of the collection

• Catalog Consistency– # of metadata specifications per digital object

Page 12: The Question of Quality

Information about the Repository

• Completeness– # of collections

• Consistency– # of collections – Catalog/collection match

• How well do the catalogs match the collections?• Are the catalogs for all the collections at the

same level of detail?

Page 13: The Question of Quality

Service Information Need

• Composability (ability to be combined to form new services)– Extensibility– Reusability

• Efficiency– Response time

• Effectiveness– Precision/recall (of search)– Classification

Page 14: The Question of Quality

Service Information, continued

• Extensibility– # extended services– # services in the DL– # lines of code per service manager

• Reusability– # reused services– # services in the DL– # lines of code per service manager

• Reliability– # service failures– # accesses

Page 15: The Question of Quality

Making more concrete

• Each of the measures listed gives an idea of the information need

• Exactly what do we measure?• How do we combine numbers obtained

to get a usable result?• Following pages describe specific

measures and formulas for combining those.

Page 16: The Question of Quality

Digital Object Accessibility

• Basic requirement– If a user cannot access the DO, there is little point

in having it in the DL– Identified measures:

• Collection, # structured streams, rights management metadata, communities

– Say it another way:• Is it present in a collection in the repository?• Is there a service that can retrieve and display the

content?• Is the rights management open enough for access by

this user?

Page 17: The Question of Quality

Digital Object Accessibility - formally

Define dox = a specific digital object

Accessibility = Acc(dox, acy) =– 0, if there is no collection C in the DL repository R

such that dox C

– Otherwise, acc = (∑z struct_streams(dox) rz(acy))/ |struc_streams(dox)|

– where rz(acy)) is a rights management rule defined as • 1, if

– Z has no access constraints, or – Z has access constraints and acy cmz,

» Where cmz, Soc(1) is a community that has the right to access z; and

• 0, otherwise

This does not deal with accessibilty related to accessing the streams

Page 18: The Question of Quality

An illustration

• NDLTD is the Networked Digital Library of Theses and Dissertations– Some institutions requre that all theses and

dissertations be stored in this DL– Student chooses how visible to make the

document.• Parts of the document may be visible while other parts

are not• The document, or parts of it, may be visible to a

restricted community.

Page 19: The Question of Quality

Accessiblity case

• etdx is a specific electronic thesis or dissertation of interest

• acc(etdx) is– 0 if it is not in the collection

– Otherwise (∑z struct_streams(etdx) rz(acy))/ |

struc_streams(dox)|

• Where rz(acy) = 1

– if etdx is marked “world wide access” or etdx is marked “local institution only” and acy C where C is defined as identifiable members of the local institution

• = 0 otherwise

Page 20: The Question of Quality

With the numbers

• An example from VT• For authors name beginning with A:

– Unrestricted ETDs: 164– Restricted ETDs: 50– Mixed ETDs: 5

• Percent unrestricted: 0.5, 0.5, 0.167, 0.1875, 0.6)

• Overall measure of accessibility outside VT:– (164 *1 + 50 * 0 + .5 + .5 + .167 + .1875 + .6)/219– 0.76

Page 21: The Question of Quality

Solidifying Pertinence

• How do we measure something like pertinence?

• Relation between the information content of a digital object and the need of the user

• Depends on the user’s situation -- background, current context, etc.

Page 22: The Question of Quality

Pertinence• Inf(doi) represents the information content of digital

object I• IN(acj) is the Information Need of actor (user) acj

• Context (acj, k) the combined effects of social factors that determine the pertinence of doi to acj at time k

• Two communities of actors– Users whose information needs we try to satisfy– External Judges who are responsible for judging the

relevance of a document in response to a query. – Non overlapping groups

Page 23: The Question of Quality

Pertinence formula

• Pertinence (doi, acj, k): Inf(doi) X IN(acj) X Context(acj, k) defined as

– 1 if Inf(doi) is judged by acj to be informative with regard to IN(acj) in context Context(acj, k)

– 0 otherwise

• Rather complex way to say that the information is relevant if either the user or a qualified independent judge says it is

Page 24: The Question of Quality

Preservability

• Property of a digital object that describes its state relative to changes in hardware and software, representation format standards– Ex new recording technologies

(replacement of VHS video tapes by DVDs)– New versions of software such as Word or

Acrobat– New image standards such as JPEG 2000

Page 25: The Question of Quality

Digital preservation techniques• Migration

– Transform from one format to another• Ex. Open the document in one format and save in another or do an automated

transformation

• Emulation– Reproducing the effect of the environment originally used to display the material

• Keep an old version of the software, or have new software that can read the old format

• Wrapping– Keep the original format, but add enough human-readable metadata so that it

can be decoded in the future• Note that the material is not directly usable

• Refreshing– Copy the stream of bits from one location to another

• Particularly suitable for guarding against the physical deterioration of the medium

Page 26: The Question of Quality

References

• Gonçalves, M. A., Moreira, B. L., Fox, E. A., and Watson, L. T. “Quality Model for Digital Libraries” to be published soon.

• Gonçalves, M. A., Luo, M., Ali, M. F., and Fox, E. A. “An XML Log Standard and Tool for Digital Library Logging Analysis” In Research and Advanced Technology for Digital Libraries, 6th European Conference, ECDL 2002, Rome, Italy, September 16-18, 2002, Proceedings, eds. Maristella Agosti and Constantino Thanos, pp. 129-143.