24
Terminology in Statistical Information Integration Tasks: What’s the Problem? Open Forum 2003 on Metadata Registries Thursday, January 23, 2003 2:00-2:45 pm Sheila O. Denn

Terminology in Statistical Information Integration Tasks: What’s the Problem?

  • Upload
    solana

  • View
    16

  • Download
    0

Embed Size (px)

DESCRIPTION

Terminology in Statistical Information Integration Tasks: What’s the Problem?. Open Forum 2003 on Metadata Registries Thursday, January 23, 2003 2:00-2:45 pm. Sheila O. Denn. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Terminology in Statistical Information Integration Tasks: What’s the Problem?

Terminology in Statistical Information Integration Tasks:

What’s the Problem?

Open Forum 2003 on Metadata Registries

Thursday, January 23, 20032:00-2:45 pm

Sheila O. Denn

Page 2: Terminology in Statistical Information Integration Tasks: What’s the Problem?

2Open Forum 2003 on Metadata Registries

Introduction

Work undertaken as part of NSF grant (EIA 0131824) to study integration of data and interfaces to work toward a statistical knowledge network.

This talk focuses on results from first phase of metadata user study to determine what kinds of problems users have with terminology and metadata on government statistical web sites.

Page 3: Terminology in Statistical Information Integration Tasks: What’s the Problem?

3Open Forum 2003 on Metadata Registries

agency backend data

agency backend data

agency backend data

agency backend data

agency intermediary: reports, tables,

“planned” DB queries

agency intermediary: reports, tables,

“planned” DB queries

agency intermediary: reports, tables,

“planned” DB queries

agency intermediary: reports, tables,

“planned” DB queries

end user: generally passive

reader, little interaction, must do all integration

Current Situation: each agency has its own backend data and provides its own intermediary. End user has little opportunity for interaction or active manipulation. Burden of finding information and integrating it across agencies (and occasionally within one agency) is on user.

firewall

Page 4: Terminology in Statistical Information Integration Tasks: What’s the Problem?

4Open Forum 2003 on Metadata Registries

agency backend data

agency backend data

agency backend data

agency backend data

Goal: In the SKN, each agency has its own backend data, which feeds into a common public intermediary (PI) outside of firewall. User Interfaces link to the PI under user control.

public intermediary: variable/concept level,

XML-based, single point of access to information

from all agencies

Statistical Ontology

firewall

Domain ExpertsEnd User

Communities

Domain Ontologies

I n t

e r

f a c

e s

U s

e r

end user

end user

end user

end user

end users: interactwith data frominformation/conceptperspective, notagency perspective

end user

end user

end user

Page 5: Terminology in Statistical Information Integration Tasks: What’s the Problem?

5Open Forum 2003 on Metadata Registries

What kinds of problems does terminology cause for users?

Miss

Collision

Categorization

User Agency

Term

Term

??

??

Termuser Termagency

Termuser category Termagency category

Termuser TermuserTermuser Termuser

Page 6: Terminology in Statistical Information Integration Tasks: What’s the Problem?

6Open Forum 2003 on Metadata Registries

What kinds of problems does terminology cause for users?

Misses There is no agency term or concept that is linked

to a term or concept that the user is interested in or

The user encounters a term on the system with which she is unfamiliar or about which she has only a vague understanding.

Examples: Seasonal adjustment Consumption vs. production Farm profits vs. market value of agricultural products

Page 7: Terminology in Statistical Information Integration Tasks: What’s the Problem?

7Open Forum 2003 on Metadata Registries

What kinds of problems does terminology cause for users?

Collisions A user has an understanding of a concept that is different from the way the

concept is expressed by the agency. The same term is used differently by different agencies, making integration

of data difficult. Can also apply to clusters of terms where it is not clear what the distinction

between them is. Examples

Labor, labor force, labor supply, workforce, labor force participation rate, labor market

Full-time employment Sector

Categorization – when category groupings do not make sense to the user. Example

Soybeans

Page 8: Terminology in Statistical Information Integration Tasks: What’s the Problem?

8Open Forum 2003 on Metadata Registries

Data Collection

In previous work: Transaction logs User queries Interviews

In the first phase of current study, interviews with agency and non-agency domain experts

These sources of evidence yielded categories of terms that can cause difficulty.

Page 9: Terminology in Statistical Information Integration Tasks: What’s the Problem?

9Open Forum 2003 on Metadata Registries

Categories of Terms

Statistical terms Date/currency/time Geography Domain terms User terms

Page 10: Terminology in Statistical Information Integration Tasks: What’s the Problem?

10Open Forum 2003 on Metadata Registries

Implications for Vocabulary Support Tools

Goals: Provide a basic level of statistical literacy Not intended to be a highly technical, or comprehensive

resource Include terms users frequently encounter while browsing

statistical agency sites Sources of Evidence:

Terminology used on frequently visited pages Anecdotal evidence from agency and non-agency

consultants Metadata user study Web crawl of agency sites

Page 11: Terminology in Statistical Information Integration Tasks: What’s the Problem?

11Open Forum 2003 on Metadata Registries

Implications for Vocabulary Support Tools

We need to explore how we can use metadata to map between the user terms and the agency terms, and between terms as used by different agencies.

Users are not likely to browse the glossary as a distinct activity, so they need “just-in-time” vocabulary support.

Vocabulary support should allow users to remain in context, not lose sight of the task they are working on. Context specificity – explanations should be provided at varying

levels of specificity General (context-free or “universal”) Agency or context-specific (term as used by particular agency or within

particular domain) Table or statistic-specific (term as it relates to a particular row, column,

or statistic)

Page 12: Terminology in Statistical Information Integration Tasks: What’s the Problem?

12Open Forum 2003 on Metadata Registries

Implications for Vocabulary Support Tools

Provide explanations of term or concept that are as relevant to the user’s current context as possible.

The most specific explanations available should be offered at the time a user first invokes help.

If there are no explanations appropriate for a specific statistic, row, or column, offer an explanation one level up in generality.

Pathways from specific to general will be based on a statistical ontology currently under development.

The ontology will also be used to provide patterns (templates) for definitions at each level of specificity.

Page 13: Terminology in Statistical Information Integration Tasks: What’s the Problem?

13Open Forum 2003 on Metadata Registries

Vocabulary Support Tool Examples

The tools we are working on will provide a basic level of explanation of statistical terms.

Tools may include: Definitions Examples Brief tutorials Demonstrations Interactive simulations Pointers to related terms/concepts Pointers to more complete (or more technical) explanations

Page 14: Terminology in Statistical Information Integration Tasks: What’s the Problem?

14Open Forum 2003 on Metadata Registries

IndexAn index combines numbers measuring different things into a single number. The single number represents all the different measures in a compact, easy-to-use form. Values for an index can be compared to each other, for example, over time.

combiner

index = 12.3

10.1

103

24.759

6

42

Page 15: Terminology in Statistical Information Integration Tasks: What’s the Problem?

12

12.5

13

13.5

14

14.5

Jan Apr Jul Oct

Jan.combiner

Apr.combiner

Jul.combiner

Oct.combiner

12.3 13.1 13.9 14.3

The index has increased this year.

Page 16: Terminology in Statistical Information Integration Tasks: What’s the Problem?

16Open Forum 2003 on Metadata Registries

Consumer Price Index (CPI)

The Consumer Price Index (CPI) represents changes in prices of all goods and services produced for consumption by urban households. It combines prices into a single number that can be compared over time.

Items are classified into 8 major groups:•Food and Beverages•Housing•Apparel•Transportation•Medical Care•Recreation•Education and Communication•Other

Page 17: Terminology in Statistical Information Integration Tasks: What’s the Problem?

Consumer Price Index

medical careother

CPI combiner

transportationfood & beverage

apparel

recreation

housing

education & communication

Telephone

Page 18: Terminology in Statistical Information Integration Tasks: What’s the Problem?

The Consumer Price Index has increased since 1995.

1997 CPICombiner

1998 CPICombiner

1999 CPICombiner

2000 CPICombiner

2001 CPICombiner

160

165

170

175

180

1997 1998 1999 2000 2001

Page 19: Terminology in Statistical Information Integration Tasks: What’s the Problem?

19Open Forum 2003 on Metadata Registries

Antiknock Index, also known as Octane Rating

A number used to indicate gasoline’s antiknock performance in motor vehicle engines. The two recognized laboratory engine test methods for determining the antiknock rating, i.e., octane rating, of gasolines are the Research method and the Motor method. In the United States, to provide a single number as guidance to the consumer, the antiknock index (R+M)/2, which is the average of the Research and Motor octane numbers, was developed.

http://www.eia.doe.gov/glossary/glossary_a.htm

Page 20: Terminology in Statistical Information Integration Tasks: What’s the Problem?

Research method

Motor method

Antiknock Index, also known as Octane Rating

Regular:

85 - 88

Midrange:

88 - 90

Premium:

90 or above

(R + M)/2

AntiknockCombiner

Page 21: Terminology in Statistical Information Integration Tasks: What’s the Problem?

21Open Forum 2003 on Metadata Registries

Evaluation

What do we need to evaluate? Technical accuracy Usability of interface “Effectiveness”

Is it attractive enough to entice people to use it? Is it helpful? Is it informative? Does it help the user in completion of task?

How do we measure these things? What other kinds of vocabulary support issues do we

need to address?

Page 22: Terminology in Statistical Information Integration Tasks: What’s the Problem?

22Open Forum 2003 on Metadata Registries

Other Issues

Implementation Ongoing maintenance/responsibility

Page 23: Terminology in Statistical Information Integration Tasks: What’s the Problem?

23Open Forum 2003 on Metadata Registries

Project Teams

Metadata User Study Team Carol Hert Stephanie Haas Jenny Fry Lydia Harris Sheila Denn

Vocabulary Support Team Stephanie Haas Ron Brown Cristina Pattuelli Jesse Wilbur

GovStat PIs Gary Marchionini, UNC-CH Stephanie Haas, UNC-CH Carol Hert, Syracuse

Catherine Plaisant, UMd Ben Shneiderman, UMd

Page 24: Terminology in Statistical Information Integration Tasks: What’s the Problem?

24Open Forum 2003 on Metadata Registries

For More Information

Sheila O. Denn

School of Information and Library Science

University of North Carolina at Chapel Hill

[email protected]

http://ils.unc.edu/govstat/