Tefko Saracevic1 EVALUATION in searching Requirements Criteria [email protected]@rutgers.edu; tefko/ tefko

Tefko Saracevic 1

EVALUATION in searching

RequirementsCriteria

[email protected]; http://comminfo.rutgers.edu/~tefko/

mailto:[email protected]

http://comminfo.rutgers.edu/~tefko/

Central ideas

Evaluation is an integral part of searching

But there a number of:o contexts & approaches to

evaluationo requirements for evaluationo criteria used in evaluation

Tefko Saracevic 2

ToC

1. Importance, definitions2. Contexts & approaches3. Requirements for

evaluation4. Web evaluation

and some pretty pictures at the end

Tefko Saracevic 3

1. Importance, definitions

Place of evaluation

Tefko Saracevic 4

Tefko Saracevic 5

Definition of evaluation

Dictionary:1. assessment of value

the act of considering or examining something in order to judge its value, quality, importance, extent, or condition

In searching:assessment of search results on basis of

given criteria as related to users and usecriteria may be specified by users or derived

from professional practice, other sources or standards

Results are judged & with them the whole process, including searcher & searching

Tefko Saracevic 6

Importance of evaluation

Integral part of searchingo always there - wanted or not

no matter what user will in some way or other evaluate what obtained

o could be informal or formal

Growing problem for allo information explosion makes

finding “good” stuff very difficult

Formal evaluation part of professional job & skillso requires knowledge of evaluation

criteria, measures, methodso more & more prized

Tefko Saracevic 7

Place of evaluation

User

Inf. need

Evaluation

Search

Results

Tefko Saracevic 8

General application

Evaluation (as discussed here) is applicable to results from a variety of information systems:o information retrieval (IR) systems,

e.g. Dialog, Scopus …o sources included in digital libraries,

e.g. Rutgerso reference services e.g. in libraries

or commercial on the web o web sources e.g. as found on many

domain sites

Many approaches, criteria, measures, methods are similar & can be adapted for specific source or information system

2. Contexts & approaches

Broad orientation

Tefko Saracevic 9

Tefko Saracevic 10

Broad context

Evaluating the role that an information system plays as related to:

SOCIETY - community, culture, discipline ...

INSTITUTION - university, organization, company ...

INDIVIDUALS - users & potential users (nonusers)

Roles lead to broad, but hard questions as to what CONTEXT to choose for evaluation

Tefko Saracevic 11

Questions asked in different contexts

Social:o how well does an information

system support social demands & roles? hardest to evaluate

Institutional:o how well does it support

institutional/organizational mission & objectives? tied to objectives of institution also hard to evaluate

Individual:o how well does it support inf.

needs & activities of people? most evaluations in this context

Tefko Saracevic 12

Approaches to evaluation

Many approaches existo quantitative, qualitative …o effectiveness, efficiency ...o each has strong & weak points

Systems approach prevalento Effectiveness: How well does a

system perform that for which it was designed?

o Evaluation related to objective(s)o Requires choices:

Which objective, function to evaluate?

Tefko Saracevic 13

Approaches … (cont.)

Economics approach: o Efficiency: at what costs?o Effort, time also are costso Cost-effectiveness: cost for a

given level of effectiveness

Ethnographic approacho practices, effects within an

organization, communityo learning & using practices &

comparisons

Tefko Saracevic 14

Prevalent approach

System approach used in many different ways & purposes – in evaluation of:o inputs to system & contentso operations of a systemo use of a systemo outputs from a system

Also, in evaluation of search outputs for given user(s) and useo applied on the individual level

derived from assessments from users or their surrogates, e.g. searchers

o this is what searchers do most often

o this is what you will apply in your projects

3. Requirements for evaluation

No evaluation without them

Tefko Saracevic 15

Tefko Saracevic 16

Five basic requirements for system evaluation

Once a context is selected need to specify ALL five:

1. Constructo A system, process, source

a given IR system, web site, digital library ...what are you going to evaluate?

2. Criteria o to reflect objective(s) of searching

e.g. relevance, utility, satisfaction, accuracy, completeness, time, costs …

on basis of what will you make judgments?

3. Measure(s) o to reflect criteria in some quantity or

qualityprecision, recall, various Likert scales, $$$ ...how are you going to express judgment?

Tefko Saracevic 17

Requirements … (cont.)

4. Measuring instrumento recording by users or user surrogates

(e.g. you) on the measureexpressing if relevant or not, marking a

scale, indicating costpeople are instruments – who will it be?

5. Methodology o procedures for collecting & analyzing

datahow are you going to get all this done? Assemble the stuff to evaluate

(construct)? Choose what criteria? Determine what measures to use to reflect the criteria? Establish who will judge and how will the judgment be done? How will you analyze results? Verify validity and reliability?

Tefko Saracevic 18

Requirements … (cont.)

Ironclad rule: No evaluation can proceed if

not ALL five of these are specified!

Sometimes specification on some are informal & implied, but they are always there!

Tefko Saracevic 19

1. Constructs

In IR research: most done on test collections & test questionso Text Retrieval Conference - TREC

evaluation of algorithms, interactions reported in research literature

In practice: on use & user level: mostly done on operational collections & systems, web siteso e.g. Dialog, LexisNexis, various files

evaluation, comparison of various contents, procedures, commands,

user proficiencies, characteristics evaluation of interactions reported in professional literature

Tefko Saracevic 20

2. Criteria

In IR: Relevance basic & most used criterion o related to the problem at hand

On user & use level: many othero utility, satisfaction, success, time,

value, impact, ...

Web sourceso those + quality, usability,

penetration, accessibility ...

Digital libraries, web siteso those + usability

Tefko Saracevic 21

2. Criteria - relevance

Relevance as criterion (as mentioned)

o strengths: intuitively understood, people know what

it means universally applied in information systems

o weaknesses: not static - changes dynamically, thus hard

to pin down tied to cognitive structure & situation of a

user – possible disagreements

Relevance as area of study basic notion in information science many studies done about various aspects of

relevance

Number of relevance types existo indication of different relations

had to be specified which ones

Tefko Saracevic 22

2. Criteria - usability

Increasingly used for web sites & digital libraries

General definition (ISO)“extent to which a product can be

used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use”

Number of criteriao enhancing user performanceo ease of operationso serving the intended purposeo learnability – how easy to learn,

memorize?o losstness – how often got lost in using

it?o satisfactiono and quite a few more

Tefko Saracevic 23

3. Measures

in IR: Precision & recall preferred (treated in unit 4)o based on relevanceo could be two or more dimensions

e.g. relevant–not relevant; relevant–partially relevant–not relevant

Problem with recallo how to find what's relevant in a file?

e.g. estimate; broad & narrow searching or union of many outputs then comparison

On use & user levelo Likert scales - semantic differentials

e.g. satisfaction on a scale of 1 to x (1=not satisfied, x=satisfied)

o observational measures e.g. overlap, consistency

Tefko Saracevic 24

4.Instruments

People used as instrumentso they judge relevance, scale ...

But people who?o users, surrogates, analysts,

domain experts, librarians ...

How do relevance, utility ... judges effect results?o who knows?

Reliability of judgments:o about 50 - 60% for experts

Tefko Saracevic 25

5. Methods

Includes design, procedures for observations, experiments, analysis of results

Challenges: o Validity? Reliability? Reality?

Collection - selection? size? Request - generation? Searching - conduct? Results - obtaining? judging?

feedback? Analysis - conduct? tools? Interpretation - warranted?

generalizable?

4. Web evaluationCriteria

Tefko Saracevic 26

Tefko Saracevic 27

Evaluation of web sources

Web is value neutral o it has everything from diamonds to

trash

Thus evaluation becomes imperativeo and a primary obligation & skill of

professional searchers – youo continues & expands on evaluation

standards & skills in library tradition

A number of criteria are usedo most derived from traditional criteria,

but modified for the web, others added

o could be found on many library sites librarians provide the public and

colleagues with web evaluation tools and guidelines as part of their services

Tefko Saracevic 28

Criteria for evaluation of web & Dlib sources

What? Contento What subject(s), topic(s) covered? o Level? Depth? Exhaustively?

Specificity? Organization?o Timeliness of content? Up-to-date?

Revisions? o Accuracy?

Why? Intentiono Purpose? Scope? Viewpoint?

For? Users, useo Intended audience? o What need satisfied? o Use intended or possible?o How appropriate?

Tefko Saracevic 29

criteria ...

Who done it? Authorityo Author(s), institution, company,

publisher, creator: What authority? Reputation? Credibility?

Trustworthiness? Refereeing? Persistence? Will it be around? Is it transparent who done it?

How? Treatmento Content treatment:

Readability? Style? Organization? Clarity?

o Physical treatment: Format? Layout? Legibility? Visualization?

o Usability

Where? Accesso How available? Accessible?

Restrictions?o Links persistence, stability?

Tefko Saracevic 30

criteria ...

How? Functionalityo Searching, navigation, browsing?o Feedback? Links?o Output: Organization? Features?

Variations? Control?

How much? Effort, economicso Time, effort in learning it?o Time, effort in using ito Price? Total costs? Cost-benefits?

In comparison to? Wider worldo Other similar sources?

where & how similar or better results may be obtained?

how do they compare?

Tefko Saracevic 31

Intentionpurposescope

viewpoint…

Authorityreputationcredibility“About us”

…

Treatmentcontentlayout

visualization…

Accessavailabilitypersistence

links…

Effortin using it

in learning ittime, cost

…

Functionalitynavigationfeaturesoutput

…

Users, useaudience

needappropriateness

…

Content coverageaccuracytimeliness

…

Quality

Main criteria for web site evaluation

Tefko Saracevic 32

Evaluation:To what end?

To asses & then improve performance – MAIN POINTo to change searches & search results

for betterTo understand what went on

o what went right, what wrong, what works, what doesn't & then change

To communicate with usero explain & get feedback

To gather data for best practiceso conversely: eliminate or reduce bad

onesTo keep your job

o even more: to advanceTo get satisfaction from job

well done

Tefko Saracevic 33

Conclusions

Evaluation is a complex task o but also an essential part of being

an information professional

Traditional approaches & criteria still applyo but new ones added or adapted to

satisfy new sources, & new methods of access & use

Evaluation skills are in growing demand particularly because web is value neutral

Great professional skill to sell!

Tefko Saracevic 34

Evaluation perspectives - Rockwell

Tefko Saracevic 35

Evaluation perspectives

Tefko Saracevic 36

Evaluation perspective …

Tefko Saracevic 37

Possible rewards*

* but don’t bet on it!

Documents

Tefko Saracevic1 EVALUATION in searching Requirements Criteria [email protected]@rutgers.edu; tefko/ tefko