View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Tefko Saracevic 1
EVALUATION in searching
RequirementsCriteria
[email protected]; http://comminfo.rutgers.edu/~tefko/
Central ideas
Evaluation is an integral part of searching
But there a number of:o contexts & approaches to
evaluationo requirements for evaluationo criteria used in evaluation
Tefko Saracevic 2
ToC
1. Importance, definitions2. Contexts & approaches3. Requirements for
evaluation4. Web evaluation
and some pretty pictures at the end
Tefko Saracevic 3
1. Importance, definitions
Place of evaluation
Tefko Saracevic 4
Tefko Saracevic 5
Definition of evaluation
Dictionary:1. assessment of value
the act of considering or examining something in order to judge its value, quality, importance, extent, or condition
In searching:assessment of search results on basis of
given criteria as related to users and usecriteria may be specified by users or derived
from professional practice, other sources or standards
Results are judged & with them the whole process, including searcher & searching
Tefko Saracevic 6
Importance of evaluation
Integral part of searchingo always there - wanted or not
no matter what user will in some way or other evaluate what obtained
o could be informal or formal
Growing problem for allo information explosion makes
finding “good” stuff very difficult
Formal evaluation part of professional job & skillso requires knowledge of evaluation
criteria, measures, methodso more & more prized
Tefko Saracevic 7
Place of evaluation
User
Inf. need
Evaluation
Search
Results
Tefko Saracevic 8
General application
Evaluation (as discussed here) is applicable to results from a variety of information systems:o information retrieval (IR) systems,
e.g. Dialog, Scopus …o sources included in digital libraries,
e.g. Rutgerso reference services e.g. in libraries
or commercial on the web o web sources e.g. as found on many
domain sites
Many approaches, criteria, measures, methods are similar & can be adapted for specific source or information system
2. Contexts & approaches
Broad orientation
Tefko Saracevic 9
Tefko Saracevic 10
Broad context
Evaluating the role that an information system plays as related to:
SOCIETY - community, culture, discipline ...
INSTITUTION - university, organization, company ...
INDIVIDUALS - users & potential users (nonusers)
Roles lead to broad, but hard questions as to what CONTEXT to choose for evaluation
Tefko Saracevic 11
Questions asked in different contexts
Social:o how well does an information
system support social demands & roles? hardest to evaluate
Institutional:o how well does it support
institutional/organizational mission & objectives? tied to objectives of institution also hard to evaluate
Individual:o how well does it support inf.
needs & activities of people? most evaluations in this context
Tefko Saracevic 12
Approaches to evaluation
Many approaches existo quantitative, qualitative …o effectiveness, efficiency ...o each has strong & weak points
Systems approach prevalento Effectiveness: How well does a
system perform that for which it was designed?
o Evaluation related to objective(s)o Requires choices:
Which objective, function to evaluate?
Tefko Saracevic 13
Approaches … (cont.)
Economics approach: o Efficiency: at what costs?o Effort, time also are costso Cost-effectiveness: cost for a
given level of effectiveness
Ethnographic approacho practices, effects within an
organization, communityo learning & using practices &
comparisons
Tefko Saracevic 14
Prevalent approach
System approach used in many different ways & purposes – in evaluation of:o inputs to system & contentso operations of a systemo use of a systemo outputs from a system
Also, in evaluation of search outputs for given user(s) and useo applied on the individual level
derived from assessments from users or their surrogates, e.g. searchers
o this is what searchers do most often
o this is what you will apply in your projects
3. Requirements for evaluation
No evaluation without them
Tefko Saracevic 15
Tefko Saracevic 16
Five basic requirements for system evaluation
Once a context is selected need to specify ALL five:
1. Constructo A system, process, source
a given IR system, web site, digital library ...what are you going to evaluate?
2. Criteria o to reflect objective(s) of searching
e.g. relevance, utility, satisfaction, accuracy, completeness, time, costs …
on basis of what will you make judgments?
3. Measure(s) o to reflect criteria in some quantity or
qualityprecision, recall, various Likert scales, $$$ ...how are you going to express judgment?
Tefko Saracevic 17
Requirements … (cont.)
4. Measuring instrumento recording by users or user surrogates
(e.g. you) on the measureexpressing if relevant or not, marking a
scale, indicating costpeople are instruments – who will it be?
5. Methodology o procedures for collecting & analyzing
datahow are you going to get all this done? Assemble the stuff to evaluate
(construct)? Choose what criteria? Determine what measures to use to reflect the criteria? Establish who will judge and how will the judgment be done? How will you analyze results? Verify validity and reliability?
Tefko Saracevic 18
Requirements … (cont.)
Ironclad rule: No evaluation can proceed if
not ALL five of these are specified!
Sometimes specification on some are informal & implied, but they are always there!
Tefko Saracevic 19
1. Constructs
In IR research: most done on test collections & test questionso Text Retrieval Conference - TREC
evaluation of algorithms, interactions reported in research literature
In practice: on use & user level: mostly done on operational collections & systems, web siteso e.g. Dialog, LexisNexis, various files
evaluation, comparison of various contents, procedures, commands,
user proficiencies, characteristics evaluation of interactions reported in professional literature
Tefko Saracevic 20
2. Criteria
In IR: Relevance basic & most used criterion o related to the problem at hand
On user & use level: many othero utility, satisfaction, success, time,
value, impact, ...
Web sourceso those + quality, usability,
penetration, accessibility ...
Digital libraries, web siteso those + usability
Tefko Saracevic 21
2. Criteria - relevance
Relevance as criterion (as mentioned)
o strengths: intuitively understood, people know what
it means universally applied in information systems
o weaknesses: not static - changes dynamically, thus hard
to pin down tied to cognitive structure & situation of a
user – possible disagreements
Relevance as area of study basic notion in information science many studies done about various aspects of
relevance
Number of relevance types existo indication of different relations
had to be specified which ones
Tefko Saracevic 22
2. Criteria - usability
Increasingly used for web sites & digital libraries
General definition (ISO)“extent to which a product can be
used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use”
Number of criteriao enhancing user performanceo ease of operationso serving the intended purposeo learnability – how easy to learn,
memorize?o losstness – how often got lost in using
it?o satisfactiono and quite a few more
Tefko Saracevic 23
3. Measures
in IR: Precision & recall preferred (treated in unit 4)o based on relevanceo could be two or more dimensions
e.g. relevant–not relevant; relevant–partially relevant–not relevant
Problem with recallo how to find what's relevant in a file?
e.g. estimate; broad & narrow searching or union of many outputs then comparison
On use & user levelo Likert scales - semantic differentials
e.g. satisfaction on a scale of 1 to x (1=not satisfied, x=satisfied)
o observational measures e.g. overlap, consistency
Tefko Saracevic 24
4.Instruments
People used as instrumentso they judge relevance, scale ...
But people who?o users, surrogates, analysts,
domain experts, librarians ...
How do relevance, utility ... judges effect results?o who knows?
Reliability of judgments:o about 50 - 60% for experts
Tefko Saracevic 25
5. Methods
Includes design, procedures for observations, experiments, analysis of results
Challenges: o Validity? Reliability? Reality?
Collection - selection? size? Request - generation? Searching - conduct? Results - obtaining? judging?
feedback? Analysis - conduct? tools? Interpretation - warranted?
generalizable?
4. Web evaluationCriteria
Tefko Saracevic 26
Tefko Saracevic 27
Evaluation of web sources
Web is value neutral o it has everything from diamonds to
trash
Thus evaluation becomes imperativeo and a primary obligation & skill of
professional searchers – youo continues & expands on evaluation
standards & skills in library tradition
A number of criteria are usedo most derived from traditional criteria,
but modified for the web, others added
o could be found on many library sites librarians provide the public and
colleagues with web evaluation tools and guidelines as part of their services
Tefko Saracevic 28
Criteria for evaluation of web & Dlib sources
What? Contento What subject(s), topic(s) covered? o Level? Depth? Exhaustively?
Specificity? Organization?o Timeliness of content? Up-to-date?
Revisions? o Accuracy?
Why? Intentiono Purpose? Scope? Viewpoint?
For? Users, useo Intended audience? o What need satisfied? o Use intended or possible?o How appropriate?
Tefko Saracevic 29
criteria ...
Who done it? Authorityo Author(s), institution, company,
publisher, creator: What authority? Reputation? Credibility?
Trustworthiness? Refereeing? Persistence? Will it be around? Is it transparent who done it?
How? Treatmento Content treatment:
Readability? Style? Organization? Clarity?
o Physical treatment: Format? Layout? Legibility? Visualization?
o Usability
Where? Accesso How available? Accessible?
Restrictions?o Links persistence, stability?
Tefko Saracevic 30
criteria ...
How? Functionalityo Searching, navigation, browsing?o Feedback? Links?o Output: Organization? Features?
Variations? Control?
How much? Effort, economicso Time, effort in learning it?o Time, effort in using ito Price? Total costs? Cost-benefits?
In comparison to? Wider worldo Other similar sources?
where & how similar or better results may be obtained?
how do they compare?
Tefko Saracevic 31
Intentionpurposescope
viewpoint…
Authorityreputationcredibility“About us”
…
Treatmentcontentlayout
visualization…
Accessavailabilitypersistence
links…
Effortin using it
in learning ittime, cost
…
Functionalitynavigationfeaturesoutput
…
Users, useaudience
needappropriateness
…
Content coverageaccuracytimeliness
…
Quality
Main criteria for web site evaluation
Tefko Saracevic 32
Evaluation:To what end?
To asses & then improve performance – MAIN POINTo to change searches & search results
for betterTo understand what went on
o what went right, what wrong, what works, what doesn't & then change
To communicate with usero explain & get feedback
To gather data for best practiceso conversely: eliminate or reduce bad
onesTo keep your job
o even more: to advanceTo get satisfaction from job
well done
Tefko Saracevic 33
Conclusions
Evaluation is a complex task o but also an essential part of being
an information professional
Traditional approaches & criteria still applyo but new ones added or adapted to
satisfy new sources, & new methods of access & use
Evaluation skills are in growing demand particularly because web is value neutral
Great professional skill to sell!
Tefko Saracevic 34
Evaluation perspectives - Rockwell
Tefko Saracevic 35
Evaluation perspectives
Tefko Saracevic 36
Evaluation perspective …
Tefko Saracevic 37
Possible rewards*
* but don’t bet on it!