3
Approaches to Teaching & Learning Information Retrieval Sponsored by: SIGs ED, HCI, DL Efthimis N. Efthimiadis, Moderator Associate Professor, The Information School, University of Washington Suite 370 Mary Gates Hall, Box 352840 Seattle, WA 98195-2840, USA, Phone: (off.) 206-616-6077, (sch.) 206-685-9937; Fax. 206-616-3152 [email protected] Jamie Callan Associate Professor, Language Technologies Institute (School of Computer Science) and Heinz School of Public Policy and Management, Carnegie Mellon University 5000 Forbes Avenue 4502 Newell Simon Hall, LTI Pittsburgh, PA 15213-8213, Phone: 412-268-4525; Fax: 412-268-6298 [email protected] Ray R. Larson Associate Professor, School of Information University of California, Berkeley Berkeley, California 94720-4600, Phone: 510-642-6046 [email protected] Summary The explosion of the web has made search an integral part of our daily lives. We search for almost any conceivable topic. Web search engines have made search easily approachable to almost everyone. Yet, for information professionals it is more important than ever before to know “how search works” in order to be more effective in their work. Search Engines or Information Retrieval systems often appear to searchers as “black boxes.” There is some sort of magic that happens between typing some keywords in a query box and getting back results. This approach contributes to the development of inadequate conceptual models of search. The panel brings LIS and CS educators involved in teaching “information retrieval” to discuss experiential learning approaches to teaching IR. Efthimiadis (UW) will be

Approaches to teaching & learning information retrieval

Embed Size (px)

Citation preview

Page 1: Approaches to teaching & learning information retrieval

Approaches to Teaching & Learning Information Retrieval

Sponsored by: SIGs ED, HCI, DL

Efthimis N. Efthimiadis, ModeratorAssociate Professor, The Information School, University of Washington Suite 370

Mary Gates Hall, Box 352840 Seattle, WA 98195-2840, USA, Phone: (off.)

206-616-6077, (sch.) 206-685-9937; Fax. 206-616-3152

[email protected]

Jamie CallanAssociate Professor, Language Technologies Institute (School of Computer

Science) and Heinz School of Public Policy and Management, Carnegie Mellon

University 5000 Forbes Avenue 4502 Newell Simon Hall, LTI Pittsburgh, PA

15213-8213, Phone: 412-268-4525; Fax: 412-268-6298 [email protected]

Ray R. LarsonAssociate Professor, School of Information University of California, Berkeley

Berkeley, California 94720-4600, Phone: 510-642-6046 [email protected]

Summary

The explosion of the web has made search an integral part of our daily lives. Wesearch for almost any conceivable topic. Web search engines have made searcheasily approachable to almost everyone. Yet, for information professionals it is moreimportant than ever before to know “how search works” in order to be more effectivein their work. Search Engines or Information Retrieval systems often appear tosearchers as “black boxes.” There is some sort of magic that happens betweentyping some keywords in a query box and getting back results. This approachcontributes to the development of inadequate conceptual models of search.

The panel brings LIS and CS educators involved in teaching “information retrieval”to discuss experiential learning approaches to teaching IR. Efthimiadis (UW) will be

Page 2: Approaches to teaching & learning information retrieval

presenting the IR-Toolbox, an interactive system developed for teaching IRprocesses to Information School students. Ray Larson (UCB) will be discussing hisapproach of using open source IR engines to create a mini-TREC competitionenvironment in class. Jamie Callan (CMU) will be talking about the Lemur Toolkitsystem and its use in teaching undergraduate and graduate students.

Following the panel presentation of the experiential teaching methods, an interactivesession with the audience will follow. The discussion session will focus on theaudience’s needs and experiences while learning or teaching information retrieval.

IR-Toolbox (Efthimiadis)

The IR-Toolbox is an experiential teaching tool for learning about information retrieval (IR)

systems. Through hands on interaction, the IR-Toolbox helps students develop their conceptual

model of search engines by exploring, visualizing, and understanding IR processes and

algorithms without needing to program. In a sequential fashion, the IR-Toolbox presents the

following processing steps: a) Document analysis (e.g., tokenizers [letter, white-space,

grammar], stemmers [Porter, Krovetz], and a variety of stop lists), b) Indexing (e.g., ability to

browse the inverted file and extract statistics), c) Searching (e.g., ability to enter queries and

select weighing algorithms such as IDF, TF-IDF, OKAPI/BM25), d) Evaluation (e.g., evaluate

results using the TREC evaluation software (trec-eval) and associated TREC collections,

presenting recall-precision tables and graphs). The IR-Toolbox uses Lucene as its underlining

search engine. Students can interact with the IR-Toolbox at different levels of complexity on

individual or group exercises that help them understand the different IR processes and build a

more detailed conceptual model of search engines.

<http://irtoolbox.ischool.washington.edu >

Lemur (Callan)

The Lemur Toolkit has become a popular platform for doing a wide range of information

retrieval teaching and research. Lemur is an open-source toolkit, written in C++ that supports

several approaches to document indexing, most of the standard retrieval models, and a set of

applications that includes retrieval, clustering, cross-lingual IR, federated search (distributed

IR), and summarization. Indri, provides a structured query language which is used for

searching large indexes and allows Web server integration for users that just want a

high-quality search engine.

The talk will discuss the use of Lemur for class homework, and Lemur’s new educational

facilities to support undergraduate and graduate IR classes at institutions with minimal local

resources. Lemurproject.org provides web-based access to search engine indices that allows

students to immediately begin writing Java-based programs that implement basic ranking

functions. Their programs parse a query, use http Web page requests to retrieve inverted lists

Page 3: Approaches to teaching & learning information retrieval

from the CMU indexes, rank documents, and upload results to the CMU web-based trec-eval to

evaluate their results. Providing web-based access to search engine indices enables students

to begin doing interesting homework assignments in the third week of classes, and also

provides access to standard corpora (e.g., TREC corpora) that students might not have locally.

<http://www.cs.cmu.edu/~lemur/3.0/overview.html >

<http://www.lemurproject.org />

Cheshire II and Cheshire3 (Larson)

The Cheshire systems are experimental IR systems that have also been used in production

environments. They provide support for development of Digital Library (DL) retrieval

applications including support for MARC processing and the Z39.50 IR protocol. The also

support Boolean searching as well as a variety of ranking algorithms (including Logistic

Regression-Based Probabilistic ranking, Vector Space ranking, OKAPI BM-25, and a number of

other techniques). In addition the systems support Data fusion approaches to combining the

results of using the different algorithms into a single ranked set. Cheshire has been used by

students for "mini-TREC" evaluation competitions in IR courses at Berkeley for many years. It

is also used extensively in the UK for production DL systems.