14
Teaching Information Retrieval With Web-based Interactive Visualization Peter Brusilovsky School of Information Sciences, University of Pittsburgh, Pittsburgh, PA 15260. E-mail: [email protected] Jae-wook Ahn School of Information Sciences, University of Pittsburgh, Pittsburgh, PA 15260. E-mail: [email protected] Edie Rasmussen School of Library, Archival and Information Studies, University of British Columbia, Vancouver BC V6T 1Z1, Canada. E-mail: [email protected] Interactive visualization is a powerful educational tool, which has been used to en- hance the teaching of various subjects from computer science to chemistry to engi- neering. This paper describes the use of interactive visualization tools in the context of a graduate course in information retrieval, to demonstrate two well-known retrieval models, the Boolean model and the vector space model. The results of five classroom studies with these tools are reported. The impact of the tools on student learning, as well as student attitudes toward the tools, were investigated. The results of the class- room studies indicate that use of interactive visualization in a homework context can result in significant growth of knowledge. The majority of the students recognize the value of interactive visualization and recommend its use in the context of information retrieval courses. The study also demonstrated that visualization focusing on less known and harder to understand topics causes a larger growth of knowledge and is perceived as more useful. This result suggests placing higher priority on the develop- ment of visualization tools for harder to understand topics. Keywords: information retrieval, education, visualization, evaluation, user study Introduction I nteractive visualization is a powerful educational tool. Visualization can provide a clear visual metaphor for un- derstanding complicated concepts and uncovering the dynamics of important processes that are usually hidden from the student’s eye (Gordin & Pea, 1995). Visualization has been used to enhance the teaching of various subjects ranging from chemistry (Evans, Yaron, & Leinhardt, 2008) to biology (McClean, et al., 2005) to physics (Perkins, et al., 2006). Computer science is one of the most active application areas for educa- tional visualization research (Naps, et al., 2003). In computer and information science (CIS) education, visualization is used almost exclusively in program- ming and data structure courses. We can name dozens of papers devoted to visu- alization of program execution on sev- eral levels from machine-level languages (Butler & Brockman, 2001) to high-level languages (Domingue & Mulholland, 1998; Haajanen, et al., 1997; Levy, Ben-Ari, & Uronen, 2003; Tung, 1998) to algorithms and data structures (Hundhausen, Douglas, & Stasko, 2002; Rößling, Schüer, & Freisleben, 2000). Our claim is that In- J. of Education for Library and Information Science, Vol. 51, No. 3—(Summer) July 2010 ISSN: 0748-5786 ©2010 Association for Library and Information Science Education 187

Teaching Information Retrieval With Web-based Interactive

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Teaching Information Retrieval With Web-based Interactive

Teaching Information Retrieval With Web-basedInteractive Visualization

Peter BrusilovskySchool of Information Sciences, University of Pittsburgh, Pittsburgh, PA 15260.E-mail: [email protected] AhnSchool of Information Sciences, University of Pittsburgh, Pittsburgh, PA 15260.E-mail: [email protected] RasmussenSchool of Library, Archival and Information Studies, University of British Columbia,Vancouver BC V6T 1Z1, Canada. E-mail: [email protected]

Interactive visualization is a powerful educational tool, which has been used to en-hance the teaching of various subjects from computer science to chemistry to engi-neering. This paper describes the use of interactive visualization tools in the context ofa graduate course in information retrieval, to demonstrate two well-known retrievalmodels, the Boolean model and the vector space model. The results of five classroomstudies with these tools are reported. The impact of the tools on student learning, aswell as student attitudes toward the tools, were investigated. The results of the class-room studies indicate that use of interactive visualization in a homework context canresult in significant growth of knowledge. The majority of the students recognize thevalue of interactive visualization and recommend its use in the context of informationretrieval courses. The study also demonstrated that visualization focusing on lessknown and harder to understand topics causes a larger growth of knowledge and isperceived as more useful. This result suggests placing higher priority on the develop-ment of visualization tools for harder to understand topics.

Keywords: information retrieval, education, visualization, evaluation, user study

Introduction

Interactive visualization is a powerfuleducational tool. Visualization can

provide a clear visual metaphor for un-derstanding complicated concepts anduncovering the dynamics of importantprocesses that are usually hidden fromthe student’s eye (Gordin & Pea, 1995).Visualization has been used to enhancethe teaching of various subjects rangingfrom chemistry (Evans, Yaron, &Leinhardt, 2008) to biology (McClean,et al., 2005) to physics (Perkins, et al.,2006). Computer science is one of themost active application areas for educa-

tional visualization research (Naps, etal., 2003). In computer and informationscience (CIS) education, visualization isused almost exclusively in program-ming and data structure courses. We canname dozens of papers devoted to visu-alization of program execution on sev-eral levels from machine-levellanguages (Butler & Brockman, 2001) tohigh-level languages (Domingue &Mulholland, 1998; Haajanen, et al.,1997; Levy, Ben-Ari, & Uronen, 2003;Tung, 1998) to algorithms and datastructures (Hundhausen, Douglas, &Stasko, 2002; Rößling, Schüer, &Freisleben, 2000). Our claim is that In-

J. of Education for Library and Information Science, Vol. 51, No. 3—(Summer) July 2010ISSN: 0748-5786 ©2010 Association for Library and Information Science Education 187

Page 2: Teaching Information Retrieval With Web-based Interactive

formation Science courses could benefitfrom this powerful technology.

This paper presents our research on us-ing interactive visualization in thenon-traditional context of information re-trieval courses. Information retrieval hasbeen in the curriculum of many computer,information, and library science depart-ments for more than 30 years. With thematurity of the World Wide Web, infor-mation retrieval became an importantpractical subject. Elements of informationretrieval are now taught to students ofmany different specialties. We think thatinformation retrieval provides an interest-ing and important application area for ex-ploring the power of interactivevisualization. Over the last six years wedeveloped and made publicly available arange of Web-based interactive visualiza-tion tools supporting various conceptstaught in information retrieval courses.These tools were used for several years byfaculty members at the University of Pitts-burgh to teach both graduate and under-graduate information retrieval courses. Toevaluate the impact of our visualizationtools and to solicit student feedback, weran several classroom studies. The devel-oped set of tools is available on the projecthome page (http://ir.exp.sis.pitt.edu/res2/resources.php). The tools, which are run-ning on our servers, could be used by any-one interested in teaching or learninginformation retrieval. This paper summa-rizes the results of our multi-year effort.To demonstrate our ideas, we present twoexamples of these interactive Web-basedvisualization tools for information re-trieval. We also report the results of class-room studies evaluating them. Toconclude, we discuss the results and pros-pects for using interactive visualization inthe context of information retrievalcourses.

Interactive Visualization forInformation Retrieval

One of the secrets of the power of inter-

active visualization is its ability to un-cover and present in detail processes thatare typically hidden from students’ eyes.Being interactive, visualization pro-grams allow the students to explore theseprocesses step-by-step, with different pa-rameters, gaining as a result a deep under-standing of the processes and theconcepts behind them. Every field has itscritical processes and concepts, whichcould be better understood with the use ofvisualization.

The core of a traditional informationretrieval (IR) course is a set of models, al-gorithms and technologies for process-ing, storing and retrieving textualinformation. Traditional presentation ofthis core usually starts with several IRmodels (such as the Boolean, vector, andprobabilistic models and several varia-tions of them) and then follows by ex-plaining how the information isorganized and retrieved in each of thesemodels (Baeza-Yates & Ribeiro-Neto,1999; Korfhage, 1997). The process bywhich information is retrieved in differ-ent models is one of the hardest topics inan IR course for students to grasp, despitebeing formalized and well understood bythe IR research community. We have ob-served that even Boolean information re-trieval, the simplest of the models, isdifficult for many students. At the sametime, traditional educational tools—re-search or commercial IR systems—offerlittle educational help. The process of re-trieving information has several steps,from entering the query to matching thequery to the documents to prioritizing theresults. In an IR system (even an educa-tionally-oriented one) all these steps arehidden from a user: the only thing that auser can observe is the final results—alist of ordered documents. This is exactlya context that could benefit greatly fromthe use of interactive visualization andthis realization formed the starting pointfor our research (Brusilovsky, 2002).Over the first years of our project we de-veloped and explored interactive visual-

188 JOURNAL OF EDUCATION FOR LIBRARY AND INFORMATION SCIENCE

Page 3: Teaching Information Retrieval With Web-based Interactive

ization tools to visualize the process ofretrieving information in several knownmodels: Boolean, fuzzy, vector, and ex-tended Boolean (see Baeza-Yates &Ribeiro-Neto, 1999 and Korfhage, 1997for the description of these classic mod-els). Since these visualization programswere the first to be developed, we hadmore opportunities to improve them overthe years and to evaluate them in theclassroom. For the purpose of this paperwe chose two of these model visualiza-tion tools to demonstrate the ideas of in-teractive visualization in the context ofIR courses. The following subsectionspresent the most recent versions ofinteractive visualization programs for theBoolean and vector IR models. The nextsection reports the results of classroomstudies with these models.

Interactive Visualization of the BooleanInformation Retrieval Model

The Boolean IR model is the oldest andthe simplest of the IR models. In thismodel, a query is formed by a set of ele-mentary queries (usually keywords) con-nected by Boolean operators such asAND, OR, and NOT. The mechanism ofthis model is set theoretical. Every queryis associated with a set of matching docu-ments. For an elementary query such as akeyword, the set of matching documentsis simply all documents indexed by thiskeyword. To obtain the set of matchingdocuments for two queries connected bya Boolean operator, one has simply toperform the corresponding set operationon their matching sets (i.e., set intersec-tion for AND, complement for NOT,etc.). Thus in several steps, a matchingset for any complex Boolean query can befound.

While it all sounds quite simple andclear, we have found that many of our stu-dents have problems understanding howBoolean matching works. Our talks withstudents have indicated that one of thesources of their troubles is the failure to

perceive Boolean operators as operationson sets of matching documents. This is aknown problem for everyday users whoconfuse the informal use of AND, OR,and NOT in everyday language with theirformal meaning in set theory andBoolean logic. Surprisingly, we also dis-covered that some students with goodprogramming backgrounds (i.e., thosewho have routinely used Boolean opera-tors for writing conditional expressionsin their programs) still have problemstransferring their knowledge of theseoperators to the set theory context.

In developing an interactive visualiza-tion environment for the Boolean IRmodel we were trying to achieve twogoals: to provide a helpful visual meta-phor and to visualize the process ofBoolean matching step by step. Figure 1presents an interface for our environ-ment. The core of this interface is a set ofall documents visualized in a table (onedocument per row). For a sample docu-ment set in our system we choose text-book surrogates since this is the type ofdocument most familiar to informationscience students. The goal of this visualrepresentation is to help students to un-derstand the core principle of thismodel—every query is associated with aparticular subset of all documents. Show-ing the set of all documents on the screenmakes it easy to demonstrate differentsubsets of the whole set as sets of differ-ently colored rows of the table.

The students explore Boolean match-ing by writing and executing simpleBoolean queries: pairs of elementaryqueries (terms) connected by a singleBoolean operator (OR, AND), or a pair ofoperators (AND NOT), which simulateset difference. The results of an executedquery are shown through row and cellcoloring in the document table. The finalset of documents is visible as a set ofgreen rows (documents 2, 8, 12 in Figure1). This is what a standard Booleansearch engine would return. The visual-ization tool, however, attempts to show

189Teaching Information Retrieval With Web-based Interactive Visualization

Page 4: Teaching Information Retrieval With Web-based Interactive

more by decomposing the standard“black box” matching process into twoparts: (1) the process of matching an ele-mentary query to the set of the documentsand (2) the process of obtaining a new setfrom contributing sets by applying differ-ent Boolean operators. To achieve thisgoal, the tool extends the document tablewith three colored columns of cells. Thefirst and the second column show the re-sults of matching each of the documentsto the first and the second query termscorrespondingly. Documents matched toeach term are marked as true in the corre-sponding column. To make the set ofmatched documents more visible, docu-ments matched to the first term aremarked by a red cell background and doc-uments matched to the second term aremarked with a blue background. Thethird column shows the results of match-ing the whole query using truth valuesand a green color for matching docu-ments. The color-coding makes the twosteps of the Boolean query matching pro-cess more transparent. For example, the

student can see that only documents 2, 8,and 12, which matched to both elemen-tary terms, are included in the final set. Incontrast, documents 4, 10, and 11, whichmatched only to the first term and docu-ment 7, which matched only to the secondterm, are not included (see Figure 1).

Beyond the term-based search shownon Figure 1, the Boolean IR visualizationtool has several other functionalities. Inparticular, to help the student transfer theunderstanding of Boolean IR from clas-sic IR to the database context, we haveprovided a very similar exploration inter-face where elementary queries are con-structed not from keywords as in classicIR but from restrictions on various fieldsof a database record (i.e., year = 2000 andpublisher ! = “O’Reilly”). Technically,the tool is implemented as a Java servletworking on a dedicated server.

Interactive Visualization of VectorInformation Retrieval Model

The vector IR model is different from

Figure 1. Boolean Model Environment. Visualization of matching for a simple Boolean ANDquery. Documents matching the first elementary query are highlighted.

190 JOURNAL OF EDUCATION FOR LIBRARY AND INFORMATION SCIENCE

Page 5: Teaching Information Retrieval With Web-based Interactive

the Boolean model in many aspects.While the query in this model is also a setof terms (keywords), the terms can beweighted, stressing their relative impor-tance. To perform the matching process,the query is converted into a weightedvector of terms. Similarly, all documentsare represented as weighted vectors ofterms. The weight of a specific term in adocument vector represents the collec-tion-adjusted importance of this term inthe document content. The matching pro-cess is based on vector algebra. The goalof vector matching is to produce a rele-vance value for each document, which re-flects how similar this document is to thequery. The relevance value is producedby calculating the Euclidean or angulardistance between the query and docu-ment vectors. Once relevance values arecalculated, all documents can be orderedby their relevance to the query. Vectormatching produces a ranked list, not justa subset of documents as Booleanmatching does.

The hardest component of the vectormatching process to understand is the cal-culation of the relevance value. While thegeometrical nature of the relevance mea-sure (Euclidean or angular distance) isrelatively straightforward, the actual rel-evance values are produced by perform-ing many operations with components ofvery large vectors. Not surprisingly, thefinal results of these calculations aremuch less evident to students than the re-sults of Boolean operations with docu-ment sets. As we observed in theclassroom, it is frequently hard for stu-dents to understand why a specific docu-ment was ranked particularly high or lowin the resulting list.

As in the case of the Boolean visualiza-tion tool, the goal of the vector matchingvisualization was to uncover the steps ofthe matching process, which are hiddenfrom the users of vector-based searchsystems. While these systems never gofurther than displaying the final rele-vance value of retrieved documents, wewanted to show how this relevance is cal-culated in the vector model. To uncoverthis process, our tool offers a small col-lection of documents formed from just 7different terms (so that query and docu-ment vectors are very short). Figure 2shows a sample document in this collec-tion. The students explore vector match-ing by writing and executing simplevector queries, which are weighed sets ofterms. A new query can be formed step bystep by adding one term (selected fromthe same set of 7 terms) with its weight ata time. The example in Figure 3 shows aquery formed by two terms: term Dogwith weight 3 and term Fox with weight1. For simplicity we use integer weightswhen forming a query; however, after thequery is formed, its vector is normalized.

The results of the query execution areshown as two tables (on the right side ofFigure 4): one table presents rankingbased on Euclidean distance and the sec-ond presents ranking based on the cosinemeasure. The more relevant a documentis to the query, the smaller the distancemeasure and the larger the cosine mea-sure. To dig deeper, a student can click onany relevance value (shown in blue) andsee in a popup window (bottom of Figure4) how this value is calculated from thecomponents of the query and documentvectors. (The example in Figure 4 usesEuclidean distance.) The vectors for the

Figure 2. A sample document used by the vector model visualization tool.On Figure 4 this document is marked as D2.

191Teaching Information Retrieval With Web-based Interactive Visualization

Page 6: Teaching Information Retrieval With Web-based Interactive

query and documents are shown on theleft alongside the ranking results, so thestudents can see how the vector compo-nents are used in the calculations. Byclicking on a document number (shownin blue), students can view the content ofeach document in a popup window. Thusthe visualization tool makes all the tradi-tionally hidden steps of the vector match-ing process visible: from documents andquery to their vectors, from vectors torelevance values, and from relevancevalues to ranked lists.

In addition to the interactive simula-

tion presented above, the tool also offersa brief tutorial on vector matching. It isimplemented using a combination of Javaservlets and client-side Javascriptprograms.

Classroom Evaluation ofInteractive Visualization

Study Design

To examine whether our visualizationtools for teaching information retrievalare effective in a real educational pro-

Figure 3. The process of forming a query in the vector model visualization tool.

Figure 4. The results of query execution produced by the vector model visualization tool. Thepopup window at the bottom shows how the Euclidean distance value of 0.71 was produced fordocument D2 given the document and query vectors shown above.

192 JOURNAL OF EDUCATION FOR LIBRARY AND INFORMATION SCIENCE

Page 7: Teaching Information Retrieval With Web-based Interactive

cess, we conducted several studies withstudents of information retrieval courses.In each of these studies we evaluated sev-eral visualization tools. This section re-ports results of five classroom studiesperformed in the context of a graduatecourse, “Information Storage and Re-trieval,” offered annually at the School ofInformation Sciences, University ofPittsburgh. These studies were per-formed between Fall 2001 and Fall 2005.The Boolean and vector model visualiza-tion tools, which are the focus of this pa-per, were evaluated in each of the fivestudies. Table 1 shows the number of stu-dents involved. During 5 semesters, a to-tal of 95 students completed the study,with a mean of 19 students per class (SD =7.58).

All studies of our visualization toolshave the same multi-stage design pre-sented in Figure 5. The goal of this designwas to assess both objectively and sub-jectively whether the visualization toolsare actually working as useful aids to helpthe students better understand the under-lying concepts. As an objective measure,we used knowledge gain from pre-test topost-test. As a subjective measure, weused student answers to a questionnaireabout the systems and their features. Thestudies of different tools were performedduring different weeks of the semester. Ina given week, the students learned theconcepts of the subject (i.e., Boolean orvector model) during a lecture and the in-structor introduced the corresponding vi-sualization tool, encouraging them to

utilize it for their deeper understandingof the knowledge they acquired in theclassroom.

At the end of the lectures devoted to theBoolean and vector space models, thestudents took a pre-test and were givenhomework assignments which, amongother things, specifically required themto use the tools introduced in the lecturefor actively solving hands-on problems.The role of this homework was to engagethe students in active exploration of thevisualization tools in the context of prob-lem solving. Figure 6 shows an exampleof the homework on the vector spacemodel.

To evaluate the results of their learn-ing, the students took a post-test beforethe next lecture (exactly one week fromthe first lecture on the topic), and filled ina questionnaire about the tools and theirimpact. The post-test was identical to thepre-test. The questionnaire included 14questions, seven for each tool (Table 2).A five point Likert scale was used for thequestions ranging from 1 (StronglyAgree) to 5 (Strongly Disagree).

The role of the questionnaire was toevaluate student attitudes to the visual-ization tools and their features. However,the questionnaire alone was not sufficientfor a reliable evaluation. To make surethat the student’s positive (or negative)attitude is grounded in the student learn-ing experience, we had to check whetherstudent knowledge increased after usingthis tool, and examine the correlation be-tween knowledge gain and attitude.

Number of students 26 27 10 13 19 95

Table 1: Classroom Studies of Boolean and Vector Model Visualization Tools.

Figure 5. The classroom study procedure.

193Teaching Information Retrieval With Web-based Interactive Visualization

Page 8: Teaching Information Retrieval With Web-based Interactive

Knowledge increase (positive knowl-edge gain) after using the tool can sup-port the student’s positive attitude andprovide additional evidence in favor ofthe educational effectiveness of the tool.In contrast, the lack of knowledge in-crease (or knowledge decrease) wouldcause us to question the educationalvalue of the tools even in the presence ofpositive feedback. Note that the format ofa classroom study does not allow us to useknowledge gain alone as a reliable indi-cator of the effectiveness of the tools.While we attempted to minimize studentlearning from other sources by placing

the pre-test after the lecture presentationof the corresponding topic, we were notable to prevent students from using othersources of knowledge during their workon the homework assignment. Only acontrolled lab study could assure that theregistered knowledge gain was the resultof student work with the tools. Given thatour main goal was to explore the value ofthe tools in a real educational process, weaccepted the shortcomings of the knowl-edge gain measure and applied it as a sec-ondary indicator.

To calculate knowledge gain, whichmeasures an increase in the student’sknowledge of the topic, we used pre- andpost-test scores [Equation (1)]. Because ofthe minor differences in the number ofpre/post-test questions over the study se-mesters, we used normalized pre- andpost-test scores (both were normalized torange from 0 to 10) to calculate the knowl-edge gain. This allowed us to evaluate theeffect of the tools over all five semesters.

Knowledge Gain = Normalized PostTest Score – Normalized Pre Test Score

(1)

With two evaluation measures en-gaged, we can formulate two formal hy-potheses to assess whether ourWeb-based interactive visualizationtools are educationally effective.

Figure 6. Homework assignment example.

Q1 Before I used this simulation, I understood the Boolean/vector model very well.Q2 The visual nature of the system helped me to understand the Boolean/vector IR model

better.Q3 The interactive nature of this system (in contrast with a set of static pictures) is very

important for learning.Q4 The interface for the system was easy to understand.Q5 I think that in the context of the Information Retrieval course the system should

become one of the key course tools.Q6 I would recommend the system to a friend who is taking an IR course next semester.Q7 The system should be used in teaching information retrieval.

Table 2: Survey Questions for the Subjective Feedback Analysis.

194 JOURNAL OF EDUCATION FOR LIBRARY AND INFORMATION SCIENCE

Page 9: Teaching Information Retrieval With Web-based Interactive

H1: The student will feel that thevisual izat ion tools are helpful inunderstanding crucial concepts ofinformation retrieval.

More specifically,

H1-1: The subjects will answer positivelyto the questions asking about theeffectiveness of the visualization tools.

H2: Student knowledge will increase aftertheir work with the tools.

That is,

H2-1: The knowledge gain calculated as adifference between post- and pre-testscores will be positive.

The following subsections report theresults of the data analysis performed toevaluate these two hypotheses for each ofthe visualization tools.

Subjective Feedback Analysis

This section analyzes students’ subjec-tive feedback on the visualization tools.Seven questions per model were askedeach semester (some semesters also in-cluded one additional question). Stu-dents’ responses indicated a value from 1to 5, where 1 is the most positive answerto each question. The distribution of stu-dent answers is shown in Tables 3 and 4and Figures 7 and 8.

Q1 differs from the rest of the ques-tions since it is focused on the students’

Strongly Agree 0.297 0.317 0.422 0.391 0.125 0.234 0.250Agree 0.500 0.476 0.469 0.438 0.484 0.609 0.641Neutral 0.125 0.190 0.109 0.141 0.344 0.141 0.063Disagree 0.063 0.016 0.000 0.031 0.047 0.016 0.047Strongly Disagree 0.016 0.000 0.000 0.000 0.000 0.000 0.000

Table 3: User Feedback on the Boolean Model Tool.

195Teaching Information Retrieval With Web-based Interactive Visualization

Figure 7. User feedback for the the Boolean model visualization tool.

Page 10: Teaching Information Retrieval With Web-based Interactive

prior knowledge about the correspondinginformation retrieval model. As the datashows, about 80% of the subjects agreedor strongly agreed that they understoodthe Boolean model. In contrast, only 23%agreed or strongly agreed that they un-derstood the vector model. This result isconsistent with the objective analysis inthe previous section, where the averagepre-test score of the Boolean model was7.87 out of 10.0 whereas that of the vectormodel was only 4.58.

The next two questions (Q2 and Q3) as-sessed the usefulness of the visual and in-teractive nature of the tools in theclassroom. About 80% to 90% of the stu-dents gave positive answers (StronglyAgree or Agree) for both the Boolean andthe vector models. Almost no negativeanswers were submitted. Moreover, the

interactive nature of the system was val-ued even higher than its visual nature.This is strong evidence in favor ofinteractive visualization.

Question 4 asked how easy it was to un-derstand the interface of the visualizationtools. Overall, the students showed a highlevel of satisfaction with the Boolean toolinterface (83%) and slightly lower satis-faction with the interface of the vectormodel tool (72%). This indicates that thelatter interface may need furtherimprovement.

Questions 5 to 7 asked in differentwords about the importance and useful-ness of the tools in the context of an infor-mation retrieval course. Students’answers confirmed that the vector modeltool, which helped them with a less wellknown topic, was most valuable for them.

Strongly Agree 0.017 0.250 0.322 0.217 0.233 0.317 0.400Agree 0.217 0.567 0.610 0.500 0.583 0.617 0.500Neutral 0.367 0.150 0.051 0.267 0.150 0.067 0.067Disagree 0.267 0.017 0.000 0.017 0.017 0.000 0.017Strongly Disagree 0.133 0.017 0.017 0.000 0.017 0.000 0.017

Table 4: User Feedback on the Vector Space Model Tool.

196 JOURNAL OF EDUCATION FOR LIBRARY AND INFORMATION SCIENCE

Figure 8. User feedback for the vector space visualization tool.

Page 11: Teaching Information Retrieval With Web-based Interactive

Over 80% of the students agreed that thevector tool should become one of the keycourse tools and over 90% wanted to rec-ommend the system to students taking thecourse next semester. Correspondingnumbers for the Boolean tool (over 60%and over 80%) are lower stressing thatthis tool was seen as less valuable. Thismay be caused by the simplicity of theBoolean model, which was relativelywell understood by most the students be-fore the start of the course. Yet, the feed-back for both tools is overwhelminglypositive: both tools were considered im-portant and useful by a clear majority ofstudents. Moreover, about 90% of thestudents agreed or strongly agreed thatboth tools should be used in teaching in-formation retrieval courses.

Knowledge Gain Analysis

As described earlier, we measured thepre- and post-test scores for two educa-tional visualization tools for informationretrieval: one for the Boolean and one forthe vector model. Table shows the basicstatistics comparing the scores. We canfirst observe that there are increases be-tween pre- and post-test scores for bothmodels. Test scores for the Booleanmodel increased by 0.59 and scores forvector model increased by 1.80. Thesedifferences were statistically significant(paired t-test) for both the Boolean andvector model tools (p = 0.03 and p < 0.01respectively). It means there was actualknowledge gain after the students usedthe visualization tools.

It is interesting to observe that thestarting knowledge of the Boolean

model was relatively high. In this con-text it was a challenge for the tools tofurther increase the student knowledge.Indeed, the increase was quite moderate,although statistically significant. Incontrast, the starting knowledge of vec-tor model was relatively low—less than50%. In this context, the visualizationtools helped to achieve a more impres-sive knowledge gain, almost a 40% in-crease from the level of knowledgereported by the pre-test. This allows usto hypothesize that interactive visualiza-tion tools could be especially useful forteaching hard-to-understand topics,where regular educational means may beless effective.

It was also interesting that the standarddeviation of pre-test scores were just 2.38and 2.23 for the Boolean and vectormodel respectively. These values are rel-atively small, suggesting the starting lev-els of student knowledge within eachtopic were quite similar. Since this datawas collected over five semesters ofteaching the course, it could be used as areliable estimation of student startingknowledge when preparing educationalvisualizations and other pedagogicalmaterial for the course.

Subjective Feedback versusKnowledge Gain

In the previous sections, we saw thatthere was an increase in students’ knowl-edge after they took the information re-trieval classes, and that they supportedthe use of visualization tools in theclasses. That is, there was an objective ef-fect on their knowledge and the source of

Pre-test 7.87 2.38 4.58 2.23Post-test 8.46 2.17 6.38 2.78

Table 5: A Comparison of Test Scores for the Two Models.

197Teaching Information Retrieval With Web-based Interactive Visualization

Page 12: Teaching Information Retrieval With Web-based Interactive

the effect was assumed to be the visual-ization tools, based on their positive re-sponse to the questions asking themabout the usefulness of the tools. To con-firm this assumption, we examined therelationship between the objectiveknowledge gain and the subjective an-swer from the students. We chose ques-tions #2 and #3 from the questionnaires,which directly asked about the usefulnessof the visual (Q #2) and interactive nature(Q #3) of the tools. The answers to thequestions were compared with studentknowledge gain scores for the Booleanand the vector model separately. Figures9 and 10 show the results of thesecomparisons.

Here, three out of four graphs show apositive relationship between the subjec-tive answers of the students and their ac-tual knowledge gain. The students withhigher knowledge gain scores tend toagree more with the statement on the use-fulness of the visual nature of theBoolean tool (Figure 9 left). For the state-ment on the interactive nature of theBoolean tools, the effect is less pro-nounced and reversed (Figure 9 right). Interms of the vector visualization tools,the students with higher knowledge gainscores tend to be more positive about thevisual and the interactive nature of thetools, although the effect is weaker thanin the previous case (see Figure 10).

Figure 9. Comparison of subjective user feedback against knowledge gain for Boolean tools.

Figure 10. Comparison of subjective user feedback against knowledge gain for vector tools.

198 JOURNAL OF EDUCATION FOR LIBRARY AND INFORMATION SCIENCE

Page 13: Teaching Information Retrieval With Web-based Interactive

Overall, it can be observed that studentswho learned more (higher knowledgegain) reacted more positively to the visu-alization tools for both mod-els—Boolean and vector.

Discussion

This paper presents our experience indeveloping and using a set of interactivevisualization tools for teaching andlearning information retrieval. By usingthe power of interactive visualization,our tools allow a teacher to introduce im-portant models of information retrievalin a way that cannot be replicated by tra-ditional whiteboard and slides, and withalmost no preparation time (simply bydeciding which examples to show tocover the main set of ideas). The instruc-tor can easily accommodate very differ-ent audiences by adjusting the number ofexamples to show, the speed, and thegranularity of presentation.

Using these interactive visualizationsas learning tools further increases theirvalue. They let the students switch frompassive learning-by-reading to active andinteractive exploratory learning. By ex-ploring a number of different exampleswith interactive visualization tools theyget a chance to achieve a better under-standing of complex IR topics. The re-sults of our classroom studies indicatethat use of interactive visualization in ahomework context can result in signifi-cant growth of knowledge. The vast ma-jority of the students recognize the valueof interactive visualization and recom-mend its use in the context of informationretrieval courses. The study also demon-strated that visualization focusing on lessknown and harder to understand topicscauses a larger growth in knowledge andis perceived as more useful. This resultsuggests placing higher priority on thedevelopment of visualization tools forharder to understand topics

Our results stressed the importance ofdeveloping, collecting and sharing tools

for teaching information retrieval. This isconsistent with the culture of sharing inIR research. A number of groups main-tain Web pages of IR resources such assearch software, text processing utilities,evaluation packages and topical bibliog-raphies. We argue that similar collectionsof IR educational resources should be es-tablished and maintained. The Webmakes it possible for researchers and edu-cators to make their contributions to re-search and teaching available to others,making it much easier for others to imple-ment. We have contributed to this pro-cess by developing a range of interactivevisualization tools for teaching andlearning information retrieval. We havedemonstrated that these visualizationtools have value for students learning theconcepts behind specific information re-trieval models. We have made thesefreely available to any IR educators orstudents wishing to use them on our IR re-sources page (http://ir.exp.sis.pitt.edu/res2/resources.php) along with similareducational resources developed by otherteams. We welcome readers who wish totry these tools and hope that moreeducational repositories like ours will beestablished in the near future.

In our own future work we plan to con-tinue exploration of interactive visualiza-tion for information retrieval. We hope toincrease the number of visualization sys-tems available for use in information re-trieval courses and to perform additionalformal studies of these tools.

Acknowledgements

Most of the tools in the original suite oflearning environments were developedunder the guidance of the first author byMSIS students of the School of Informa-tion Sciences at the University of Pitts-burgh. Our School is fortunate to havemany bright students and the authorthanks them for their devotion and cre-ativity. Special thanks go to OlenaScherbinina, Archana Sulebele, and

199Teaching Information Retrieval With Web-based Interactive Visualization

Page 14: Teaching Information Retrieval With Web-based Interactive

Anand Banvasi, the developers ofBoolean and vector model visualizationtools presented in this paper.

References

Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Mod-ern information retrieval. New York, NY: ACMPress.

Brusilovsky, P. (2002, June 24–29, 2002).Web-based interactive visualization in an infor-mation retrieval course. Paper presented at theED-MEDIA’2002—World Conference on Edu-cational Multimedia, Hypermedia and Telecom-munications, Denver, CO.

Butler, J. E., & Brockman, J. B. (2001). AWeb-based learning tool that simulates a simplecomputer architecture. SIGCSE Bulletin—in-roads, 33(2), 47–50.

Domingue, J., & Mulholland, P. (1998). An effectiveWeb based software visualization learning envi-ronment. Journal of Visual Languages and Com-puting, 9(5), 485–508.

Evans, K., Yaron, D., & Leinhardt, G. (2008). Learn-ing stoichiometry: A comparison of text and mul-timedia formats. Chemistry Education: Researchand Practice, 9, 208–218.

Gordin, D. N., & Pea, R. D. (1995). Prospects for sci-entific visualization as an educational technology.Journal of the Learning Sciences, 4(3), 249–279.

Haajanen, J., Pesonius, M., Sutinen, E., Tarhio, J.,Teräsvirta, T., & Vanninen, P. (1997). Animation

of user algorithms on the Web. Paper presented atthe VL ‘97, IEEE Symposium on Visual Lan-guages.

Hundhausen, C. D., Douglas, S. A., & Stasko, J. T.(2002). A meta-study of algorithm visualizationeffectiveness. Journal of Visual Languages andComputing, 13(3), 259–290.

Korfhage, R. R. (1997). Information storage and re-trieval. New York, NY: Wiley.

Levy, R. B.-B., Ben-Ari, M., & Uronen, P. A.(2003). The Jeliot 2000 program animation sys-tem. Computers and Education, 40(1), 1–15.

McClean, P., Johnson, C., Rogers, R., Daniels, L.,Reber, J., Slator, B. M., et al. (2005). Molecularand cellular biology animations: Developmentand impact on student learning. Cell Biology Edu-cation, 4(2), 169–179.

Naps, T., Rößling, G., Anderson, J., Cooper, S.,Dann, W., Fleischer, R., et al. (2003). Evaluatingthe educational impact of visualization. ACMSIGCSE bulletin, 35(4), 124–136.

Perkins, K., Adams, W., Dubson, M., Finkelstein,N., Reid, S., Wieman, C., et al. (2006). PhET: In-teractive simulations for teaching and learningphysics. The Physics Teacher, 44(1), 18–23.

Rößling, G., Schüer, M., & Freisleben, B. (2000).The ANIMAL algorithm animation tool. Paperpresented at the 5th annual SIGCSE/SIGCUEITiCSEconference on Innovation and technologyin computer science education, Helsinki, Finland.

Tung, S.-H. S. (1998). Visualizing Evaluation inScheme. Lisp and Symbolic Computation, 10(3),201–222.

200 JOURNAL OF EDUCATION FOR LIBRARY AND INFORMATION SCIENCE