Informing through User-Centered Exploratory Search and ...proceedings.informingscience.org/InSITE2008/IISITv... · Informing through User-Centered Exploratory Search & Human-Computer

Issues in Informing Science and Information Technology Volume 5, 2008

Informing through User-Centered Exploratory Search and Human-Computer Interaction

Strategies

Panagiotis Petratos California State University, Stanislaus, CA, USA

[email protected]

Abstract In this article the subject of Informing through user-centered Exploratory Search and Information Retrieval utilizing human-computer interaction strategies is analyzed. Exploratory Search is a new field that has sprung from the more general Information Retrieval. Informing Science is a trans-discipline which transcends a large variety of fields and seeks how to best inform all the clients of interest. One facet of Informing Science, the process of elucidating the best methods of informing inquiring clientele, is served by user-centered Exploratory Search and human-computer interaction strategies. This work explains a human factors method which allows the comparison of the performance of multiple IR systems and can enhance the comparative topic focused IR search quality. This human factors method also allows the human participants to provide their IR explicit feedback and record these judgments as a gold standard for future comparison. This hu-man factors method is tested by established statistical analysis and allows the statistical compari-son of the IR performance of a selection of IR systems. This work also demonstrates the results of this human factors method after testing it upon three leading IR systems, Google, Yahoo and Live Search.

Keywords: Information Retrieval Systems, Human Computer Interaction, Exploratory Search.

Introduction Information Retrieval is a well defined discipline with solid foundations in mathematics and other sciences. The tools utilized for Information Retrieval are created and developed from mathemati-cal equations and scientific methods gracefully employed from analysis, trigonometry, geometry, statistics and probabilities. Mathematics is not the only science which contributes elements to Information Retrieval. Computer Science, Information Science and Library Science also contrib-ute elements to Information Retrieval. The tools utilized for Information Retrieval are typically developed in conjunction with very powerful computers such as clusters and very large databases of corpora. Furthermore, with the proliferation of the Web the scope of Information Retrieval is

broadened to address ubiquitous sources of information, mobile computing de-vices and users, as well as multiple for-mats of information beyond documents, i.e. biological, musical, visual and vari-ous technological formats such as XML.

The basic aim of text retrieval, a more traditional sub-field of Information Re-trieval, is to match user queries to documents. In its’ purest form Informa-

Material published as part of this publication, either on-line or in print, is copyrighted by the Informing Science Institute. Permission to make digital or paper copy of part or all of these works for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage AND that copies 1) bear this notice in full and 2) give the full citation on the first page. It is per-missible to abstract these works so long as credit is given. To copy in all other cases or to republish or to post on a server or to redistribute to lists requires specific permission and payment of a fee. Contact [email protected] to request redistribution permission.

Informing through User-Centered Exploratory Search & Human-Computer Interaction Strategies

706

tion Retrieval serves a higher level objective, a more general aim, to assist the searcher in locating the information she seeks.

This general aim can be served by roughly dividing the work into two substantial tasks. The first task is assisting the searcher to express her information need in the most lucid way, clearly understandable for both the human user as well as the machine computer. The second task is to match the searcher’s clearly defined query to the most relevant information available. The second task is machine related and typically involves matching and retrieval algorithms resident in the information retrieval system internals.

The first task is user related and typically involves human computer interaction strategies to en-hance the information retrieval process. From this first user centered task a new research field has sprung called Exploratory Search (White, Drucker, Marchionini, Hearst, & Schraefel, 2007). The definition of Exploratory Search is elucidated by the IR methods which synthesize human com-puter interaction strategies to elicit and illuminate user search requests, semantic meanings, pref-erences, explicit and implicit relevance feedback to enhance information retrieval search quality.

Informing Science is an emerging trans-discipline which transcends a large variety of fields, from computer science, engineering, information systems, library science, social work, technology, communications, design, journalism in all its forms, to education. From a teleological point of view Informing Science researchers gracefully utilize information technology with epistemolo-gies drawn from all the aforementioned fields in order to best inform their clients (Cohen, 1999).

Also from a teleological point of view, one facet of Informing Science, the process of elucidating the best methods of informing inquiring clientele, is served by user-centered exploratory search and human-computer interaction strategies (Petratos, 2007).

Herein a comparative study is presented of three leading IR systems, Google, Yahoo and Live Search. A team of human subjects is selected according to diverse and balanced criteria. The hu-man factors method presented herein serves the IR search quality enhancement by providing a gold standard. A collective of human-computers is syllogistically designed to serve as a co-operating framework for the IR experiments. The tasks that are better suited to humans are as-signed to the participants and the tasks that can be automated are assigned to the machines. A se-ries of IR experiments is conducted to investigate whether there is overlap exact as well as partial among the selected IR systems, how it can be quantified, how it is distributed and also how search quality can be enhanced. The ensuing IR statistical results show that overlap exists among the selected three IR systems and demonstrate the comparative performance of these IR systems.

Exploratory Search Areas In this segment the research directions followed in the new field of Exploratory Search are de-scribed. In synopsis the areas of Exploratory Search are the following.

Web Retrieval, Exploratory Search Interfaces, Implicit and Explicit Relevance Feedback, Faceted Search Interfaces, Directed Search.

These areas of Exploratory Search are all connected with user-centered design and user-system interactions as well as are all related with modern aspects of information technology and Inform-ing Science. User-centered design and user-centered activities as well as web retrieval are the central themes which transcend all other Exploratory Search areas (White, Muresan, & Marchionini, 2006).

Petratos

707

Web Retrieval The web is becoming an increasingly important area of interest for information retrieval research-ers due to a plethora of challenges it presents. A few of the most important challenges include the dynamic nature of the web, the gradually increasing and diverse content of the web, the various heterogeneous technological formats of information on the web, as well as the progressively ris-ing number of users on the web (Petratos, 2006). Consequently these challenges provide a fertile ground for new approaches to the Information Retrieval process. For example, as the amount of information is increasing within a specific topic there is an increased need for clarification of the search instructions and guidance especially to inexperienced users in order to best utilize all the available search capabilities of the Information Retrieval System (Rodden, Ruthven, & White, 2007). In addition, inexperienced users will appreciate an intuitive interface which presents in separate rows a synopsis of the text as well as all the images along with their captions found in a document in a convenient standard-sized thumbnails array, see Figure 1. Also, even the experi-enced users will appreciate the more precise guidance and control provided by organizing all the available information into easy to understand categories, see Figure 3. Categories allow a presen-tation of a birds’ eye view of the information in an easy to view, organized and tidy arrangement of top level hierarchies giving the freedom to the user to drill down in a desired hierarchy reach-ing the contained document synopses and arrays of standard-sized small icons of all the images and their captions. Hence, the focus of Information Retrieval researchers is increasingly concen-trated on finding new methods for enhancing Web Retrieval.

Exploratory Search Interfaces Exploratory Search Interfaces are well suited for users who frequently embark on web search ex-ploration. Experienced users may embark on web search exploration for new knowledge.

Figure 1. Exploratory Search Interfaces may prove very helpful for inexperienced users.


708

For instance, a good analogy in the traditional paper world is a scenario in which a book reader browses through a volume of an encyclopedia to discover new knowledge on a specific topic which falls under a broader conceptual area.

An illustrative paradigm is a scenario of a museum sight-seeing tour where the user seeks in the museum collections of works of art broader conceptual area for previously unseen paintings with Greek mythology themes by artists of British origin.

Exploratory Search Interfaces are also an ideal match for the inexperienced user who often does not know what to seek for and requires guidance during her exploration for new information. For instance, a user who seeks to find out if a specific symptom may be associated with a condition, what are other related known symptoms to this condition, and what are the possible therapies, if any available.

Figure 2. Inexperienced user guidance through associated key-phrases and search refinement.

An illustrative example is a scenario of a user who seeks for blurriness of the vision and the In-formation Retrieval System also retrieves a set of associated key-phrases found in related docu-ments of the answer set (Figure 2).

The associated key-phrases may include other symptoms such as diplopia, double vision, speech ataxia, problems with organization and synchronization of speech and movement ataxia, loss of coordination, which may be related with the initial user query.

The previously unseen symptoms are presented to the user. If the user who possibly may have experienced one or more of the previously unseen symptoms selects and includes them in a new search the information retrieval results may improve and possibly present to the user an associ-ated condition.

Although this search exploration may produce useful information, truly there is no automated, computerized panacea to replace the expert diagnosis provided by an experienced Medical Doc-tor. This simple information retrieval paradigm should only be taken as an illustrative example of a preliminary first step to inform the client.

Implicit and Explicit Relevance Feedback Information Retrieval has been receiving the benefits from relevance feedback innovations for more than three and a half decades (Rocchio, 1971). Early relevance feedback methods have been relying on explicit responses from users in order to simply perform query expansion by including additional search terms to the initially issued query.

Petratos

709

As information technology progressed more powerful computer systems became economically viable. These new more powerful computer systems gradually allowed more sophisticated schemes to be developed for eliciting relevance feedback from users.

For instance, users were put in a position of selecting multiple states by clicking on check boxes, list boxes, or data grids, selecting and marking sentences or paragraphs, reading and selecting synopses of documents, answering detailed questions about user preferences in order to create and save the individual profile of each user, etc. All these and more relevance feedback methods are listed under the Explicit Feedback category in Table 1.

More recently a new relevance feedback trend of more implicit, unobtrusive, inconspicuous and even stealth techniques is emerging. Under this new trend relevance feedback is not elicited in an explicit fashion by directly engaging the user in an activity which will take her away from her normal searching routine (Kelly & Teevan, 2003).

Instead the user is closely monitored by the system which tracks specific user activities in a cov-ert manner in order to rapidly analyze the collected data and reveal what are the likely relevant documents from observing her normal search behavior (White, Ruthven, & Jose, 2002).

For instance, users are monitored to record the time they take to read text, view a video, image or other non-readable object, listen to audio excerpt of a book or other acoustic file, record the mouse clicks, scrolling, and keystrokes on the keyboard, etc. (Kelly & Belkin, 2001). All these and more relevance feedback methods are listed under the Implicit Feedback category in Table 1.

Table 1: Relevance Feedback Methods.

Implicit Feedback Explicit Feedback

Time taken to Read, View, or Listen Select documents

Unprompted Selecting Specify keywords

Unprompted Marking Mark sentences, paragraphs

Creating, saving, or deleting a file Answer questions about user’s interests

Reading text, or Viewing video, images, or other non-readable objects, Eye tracking Answer questions to refine the initial search

Listening to audio books, music, or other acoustic files

Select a mutual exclusive state by clicking a radio but-ton, a spin button, or a choice from a combo box

Find a word or phrase in a page, document, book, issue query

Select multiple states by clicking on check boxes, list boxes, or data grids

Bookmarking, Scrolling Select a grade of a Likert scale by moving a slider bar

Key-strokes, type, edit, copy, paste, link, email, publish

Rate books, documents, synopses, images, or other non-readable objects

Printing Rank information retrieval results

A new research direction that is currently explored by Information Retrieval researchers is to de-duce what exactly the user is seeing on the computer in front of her by tracking her eye move-ment (Salojarvi, Puolamaki, & Kaski, 2005). For instance a user could be reading a specific text segment of the page displayed on the screen and hence that text segment would carry more weight than the unseen text.


710

Eye tracking may also be combined with mouse clicks in order to detect associations between the two user activities, which could be used as combined implicit feedback signals (Joachims, Granka, Pan, Hembrooke, & Gay, 2006).

Even in the case where there is a weak association between the two user activities, the click streams are flowing constantly and in the long term may yield more useful information if they are tracked and logged over a longer period of time. However, the technical difficulties with long term eye tracking are considerable as they require specialized dedicated hardware.

Faceted Search Interfaces As the number of authors and users on the Internet increase the quantities of available online in-formation experience an auxesis. As the amounts of available online information rise there is an increased need for clarity and succinct, pithy presentation of information to the user.

The total cost of ownership of a traditional computer information system is significantly reduced if the user interface is well designed and hence the users are more contented and more productive. Popular computer applications such as spreadsheets and document editors are frequently used by inexperienced users who benefit especially from intuitive and easy to understand graphical user interfaces.

Overall both experienced and inexperienced users benefit from clear, lucid, and easy to under-stand graphical user interfaces. In addition, the particular idiosyncrasies of online information retrieval systems aforementioned above create an increased need for clarity and unambiguous presentation of information. Hence, the essential characteristics of online information retrieval systems are instilled in a user interface design which is lucid, saphes, clear, ergonomic as well as laconic.

Figure 3. Faceted Search Interfaces present an immediate overview of the information and allow the freedom to drill down in easy to understand categories, as shown on the right

sourcing data from nobelprize.org on the left (nobelprize.org, 2007).

Facets are conceptual categories, which are created to organize the presentation of all the avail-able data from a large database into an easy to view concise set of conceptual groups. There are

Petratos

711

two types of facets, flat and hierarchical. Hierarchical facets contain multiple levels of items or-ganized in sub-categories, whereas flat facets contain only a single level of items.

For example, in Figure 3 the Facets are Gender, Country, Affiliation, Prize, and Year (Hearst, 2006). Under the Facet Gender two sub-categories are male and female, under the Facet Prize six sub-categories are chemistry, economics, literature, medicine, peace and physics. Faceted search interfaces allow the user fluid, flexible navigation, easy understanding and maintaining control of the search.

Directed Search Directed search is a search where the user employs the assistance of an information retrieval sys-tem because she desires to find out more specific or detailed information within a more general subject.

Figure 4. Inexperienced user guidance through key-phrases and search refinement.

An illustrative example is a scenario of an inexperienced user who seeks to find out more specific detailed information on something about multiple sclerosis (Figure 4). The information retrieval system accepts the user query and provides guidance to the user. The guidance is in the form of selected associated sub-topics which are presented as hyperlinks to the user.

According to which sub-topic the user will select the information retrieval system presents her a different answer set. Hence, if the user clicks on symptoms Answer Set A is shown, if she clicks


712

on diagnosis, tests Answer Set B is displayed, if she clicks on causes, risk factors Answer Set C is portrayed and if she clicks on treatment Answer Set D is presented.

Exploratory Search Experimentation The traditional evaluation methods for information retrieval systems are Precision and Recall which are very useful for understanding the effectiveness of information retrieval systems.

Precision = (Relevant Retrieved) / (Retrieved)

Recall = (Relevant Retrieved) / (Relevant)

Precision and Recall have been studied in depth and are well established and well documented methods of information retrieval evaluation. In addition to these evaluation methods, with the proliferation of online information retrieval systems, also known as search engines, more tools may be useful to understand efficiency, redundancy of information and avoid duplication of re-trieval results.

Statistical Data Analysis An experimental framework can be designed which supports the exploratory search paradigm. The experimental framework is designed to allow user participation, human computer interaction, whilst including three of the leading commercial search engines, Google, Yahoo and Live Search.

The experimental framework allows the participation of human subjects in information retrieval sessions with enhanced human computer interaction which allows them to provide explicit rele-vance feedback to the system. The users are researchers from the California State University, Sta-nislaus and have been selected in a diverse and balanced approach to capture a sample uniform representation of their information retrieval preferences and responses (Table 2).

Table 2: User Population

User ID Gender Age > 30 Native English

1 0 0 0

2 0 1 1

3 1 0 0

4 1 1 1

In a recently published article at MIT Technology Review entitled “The evolution of web search” by Norvig the director of research at Google the same method is outlined which is utilized for enhancement of IR accuracy and search quality assurance (Norvig, 2008). Specific queries are selected randomly whilst selected users are employed to examine and evaluate how good the Google IR results are. The users are external contractors who are employed to examine the Google IR results and offer their judgments which are recorded for comparison purposes as a gold standard.

The first part of the information retrieval experiment called for users to run a specific query against all three selected search engines and compare the results to gain a better understanding of the associated overlap. The query was Q1=“Shakespeare’s metaphor theme” and the tasks to be executed included identifying the exact as well as the partial matches of the documents returned in the corresponding answer sets.

Petratos

713

An exact match is referred to herein as the identical document which is found at the matching location indicated by the same Internet address written in the absolute path of the Uniform Re-source Locator. A partial match is referred to herein as a similar document which is located at a similar Internet address which includes at least the same domain name and may have a different relative path to the document. The granularity of how much different is the address and the doc-ument is beyond the scope of this study and can be the subject of future research by including fuzzy indicator controls to attribute a similarity percentage to documents 70%, 80%, 90%. Each answer set processed contained one hundred documents for a total of 300 documents. The overlap in the corresponding answer sets of various search engines may be easily detectable by presenting to the user an easy to understand exploratory search visual comparison.

Figure 5. Exploratory Search visual comparison of the results produced by various IR systems.

In Figure 5 the user views a small sample subset of the data from the first part of the information retrieval experiment in order to provide a visual proof of concept whilst averting feeling over-whelmed by the visual information overload of the very large ensuing data sets. The user selects three search engines and issues her query. The results are grouped by color for easy visual com-parison. The green dots represent results by Google, Search Engine A, the blue dots represent results by Yahoo, Search Engine B and the magenta dots represent results by Live Search, Search Engine C. The overlap of the results in the answer sets is clearly identifiable in the diagram, green-blue dots correspond to overlap A-B, Google-Yahoo, green-magenta dots correspond to overlap A-C, Google-Live Search and blue-magenta dots correspond to overlap B-C, Yahoo-Live Search. The overall overlap is clearly identifiable by the ellipse in the middle which is formed by blue-magenta-green dots.

Table 3: Associations of Q1 Overlap

Overlap G-Y G-LS Y-LS

Exact Co-location: 8 15 13

Partial Co-location: 6 4 7


714

The next part of the IR experiment is to process the complete results which are comprised of three large data sets which contain a total of 300 documents. In Table 3 the associations of overlap are shown, G-Y are the Google-Yahoo co-locations, G-LS are the Google-Live Search co-locations, Y-LS are the Yahoo-Live Search co-locations.

Figure 6. Exact Overlap of the results produced by three leading IR systems.

In Figure 6 all the exact Q1 overlap results are shown. Yahoo and Live Search exhibit a higher number of exact matches in the early IR stages of 1-20 documents whilst Google rises much slower, then accelerates and intersects Yahoo at the later IR stages of 60-80 documents. Live Search exhibits the largest number of exact matches overall compared to the other two IR sys-tems.

Table 4: Q1 Overlap by IR system

Overlap Google SE_A Yahoo SE_B Live Search SE_C

Exact Match: 22 20 27

Partial Match: 8 12 8

In Table 4 the overlap by IR system is shown. Live Search exhibits the largest number of exact matches followed by Google and then Yahoo. As far as partial matches are concerned Yahoo comes first followed by Google and Live Search which are tied at eight partial matches.

Petratos

715

Figure 7. Partial Overlap of the results produced by three leading IR systems.

In Figure 7 all the partial Q1 overlap results are shown. Yahoo and Live Search exhibit a higher number of partial matches in the early IR stages of 1-20 documents whilst Google rises much slower, then accelerates and intersects Live Search and Yahoo in the subsequent IR stages of 20-30 documents. Yahoo accelerates and exceeds the other two in the subsequent IR stages of 30-40 documents. Yahoo exhibits the largest number of partial matches overall compared to the other two IR systems.

Table 5: Possible Indicators of User Experimenta-tion Cost and Ease of Participation.

User Effort User Will

Navigating Effort1 Willingness to Explore1

Browsing Effort2 Willingness to Browse2

Feedback Effort1 Willingness to provide Feedback1

Cognitive Effort, time Willingness to Learn more

1. multiple categories, see Table 1

2. Read, View, Listen, see Table 1 Naturally when human subjects are involved in IR experimentation in order to examine, provide feedback or evaluate exploratory search systems the investigator should also consider the associ-ated cost which includes the required time and user effort to conduct the experiments (Ke-skustalo, Järvelin, & Pirkola, 2006). In Table 5 some of the possible indicators of cost evaluation of user experimentation are listed. User effort is a cost which may represent impediment to the experiments while user willingness is an advantage which may represent progress for the experi-ments.


716

Table 6: Comparative Ranking on Q1 by Experts and IR systems Expert Rank

Google SE_A Rank

Yahoo SE_B Rank

Live Search SE_C Rank

Documents

1 1 1 1 D1

2 2 4 7 D2

3 7 2 5 D3

4 4 3 3 D4

5 5 9 10 D5

6 6 10 6 D6

7 3 8 9 D7

8 8 5 8 D8

9 9 7 2 D9

10 10 6 4 D10

The next part of the IR experiment called for a comparative relevance ranking by human subjects and IR systems. In Table 6 the top ten documents corresponding to query Q1 are ranked accord-ing to the expert and also according to Google, Yahoo and Live Search. Notice that there are no ties in the ranks. This no-ties case is simpler than the tie-corrected case which follows subse-quently.

Table 7: Spearman rank correlation coefficient

Expert/Google SE_A Expert/Yahoo SE_B Expert/Live

Search SE_C

0.806060606 0.587878788 0.127272727

The Spearman rank correlation coefficient is computed and the individual correlations of the ex-pert to Google, Yahoo and Live Search are listed in Table 7. The first is a strong positive correla-tion, the second is a positive correlation and the third is a weak positive correlation.

Figure 8. Scatter plots for independent, non-tied ranking results of Expert/Google SE_A.

Petratos

717

The results are graphed and in Figure 8 we see a strong positive correlation between the two re-sult sets of the expert and Google SE_A.

Figure 9. Scatter plots for independent, non-tied ranking results of Expert/Yahoo SE_B.

In Figure 9 we see a positive correlation between the two result sets of the expert and Yahoo SE_B. Notice that the data points now are more scattered than before.

Figure 10. Scatter plots for independent, non-tied ranking results of Expert/Live Search SE_C.

Additionally, in Figure 10 we see a weak positive correlation between the two result sets of the expert and Live Search SE_C. Notice that the data points now are even more scattered than the two previous cases.


718

Table 8: Comparative Ranking on Q1 with Document Weights

Google SE_A Rank; Document Weight

Yahoo SE_B Rank; Doc-ument Weight

Live Search SE_C Rank; Document Weight

Documents

1; 0.9 1; 0.9 1; 0.9 D1

2; 0.8 4; 0.6 7; 0.3 D2

7; 0.3 2; 0.8 5; 0.5 D3

4; 0.6 3; 0.7 3; 0.7 D4

5; 0.5 9; 0.1 10; 0.1 D5

6; 0.4 10; 0.1 6; 0.4 D6

3; 0.7 8; 0.2 9; 0.1 D7

8; 0.2 5; 0.5 8; 0.2 D8

9; 0.1 7; 0.3 2; 0.8 D9

10; 0.1 6; 0.4 4; 0.6 D10

The next stage is to take into account the more complex tie-corrected case. The tie-corrected case can occur by a couple of conditions. The first condition is if multiple experts assign the same weight to two or more documents. The second condition is if the computed document weights which are used to estimate the document rankings coincide for two different documents ensuing to a tie of ranks. In Table 8 the cells colored blue green and red correspond to tied ranks 9 and 10.

Table 9: Spearman tie-corrected rank correlation coefficient

Expert/Google SE_A Expert/Yahoo SE_B Expert/Live Search

SE_C

0.803030303 0.584848485 0.142424242

The Spearman tied-corrected rank correlation coefficient is computed and the individual correla-tions of the expert to Google, Yahoo and Live Search are listed in Table 9. The first is a strong positive correlation, the second is a positive correlation and the third is a weak positive correla-tion.

Notice that the correlations have slightly changed now with the tie-correction compared to before without it, see Table 7. The tie-correction change is significant in the weak positive correlation.

Petratos

719

Figure 11. Scatter plots for tie-corrected ranking results of Expert/Google SE_A.

The results are graphed and in Figure 11 we see a strong positive correlation between the two re-sult sets of the expert and Google SE_A.

Figure 12. Scatter plots for tie-corrected ranking results of Expert/Yahoo SE_B.

In Figure 12 we see a positive correlation between the two result sets of the expert and Yahoo SE_B. Notice that the data points now are more scattered than before.


720

Figure 13. Scatter plots for tie-corrected ranking results of Expert/Live Search SE_C.

Finally, in Figure 13 we see a weak positive correlation between the two result sets of the expert and Live Search SE_C. Notice that the data points now are even more scattered than the two pre-vious cases.

In Table 10 the overlap of exact matches is listed. Yahoo and Live Search exhibit a higher num-ber of exact matches in the early IR stages of 1-20 documents whilst Google rises much slower, then accelerates and intersects Yahoo at the later IR stages of 60-80 documents. Live Search ex-hibits the largest number of exact matches overall compared to the other two IR systems.

Table 10: Q1 Overlap of Exact Matches

User ID Documents Google SE_A Yahoo SE_B Live Search SE_C

1 1 0 0 0

1 2 0 1 0

1 3 1 0 1

1 4 0 2 0

1 5 0 0 0

1 6 0 3 2

1 7 0 4 0

1 8 0 0 0

1 9 0 0 3

1 10 0 5 4

1 11 0 6 0

1 12 0 0 5

1 13 0 7 6

1 14 2 0 0

1 15 0 0 7

1 16 0 0 8

Petratos

721



1 17 0 0 9

1 18 3 0 0

1 19 0 8 10

1 20 0 0 0

1 21 0 9 0

1 22 4 10 11

1 23 0 0 12

1 24 0 0 0

1 25 0 11 0

2 26 5 0 0

2 27 0 12 0

2 28 0 0 0

2 29 0 0 0

2 30 0 0 0

2 31 0 0 0

2 32 0 0 0

2 33 0 0 0

2 34 0 13 13

2 35 6 0 0

2 36 0 14 0

2 37 0 15 0

2 38 0 0 0

2 39 0 0 0

2 40 0 0 0

2 41 0 0 0

2 42 0 0 0

2 43 7 0 0

2 44 0 0 0

2 45 0 0 0

2 46 0 0 0

2 47 8 0 14

2 48 9 16 15

2 49 0 0 0

2 50 0 0 0

3 51 0 0 16

3 52 0 0 0

3 53 0 0 17

3 54 10 0 0

3 55 0 0 0


722



3 56 0 0 0

3 57 11 0 0

3 58 0 0 18

3 59 0 0 0

3 60 12 0 19

3 61 0 0 20

3 62 13 0 0

3 63 14 0 0

3 64 0 0 0

3 65 0 0 0

3 66 15 0 0

3 67 16 0 0

3 68 0 0 21

3 69 0 17 0

3 70 0 0 0

3 71 0 0 22

3 72 0 0 0

3 73 0 0 23

3 74 0 0 0

3 75 17 0 0

4 76 0 18 0

4 77 18 0 0

4 78 0 0 24

4 79 0 19 0

4 80 0 0 0

4 81 0 0 25

4 82 0 0 0

4 83 0 0 0

4 84 0 0 0

4 85 19 0 0

4 86 0 0 0

4 87 0 0 0

4 88 0 0 26

4 89 20 0 0

4 90 21 20 0

4 91 0 0 27

4 92 0 0 0

4 93 0 0 0

4 94 22 0 0

Petratos

723



4 95 0 0 0

4 96 0 0 0

4 97 0 0 0

4 98 0 0 0

4 99 0 0 0

4 100 0 0 0

In Table 11 the overlap of partial matches is listed. Yahoo and Live Search exhibit a higher num-ber of partial matches in the early IR stages of 1-20 documents whilst Google rises much slower, then accelerates and intersects Live Search and Yahoo in the subsequent IR stages of 20-30 doc-uments. Yahoo accelerates and exceeds the other two in the subsequent IR stages of 30-40 docu-ments. Yahoo exhibits the largest number of partial matches overall compared to the other two IR systems.

Table 11: Q1 Overlap of Partial Matches


1 1 0 1 0

1 2 0 0 1

1 3 0 2 0

1 4 0 0 0

1 5 0 3 0

1 6 0 0 0

1 7 1 0 2

1 8 0 0 0

1 9 0 0 0

1 10 0 0 0

1 11 0 0 0

1 12 0 0 0

1 13 2 0 0

1 14 0 0 0

1 15 0 0 0

1 16 0 0 0

1 17 0 0 0

1 18 0 4 3

1 19 3 0 0

1 20 4 0 0

1 21 0 0 0

1 22 0 0 0

1 23 0 0 0


724



1 24 5 0 0

1 25 0 0 0

2 26 0 5 0

2 27 0 0 0

2 28 0 0 0

2 29 0 0 0

2 30 0 6 0

2 31 0 0 4

2 32 0 0 0

2 33 0 0 0

2 34 0 0 0

2 35 0 7 0

2 36 0 0 0

2 37 0 0 0

2 38 0 0 0

2 39 0 0 0

2 40 0 8 0

2 41 0 0 0

2 42 0 0 0

2 43 0 0 0

2 44 0 0 0

2 45 0 0 0

2 46 0 0 0

2 47 0 0 0

2 48 0 0 0

2 49 0 0 0

2 50 0 0 0

3 51 0 0 0

3 52 0 0 0

3 53 0 0 0

3 54 0 0 0

3 55 0 0 0

3 56 0 0 0

3 57 0 0 0

3 58 0 0 0

3 59 0 0 5

3 60 0 0 0

3 61 0 0 0

3 62 0 0 6

Petratos

725



3 63 0 0 0

3 64 6 0 0

3 65 0 0 0

3 66 0 0 0

3 67 0 9 0

3 68 7 0 0

3 69 0 0 0

3 70 0 0 0

3 71 0 0 0

3 72 0 0 0

3 73 0 0 0

3 74 0 0 0

3 75 0 0 0

4 76 0 0 7

4 77 0 0 0

4 78 0 0 0

4 79 0 0 0

4 80 0 0 0

4 81 0 0 0

4 82 0 0 0

4 83 0 0 0

4 84 0 10 0

4 85 0 0 0

4 86 0 0 0

4 87 8 0 0

4 88 0 0 0

4 89 0 0 8

4 90 0 0 0

4 91 0 0 0

4 92 0 0 0

4 93 0 0 0

4 94 0 0 0

4 95 0 11 0

4 96 0 0 0

4 97 0 0 0

4 98 0 12 0

4 99 0 0 0

4 100 0 0 0


726

The purpose of this work is to offer new insights into the information retrieval process and illu-minate the issues which affect the principal client for whom the information is intended.

Informing through exploratory search utilizes alternative information retrieval strategies which involve and engage the human user who is the primary client of informing. Various methods of eliciting and utilizing user relevance feedback have been presented along with evaluation meth-ods more suitable for exploratory search interfaces.

In synopsis, the last five decades have been very fruitful for computing research generating great advances in the field of computing science. The current computer automation and computing power possible are orders of magnitude greater than what they were a few years ago. Still this work shows that human subjects can be very valuable especially for the enhancement of search quality of IR systems.

Conclusion In conclusion, an IR experimental framework which supports the involvement and participation of human subjects in the IR experiments has been presented and tested. Human computer interac-tion plays an important role in the examination of the documents, as well as providing the gold standard for comparison of IR rankings.

The comparative statistical results given by the Spearman correlations for distinct as well as tie-corrected IR rankings, favor Google, whilst Yahoo and Live Search follow in that order. From the results of the IR experiments it is clear that overlap exists among leading IR systems.

In the case of exact overlap Yahoo and Live Search exhibit a higher number of exact matches in the early IR stages of 1-20 documents whilst Google rises much slower, then accelerates and in-tersects Yahoo at the later IR stages of 60-80 documents. Yahoo exhibits the largest number of partial matches overall compared to the other two IR systems.

As far as partial overlap is concerned Yahoo and Live Search exhibit a higher number of exact matches in the early IR stages of 1-20 documents whilst Google rises much slower, then acceler-ates and intersects Live Search and Yahoo in the subsequent IR stages of 20-30 documents. Sub-sequently in partial overlap Yahoo accelerates and exceeds the other two in the subsequent IR stages of 30-40 documents. Yahoo exhibits the largest number of partial matches overall com-pared to the other two IR systems.

Future trends in these areas include work on similarities of corpuses, i.e. web sites and the effects of human factors involvement on multiple established IR similarity statistics such as the cosine, the overlap, Dice and Jaccard. Furthermore the topic-focused approach can be applied to multiple genres, themes, publications, authors as well as multiple information sources i.e. literary works, email, blogs, scientific journals, etc.

From the three tested IR systems, Google, Yahoo and Live Search the IR system that exhibits the highest number of exact overlap is Live Search whereas the IR system that exhibits the highest number of partial overlap is Yahoo.

References Cohen, E. (1999). Reconceptualizing information systems as a field of the transdiscipline informing sci-

ence: From ugly duckling to swan. Journal of Computing and Information Technology, 7(3), 213-219.

Hearst, M. (2006). Design recommendations for hierarchical faceted search interfaces. Proceedings of the ACM SIGIR 2006 Workshop on Faceted Search, ACM, New York.

Petratos

727

Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting click-through data as implicit feedback. Proceedings of the ACM SIGIR 2005 International Conference, ACM, New York, 154-161.

Kelly, D., & Belkin, N. (2001). Reading time, scrolling and interaction: Exploring implicit sources of user preferences for relevance feedback. Proceedings of the ACM SIGIR 2001 International Conference on Research and Development in Information Retrieval, ACM, New York, 408 - 409.

Kelly, D., & Teevan, J. (2003). Implicit feedback for inferring user preference: A bibliography. ACM SI-GIR Forum, 37(2), 18-28.

Keskustalo, H., Järvelin, K., & Pirkola, A. (2006). The effects of relevance feedback quality and quantity in interactive relevance feedback: A simulation based on user modeling. Lecture Notes in Computer Sci-ence, 3936, 191-204. Berlin, Heidelberg: Springer.

Norvig, P. (2008). The evolution of web search. MIT Technology Review, 111(1), 32-33.

Petratos, P. (2006). Information retrieval systems: A perspective on human computer interaction. Journal of Issues in Informing Science and Information Technology, 3(1), 511-518. Retrieved from http://informingscience.org/proceedings/InSITE2006/IISITPetr231.pdf

Petratos, P. (2007). Information retrieval systems: A human centered approach. Interdisciplinary Journal of Information, Knowledge, and Management, 2(1), 17-32. Retrieved from http://ijikm.org/Volume2/IJIKMv2p017-032Petratos331.pdf

Rocchio, J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 313 - 323). Englewood Cliffs, New Jer-sey: Prentice Hall.

Rodden, K., Ruthven, I., & White, R. (Eds.). (2007). Web information seeking and interaction. Proceedings of the ACM SIGIR 2007 Workshop on Web Information Seeking and Interaction, ACM, New York.

Salojarvi, J., Puolamaki, K., & Kaski, S. (2005). Implicit relevance feedback from eye movements. Pro-ceedings of ICANN 2005 Artificial Neural Networks: Biological Inspirations, Lecture Notes in Com-puter Science, 3696, 513-518. Berlin, Heidelberg: Springer.

White, R., Drucker, S., Marchionini, G., Hearst, M., & Schraefel, M. (Eds.). (2007). Exploratory search and HCI: Designing and evaluating interfaces to support exploratory search interaction. Proceedings of the ACM SIGCHI 2007 Workshop on Exploratory Search and HCI, ACM, New York.

White, R., Muresan, G., & Marchionini, G. (Eds.). (2006). Evaluating exploratory search systems. Proceed-ings of the ACM SIGIR 2006 Workshop on Evaluating Exploratory Search Systems, ACM, New York.

White, R., Ruthven, I., & Jose, J. (2002). The use of implicit evidence for relevance feedback in web re-trieval. Proceedings of the 24th BCS-IRSG European Colloquium on IR Research ECIR 2002 Lecture Notes in Computer Science, 2291, 93-109. Berlin, Heidelberg: Springer.

Biography Panagiotis Petratos is Assistant Professor of Computer Information Systems at California State University, Stanislaus. His research inter-ests include information retrieval systems, human computer interac-tion, networking, computer security and biometrics enabled computer systems.