A Comparative Evaluation of Tabular and Graphical ......A comparative evaluation of tabular and graphical presentation styles for information retrieval search results was conducted,

1

A Comparative Evaluation of Tabular and Graphical Presentation Styles for Information Retrieval Search Results

Jeffrey Morgan and Greg Michaelson

Department of Computing and Electrical Engineering Heriot-Watt University, Riccarton,

Edinburgh EH14 4AS, UK.

Abstract

There has been considerable debate about the relative merits of tabular and graphical presentation styles but experimental results are contradictory. One source of these inconsistencies might lie in mismatches between spatial/symbolic tasks and symbolic/spatial presentations. A comparative evaluation of tabular and graphical presentation styles for information retrieval search results was conducted, where spatial or symbolic data retrieval tasks were matched with appropriate spatial or symbolic data presentations. However, when sixty participants performed ten presentation-matched information retrieval tasks, no performance advantages were observed. We suggest that an additional dimension of data relevance to task takes precedence over the match between task and presentation.

1. Introduction A typical information retrieval (IR) search returns relevant and irrelevant results and users must examine them to find the relevant ones. We believe that the traditional linear list presentation style for search results is inadequate for performing the results evaluation task (Morgan 2000, Morgan & Michaelson 2000b). We have therefore been developing presentation styles for search results that support users when evaluating search results for relevance. An improved presentation able to support the results evaluation task should: 1. help users to deal with large numbers of

results by summarising them; 2. make the relationships between the results

explicit; 3. enable users to quickly identify

interesting characteristics and trends within the results;

4. provide tools to search and sort the results.

Data visualisation uses a variety of

techniques to present large amounts of data in

a pictorial form so that it can be understood more clearly and so that patterns and trends in the data can be seen more easily (Tukey 1977, Bertin 1981, Tufte 1983). Comprehending a large amount of data can be an enormous cognitive task. Visualisation enables people to perform this task more easily because it exploits the highly developed human visual perceptual system (Neisser 1963, Treisman & Gormican 1988, Enns & Rensink 1990, Healey et al. 1996). Data visualisation has many applications in science (McCormick et al. 1987), engineering (Porter 1994), medicine (Mahoney 1996), and business (Wright 1997).

While visualisation seems particularly promising for inspecting information retrieval search results, tables are a well established presentation technique. It is therefore important to investigate whether or not data visualisation offers advantages over tables in information retrieval.

The next section reviews previous work comparing the performance of users with tables and graphs. Section 3 describes the hypotheses developed from this review. Section 4 describes the presentation styles

2

and section 5 describes the comparative evaluation. The results are set out in section 6 and the conclusions are presented in section 7. 2. Tables versus Graphs There have been a considerable number of studies comparing the effectiveness of tabular and graphical presentations of information. The earliest studies date back to the 1920s. Washburne (1927) conducted the largest study to date on the display of quantitative information with tables and graphs. Several thousand participants performed three types of task using a variety of textual, tabular and graphical presentation styles. The task types were: reading specific values, comparing values and identifying trends. Washburne concluded that bar charts should be used for complex comparisons; pictographs should be used for simple comparisons; line graphs should be used for dynamic comparisons; and that tables should be used for reading specific values.

Washburne's study began a debate that, after seventy years of research, is still producing equivocal results. For example, Carter (1947), Feliciano, Powers & Bryant 1963), Benbasat & Schroeder (1977), Tullis (1981), and Meyer et al. (1997) found that the users of graphical presentations performed tasks more accurately than the users of tabular presentations, whereas Grace (1966), Wainer & Reiser (1976), Lucas (1981), and Remus (1984) found that the users of tabular presentations performed tasks more accurately than the users of graphical presentations. Furthermore, MacDonald-Ross (1977), Ives (1982), DeSanctis (1984), Dickson et al. (1986), and Coll et al. (1994) found mixed results.

Washburne's study has been cited in many of the reports comparing tables and graphs and has been used as a basis for choosing between graphs and tables. Meyer (1997) recently re-examined this study and found that only a few of Washburne's conclusions were supported by his data. Thus, acceptance of Washburne's recommendations may have led to inappropriate use of tables and graphs

and to incorrect assumptions being made in comparative studies.

Several researchers, such as Jarvenpaa et al. (1985), Scaife & Rogers (1996), and Lohse (1997) believe the equivocal results were caused by the lack of a theoretical framework for creating graphs and no theoretical framework for choosing between graphs and tables; without such a framework, authors are unable to produce accurate graphs. For example, Cleveland (1984) studied the graphs in one volume of the journal Science and found that 30% of them were either incomplete, misleading or not fully explained. In the chapter “Graphical Integrity”, Tufte (1983) describes how many authors, through lack of training or ability, produce graphs that are at best simply misleading and at worst deceptive. There are many how-to books for producing graphs, such those by Bertin (1981) and Tufte (1983). Some of these books are based on experimental work such as those by Cleveland (1985) and Kosslyn (1989). Many, however, are based on the author's intuition and, although they provide insight into graph design, many of their recommendations are not supported by empirical results (Legge et al. 1989, Spence & Lewandowsky 1990).

Vessey (1991), on the other hand, believes that the equivocal results are due to the lack of attention paid to the different types of tasks performed in those studies. In a re-examination of a number of studies on tables and graphs, Vessey differentiated between symbolic and spatial tasks. Symbolic tasks are typically data acquisition tasks that require specific data values to be retrieved. Spatial tasks, referred to by Larkin & Simon (1987) as diagrammatic tasks, involve judgement, comparison and inference about the relationships within data.

When applied to the graphs versus tables debate, Vessey's cognitive fit theory predicts that users will perform well when they answer symbolic questions with a symbolic representation, such as a table, and also when they answer spatial questions with a spatial representation, such as a graph. Vessey argues that most tables versus graphs studies

3

have mixed symbolic and spatial questions. Any performance improvements accrued when the participants answered symbolic questions with tables were cancelled out when they used tables to answer spatial questions. Similarly, any performance improvements accrued when the participants answered spatial questions with graphs were cancelled out when they used graphs to answer symbolic questions. Vessey found that subjects performed well when using tables to answer symbolic questions and when using graphs to answer spatial questions.

Studies performed by Boehm-Davis et al. (1987) and Peters et al. (1988) have shown that matching the type of data retrieval task with the representation of the data leads to improved performance. The results of Vessey's meta-analysis add to this evidence.

Factors other than the task affect participants' performance when using tables and graphs. Petre & Green (1993) found that people have different levels of ability to read graphs, which is affected by the quality of the graph. Lohse (1997), using Pinker's (1990) model of perception and cognition for graph comprehension, describes how the reader's working memory capacity affects their ability to use graphs: well designed graphs are easier to use because they put less load on the reader's working memory.

Coll et al. (1994) found that the performance of the participants in their study correlated to their educational background: the engineering subjects out-performed the business students when using graphs, and the business students out-performed the engineering students when using tables. They proposed that the background of the engineering students made them more familiar with graphs, whereas business students were more familiar with tables.

Despite the equivocal results of the graphs versus tables studies, there is a growing consensus in the literature to support Vessey's conclusion that no single representation will be suitable for completing all types of tasks. Therefore, it is important to consider the demands of the results evaluation task when designing new presentations of results.

However, the literature indicates that both tables and graphs can be appropriate.

We observe that tables are symbolic representations that are useful for answering symbolic questions that require the retrieval of specific data values. In contrast, graphs are spatial representations that are useful for answering spatial questions that require data to be compared and related. Both tables and graphs can be used to answer symbolic and spatial questions but there is a predicted performance advantage of matching the representation to the task (Boehm-Davis et al. 1987, Peters et al. 1988, Vessey 1991). 3. Hypotheses Guided by the literature reviewed in the previous section, we formed two groups of hypotheses concerned with tabular versus graphical presentations and static versus re-orderable presentations. 3.1 Tabular vs. Graphical Presentation Styles We wished to explore whether users could filter results more quickly and more accurately by using graphical presentations than by using tabular presentations. Vessey’s (1991) conclusions predict that users of tabular presentations will perform symbolic tasks faster and more accurately than users of graphical presentations: • Hypothesis 1a. The users of tabular

presentations of results will perform a set of symbolic results evaluation tasks significantly faster than the users of graphical presentations.

• Hypothesis 1b. The users of tabular

presentations of results will perform a set of symbolic results evaluation tasks significantly more accurately than the users of graphical presentations.

Vessey (1991) also predicts that users of graphical presentations will perform spatial tasks faster and more accurately than users of tabular presentations:

4

• Hypothesis 1c. The users of graphical presentations of results will perform a set of spatial results evaluation tasks significantly faster than the users of tabular presentations.

• Hypothesis 1d. The users of graphical

presentations of results will perform a set of spatial results evaluation tasks significantly more accurately than the users of tabular presentations.

3.2 Static vs. Re-orderable Presentation Styles We also wished to investigate whether users could filter results more quickly and more accurately using re-orderable presentations than by using static presentations.

Tukey (1977) developed the idea that interactive visualisations could be used for exploratory data analysis. Bertin (1981) discussed the usefulness of manually reordering the information in a visualisation so that trends and relationships can be more easily identified and so that interesting information can be grouped together to make comparison easier. Another useful manipulation would be to automatically sort the information based on one or more dimensions. Sorting by author name, for example, would group together all documents written by the same author.

Spatial tasks require users to identify relationships within a set of results. It was predicted that re-orderable presentation styles would enable users to complete spatial tasks more quickly and more accurately than static presentation styles: • Hypothesis 2a. The users of re-orderable

presentations of results will perform a set of spatial results evaluation tasks significantly faster than the users of static presentations.

• Hypothesis 2b. The users of re-orderable

presentations of results will perform a set of spatial results evaluation tasks significantly more accurately than the users of static presentations of results.

Vessey's (1991) cognitive fit theory matches tabular presentations with symbolic tasks. Tables presented on paper are not re-orderable so the ability to reorder the results was not thought to be an advantage for performing symbolic tasks. It was predicted that there would be no speed or accuracy difference between the static and re-orderable presentations for symbolic tasks: • Hypothesis 2c. There will be no

significant difference in time taken to perform a set of symbolic results evaluation tasks between the users of re-orderable presentations of results and the users of static presentations.

• Hypothesis 2d. There will be no

significant difference in accuracy achieved when performing a set of symbolic results evaluation tasks between the users of re-orderable presentations of results and the users of static presentations.

4. The Comparison Systems Our study comparatively evaluated six different presentation styles for IR search results: a linear list, text tables, bar charts, matrix charts, a scatter plot and a document map. Except for the document map, these systems use existing data presentation and visualisation techniques but are novel for presenting IR search results. The document map is a new visualisation technique for IR search results. The documents were sequentially numbered in each of the six presentations. These systems are now described briefly. More detail may be found in Morgan (2000), Morgan & Michaelson (2000a) and Morgan & Michaelson (2000b). 4.1 Linear List The linear list presentation was the baseline for comparison. The results were displayed in a format typically used by IR systems (figure 1). Users were able to search for terms such as author names and keywords through a

5

simple string search facility. No document similarity information was presented. 4.2 Text Tables The text tables presentation provided 1D and 2D tables of search terms. There was a 1D table for each of the four categories of search terms: keywords, author names, publication years and publication types. Each 1D table provided a list of search terms and the numbers of the documents they occurred in.

The 2D tables provided a join of two 1D tables. For example, the authors x publication years table showed the author names on one axis and the publication years on the other. The elements of the table showed the numbers of the documents that contained the search terms on the axes. There was one 2D table for each combination of two 1D tables: authors x publication years, authors x publication types, authors x keywords, keywords x publication years and keywords x publication types.

A 2D document similarity table was provided that listed the document numbers along both axes. The elements of this table showed the numerical similarity between each document (figure 2).

The numeric measure of similarity between each document was calculated by performing a latent semantic analysis (LSA) on the documents (Deerwester et al. 1990). LSA represents documents in a multi-dimensional space where similar documents are placed near one another and dissimilar documents are placed far apart. The document similarity was calculated by finding the distance between the documents in the multi-dimensional space. 4.3 Bar Charts The bar charts presented the frequency of occurrence of the search terms (figure 3). Four bar charts displayed the search terms, one for each of the four categories of search terms, keywords, author names, publication years and publication types. Each search term was represented by a bar and the length of the bar was the frequency of occurrence of the search term in the documents. Each bar chart

could be sorted in descending order of frequency.

The bar charts did not provide document similarity because no viable method was found for presenting document similarity as a bar chart (Morgan 2000). 4.4 Matrix Charts Matrix charts present 2D tables of quantities (Carmichael & Sneath 1969, Ling 1973, Marsh 1992). Each table element represents a quantity as a filled circle; the larger the quantity, the greater the diameter of the circle. Here, the matrix charts were the same as the 2D text tables, except that the elements of the matrix charts represented the frequency of occurrence of the search terms with filled circles. The document similarity matrix chart was identical to the document similarity text table except that the numerical similarity values were represented by filled circles (figure 4). The rows and columns of the matrix charts could be sorted to show the search terms in descending order of frequency. 4.5 Scatter Plot Scatter plots have been used to present information in 3D spaces (Stuetzle 1987, Fisherkeller et al. 1988, Donoho et al. 1988). The scatter plot presented each document as a square in a 3D space. The 3D co-ordinates of the squares were the first three dimensions of the multi-dimensional space used to calculate document similarity.

The scatter plot showed document similarity directly because the proximity of the squares in the 3D space represented their similarity; the closer documents were placed to each other in the space, the greater their similarity (figure 5).

The scatter plot was explored using scrolling lists of search terms. Selecting a search term from a scrolling list highlighted the documents on the scatter plot that contain the search term. 4.6 Document Map Document maps visualise a set of documents as a set of numbered squared arranged in a

6

square matrix (Morgan & Michaelson 2000b). Document maps are information displays that are explored by selecting search terms from the same scrolling lists as the scatter plot. Document maps present the similarity between a selected document and the remaining documents by rearranging the squares so that documents similar to the selected document are placed near it (figure 6). The squares are also shaded to emphasise the similarity; the darker the shade of grey, the greater the similarity. 5. Evaluation 5.1 Participants Sixty staff and students of the Department of Computing and Electrical Engineering at Heriot-Watt University participated in the evaluation. The age range of the 52 male and 8 female participants was 21 to 42 years with a median age of 26 years. Each participant had used at least one IR system and had normal or corrected to normal vision (self reported). Each participant was given a book token worth 5 pounds. 5.1 Search Results Each presentation provided information about the results of the query functional programming issued to the SEL-HPC1 article archive. The answers to the tasks were objective and were obtained by interacting with the presentations and reading information from them. No prior knowledge of functional programming was assumed or required. 5.2 Design The experiment comparatively evaluated the six presentation styles. The experiment had one independent variable: presentation style (linear list / text tables / bar charts / matrix charts / scatter plot / document map).

The experiment had two dependent variables: the time taken to complete a task and the score awarded for that task.

The experiment had a single factor design with 6 levels. A between subjects design was

1 http://hypatia.dcs.qmw.ac.uk/SEL-HPC/Articles

used to avoid transfer of learning effects that might have been introduced by a within subjects design. Ten participants were assigned to each experimental group as shown in table 1. 5.3 Tasks The results evaluation task involves identifying relevant documents and the interesting relationships within them. Ten results evaluation tasks were devised to test the performance of users with the six presentations. The ten tasks were representative of those used in comparative studies of presentation styles (Powers et al. 1984, Nugent & Broyles 1992, Coll et al. 1994, Meyer et al. 1997). Furthermore, the tasks were representative of results evaluation tasks rather than general IR search tasks which tend to be more open-ended. Users perform results evaluation tasks in the context of an overall IR search goal. Users may issue several queries before their search goal is satisfied and therefore have to evaluate several sets of results.

Each task asked a specific question about search terms and document similarity information. The hypotheses required both symbolic and spatial questions. Symbolic questions require retrieval of specific data values whereas spatial questions deal with the relationships within that data. As well as asking questions about search term frequency and document similarity, the ten tasks were also spatial or symbolic in nature.

The tasks were grouped into three types: direct representation, visual advantage and document similarity. Table 2 shows the symbolic or spatial nature of the three groups of tasks and the information required to complete them. The three groups of tasks are now described in detail. 5.3.1 Direct Representation The direct representation questions were symbolic tasks that required participants to read the answers directly off the presentation of results or use the facilities of the presentation to obtain the answer:

7

1. Write down the document numbers of all the documents written in 1988.

2. Write down the document numbers of all the documents written by Hanus.

3. Write down the document numbers of all the documents written by Hill in 1993.

4. Write down the document numbers of all the documents that contain the word “polymorphism” in the title and keyword fields.

5.3.2 Visual Advantage The visual advantage questions were spatial tasks that users were thought to be able to answer more quickly by making an initial visual scan of the available information to identify the likely candidates for the answer. The likely candidates would then be investigated in more detail: 5. Write down the year in which the most

number of documents was published. 6. Write down the surnames of the top three

authors that published the most number of documents (any type of documents).

7. Write down the surname of the author that published the most number of documents as articles.

8. Write down the word that occurs most frequently in the title and keyword fields.

5.3.3 Document Similarity The document similarity questions were spatial tasks that tested the ability of the presentations of results to show document similarity.

They required the participants to read the similarity values directly from the presentations. Note that the participants were not required to evaluate the similarity between any documents. This had three advantages. First, evaluating document similarity is very subjective. By removing a subjective element from the evaluation the reliability of the results should be improved. Second, it reduced the time and effort required by participants to perform the tasks. The aim was to remove a cause of fatigue, boredom and loss of concentration which could have affected the results. Third,

differences between participants' familiarity with the topic of the query were unimportant.

The document similarity questions were: 9. Write down the document numbers of the

five documents that are most similar to document number 36, entitled “Fixing some space leaks without a garbage collector”.

10. Lloyd Allison has written six documents including document number 3, entitled “Lazy dynamic-programming can be eager”. Of the five other documents written by Lloyd Allison, write down the document numbers of the two documents that are most similar to document number 3.

The linear list and bar charts presentations

did not provide document similarity information and were excluded from the comparisons of document similarity. 5.4 Procedure Each participant was given a set of instructions that explained the presentation software they would be using. Each participant was also given a set of simple training exercises which enabled them to familiarise themselves with the software.

Each participant was asked to complete the tasks after they had read the instructions and completed the training exercises.

After completing the tasks each participant was asked to fill in a questionnaire that collected qualitative information about their experience with the presentation. 6. Results 6.1 Quantitative Results The time and score data for each question was averaged by question type for each participant. For each participant, the mean time taken to answer the direct representation questions was the mean time taken to answer questions 1, 2, 3, and 4. The mean time taken to answer the visual advantage questions was the mean time taken to answer questions 5, 6, 7, and 8. The mean time taken to answer the document similarity questions was the mean

8

time taken to answer questions 9 and 10. The scores were similarly averaged over question type for each participant.

Table 3 shows the means and standard deviations of the time taken by each presentation group broken down by question type. Table 4 shows the means and standard deviations of the scores of each presentation group broken down by question type. 6.1.1 Tabular vs. Graphical Presentation Styles For the tabular and graphical presentation styles, the following comparison was made for the direct representation and visual advantage questions:

Text Bar Matrix Scatter Document Tables Charts Charts Plot Maps |______| |_________________________________| Tabular Graphical |_____________________|

The bar charts visualisation did not present document similarity information so the following comparison was made for the document similarity questions:

Text Bar Matrix Scatter Document Tables Charts Charts Plot Maps |______| |_________________________| Tabular Graphical |________________________|

Hypotheses 1a and 1c Table 5 shows the mean times taken by the tabular and graphical presentation groups broken down by question type. An ANOVA revealed a significant interaction between presentation type and question type: F(2,18)=35.08, p<0.001 (figure 7). However, a Scheffe post hoc analysis revealed that the significant differences were between question types within the two presentation types rather than between the two presentation types. Since there was no significant difference between the tabular and graphical presentations for any question type, hypotheses 1a and 1c were not supported. Hypotheses 1b and 1d Table 6 shows the mean scores of the tabular and graphical presentation groups broken down by question type. An ANOVA revealed

a significant interaction between presentation type and question type: F(2,18)=11.73, p<0.005 (figure 8). However, a Scheffe post hoc analysis revealed that the significant differences were between question types within the two presentation types rather than between the two presentation types. Since there was no significant difference between tabular and graphical presentations for any question type, hypotheses 2b and 2d were not supported. Discussion There was no difference in speed or accuracy between the tabular and graphical presentation groups for the symbolic or spatial questions. None of the hypotheses based on Vessey's (1991) predictions for the performance of tabular and graphical presentations were supported. Boehm-Davis et al. (1987), Peters et al. (1988) and Vessey (1991) recommend that a data retrieval task should be matched with an appropriate data representation. The results of the evaluation illustrate a case where the use of this recommendation did not produce a performance advantage. This case might have been caused by the number of participants in the experiment being too low to produce statistical significance.

We propose an alternative explanation that considers whether data retrieval tasks are affected by the data as well as by the representation of that data. For example, consider a symbolic task that requires the retrieval of the average age of several groups of people from a table. Also consider a spatial task that requires the comparison of the average ages of the same groups of people with a bar chart. These tasks might be performed more quickly and more accurately if the table and bar chart present the average ages as well as the individual ages. The performance of these data retrieval tasks might be affected by how closely the data itself matches the data retrieval task as well as its presentation. It is likely that in the evaluation, the search term and document similarity information was so useful for performing the tasks that any performance

9

differences between the tabular and graphical presentations were not relevant.

In the experiments performed by Boehm-Davis et al. (1987), Peters et al. (1988) and the meta-analysis performed by Vessey (1991), the type of data retrieval task and the type of data representation was varied but the information that was presented was not varied. These researchers do not seem to have considered how the suitability of the data for performing data retrieval tasks might affect their recommendations.

Our studies suggest that the performance predictions made by Boehm-Davis et al. (1987), Peters et al. (1988) and Vessey (1991) need to be extended to consider how well the data supports the data retrieval task, as well as how well the presentation of that data supports the data retrieval task. 6.1.2 Static vs. Re-orderable Presentation Styles This sub-section compares the time and score performance of the static presentation styles with the re-orderable presentation styles. Here, the linear list, text tables and scatter plot presentations were static and the bar charts, matrix charts and document map presentations were re-orderable.

The following comparison was made for the direct representation and visual advantage questions: Linear Text Scatter Bar Matrix Document List Tables Plot Charts Charts Maps |______________________| |______________________| Static Re-orderable |___________________________|

The linear list and bar charts presentations

did not provide document similarity information so the following comparison was made for the document similarity questions:

Text Scatter Matrix Document Tables Plot Charts Maps |_______________| |________________| Static Re-orderable |___________________|

Table 7 shows the mean times taken by the static and re-orderable presentation groups. An ANOVA revealed a significant difference in time taken: F(1,9)=26.0, p<0.001. There

was a significant time saving when using re-orderable rather than static presentations.

Table 8 shows the mean scores of the static and re-orderable presentation groups. An ANOVA revealed a significant difference in scores achieved: F(1,9)=25.17, p<0.001. There was a significant increase in accuracy when using re-orderable rather than static presentations. Hypothesis 2a Table 9 shows the mean times taken by the static and re-orderable presentation groups broken down by question type. An ANOVA revealed a significant interaction between presentation type and question type: F(2,18)=16.9, p<0.001 (figure 9). A Scheffe post hoc analysis revealed that the re-orderable presentation groups were significantly faster than the static presentation groups for the document similarity questions (p<0.005). There was no significant difference in time taken between the static and re-orderable presentation groups for the visual advantage questions. Hypothesis 2a was therefore only partially supported. Hypothesis 2b Table 10 shows the mean scores of the static and re-orderable presentation groups broken down by question types. An ANOVA revealed a significant interaction between presentation type and question type: F(2,18)=13.4, p<0.001 (figure 10). A Scheffe post hoc analysis revealed that the re-orderable presentation groups were significantly more accurate that the static presentation groups for the document similarity questions (p<0.05). There was no significant difference between the static and re-orderable presentation groups for the visual advantage questions. Hypothesis 2b was therefore only partially supported. Hypothesis 2c There was no significant difference in time taken to complete the direct representation questions between the static and re-orderable presentation groups. Hypothesis 2c was supported.

10

Hypothesis 2d There was no significant difference in the accuracy of the answers to the direct representation questions between the static and re-orderable presentation groups. Hypothesis 2d was supported. Discussion Table 11 summarises the significant differences between the static and re-orderable presentations. The re-orderable presentation groups performed the document similarity questions significantly faster and significantly more accurately than the static presentation groups. Although no significant difference was found for the visual advantage questions, these results lend support to Bertin's (1981) idea that enabling users to reorder information helps them deal with relationships in that information. 6.2 Qualitative Results The participants were asked to subjectively rate the presentation of results they evaluated in four categories: (1) ease of use; (2) ease of learning; (3) suitability for helping users filter results; and (4) whether they would use the presentation for filtering results on a regular basis. Responses were captured in a post-experiment questionnaire using the following statements: 1. The system was easy to learn. 2. The system was intuitive and easy to use. 3. The system is suitable for helping users to

filter the results of an information retrieval system query.

4. I would be happy to use a system like this for filtering the results of an information retrieval system query on a regular basis.

Participants recorded their level of agreement with these statements on the four point scale:

Disagree 1 2 3 4 Agree Table 12 shows the percentage agreement with the four statements.

The participants rated the bar charts and matrix charts presentations higher than the text tables presentation for ease of learning; they rated the bar charts as equally easy to use as the text tables; and they rated all four visualisations higher for suitability for filtering results than the text tables. The quantitative results of the evaluation showed that the users of the text tables presentation performed as well as the users of the visualisations.

This paradox of users preferring systems that perform less well is not new in the human-computer interaction field. Nielsen & Levy (1994) conducted a meta-analysis of the correlation between user performance and subjective satisfaction. They found that there was a positive correlation as would be expected, i.e. that users prefer systems that enable them to perform better, where better can mean faster and with greater accuracy. Nielsen and Levy also found several cases where the reverse was true. For example, Grudin & MacLean (1984) found that some users preferred to use a data selection method that was slower than another method that they had used during training. MacLean et al. (1985) found that some users preferred to use a slower method of data entry (as long as it was not too much slower than the faster method).

One reason why the text tables presentation was given a lower subjective rating even though user performance was high, was that users form an initial impression of a user interface which can affect their perception of its usability. For example, in a study of information systems, Hiltz & Johnson (1990) found that if people perceive computers to be difficult to use before they use them, the same people were more likely to express dissatisfaction with the computers after four months of use. Kurosu & Kashimura (1995) found that the aesthetics of an interface were highly correlated with perceived ease of use; the more attractive an interface looked, the easier users thought it would be to use. Kurosu and Kashimura asked subjects to rate the attractiveness and perceived ease of use of 26 designs for an

11

automated teller machine. Tractinsky (1997) thought that Kurosu and Kashimura's result was because the Japanese culture values aesthetics. Tractinsky repeated Kurosu and Kashimura's experiment in Israel, a country which he describes as having a more “action-oriented” culture. Tractinsky found that design aesthetics were even more closely related to perceived usability than in Japan. The text tables presentation could therefore have been given lower subjective ratings because it was not as visually appealing as the graphical presentations. 7. Conclusions The results of this study helped to identify a new research question for further tables versus graphs studies, and recommendations for developers of tabular and graphical as well as static and re-orderable presentation styles. 7.1 A New Research Question for Tables versus Graphs Studies There was no difference between the tabular and graphical presentations for speed or accuracy. The performance differences predicted by the literature for the tabular and graphical presentations were not observed. We suggest that this was because the data presented was closely matched to the data retrieval tasks. Matching data and data retrieval tasks might therefore take precedence over matching spatial and symbolic tasks with spatial and symbolic presentations. This poses a new research question for the visualisation community: If the data retrieval task is matched with an appropriate data representation, does the suitability of the data for performing the data retrieval task support or inhibit the performance of that task?

Experiments investigating this question would need to vary the data representation (symbolic / spatial), the data retrieval task (symbolic / spatial), and the suitability of the data for performing that task (closely matched to the task / poorly matched to the task).

7.2 Recommendation for Developers of Tabular and Graphical Presentations Since there was no difference in performance between the users of the tabular and graphical presentations, it seems that search terms and document similarity information can be presented as effectively with tables as with graphics. This is an important result for developers of presentations of results. Each graphical presentation took significantly longer to implement than the tabular presentation. However, the evaluation participants gave higher subjective ratings to the graphical presentations. There is therefore a trade-off: users prefer graphical presentations but they can perform equally well with tabular presentations which can be implemented more quickly.

Developers therefore need to decide whether committing the resources to building graphical presentations is worthwhile. This decision must be balanced with the results of the research conducted by Kurosu & Kashimura (1995) and Tractinsky (1997) who found that interface aesthetics were highly correlated with perceived ease of use, and that of Hiltz & Johnson (1990) who found that perceived difficulty of use was more likely to lead to dissatisfaction after use. Therefore, although tabular and graphical presentations might provide equivalent performance, and tabular presentations are faster to implement, long term user satisfaction and ultimately user acceptance might be best served with graphical presentations. This needs to be tested with further experimentation. 7.3 Static and Re-orderable Presentations A tabular or graphical presentation of the search term frequency and document similarity information should provide automatic reordering operations such as sorting. Acknowledgements The authors would like to thank the EPSRC who provided the Ph.D. studentship that funded this work, and the staff and students

12

of Heriot-Watt University that took part in the evaluation. References Benbasat, I. & Schroeder, R. (1977), “An

experimental investigation of some MIS variables”, MIS Quarterly 1(1), pp. 37-50.

Bertin, J. (1981), Graphics and Graphic Information Processing, Walter de Gruyter.

Boehm-Davis, D., Holt, R., Koll, M., Yastrop, G. & Peters, R. (1987), The effects of different data base formats on information retrieval, in “Proceedings of the Human Factors Society - 31st Annual Meeting”, pp. 983-987.

Carter, L. F. (1947), “An experiment on the design of tables and graphs used for presenting numerical data”, Journal of Applied Psychology 31, pp. 640-650.

Carmichael, J. W. & Sneath, P. N. A. (1969), “Taxonometric maps”, Syst. Zool. 18, pp. 402-415.

Cleveland, W. S. (1984), “Graphs in scientific publications”, The American Statistician 38, pp. 261-269.

Coll, R. A., Coll, J. H. & Thakur, G. (1994), “Graphs and tables: A four-factor experiment”, Communications of the ACM 37(3), pp. 77-86.

DeSanctis, G. (1984), “Computer graphics as decision aids: Directions for research”, Decision Science 15(4), pp. 463-487.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landaur, T. K. & Harshman, R. (1990), “Indexing by latent semantic analysis”, Journal of the American Society for Information Science 41(6), pp. 391-407.

Dickson, G. W., DeSanctis, G. & McBride, D. J. (1986), “Understanding the effectiveness of computer graphics for decision support: A cumulative experimental approach”, Communications of the ACM 29(1), pp. 40-47.

Donoho, A. W., Donoho, D. L. & Grasko, M. (1988), “MACSPIN: Dynamic graphics on a desktop computer”, Journal of the American Statistical Association, 8(4), pp. 51-58.

Enns, J. T. & Rensink, R. A. (1990), “Sensitivity to three-dimensional orientation in visual search”, Psychological Science 1(5), pp. 323-326.

Feliciano, G. D., Powers, R. D. & Bryant, E. K. (1963), “The presentation of statistical information”, Audio and Visual Communication Review 13(11), pp. 32-39.

Fisherkeller, M., Friedman, J. H. & Tukey, J. W. (1988), PRIM-9: An interactive multidimensional data display and analysis system, in W. S. Cleveland & M. E. McGill, eds, “Dynamic Graphics for Statistics”, Wadsworth and Brooks Cole, pp. 91-109.

Grace, G. L. (1966), “Application of empirical methods to computer-based system design”, Journal of Applied Psychology 50, pp. 442-450.

Grudin, J. & MacLean, A. (1984), Adapting a psychophysical method to measure performance and preference tradeoffs in human-computer interaction, in “Proceedings of INTERACT '84”, pp. 737-741.

Healey, C. H., Booth, K. S. & Enns, J. T. (1996), “High-speed visual estimation using preattentive processing”, ACM Transactions on Computer-Human Interaction 3(2), pp. 107-135.

Hiltz, S. R. & Johnson, K. (1990), “User satisfaction with computer mediated communication systems”, Management Science 30(6), pp. 739-764.

Ives, B. (1982), “Graphical user interfaces”, MIS Quarterly 6(1), pp. 15-47.

Jarvenpaa, S., Dickson, G. W. & DeSanctis, G. (1985), “Methodological issues in experimental IS research: Experiences and recommendations”, MIS Quarterly, 9(2), pp. 141-156.

Kosslyn, S. M. (1989), “Understanding charts and graphs”, Applied Cognitive Psychology 3, pp. 185-225.

Kurosu, M. & Kashimura, K. (1995), Apparent usability vs. inherent usability, in “CHI '95 Conference Companion”.

Larkin, J. H. & Simon, H. H. (1988), “Why a diagram is (sometimes) worth ten

13

thousand words”, Cognitive Science 11, pp. 65-99.

Legge, G. E., Gu, Y. & Luebaker, A. (1989), “Efficiency of graphical perception”, Perception and Psychophysics 46(4), pp. 365-374.

Ling, R. F. (1973), “A computer generated aid for cluster analysis”, Communications of the ACM 16, pp. 355-361.

Lohse, G. L. (1997), Models of graphical perception, in M. G. Helander, T. K. Landauer & P. V. Prabhu, eds, “Handbook of Human-Computer Interaction”, North-Holland, pp. 107-135.

Lucas, H. C. (1981), “An experimental investigation of the use of computer-based graphics in decision making”, Communications of the ACM 27(7), pp. 757-768.

MacDonald-Ross, M. (1977), “How numbers are shown”, Audio and Visual Communication Review 25(4), pp. 359-409.

MacLean, A., Barnard, P. J. & Wilson, M. D. (1985), Evaluating the human interface of a data entry system: User choice and performance measures yield different trade-off functions, in “People and Computers: Designing the User Interface”, Cambridge University Press.

Mahoney, D. P. (1996), “The art and science of medical visualization”, Computer Graphics World 19(7), pp. 25-30.

Marsh, S. (1992), “The interactive matrix chart”, SIGCHI Bulletin 24(4), pp. 32-38.

McCormick, B. H., DeFanti, T. A. & Browne, M. D. (1987), “Visualization in scientific computing”, Computer Graphics 21(6), pp. 1-14.

Meyer, J. (1997), “A new look at an old study on information display: Washburne (1927) reconsidered”, Human Factors 39(3), pp. 333-340.

Meyer, J., Gopher, D. & Levy, J. (1997), Discrimination between functions in tables and graphs, in “Proceedings of the Human Factors and Ergonomics Society 41st Annual Meeting”, pp. 1348-1351.

Morgan, J. (2000), Supporting Information Retrieval System Users by Making

Suggestions and Visualising Results, Ph.D. Thesis, Heriot-Watt University, Edinburgh, UK.

Morgan, J. & Michaelson, G. (2000a), “The Design and Comparative Evaluation of Six Presentation Styles for Information Retrieval Search Results”, In Preparation.

Morgan, J. & Michaelson, G. (2000b), “Visualising Information Retrieval Search Results with Document Maps”, In Preparation.

Nielsen, J. & Levy, J. (1994), “Measuring usability: Preference vs. performance”, Communications of the ACM 37(4), pp. 67-75.

Neisser, U. (1963), “Decision time without reaction time: Experiments in visual scanning”, American Journal of Psychology 76, pp. 376-385.

Nugent, W. A. & Broyles, J. W. (1992), Assessment of graphics and text formats for system status displays, in “Proceedings of the Human Factors and Ergonomics Society 36th Annual Meeting”, pp. 1464-1468.

Peters, R. D., Yastrop, G. T. & Boehm-Davis, D. (1988), Predicting information retrieval performance, in “Proceedings of the Human Factors Society - 32nd Annual Meeting”, pp. 301-305.

Petre, M. & Green, T. R. G. (1993), “Learning to read graphics: Some evidence that 'seeing' an information display is an acquired skill”, Journal of Visual Languages and Computing 4, pp. 55-70.

Pinker, S. (1990), “A theory of graph comprehension”, in R. Freedle, ed., Artificial Intelligence and the Future of Testing, Lawrence Erlbaum.

Porter, S. (1994), “Engineering visualization”, Computer Graphics World 17(11), pp. 23-25.

Powers, M., Lashley, C., Sanchez, P. & Shneiderman, B. (1984), “An experimental comparison of tabular and graphic data presentation”, International Journal of Man-Machine Studies 20, pp. 545-566.

14

Remus, W. (1984), “An empirical investigation of the impact of graphical and tabular data representations on decision making', Management Science 30(5), pp. 553-541.

Scaife, M. & Rogers, Y. (1996), “External cognition: how do graphical representations work?”, International Journal of Man-Machine Studies 45, pp. 185-213.

Spence, I. & Lewandowsky, S. (1990), “Displaying proportions and percentages”, Applied Cognitive Psychology 5, pp. 61-77.

Stuetzle, W. (1987), “Plot windows”, Journal of the American Statistical Association 82, pp. 466-475.

Tractinsky, N. (1997), Aesthetics and apparent usability: Empirically assessing cultural and methodological issues, in “Proceedings of CHI '97”.

Treisman, A. & Gormican, S. (1988), “Feature analysis in early vision: Evidence from search asymmetrics”, Psychological Review 95, pp. 15-48.

Tufte, E. R. (1983), The Visual Display of Quantitative Information, Graphics Press.

Tukey, J. W. (1977), Exploratory Data Analysis, Addison Wesley.

Tullis, T. S. (1981), “An evaluation of alphanumeric, graphic and color displays”, Human Factors 23(5), pp. 541-550.

Vessey, I. (1991), “Cognitive fit: A theory-based analysis of the graphs versus tables literature”, Decision Sciences 22, pp. 219-240.

Wainer, H. & Reiser, M. (1976), “Assessing the efficacy of visual displays”, Proceedings of the American Statistical Association, pp. 89-92.

Washburne, J. N. (1927), “An experimental study of various graphic, tabular and textual methods of presenting quantitative material”, Journal of Educational Psychology, 18, pp. 361-376,465-476.

Wright, W. (1997), “Business visualization applications”, IEEE Computer Graphics and Applications 17(4), pp. 66-70.

15

Figure 1. A linear list of the details of three documents.

Figure 2. A text table presentation of the numerical similarity between three documents.

16

Figure 3. A bar chart presentation of the number of documents published between 1996 and 1998.

17

Figure 4. A matrix chart presentation of the numerical similarity between three documents.

Figure 5. A 2D scatter plot presentation of the similarity between twelve documents; the closer the documents, the greater the similarity.

Figure 6. A document map presentation of the similarity between nine documents; the closer a document is to the selected document, the greater the similarity to it.

18

Figure 7. Interaction between the type of presentation (tabular and graphical) and the type of question (direct representation, visual advantage and document similarity) for time taken.

Interaction between Presentation Type (Tabular andGraphical) and Question Type (Direct Representation,Visual Advantage and Document Similarity) for Score

Achieved

0

20

40

60

80

100

120

DirectRepresentation

Visual Advantage DocumentSimilarity

Question Type

Scor

e (%

)

TabularGraphical

Figure 8. Interaction between the type of presentation (tabular and graphical) and the type of question (direct representation, visual advantage and document similarity) for scores achieved.

Interaction between Presentation Type (Tabular and Graphical) and Question Type (Direct Representation, Visual Advantage and Document Similarity) for Time

Taken

020406080

100120140160180200


Visual Advantage Document Similarity

Question Type

Tim

e (S

econ

ds)

TabularGraphical

19

Interaction between Presentation Type (Static andRe-orderable) and Question Type (Direct

Representation, Visual Advantage and DocumentSimilarity) for Time Taken

0

50

100

150

200

250



Question Type

Tim

e (S

econ

ds)

StaticReorderable

Figure 9. Interaction between the type of presentation (static and re-orderable) and the type of question (direct representation, visual advantage and document similarity) for time taken.

Interaction between Presentation Type (Static andRe-orderable) and Question Type (Direct

Representation, Visual Advantage and DocumentSimilarity) for Score Achieved

0

20

40

60

80

100

120



Question Type

Scor

e (%

)

StaticReorderable

Figure 10. Interaction between the type of presentation (static and re-orderable) and the type of question (direct representation, visual advantage and document similarity) for score achieved.

20

Presentation Style Linear

List Text

Tables Bar

Charts Matrix Charts

Scatter Plot

Document Map

10 People 10 People 10 People 10 People 10 People 10 People

Table 1. The experiment design.

Task Group

Task Type

Information Required to Complete the Task

Direct Representation Symbolic Search Term Frequency Visual Advantage Spatial Search Term Frequency Document Similarity Spatial Document Similarity

Table 2. The types of evaluation tasks and the information required to answer them.

Direct Visual Document Representation Advantage Similarity Mean S.D. Mean S.D. Mean S.D.

Linear List 61.50 14.76 380.78 117.40 - - Text Tables 41.65 9.35 72.70 22.90 188.05 44.30 Bar Charts 52.43 16.28 66.75 25.37 - - Matrix Charts 81.05 28.97 105.70 29.85 163.65 78.47 Scatter Plot 46.73 11.20 166.65 38.75 245.50 111.16 Document Map 40.73 8.94 204.65 30.93 56.00 13.84

Table 3. Means and standard deviations (S.D.) of the time (seconds) taken by each presentation group. The Linear List and Bar Charts presentation groups did not provide document similarity information.

21

Direct Visual Document Representation Advantage Similarity Mean S.D. Mean S.D. Mean S.D.

Linear List 97.15 6.23 60.83 22.51 - - Text Tables 99.65 1.11 100.00 0.00 93.00 8.23 Bar Charts 100.00 0.00 92.50 12.08 - - Matrix Charts 96.25 11.96 91.25 18.48 90.00 12.91 Scatter Plot 100.00 0.00 95.00 15.81 53.00 14.57 Document Map 98.75 3.95 95.83 9.00 99.00 3.16

Table 4. Means and standard deviations (S.D.) of the percentage scores of each presentation group: 100% is the maximum score. The Linear List and Bar Charts presentation groups did not provide document similarity.

Question Type

Presentation Type Tabular Graphical

Direct Representation 41.65 55.23 Visual Advantage 72.70 135.94 Document Similarity 188.05 155.05

Table 5. Mean time (seconds) taken by the tabular and graphical presentation groups broken down by question type.

Question Type

Presentation Type Tabular Graphical

Direct Representation 99.65 98.75 Visual Advantage 100.00 93.65 Document Similarity 93.00 80.67

Table 6. Mean percentage scores of the tabular and graphical presentation groups broken down by question type.

22

Presentation Type Static Re-orderable 157.81 97.86*

*significantly faster at p<0.001 Table 7. Mean time (seconds) taken by the static and re-orderable presentation groups.

Presentation Type Static Re-orderable 85.74 95.34*

*significantly faster at p<0.002 Table 8. Mean percentage scores of the static and re-orderable presentation groups.

Question Type

Presentation Type Static Re-orderable

Direct Representation 49.96 58.07 Visual Advantage 206.70 125.70 Document Similarity 216.78 109.83*

*significantly faster at p<0.005

Table 9. Mean time (seconds) taken by the static and re-orderable presentation groups broken down by question type.

23

Question Type

Presentation Type Static Re-orderable

Direct Representation 98.93 98.33 Visual Advantage 85.28 93.19 Document Similarity 73.00 94.50*

*significantly faster at p<0.05 Table 10. Mean percentage scores of the static and re-orderable presentation groups broken down by question type.

Question Type

Presentation Type Time Accuracy

Direct Representation Static = Re-orderable Static = Re-orderable Visual Advantage Static = Re-orderable Static = Re-orderable Document Similarity Re-orderable<Static Re-orderable>Static

= no significant difference < significantly faster > significantly more accurate

Table 11. Summary of the significant differences between the static and re-orderable presentation groups.

Ease of Learning

Ease of Use

Suitability for Filtering

Regular Use

Linear List 100.00 82.50 60.00 72.50 Text Tables 95.00 92.50 77.50 82.50 Bar Charts 97.50 92.50 85.00 80.00 Matrix Charts 87.50 77.50 85.00 80.00 Scatter Plot 90.00 80.00 80.00 75.00 Document Map 97.50 87.50 85.00 87.50

Table 12. Percentage agreement of all participants with four subjective statements for each presentation type: 0% is complete disagreement; 100% is complete agreement.

Documents

A Comparative Evaluation of Tabular and Graphical ......A comparative evaluation of tabular and graphical presentation styles for information retrieval search results was conducted,