Cross-disciplinary research into improving Internet-based ... · Cross-disciplinary research into improving Internet-based research methodology applied to the interpretation of natural

Cross-disciplinary research into improving

Internet-based research methodology applied to the

interpretation of natural language quantifiers

Maria BuckleyMSc. Computational Linguistics, August 2003

Supervisor: Dr. Carl Vogel

Declaration

I hereby declare that this thesis is entirely my own work (all joint work and debts to theliterature are duly acknowledged in the text) and that it has not been submitted as anexercise for a degree at any other university. Furthermore, I hereby give permission to theLibrary of Trinity College, University of Dublin, to loan or copy the thesis upon request(this permission covers only single copies made for study purposes, subject to normalconditions of acknowledgement).

September 11, 2004Maria Buckley

1

Acknowledgements

Firstly I would like to thank my supervisor Dr. Carl Vogel for all of his help, support,guidance, knowledge and encouragement over the last six years.

I would also like to say a big thank you to my parents, Eva & Declan, my sister Catrionaand my brother John for everything they have done for me always. I really appreciate it all.

Thanks to everyone in the Computational Linguistics Group and to all of my friendsfrom college and home.

2

Contents

1 Introduction 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Conducting Experiments on the Web 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Methods of carrying out experiments . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Face–to–Face Experiments . . . . . . . . . . . . . . . . . . . . . . . 42.2.2 Telephone Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.3 Mail Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.4 Email Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Web–based Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3.1 Advantages of using the Internet for experiments . . . . . . . . . . 62.3.2 Disadvantages to using the Internet for research . . . . . . . . . . . 8

2.4 Sampling on the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 Sampling on the Internet and Response Rates . . . . . . . . . . . . . . . . 102.6 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.7 Replicating Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.8 Demand Characteristics of Experiments . . . . . . . . . . . . . . . . . . . . 13

2.8.1 The good subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.8.2 Volunteers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.8.3 Undergraduate Students . . . . . . . . . . . . . . . . . . . . . . . . 152.8.4 Rewards for participation . . . . . . . . . . . . . . . . . . . . . . . 152.8.5 The Experiment setting . . . . . . . . . . . . . . . . . . . . . . . . 16

2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Introduction to the System 17

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Overview of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Architecture of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 System Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3

CONTENTS 4

3.4.1 Experiment Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.2 Experimenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.3 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Creating Experiments 24

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3 Designing the Expermient . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.1 Experiment Details and Instructions . . . . . . . . . . . . . . . . . 264.3.2 Slides in the Experiment . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4 Implementation of Experiment Creation . . . . . . . . . . . . . . . . . . . 334.4.1 The Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4.2 Changes to Labelling . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Overview of main methods in this class . . . . . . . . . . . . . . . . . . . . 364.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Presenting Experiments 37

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 Viewing an Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3.1 Parsing the comment slide . . . . . . . . . . . . . . . . . . . . . . . 415.3.2 Parsing the final slide . . . . . . . . . . . . . . . . . . . . . . . . . . 445.3.3 Parsing the main file . . . . . . . . . . . . . . . . . . . . . . . . . . 445.3.4 Recording the subjects answers . . . . . . . . . . . . . . . . . . . . 46

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Analyzing Experiments 47

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2 Analyze Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.4 Opening the result files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.5 Parsing the results file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.5.1 Results by Subject in original program . . . . . . . . . . . . . . . . 526.5.2 Results by subject in new program . . . . . . . . . . . . . . . . . . 526.5.3 Results by Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.5.4 Results by Slide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.5.5 Results by Question . . . . . . . . . . . . . . . . . . . . . . . . . . 596.5.6 How often a particular answer occurred . . . . . . . . . . . . . . . . 61

6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

CONTENTS 5

7 Research into Quantifiers 62

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627.2 Quantifiers and Frequency Expressions – Introduction . . . . . . . . . . . . 627.3 Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.3.1 Introduction to Scales . . . . . . . . . . . . . . . . . . . . . . . . . 637.3.2 Research in the area of quantifiers and scales . . . . . . . . . . . . . 647.3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.4 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.4.1 Introduction to research on quantifiers and context . . . . . . . . . 687.4.2 Research into quantifiers and context . . . . . . . . . . . . . . . . . 697.4.3 Conclusion on quantifiers and context . . . . . . . . . . . . . . . . . 75

7.5 Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.5.1 Introduction to focus . . . . . . . . . . . . . . . . . . . . . . . . . . 767.5.2 Research into quantifiers and focus . . . . . . . . . . . . . . . . . . 767.5.3 Conclusion to quantifiers and focus . . . . . . . . . . . . . . . . . . 82

7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

8 Replications of Experiments 84

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.2 Motivations for replicating experiments . . . . . . . . . . . . . . . . . . . . 848.3 First Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

8.3.1 Original Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 848.3.2 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 868.3.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

8.4 Second Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908.4.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 948.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

9 Conclusion 96

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969.2 Summary of work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Bibliography 98

A Examples 102

List of Figures

2.1 Example of posting sent to a music newsgroup . . . . . . . . . . . . . . . . 11

3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Screen shot of the options available on the experimenter’s home-page . . . 22

4.1 Screen shot of where the experimenter inputs the experiment name . . . . 254.2 Screen shot entering details about the experiment . . . . . . . . . . . . . . 264.3 Screen shot of entering the instructions . . . . . . . . . . . . . . . . . . . . 274.4 Screen shot of the options available when creating an experiment . . . . . . 284.5 Screen shot of the choices on this page . . . . . . . . . . . . . . . . . . . . 304.6 Example of resulting system file from Figure 4.5 . . . . . . . . . . . . . . . 314.7 Example of resulting from typesetting of file in Figure 4.6 . . . . . . . . . . 324.8 Depiction of the Created Experiment Vector . . . . . . . . . . . . . . . . . 344.9 How multiple choice questions are saved . . . . . . . . . . . . . . . . . . . 35

5.1 Screen shot of subject entering username and password for an experiment . 395.2 Screen shot of subject participating in an experiment . . . . . . . . . . . . 405.3 Example of a Comment Slide . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 Determining whether each slide marks the end of a section or not . . . . . 43

6.1 Screen shot of how many have participated in an experiment . . . . . . . . 486.2 Screen shot of the options available to the experimenter . . . . . . . . . . . 496.3 Example Participant Data File . . . . . . . . . . . . . . . . . . . . . . . . . 506.4 Example Subject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.5 Results by Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.6 Screen shot of results by subject . . . . . . . . . . . . . . . . . . . . . . . . 566.7 Results by Slide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.8 Screen shot of results by slide . . . . . . . . . . . . . . . . . . . . . . . . . 586.9 Results by Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.10 Screen shot of results by question . . . . . . . . . . . . . . . . . . . . . . . 60

7.1 List of expressions used in Hakel’s experiment . . . . . . . . . . . . . . . . 657.2 9 point scale for expressions of frequencey . . . . . . . . . . . . . . . . . . 68

6

LIST OF FIGURES 7

8.1 List of expressions used in Hakel’s experiment . . . . . . . . . . . . . . . . 86

List of Tables

8.1 Results from our experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 878.2 Results from our experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 898.3 Bass Replications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928.4 Bass Replications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8

Abstract

This thesis deals with web–based experimentation. A web–based experimentation systemwhich has been developed in the Computational Linguistics Group, Trinity College, Dublinis described in terms of how to use it, how it is implemented, what improvements havebeen added to it and what facilities could be added to it in the future. Topics such asresponse rates, demand characteristics, sampling techniques are all raised in terms of web–based experimentation. Research in the area of quantification is reported. Replications ofexperiments in this area were carried out and the results of these are discussed.

Chapter 1

Introduction

1.1 Introduction

The main topic of this thesis is web–based experimentation. The advantages, disadvan-tages and issues related to carrying out experiments on the web are dealt with. Relatedissues such as demand characteristics, bias and sampling are raised. My own experienceswith these issues are discussed and I give my own suggestions for the best way to sampleon the web. A major part of my Masters involved the development of a web–based exper-imentation system. One contribution of this research is the repair of many errors whichexisted in an inherited system, and the addition of facilities to the different parts of thesystem to improve its functionality. Contemporaneously with developing the system, I didsome research into the area of quantification and carried out replications of two experi-ments in this area. Replicating experimental designs with alternative sampling techniquesis an important part of scientific methodology. In some cases, I added further verificationmethods to ensure reliability of the claimed results.

This introductory chapter outlines the motivations behind the areas worked on, whilealso providing a brief outline of the structure of this thesis—where the issues just namedare discussed in depth.

1.2 Motivations

My final year project for my undergraduate degree involved carrying out experimentsin the human sciences. These were face–to–face experiments. I thoroughly enjoyed theexperience, however it was extremely time–consuming and transcribing the data could bequite tedious. When I was introduced to the web–based experimentation system thatprevious students in the Computational Linguistics Group, Trinity College, Dublin hadworked on, I could see the enormous benefit and value of having such a system available. Idecided the development of this system would form a major part of my Masters. HoweverI did first spend time learning Java as I had only completed an introductory course 4years previously. I also wanted to keep up my linguistic knowledge and chose to research

1

CHAPTER 1. INTRODUCTION 2

the area of quantification and carry out experiments in this area using the web–basedexperimentation system.

1.3 Structure of the Thesis

Chapter 1 – This chapter acts as an introduction to the thesis. It discusses the motivationsbehind the thesis as well as providing a synopsis of the remaining chapters.

Chapter 2 – This chapter outlines the other avenues which are open for carrying outexperiments and provides a discussion of the advantages and pitfalls of carrying out exper-iments on the web. Demand characteristics can play a very important role in experimentsand must be acknowledged and controlled for. The various ways which demand charac-teristics can bias results are mentioned and some brief suggestions are made as to how toavoid them. Sampling on the web is quite challenging and the techniques involved are out-lined in this chapter. I also discuss my own experiences with the difficulties of web–basedsampling. I provide suggestions about ways to avoid some of these difficulties.

Chapter 3 – The chapter aims to introduce the overall idea of the web–based exper-imentation system which was worked on as part of this thesis (Hourihane, 2002), (Ryan,2001), (McGowan, 1999). Secondly, the architecture of the system is presented in somedetail. The system is designed for three levels of user functionality: system manager, ex-perimenter, experimentee. Finally, these different types of users who can interact with thesystem and the options available to them are presented in detail.

Chapter 4 – This chapter concerns itself with how experiments can be created. Itdetails how an experimenter would create an experiment and the various options there areduring this process. The functionality described is complete in the sense of describing thesorts of items and presentation constraints an experimenter might adopt. Obviously, thisdoes not cover the full range of possible experiments across even a single subject area,as it is focused on experiments presented so far within the described system. Moreover,the system is undoubtedly useful for experiments and designs not actually intended. Theremainder of this chapter shows how this creation facility is implemented.

Chapter 5 – How experiments are presented to both subjects and experimenters isdealt with here. Many screen shots are provided to give a clear picture of this process.Experimenter views of an experiment are distinct from those of participants, as one mightexpect. Again, the implementation of this aspect of the system is given, in particulardetailing how the files are parsed and improvements made to this.

Chapter 6 – This chapter describes how the results of experiments can be analysedwithin this system. There are many options available to the experimenter and these areoutlined. This section of the system was greatly overhauled from the original programwhich I inherited (Hourihane, 2002) and such improvements are outlined. One of theareas for future work on this system made possible by this dissertation research is inincorporating more sophisticated statistical analysis of the data based on the revised datastructures provided here for experimenters to view their results in consolidated fashion.

CHAPTER 1. INTRODUCTION 3

Chapter 7 – Research into quantifiers is reported in this chapter; this is because theother stream of my research was in the interpretation of natural language quantifiers, withthe idea of replicating and extending past research using web-based sampling. Naturallanguage quantifiers are vague and imprecise and yet they are frequently used in conver-sation, textbooks, etc. It is difficult to find a paragraph of English that lacks quantifyingexpressions of some sort or other. The research reported in this chapter is divided intothree main sections; quantifiers and scales, quantifiers and context, quantifiers and theirfocus effects. Experiments which have been carried out in these areas and their results arediscussed. I also suggest possibilities for future experiments which could be carried out.

Chapter 8 – Two experiments were carried out using the web–based experimentationsystem. They were replications of experiments described in the previous chapter. Repli-cation is important to science in establishing reliability of results. These replications aredescribed and correlations between the results are discussed. Both replications had veryhigh correlations with the original experiments which shows that the results were reli-able. Additional tweaks were made in some cases, and correlations there also confirmedreliability.

Chapter 9 – This chapter provides a brief synopsis of the whole thesis. The mainresults are summarised. Suggestions are also made for future work – e.g. extensions to thesystem as well as experiments which could be carried out.

1.4 Summary

This chapter has outlined the main motivations behind this thesis. The main results of thedifferent areas are briefly mentioned. Finally, a textual overview of the thesis as a wholewas provided. To briefly recapitulate: the main contributions of the research reported inthis dissertation are in substantial modifications and additions to a system for running web-based experiments; analysis of ‘best practice’ in Internet sampling methodology; analysisand replication of empirical research into human interpretation of English quantifyingexpressions.

Chapter 2

Conducting Experiments on the Web

2.1 Introduction

Researchers in the human sciences often conduct experiments in order to confirm conceptsthey have and to further investigate established theories. When carrying out experimentsthere are many options available to them; face–to–face experiments, telephone interviews,mail questionnaires, email surveys, and finally web–based experiments. It is this finaloption which is advocated and used in this thesis; however, I will briefly outline the ad-vantages and disadvantages of these other platforms for carrying out experiments beforediscussing in detail the merits of using web–based experiments. I also discuss issues thatmust be considered when carrying out experiments; sampling methods, demand charac-teristics, response rates and bias. Some of the material here is derived from Buckley andVogel (2003a).1

2.2 Methods of carrying out experiments

This section will outline the various other options available to experimenters when carryingout experiments involving human participants. While focused on human sciences, thissection and the thesis as a whole, does not focus at all on medical science. However, thetools produced and described could have applications in medical experimental scenarios.

2.2.1 Face–to–Face Experiments

Face–to–face experiments are frequently used for experiments. In psychology experiments,subjects tend to be undergraduate university subjects (§2.8 discusses this tendency forundergraduates to be used in psychology experiments) and the experiments with them areusually face–to–face ones. There are many advantages to this method.

1With permission of the co–author, the supervisor of this dissertation, in some places the text overlapswith that of Buckley and Vogel (2003a), and in other places it has been adapted.

4

CHAPTER 2. CONDUCTING EXPERIMENTS ON THE WEB 5

Firstly, subjects have an interaction with the experimenter. If there is something theydo not understand, then they have the option of asking for clarification on an experimentalquestion. Equally, experimenters can ask the subjects for an explanation of their answers orfor feedback on the experimental methodology. Secondly, there is less of a chance of subjectsbacking out of participating than with some of the other methods as the experimenter ispresent with them. It seems that social pressure from co–presence is powerful.

However this method does also have many shortcomings:

• cost

• time involved

• bias

Face–to–face experiments can be quite costly. The venue must be rented, travel to thevenue for both subjects and the experimenter must be paid for, any equipment neededmust be bought or rented.

Then there is also the difficulty of obtaining a time that is suitable to both the subjectand the experimenter.

This type of experiment also leaves itself open to experimenter bias which is a veryserious issue and should be avoided (this topic is discussed further in §2.6). The systemdescribed in this thesis avoids experimenter bias in this sense, as the subjects will nevercome face–to–face with the experimenter, thus are not influenced by the experimenter’sage, sex, colour etc. Equally, the experimenter will not be influencing the subject’s judge-ments with their facial expressions/body language or tone of voice which are issues withface–to–face experiments. Another possibility is that individuals would take part in an ex-periment which involves them interacting with a computer e.g. marking their judgementsto sentences on lines on a computer screen. Such an experiment would be completed underthe observation of an experimenter and so does not eliminate the biases mentioned above.

2.2.2 Telephone Interviews

Nearly all households2 have telephones, and so there is a large population available fromwhich to obtain a telephone sample. It is relatively quick and cheap to contact subjectsin this manner. Phone numbers can be generated at random. Results are usually avail-able shortly after the experiment. However a problem with this form of research is theunwillingness of subjects. Some subjects may feel getting a call from a stranger is inappro-priate. Also although nearly all homes have telephones, it may prove challenging to contactsubjects during working hours. Random generation of numbers carries some risk of inter-viewing the same individual twice. Arbritrary selection of numbers from printed directoriesis also possible. Physical materials cannot be easily used in this form of research withoutco–ordinating through multiple contacts – e.g., via post, before the telephone contact. Forexample, it would not be possible to show images to subjects over a telephone.

2Apart from places described as third world.


2.2.3 Mail Experiments

Questionnaires are posted out to potential subjects and the participants are required toreturn it also by mail. There are no venue costs involved, so this method is inexpensive,with only photocopying and postal costs. Also, subjects can complete the experiment at atime suitable to them. However this convenience is also a negative aspect of such a methodof research as subjects may take quite a long time to participate or equally forget to takepart, thus leading to non–response bias (§2.5 discusses response rates).

2.2.4 Email Surveys

Many people now use email on a daily basis. Hence email surveys are proving quite popular.They are quick and cheap to administer. However, the disadvantages are that people maynot respond and also the experimental design is quite limited as the experiments can reallyonly take the form of simple questionnaires. This limits the number of studies which canuse this form of research. §2.5 has an example of the calls for participation that were sentto newsgroups and email discussion groups.

2.3 Web–based Experiments

With the advent of the World Wide Web, there is now a new platform from which toconduct experiments. More than 100 million people use the Internet on a frequent basisfrom all over the world.3 There are many issues to be considered when using the Internetfor research.

2.3.1 Advantages of using the Internet for experiments

As mentioned in §2.2.1, face–to–face experiments have many shortcomings, many of theseare overcome with web–based experiments. Firstly, web–based experiments are, in prin-ciple, extremely cheap to administer. There are initial start–up costs involved with theexperimenter buying a computer (although this is not always necessary if they have aca-demic access to one), the Internet Service Provider costs, which in many cases are actuallyfree. So web–based experiments are extremely inexpensive.

Secondly, the experimenter creates an experiment, and it is then available on the Inter-net. The experimenter does not need to be online when the subjects are participating, atleast in the experiments I discuss. Certainly some online designs might lack the advantagesthat are potentially available, or actually available in the system described here. Subjectsdo not have to participate at a particular time. They can take part at a time suitable

3This is based on the Internet Domain Survey http://www.isc.org/ds/WWW-200301/index.html

count of January 2003, of 171,638,297 hosts; this does not equate with a count of people online, though, asmany hosts are multi-user, and many users do so through multiple hosts. Still, the conservative estimateof ‘daily’ users seems reasonable on this basis, if not verifiable – website verified August 2003


to them. This may actually lead to more people being willing to take part. (Attractingsubjects and the response rate issue are discussed in §2.5.)

Subjects do not need to travel to a particular location to take part, they may participatefrom their home, work etc. Location is simply not a factor. It is equally convenient forthe experimenter as they don’t need to travel in order to recruit subjects (which can betime consuming; e.g. some market researchers travel to supermarkets to obtain subjects,psychology researchers to universities etc).

Data collection in traditional face–to–face experiments can be both time–consumingand error–prone e.g. experimenters could make typing mistakes when recording the dataon a computer. Web–based experiments automate this process which both quickly presentsthe resulting data as well as ensuring there are no human typing errors involved.

Web–based experiments are also very flexible. They allow experimenters to have manydifferent types of questions and formats which may not be possible to achieve with an emailor telephone survey. Graphical and audio materials could potentially be used although,as yet, are not available in the general experiment server described here.4 Equally, havingso many different types of questions with face–to–face experiments could prove difficult.Web–based experiments also allow randomisation which is a very useful facility (the meritsof randomisation are outlined further in §2.6) and with the click of a button this can beachieved, which is very straight forward compared to how it would have to be implementedwith other methods.

Anonymity is possible with web–based experiments. Subjects are not required to entertheir name and obviously it is conducted online so the subjects and experimenter do nothave personal contact. Thus, they may be more likely to answer honestly if they think thatno one will know what they answered. If they feel that their answers will be associatedwith them, then they may be more likely to give what they believe is the correct answer.Clearly, this would negatively bias the results and should be avoided (§2.6 deals with thisdemand characteristic issue of subjects trying to be good subjects).

However, perhaps the most advantageous aspect of using the Internet for administeringexperiments is the sampling issue. As noted in §2.3, 100 million people use the Interneton a regular basis. There is thus a potentially very large subject pool. Birnbaum (2000)reports receiving high volumes of data in a short space of time. This is extremely use-ful as researchers often believe that more subjects means greater generalizability on thepopulation as a whole. Also, the Internet provides a means for obtaining a sample thatwould otherwise not be possible. One such population might be stay–at–home mothers.Another case might be a researcher in a China seeking native–German speakers. The In-ternet crosses cultural and language barriers with subjects from all over the world beingtargeted. Qualifications of this potential, derived from experiments discussed in Chapter7, are offered in Chapter 8.

4Increased modality also carries restrictions on the participant pool, in that not all participants havethe possibility of viewing graphics or hearing audio files. Indeed, the current system probably excludes thevisually impared from participating in experiments it administers. Accessible browser interaction hasn’tyet been explored in the context of this research.


2.3.2 Disadvantages to using the Internet for research

Despite these many advantages to using the Internet, there are also many issues whichneed to be addressed. This was discussed in Buckley and Vogel (2003a).

Mann and Stewart (2000) discuss the issue of computer literacy. The experimentersthemselves, as well as the participating subjects, will need to be somewhat computerliterate. However, most subjects will have the necessary skills, as virtually all subjectswho come across the call for participation, will be using email or using the Internet andwill be capable of using scrollbars, entering text, clicking check boxes and pressing buttons,which are the skills that are usually needed to participate in a web–based experiment. Thesystem which is described and used for experiments in this thesis requires subjects andexperimenters to have only a limited amount of technical knowledge.

Computer software can often pose difficulties when conducting web–based experiments,as some participants will have different software than others on their computers, e.g. somesubjects might use Internet Explorer and others, Netscape. However when carrying outexperiments on the web, one should ensure that such differences of web browsers do notaffect participation. Testing on a variety of browsers is advisable. Something, however,which could eliminate some subjects from participating is whether they have Java enabledon their computer. Some subjects do not have this enabled as they do not wish to havecertain Java applications running on their computer. This would mean that they wouldbe eliminated from the sample if using the web–based experimentation system which isdescribed in Chapter 3.

Experiments carried out on the web have problems concerning response times. Responsetimes are very important to researchers when analysing results, although they can bedifficult to record accurately. With this system, it is difficult to know how much of thecomputer timing is being associated with a particular user. As noted in Buckley and Vogel(2003a), this was not historically an issue with stand–alone computer methods. Each userhas a unique computer work station. However, with advances’ in computer hardware andsoftware enabling multitasking on PC architecture, it is diminishingly clear that a softwarepackage for experiment presentation owns the system clock for purposes of calculatingprecise reaction times. Networked timings amplify the uncertainty of absolute timings.These issues constitute a major open problem for Internet–based laboratories.

Multiple participation is another issue with web–based experiments. Even if an experi-ment is controlled with a password, it is possible that subjects could participate more thanonce. However if experimenters study IP addresses and email addresses, then multipleparticipation can be controlled.

If subjects are participating from their own home, they are more likely to drop out thanif they are involved in a face–to–face experiment. They do not have to answer to anyoneso drop–out rates are an issue. The results of my own experiments discussed in Chapter8 had an average of a 50% participation rate (59% for one experiment and 41% for theother). §2.5 discusses response rates: It isn’t clear how to measure response rates exactlyin this setting, nor how relevant they are provided that participant bias, representation ofsamples that approximate the full population, are well controlled.


Equally, with subjects participating on their own, there is no interaction with theexperimenter. This is usually advantageous as it reduces experimenter bias (see §2.6 for afurther discussion of experimenter bias), however, it does become a disadvantage if subjectsdo not follow some aspect of the experiment and need some guidance. This can be overcomeby giving all subjects a contact email address through which they can make contact withthe experimenter for further clarification. However, in the age of spam email, care needsto be taken by the researchers to construct temporary email addresses that are separatefrom normal operating addresses to avoid inundation over time by unwanted mail. Thisrelates to issues about soliciting participation discussed in §2.5.

Non–response bias is difficult to account for, when using online experimentation, as itis not easy to determine how many people read the message on a newsgroup, comparedto those who responded. However, with the methodology that we used, it is possible forus to at least determine how many people responded to the announcement and how manyactually participated. The results of such responses to experiments carried out for thisthesis are reported in §2.5.

Although lots of random variables are controlled for with web–based experiments, thereare situational variables that are not i.e. temperature, lighting and background noise ofthe room the subject is in. Equally, we must trust that any pre–experiment questionnaire(about demographics, handedness, visual activity, etc) is completed with the same veracityas the experimental items themselves, and by default the experimenter should be trusting.One also has to trust that the participants have the attributes they claim to, in general.However, I believe that these are of less importance than the variables which are controlledfor with web–based experiments.

2.4 Sampling on the Internet

Some researchers do not consider using the Internet for research as they believe that In-ternet samples are demographically skewed and biased. Hewson, Yule, Laurent, and Vo-gel (2003) point out that the Internet is often viewed as being dominated by educatedand employed males. However, they argue against this and support the claim that “theInternet–user population represents a vast and diverse section of the general populationthat is rapidly moving beyond the select group of technologically proficient professionals”(Hewson et al., 2003, p. 26). Hewson and Charlton (2003) also found, having carriedout an experiment with an Internet and non–Internet sample, that the Internet samplewas the more diverse. The Internet is used by an ever increasing number of people inemployment, academic and recreational areas, which is thus creating a more representativesample of the general population. Hewson et al. (2003) report some studies which supportthis claim. For example, Smith and Leigh (1997) found that an Internet and non–Internetsample were demographically similar. Conclusions from this study even suggested that theInternet sample was more representative of the general population than samples drawnfrom traditional methods. They also report a study by Krantz, Ballard, and Scher (1997)that actually found the Internet sample to be more race and location diverse as well as


having a wider age range than the non–Internet sample.In Hewson et al. (2003), it is also claimed that many experiments do not actually

require samples that are representative of the general population and secondly that manytraditional experiments have used homogeneous samples. This is backed up in this thesiswhere nearly all of the studies reported in Chapter 7 used university graduates as theirsample.

2.5 Sampling on the Internet and Response Rates

Although there are, as mentioned in §2.3.1, many advantages to recruiting a sample onthe web, this can actually prove rather challenging. There are various options open to theweb–researcher. One could post an advertisement for an experiment on web pages and thenif people wish to take part they can use the link to access the experiment. Similarly, noticesabout the experiment could be posted in Internet chat rooms. Lists of email addresses canbe bought and the researcher could email potential subjects. However, as noted in Hewsonet al. (2003), this method opens up many issues with regard to non response bias, as itmay not be possible to determine how many people are using the chat room at the timeand how many actually read it.

For the experiments carried out for this thesis, I followed the guidelines in Hewson et al.(2003). They suggest that probabilistic sampling should be used for research in the humansciences. Announcements could be sent to a randomly selected list of newsgroups. This iswhat I did.

An example of the posting which was sent to newsgroups/email discussion groups isseen in Figure: 2.1.5 (See Appendix A for a sample of the responses which were generatedin the newsgroups as a result of this call for participation.)

Whether discussion lists or newsgroups are used, as can be seen in Figure 2.1, we adoptthe practise of signalling the situation with an informative subject line, e.g. “Off–topic callfor participation in an online experiment”. Our experience is that it is best if the body ofthe message is able to assure the potential participant that their email addresses are notcollected or passed on to other parties. Indeed, our system does not even collect individualaddresses – management of individual addresses is managed by the experimenter. Forsecurity purposes, IP addresses are logged in order to control for multiple participation bysubjects.

In the first instance, it was decided that I would post announcements to randomlyselected groups of high–traffic, high–membered newsgroups. The idea behind this wasthat if there is a high membership then more people will see the announcements. Equally,

5Recent spam activities suggest that this may well result in a plethora of effectively disinterested spamresponses. No clear solution to weeding out such replies automatically and without hassle is currentlyavailable, although obviously spam detection software exists (O’Brien & Vogel, 2003) and is a topic ofresearch. The problem with no currently viable answers is how to post such a URL while targetting onlyintended groups. Experimenter interaction is currently an essential filter for the garbage responses thatwill come through.


Greetings from Dublin, Ireland,\\

I am a postgraduate student and am running a small experiment whichinvolves assessing responses to English words. You may have seen pastcalls for participation that I have posted for related experiments.However this call is for a different experiment which I am carrying outat the moment.\\

I can assure you of at least two things:\\1) your email address will not be recorded\\2) I am aware of the complications involved in sampling methods\\

To participate you will have to navigate through webpages with onequestion on each page. It takes no longer than 10 minutes to completethe experiment. The experiment does not involve music but ismentioned here to secure a diverse range of participants.\\

If you would like to participate, please send an email [email protected] before July 31, 2003 (preferably sooner) and I willsend in return the url and password necessary to access the experiment.\\

Participation is anonymous and interested parties may contact mydepartment to request a final copy of the ultimate report if they sodesire.\\

Thanks for your time,\\

Maria Buckley

Figure 2.1: Example of posting sent to a music newsgroup

with a high–traffic group, it was thought that this would mean lots of people using thegroup and so we would potentially have a large sample. However, this turned out notto be the case and despite posting to more than 60 newsgroups over a period of 2 anda half months, only 77 responses were received, with 32 eventually participating in theexperiment.

Despite the attraction of high–traffic groups seeming to have more potential readers,this clearly does not lead to high response rates, as a result quieter groups were deemedmore suitable. Although they have fewer readers, the members may be less desensitizedto noise. So our next approach was to post announcements to high–membershipped butlow–trafficked groups. It was thought that this would mean our announcement would notget lost in a high number of messages but that it could be potentially read by lots ofsubjects. Over a period of 2 weeks, postings were sent to 17 newsgroups and 24 responseswere received. Moderated discussion lists were also used. One additional issue is associatedwith cyclic features of the calendar and accompanying human activities. At present it is


not known how the Internet activity of the general populace relates to the annual calendar.Discussion lists with high membership were chosen however, only 2 responses were received.Sixteen subjects in total took part in the experiment. Hewson and Charlton (2003) alsoreport low numbers of responses from postings to newsgroups and one of the reasons givenfor this is that there are now many experiments on the web, and it is no longer a novelty.

Another issue with opinion polls and polls carried out on the web, on television and viaother media are that often the number of total respondents are not given. For example,the Sky News television station have a daily poll relating to a topical news item of the day.Results of the poll are announced during the day, which could clearly influence potentialsubjects’ opinions, however no indication is given as to how many have actually voted atany point. The same is true of a daily poll that is posted on a www.manutd.com website.It is possible to access the results thus far, however again it is not possible to see how manyhave participated. Another issue in both of these cases is that of multiple participation.Subjects can very easily vote on numerous occasions. Another occasion where there was aproblem with the manner in which results of polls are reported was in the 1980 AmericanPresidential election campaign in which Ronald Reagan won against Jimmy Carter. Thecontroversy surrounded the fact that the results of the votes cast on the East coast ofAmerica were offically announced before the polling booths had closed on the West coast.The West coast includes the State of California which is the biggest state with 55 electoralcollege votes. Voters on the West coast who heard the results from the East coast couldeasily have been influenced by them and based their own vote on this. Concern aboutresulting biases in rates on the West coast in that election, as one significant part of theelectorate had different information than the other on which to base their votes led to achange in journalistic practise.

2.6 Bias

Bias from extraneous variables play a very important role in experiments and experimenterstry at all costs to eliminate bias from their experiments. There are many sources of bias.

Experimenters themselves are often a major source of bias (see §2.3.2). They can un-knowingly influence subject’s results by their body language (in face–to–face experiments),personality or perhaps changing the tone of their voice on the correct answer (in both face–to–face and telephone experiments). Also, it is basically impossible for experimenters tospeak in the same tone and manner for all subjects. This means that some subjects wouldhave different words to others being emphasised. These factors will bias the results of theexperiment which is what should be avoided. If web–based experiments are used, this isnot an issue. Experimenters do not ask the questions. All subjects see the questions inthe same manner so this form of bias is eliminated.

Another form of bias is the ordering of the questions. The order of presentation hasbeen found to influence judgements. Greenbaum and Quirk (1970) found that in a group ofsentences, the first sentence will be judged differently than the rest of the sentences. If thesame order is used for all subjects, it will be difficult to determine whether values/answers


given for a question are based on the answer given before it or not. If questions are ran-domised, then each question will not have the same questions before or after them oneach run of the experiment. This randomisation removes the ordering bias and should beimplemented in all experiments. Also if the question was a multiple–choice question, itis recommended that the choice of answers is always randomised as subjects may alwayschoose the first answer. The reason for this being that the first answer is the most conve-nient. If the answers are randomised this biasing effect is avoided and will not influencethe results. This is possible with the system described in this thesis.

A general issue with experiments is that subjects may answer differently at the begin-ning of an experiment than at the end of it. This may be due to many reasons. Schutze(1996) claims that nervousness at the beginning of the test, fatigue at the end, practiseeffects may all influence judgements and so must be controlled for. If the questions arerandomised then, different questions will be at the beginning and end of the experimenteach time. As fatigue and boredom can be problems, a short experiment may prove morebeneficial than a long one. With our system, an experiment can be whatever length anexperimenter chooses. Ordering effects may be controlled for separately.

If there are different conditions in an experiment then there should be an independentgroups design, i.e. the subjects should be randomly assigned to the different conditions.

2.7 Replicating Experiments

Cowart (1996) reports that Labov (1975) found that subjects can give different judgementsat different times. One option would be to get subjects to make more than one judgement.However this opens itself up to the possibility that subjects would base their second judge-ments on what their first judgements were. A better option would be to replicate theexperiment on a different set of subjects and then compare and correlate the results. Anexperiment should be reliable, it is considered reliable if subjects were to participate inthe experiment at a later date, differences in their judgements are what’s being analyzedand not any inconsistencies in the experiment. Again, a way to investigate such reliabilityis to replicate the experiment. Valid experiments are ones which are investigating whatthey aimed to investigate. An example of something which would not be considered validwould be using punctuality as a measure of intelligence or shortness of hair as a measure ofathletic ability. Experiments should be practical in the sense that they should be easy toboth administer and participate in. The system described in this thesis certainly advocatespractical experiments. However, whether the experiment is reliable and valid depends onthe experimental design.

2.8 Demand Characteristics of Experiments

There are variables which influence a subject’s behaviour/judgements which cannot becontrolled for. As noted in Rosnow (2002), Orne (1959) was the first to make use of the


term demand characteristics. They are referred to in Rosnow (2002, p. 2) as “uncontrolledtask–orienting cues in an experimental situation”. This section discusses such demand char-acteristics and how they may effect the outcome of experiments. The following sectionswill deal with how some subjects sometimes try to be good subjects, followed by discussionsabout whether volunteer or non–volunteer and undergraduate or non–undergraduate sub-jects should be used in experiments. The various types of rewards that subjects receive forparticipating in experiments are then outlined and the merits of such rewards are debated.Finally I discuss the way in which the setting of an experiment can influence subject’sjudgements.

2.8.1 The good subject

One issue that must be considered when carrying out an experiment is the influence thesubjects can have on an experiment. Subjects, when participating in an experiment oftentry to be good subjects. Orne (1959) noticed when he carried out experiments that subjectsasked afterwards if they had been good subjects or if they had ruined the experiment. Thisindicates that subjects may be trying to give answers which are correct for the study butyet these may not actually be their true answers. This is unfortunate as experimenters donot want subjects giving them answers which do not reflect their true judgements, thatare not honest spontaneous answers. Such results do not represent what really happens orwhat is the case for the population at large. It would be more beneficial to have subject’strue responses. Rosenthal and Rosnow (1975) suggest that a possible way of achieving thisis by carrying out an experiment in the form of a disguise so that subjects don’t actuallyrealise that they are doing an experiment.

2.8.2 Volunteers

Another issue that is contested when recruiting subjects for participation in an experimentis whether volunteer or non–volunteer subjects should be used. There are merits to usingboth. Orne (1959) suggested that volunteer subjects are especially interested in being goodsubjects as Rosnow (2002, p.4) states that “the mere act of volunteering is an implicit com-mittment to comply with whatever demands are inherent in the experimental situation”.As noted by Orne (1962) volunteers try to validate the experimental hypothesis. If someonehas offered to participate, there is clearly a possibility that the reason behind this offer toparticipate is that they are interested in the area of the experiment, or if this is unknown,perhaps simply interested in participating in experiments. Rosenthal and Rosnow (1975)note that volunteer subjects who are interested in the area will probably be more likely tovolunteer. This could mean that they have previously taken part in such an experimentand so not be a naive subject and would be aware of the requirements of the experimenter.Again this is not what is wanted from a subject as it could negatively bias the results.Silverman (1977, p. 89) suggests that volunteers are “more intelligent, better educated...more sociable, more arousal–seeking, less conventional, ... younger than non–volunteers”.Each of these factors could have an influence over the results. For example, if all volunteers


were used in a study, the results may not represent the population in general but be biasedtowards a young, very well educated section of the population. This would perhaps leadone to suggest that non–volunteers should be used as subjects in experiments. However,Silverman (1965) concluded from an experiment that forced participation led to subjectstrying to distort the results of the experiment.

From the above discussion, I believe that since there are disadvantages to both vol-unteers and non–volunteers, that the optimal solution would be to have half volunteersand half non–volunteers in your study or equally for an experiment to be replicated sothat both types are accounted for. A study which includes volunteers and non–volunteersshould overcome the biases outlined above and therefore should be more representative ofthe general population than either one group on their own.

2.8.3 Undergraduate Students

Another issue in the area of demand characteristics is that of using college students asparticipants. Silverman (1977) reports studies by Smart (1996) and Schultz (1969) whichconclude that most of the subjects (70–/80%) in social psychology and experimental psy-chology experiments are college students, half of which come from an introduction topsychology course. These are psychology students and are thus clearly involved in the areaof the experiment and may have some knowledge in the area. This will bias the results ofthe experiment. Schutze (1996) suggested that linguists should not be used in linguisticexperiments and I believe the same should hold here. A psychology student, because ofextra knowledge in the area, may not give the same judgement that an engineering studentor fireman would. A study would be more beneficial if it provided information on thebehaviour of all people.

Also it is likely that these college students would know the experimenter as he/she couldbe the students’ professor and have status or power over the subjects. Some subjects couldbe influenced by this. They might try to be good subjects and try to validate the study.College students often hope it will benefit science or research in general (Orne, 1962).

2.8.4 Rewards for participation

Subjects are sometimes rewarded for participating in an experiment by money or coursecredit. I would have thought that this would bias results as subjects may again be tryingharder to be good subjects as they are benefiting from it. However, on the payment issueMacLeod (1999) noted that payment actually did not influence the subject’s judgementsas they were the same with or without the payment. This should mean that it is notnecessary to pay subjects. Perhaps it is necessary though to entice subjects to take part.

Equally, college students are sometimes given course credit for participating in exper-iments. In my opinion, this leaves the possibility that subjects will be trying to validatethe experimental hypothesis because they are being rewarded for taking part. However,one would imagine that if payment did not influence a subjects’ judgements, then course


credit would not either. Further work should be carried out in this area in order to be surewhether there are effects or not.

2.8.5 The Experiment setting

Silverman (1977) suggests that although there are advantages to experiments being carriedout in experiment laboratories (experimenters can have the experiment set up etc.), thereis one distinct disadvantage. He believes that the way subjects act/perform in such anunknown location, will be different to how they act in a location known to them. Silverman(1977, p.2) says that humans would be expected “to show atypical behaviour in an atypicalenvironment”. The system described in this thesis allows subjects to use their own personalwork stations to take part, which means they are in a familiar environment and should notbe displaying atypical behaviour.

This section has discussed many of the demand characteristics which can influenceexperiments. It is important to overcome as many of them as possible and measuresshould be put in place to achieve this. Possible solutions to avoid demand characteristicssuggested by Orne (1962) are to give the subjects quasi control – making the subjectsco–investigators or obtaining information from them about the demand characteristics.One example given is to obtain information from the subjects before the experiment orafter the experiment where it is determined what the factors were which influenced theirjudgements or decisions. The subjects could be asked what the experiment was aboutand experimenters can look at what the subjects’ thought this was and what the subjects’actual actual performance conveyed. Finally, an enquiry procedure could be used where thesubjects don’t take part but they are asked if they were approached to do this what wouldthey think the aim was. However, these equally have demand characteristics inherent inthem.

2.9 Conclusion

There are many methods available for carrying out experiments and these were outlinedbriefly at the beginning of this chapter. Carrying out experiments on the Internet is thefocus of this thesis, thus the advantages and disadvantages of Internet experimentationwere covered in detail. Sampling on the Internet is quite controversial. I discussed theissues inherent in this and presented my own sampling experiences and the difficultiesencountered with Internet sampling. I concluded that high–membershipped, low–traffickedgroups should be targetted when recruiting a sample online. The final section concerneditself with the various demand characteristics which can influence subjects judgements andsolutions to avoid such bias are suggested, e.g. firstly, using half–volunteer, half non–volunteer subjects and secondly, not using undergraduate students. I also suggested thatfurther research should be carried out to investigate the influence of both money and coursecredit on subject’s performance in experiments.

Chapter 3

Introduction to the System

3.1 Introduction

This chapter1 introduces the system which will be described in greater detail in later chap-ters as component functions are described with the data structures and algorithms thatutilize them. It is a system which allows experiments to be conducted on the Web. Itreports on a vastly overhauled design that developed out of prior work on earlier proto-types (Hourihane, 2002; Ryan, 2001; McGowan, 1999; Kenny, 1998). Other Web-basedexperiment services of course exist (e.g. Keller, Corley, Corley, Konieczny, & Todirascu,1998), and more sophisticated data analysis packages exist (e.g. SPSS, SAS, Data-desk),but to our knowledge this system is one of few intended as general purpose central servers,not targeted to any particular human science (although with limited current applicabilityto some human sciences, like medical science in its current state and in principle as thecurrent system doesn’t support tissue sampling, for example) and with the potential to in-corporate traditional and recent automated data analysis processing. In §3.2 an overviewof the architecture and functionality of the implemented system is given. Finally, thehierarchy of user types for whom functionality is provided is discussed in §3.4.

3.2 Overview of the System

This tool allows researchers to create complicated experiments (building in selective ran-domisation of materials if desired, for example) driven by Java programs without re-searchers having to program themselves. The created experiment can then be run onthe Web, allowing subjects across the world to participate. Access permissions to the ex-periments are restricted by password access controlled by experimenters. The results ofthe experiment are recorded as participants complete the materials. There is a facility forthe experimenter to view the results recorded, with different options available for viewing

1This chapter derives from Buckley and Vogel (2003c), joint work with the co–author of that paper.The co–author, my dissertation supervisor, approves of textual overlaps between this chapter and thatpaper.

17

CHAPTER 3. INTRODUCTION TO THE SYSTEM 18

them, such as viewing the results for a particular subject, or viewing all of the results fora particular question etc.

Materials in an experiment may be located all on one page or on a series of pages(hereafter, we refer to these as slides). Researchers may want the slides to appear in arandom order, or provide a specific order experienced by all participants (see Section §2.6for a discussion of order effects). Researchers may wish to present textual or graphicalitems on each slide, and similarly may want multiple items on slides to be randomly placedor in a specific order. The researcher may desire open-text questions (hereafter, we referto these as simple questions) or closed multiple choice questions. Each slide can have anyamount of text, questions or graphics on it.

In a manually constructed experiment, a slide corresponds to any simple page in aprinted booklet that a participant may encounter or a consolidated display of informationthat can be presented to a participant with a photographic slide. However the materialsare ordered, the researcher is to be able to track the answers made by participants, whileknowing the exact order in which participants encountered slides, questions and text (andpreventing participants from adopting any order than the one they are presented with bydoing the bookkeeping of that information automatically). The responses (and materialsthemselves) are saved in a format that is amenable to manipulating by both automateddata analysis and direct inclusion in a printed format (we have saved both the materialsand recorded data in text files using a mark–up designed for convenient typesetting withLATEX). Participants may scroll up or down a slide but may not move backwards to otherslides, nor to subsequent slides without completing the one they face. The remainder of thischapter describes the architecture of the system and then discusses the different user types.Figure 3.1 depicts the overall structure of the system. Straight dashed lines indicate dataand system interdependencies that happen through message passing governed by the servermanager; straight dotted lines indicate communication that may happen directly betweenthe experimenter and user (e.g. via email outside the system); curved lines indicate thecentral control of the server manager on the overall system. The interaction between theexperiment manager, participant and experimenter can be seen here and are discussed inthe next section 3.4


ExperimentCreation &Modification

Experiment &ResultData Files

DataAnalaysisTool

Experiment Display Tool

Server Manager

Participant

Experimenter

Experiment Server System

Experiment Manager

Figure 3.1: System Architecture


3.3 Architecture of the System

This system employs a client/server architecture. The client/server infrastructure is versa-tile. It is message based and modular. The client program is where the input and outputoccur, and when some input is received this is then sent as a request to the server whichthen executes the request and sends the results back to the client which then outputsthem. The other option for the architecture of the system was to use stand alone software,where each subject would participate on separate personal computer workstations. Thisdoes have advantages, e.g. potentially more accurate time–keeping than is possible withthe risk of Internet traffic delays interfering with reaction–times. However, if stand-alonemethods are used, this would mean that each subject’s results are recorded on separatecomputers and so the resulting data would have to be manually collected which is extremelytime consuming. Moreover, with current PC operating systems, it is no longer clear thatindividual program control of the central processing unit (CPU) and clock is any differentthan in the UNIX setting where our system is constructed with multi–user multi–taskingin mind for the processor. That is, it is no longer clear that stand–alone systems providereliable reaction time data even when run on general purpose isolated PCs.

The client/server architecture was thus chosen for our system as it has to cater formany different users, from experimenters to subjects, who may use the system simultane-ously. Improving control of time lags and accuracy of time recording is an open researchproblem in this area. The results for any experiment are all stored in the same experi-menter controlled directory, and so there are no data-collection or transcription issues (itis conceivable that an experiment might be designed around the possibility of interactivegraphical or voice interfaces, and these would create transcription issues of their own; how-ever this system accepts only textual and button-pushing interaction with experimentersand participants, although graphical items may be included as stimuli).

A paramount design feature of our system is the requirement that the ex-

perimenter does not need to have any programming skills—only basic Internetskills are necessary. The creation tool is presented using a user–friendly Graphical UserInterface (GUI). All options are chosen by simply pressing buttons or clicking check–boxes,so the interface is extremely straightforward to use which is extremely useful for researcherswith limited computing experience. Experiment participants are even less imposed uponto have computing prowess.

3.4 System Users

The system facilitates three different kinds of users varying in forms of access and permis-sions. The following sections will describe such access and permissions for each type ofuser.


3.4.1 Experiment Manager

The experiment manager has control over the whole system. When new experimenterswish to use the system to create, present, delete and analyse experiments, they get intouch with the experiment manager. The experiment manager creates an account for theexperimenter, and a mail is sent to them with the details and password for their ownindividual home-page hosted on the system. This home–page contains the facilities forthe experimenter to create (see Chapter 4), view, run, delete (see Chapter 5), edit, andanalyze results of experiments (see Chapter 6). The experiment manager can access allexperimenters’ home pages at any time.

Also, when a file is deleted by an experimenter, they are sent to a trash directory whichthe experiment manager then deletes. The trash directory functions as a simple backupmechanism for the possibility that experimenters may wish to reverse the deletion decision.

3.4.2 Experimenter

The experimenter has their own home page hosted on the server from which they can create,edit and delete experiments. It also is possible to view details of existing experiments, viewa run through of experiments and analyze results from existing experiments. Figure 3.2shows options available to experimenters. Also, note the design of the webpages was neverconsidered during this Masters. However, these are by no means the optimal design. Thisis certainly an area for future work.

The full functionality open to the experimenter in creating an experiment is discussedin Chapter 4. The emphasis is on ensuring that the experimenter has maximal controlover participation in the experiment and over the data (Hewson et al., 2003). Thus, theexperimenter associates with each experiment a name unique to the set of experiments they‘own’ and a password that any potential subject would need in order to access it. Thisis to avoid participants simply surfing into an experiment without the experimenter beingable to properly target demographic properties (e.g. for a particular survey, it might beimportant to target very focused subject pools—via a topical discussion list, maybe—suchthat it is easy to estimate certain participant held attitudes about the topic in advance.See Section §2.5 for a further discussion of participant solicitation issues).2 In Chapter 5,the facility where experiments are run on the Web is outlined as well as how this processis implemented. This process is automated, when the experimenter creates an experiment,the experiment is automatically given its own web-page whence subjects can participate.In Chapter 6 the analysis tool is described. The various options available for analysis areoutlined as well as suggestions for future statistical tools are given. How this section ofthe system is implemented is also discussed.

2Of course, nothing stops an experimenter from yielding aspects of control by, for example, just postingthe URL and password to a newsgroup or an experiment portal.


Figure 3.2: Screen shot of the options available on the experimenter’s home-page


3.4.3 Subjects

Subjects receive the URL and password necessary to access the experiment from the experi-menter. It isn’t at present possible to supply individualized passwords for each participant.Although this is a feasible extension, it isn’t clear that the consequent account manage-ment responsibilities for the experiment manager would be worthwhile. The participantscan access the URL at any time convenient to them.

In many experiments, it is necessary that each subject participate only once. Theexperimenter can prevent multiple participation by requiring potential subjects to emailthem for the necessary access details. The system maintains a time-stamped log of IPaddresses that access the system; therefore, to some extent (modulo multiple participantssharing an IP address), it is possible to identify multiple attempts to participate made bya single individual. The experimenter may check if they have previously received an emailfrom that IP address or user-name, and if this is the case they do not have to send theaccess details to that subject. Currently, such checks must be mediated by the systemmanagers. A further improvement to the system would be a method which would identifyIP addresses that have already used the program and notify the experimenter. Also, eachexperiment has a unique user-name and password, so if the subjects have the access detailsfor one experiment, this does not mean that they will be able to access another experiment.

In some experiments, it may be necessary to have a vast number of subjects and it maynot be necessary that they all be unique and so in this case there would be no need tocheck the email addresses and perhaps the experimenter could just post the URL link withthe initial call for participation. We suspect that response rates (if it is appropriate toeven measure response rates in the Internet setting on par with response rates to absolutenumbers of physical materials that might be put forward) can be enhanced by not takingthis approach, in order to better reassure subjects via direct communications that theiranonymity is preserved and addresses not recorded.

3.5 Conclusion

This chapter has introduced the web–based experimentation system which has been ex-panded and used for this thesis. Firstly, an overview of the system was given. The systemsinteraction with the servers and the different sections of the system and the users of thesystem were discussed. The architecture of the system is then outlined. Finally, I de-scribed each user’s roles and the facilities available to them. Brief references were made toimprovements which could be made to the system. The next chapter will outline how anexperimenter creates an experiment and the various options available during this process.The implementation of this aspect of the system is also detailed.

Chapter 4

Creating Experiments

4.1 Introduction

This chapter describes in detail how experiments are created using this web–based exper-imentation system. I outline how an experimenter goes about designing an experimentand some screen shots are provided of what the experimenter views during this process.The final section explores how this side of the system is implemented, which includes adescription of the methods and data structures used.

4.2 Experiment Details

When designing the creation tool, it was assumed that an experimenter who has been givenaccess to the system will want to track more than one experiment (or separate conditionsof a single experiment) at a time. As a result, when an experiment is created, it is given itsown directory where all files (presentation files, result files) are stored. Each experimentmust have a unique name relative to the system’s current supply of experiments and ifa unique name is not given, then the experimenter will be prompted until one is given.An example of a screenshot where the experimenter inputs the name of the experiment isgiven in Figure 4.1.

The experimenter must also provide a user–login and password for each experiment.This ensures that if a subject has participated in one experiment, they cannot simply takepart in another experiment provided the experimenter can control or feel secure about thelikelihood of multiple individual participation from a number of addresses. However, theremay be some cases where security is not such an issue and experimenters may post thelink to the web–page, user–name and password for an experiment in a chat-room or on anotice board. Such details (experiment name, user–name, password) are simply input intotext boxes and the system stores them in variables for processing later.

24

CHAPTER 4. CREATING EXPERIMENTS 25

Figure 4.1: Screen shot of where the experimenter inputs the experiment name


4.3 Designing the Expermient

4.3.1 Experiment Details and Instructions

When these details have been completed, there is then the option of entering details aboutthe experiment for the experimenter’s own future reference. This is a very useful facil-ity as the experimenter can simply read these details and will know what the particularexperiment is about without having to view the whole experiment (see Figure 4.2)

Figure 4.2: Screen shot entering details about the experiment

Then there is the option of having a slide which presents instructions to the subjects to


ensure they completely understand what is required of them. As this component acceptsopen running text, it is possible for an experimenter to use it for a wide range of purposes –the key issue is that this slide doesn’t request direct feedback. Conceivably an experimentcould consist of this slide’s text and a subsequent slide of response clarification. Butthe purpose of this slide is pre–experiment briefings. It is also on this page, that theexperimenter must decide if he/she wants the slides in the experiment to be in sections ornot. If sections are wanted, then the checkbox as seen in Figure 4.3 should be checked.

Figure 4.3: Screen shot of entering the instructions


4.3.2 Slides in the Experiment

Once these details have been completed, the experimenter moves on to creating the slidesin the experiment. An experiment can contain as few or as many slides as the experimenterwishes. There are various options available to the experimenter (see Figure 4.4).

Figure 4.4: Screen shot of the options available when creating an experiment

An experimenter may choose to have each slide in the experiment randomised, to haveno slides randomised, or to have slides randomised in sections. This refers to the sequencein which any individual participant experiences the experimenters materials. This is a veryimportant feature as randomisation ensures there is no ordering–bias in the results which


can significantly distort the experiment’s results when there is an interaction among items.An example of where one might want some slides randomised and some not in the

one experiment is, where one might always want a slide at the beginning which obtainsdemographic information from the subject (to randomise slides, the experimenter simplyclicks the appropriate checkbox). Then the actual slides which contain the experimentquestions would be randomised together in a block and perhaps then a slide at the endwould not be randomised and so would always be the last slide asking for perhaps feedbackon the experiment. Another example would be if one wanted a stimulus on one slideand then following that, a series of slides all of which are randomised. The system isdesigned so that participants cannot independently navigate forward or backward throughthe materials, insuring that experimenters have control over the experimental design andits realisation for any individual participant.

On each slide, the experimenter has many choices as can be seen on the screen shot (Fig.4.4). The first line of this slide indicates which slide number is accessed. Then the nextline gives three options. If the back button1 is pressed then the previous slide is accessed.If the next slide button is clicked then this opens the next slide in the experiment. Finallythere is the option to insert a new slide – which opens a new page as seen on Fig. 4.4.

The next line on the slide gives options for different types of input on the slide. Tochoose an option, the relevant button must be pressed. It is possible to have Simplequestions which are open ended questions. Multiple Choice Questions present the subjectswith a question and a choice of answers. Text and Graphics present some text and graphics.When, for example, a multiple choice question is chosen, there will be a check–box displayedbeside it. If this box is checked and the delete button is pressed, then the multiple choicequestion will be deleted. Associated answer possibilities will also be deleted. Underneaththis line there are various check boxes, which can be checked or left unchecked. Theserandomise the slides, text, and questions. Finally, when the whole experiment is finishedthen the Finish button is clicked. Figure 4.5 shows what Figure 4.4 looks like when somechoices for the experiment have been made e.g. simple question, multiple choice question,text.

When an experiment is created, a web page is automatically created which stores theexperiment and which subjects will have access to when they are participating in an exper-iment. Thus, the experimenter has access through a remote server to a facility that makesa complicated combination of functionality available, yet without having to program evenin html.

Figure 4.6 provides an example of the LATEX markup associated with the sample mate-rials shown there in Figure 4.5 as the user would view the data. LATEX macros have beendefined, which makes it possible for the experimenter to easily incorporate examples ofmaterials and datafiles into the documentation of their results: the text in Figure 4.6 willappear as in Figure 4.7.

1This refers to the back button supplied on the page, and not by the browser. The effect of using thebrowser’s ‘back’ button is a null operation.


Figure 4.5: Screen shot of the choices on this page


\begin{slide}{0}{norandom}\textRandom{false}\multiRandom{false}\eomultiRandom\simpleRandom{false}\eosimpleRandom{}\begin{txts}\txtin{This is some text}\end{txts}\simplequestion{0}{This is a simple question?}\eosimplequestion{0}\multiquestion{0}{This is a multiple choice question}\eomultiquestion{0}\noOfAns{0}{2}\random{0}{false}\begin{0}{answers}\answer{0}[0]{Answer 1}\answer{0}[1]{Answer 2}\end{0}\eoqs{endqs}\MultiQuestions{1}\SimpleQuestions{1}\Sentences{1}\end{slide}

Figure 4.6: Example of resulting system file from Figure 4.5


• Is text randomized? false

• Are multiple choice questions randomized? false

• Open-text question is randomized? false

• Textual experimental items:

1. This is some text

0 This is a simple question?

0 This is a multiple choice question end of multiple choice question: 0

• Number of choices 0

• Choices to Q 0 are randomized: false

– question (0): [ : 0]Answer 1

– question (0): [ : 1]Answer 2

• Number of Multiple Choice Questions: 1

• Number of Simple Choice Questions: 1

• Number of Items: 1

Figure 4.7: Example of resulting from typesetting of file in Figure 4.6


4.4 Implementation of Experiment Creation

4.4.1 The Vector

The main data structure for this part of the program is a vector. A vector is a dynamicarray of objects. It gets bigger or smaller depending on the requirements of the executionof the program. Arrays could not be used in this situation as they have fixed compile–timeallocation. The size of the array could not be given as a variable since there would be noway of knowing how many questions or text the experimenter will add to the experiment.Linked lists, being dynamic, could perhaps have been used, however vectors were chosenas they have dynamic allocation and direct indexing to individual cells.

When a slide is added, an element will be added to the vector, and if the experimenterremoves a slide, then the elements related to that slide will be removed from the vector. Inthis way, it is extremely useful to us when creating an experiment, as it allows experimentersto add/remove as many slides, text, questions as they wish.

The original class to create experiments only allowed slides and text (sentences) to berandomised. However, the system would be more useful if the simple and multiple choicequestions were also randomised and this has now been developed and integrated into thesystem. In order to do this, many items were added to the files and the data structurewas changed, which in turn affected the other parts of the system and so changes to theother classes (classes which present and analyze experiments) were required. In the originalsystem, the vector held information on whether the slides were in sections, whether a slidewas randomised and then the text, simple and multiple choice questions for the experiment.New elements were added to the vector to hold information on whether the different typesof questions are randomised or not.

Different labels are used to differentiate between the different types of information thatare being added to the experiment file, e.g. labels for slides, randomisation, graphics, text,open–ended questions and multiple–choice questions.

When an experiment is created, elements of the vector hold information about whetherthe slides are in sections or not. This is indictated by Yes or No. When a slide is added to anexperiment, then an element is added to indicate a new slide (Slide) followed by informationregarding whether the slide is in a random position or not. There are then two newelements which indicate whether the questions on this particular slide are randomised. Suchinformation is parsed and stored in variables, which are then read when the experimentfile is being created. For example, if simple questions are to be randomised, the variablethat should hold this information will hold Random.

If some text, graphics, simple questions are added to the current slide, then two newelements are added to the vector, the first to hold the data type as described above. If forexample, some text is being added, the element T is input to the first new element in thevector. This then signals for the layout for some text to be added to the screen, e.g. thetext box. Then the second element of the two added to the vector, will hold the actualtext input by the experimenter. T, represents the text data type; other types are describedpresently. However for multiple–choice questions, the process is a little different as more


elements must be added. Once again, the first new element will contain the type M toindicate that a multiple–choice question follows. The layout for multiple choice questions isthen added to the screen. The next element holds the number of answers to the multiple–choice question, followed by the actual multiple choice question. The next element holdseither Random or NoRandom depending on whether the presentation of answers to thequestion should be randomised or not. Then the last elements store the answers to themultiple choice question. When simple questions (an open question) are added, the datatype S will be added along with the actual question. If some more text was added nowafter the simple question, the same process as described above would happen with the datatype T and the text being added. This is not written to file. Figure 4.8 shows what acreated vector would look like.

/C This is where instructions could go /F The last slide

/Slide /Random /Random /No Random Random

/T This is some text presented to the subjects /S Simple Q here

/M 2 Multiple Choice Question goes here /Random Answer1 Answer2

Figure 4.8: Depiction of the Created Experiment Vector


4.4.2 Changes to Labelling

When presenting the experiment on the web, the files that are created during this processmust be parsed. All files contain labels which identify the type of information to followin the slide. When a new slide is added to an experiment, labels will be added to thefile to signify this. Equally when, for example, a simple question is added to the slide,a simple question label will be added to the file to show that the question that followsis a simple question. The original labelling system did not lend itself to straight forwardparsing. Information was read from the end of one label to the beginning of the nextlabel. The problem with this was that it was not systematic. In order to parse text, thesystem was told to read between a text label and a simple question label. This is clearlyproblematic if there was no simple questions on this slide. In this instance, the text wouldnot be parsed. There were similar problems with parsing simple/multiple choice questionsand also when parsing the randomisation information. This system clearly needed to bechanged so that it was irrelevant what information the experimenter had chosen for eachslide. Labels are now added to identify the end of information, e.g. to show the end of textetc. All information is now parsed correctly as to parse some text, the system reads fromthe beginning of the text to the label which marks the end of the text.

In the original system, there were often problems with the answers to the multiple choicequestions. This was due to there being no information stored with groups of answers, toindicate which question they belonged to. Often the answers to the first question wouldbe output for each of the multiple choice questions on a slide. The counter that is used tomark the number of a multiple choice question is now also stored with the answers to sucha question. An example of what is now saved is seen in Figure 4.9.

0 Are you male/female?end of multiple choice question: 0

Number of choices 2

Choices to Q 0 are randomized: true

• question (0): 0 : male

• question (0): 1 : female

Figure 4.9: How multiple choice questions are saved

This shows that this is the first multiple choice question on this slide (e.g. 0) and thisis recorded with each line of code here. There are two answer choices for this question.The final two lines of this file explain that male is answer 0 for question 0 and female isanswer 1 for question 0.


4.5 Overview of main methods in this class

As in all of the classes, there is a method which is triggered each time a button is pressed.There are many if statements used within this method, one for each button. When abutton is pressed, the appropriate section of the method is triggered and the instructionswithin this are implemented. Often this involves a new layout being added to the page,e.g. if the text button is pressed, then the text box will appear. There is a method whichuses switch statements to differentiate between the different types of layout which may beadded. However, when buttons are pressed, this also involves elements and informationbeing added to the vector as discussed in 4.4.1. Again switch statements are used dependingon which type of information that has been added. For example if some text is added, thenT will be added to the vector etc. A method is also needed which reads the informationthat the user enters, e.g. if questions should be randomised. Such information is stored invariables and accessed in the method which writes the experiment.tex file.

4.6 Conclusion

Firstly I described how an experimenter would create an experiment. Details were givenabout the different options available during this process, e.g. randomisation, sections, etc.The main data structure which is used during the implementation of the creation processis a vector and the make up of this vector was outlined. I gave an example of what a vectorwould look like after an experiment has been created. The new labelling techniques whichhave been employed were also discussed. Finally I gave a brief overview of the other mainmethods used during this creation process.

Chapter 5

Presenting Experiments

5.1 Introduction

This chapter discusses the presentation of an experiment in the system, whose extensionis a prime contribution of this dissertation research. The chapter provides the view whichthe subject will see when participating in an experiment. Finally I will seek to show howthis section of the system is implemented by outlining the main methods used in it.

5.2 Viewing an Experiment

There are two options when viewing an experiment. Firstly, the experimenter can viewa run–through of an experiment without recording answers. An experimenter may decidethat a slide is no longer needed and that it must be deleted and so this facility is useful whileconstructing and editing materials for an experiment.1. The experimenter accesses thisfacility by choosing the experiment and clicking on the View Show button. Secondly, thesubjects can view and participate in experiments. They are given a link to the experiment,load that in to their web browser and then participate while the results are recorded. Whena subject participates in an experiment, their results are recorded in a separate file. Resultfiles are identified by their .res extension. Each .res file takes the form of experimentersname + thread number + timestamp + .res. The timestamp technique is used as thisensures that each .res file is unique. As noted in Hourihane (2002), the time stamp is the

1Due to time constraints on this thesis, full editing facilities are not currently available. Presently ifthe edit option is chosen by the experimenter, the experiment is shown on the screen. However if buttonsare pressed nothing happens. However, the changes that should be made to the vector (the vector wasdescribed in Chapter 4) during the editing process are in place, e.g. if questions are added to the slide,then elements are added to the vector or if questions are removed from a slide, then elements have to beremoved from the vector. Thus all that is required to make the editing option work correctly is simply toadd the layout aspects, in otherwords for the text boxes to appear, when the text button is pressed andso on. This is the same as what is used in this method to present experiments and so should not requiremuch work. I have changed any appropriate label differences that existed in this method as a result ofchanges that were made to these labels as discussed later in this chapter

37

CHAPTER 5. PRESENTING EXPERIMENTS 38

number of seconds since January 1, 1970, 00:00:00 GMT. An example of a .res file namewould be maria1059661341880.res, where maria is the name of the experimenter. Figure5.1 demonstrates the subject logging into an experiment. The username and password areallocated by the experimenter.

Then Figure 5.2 shows what the subject sees when participating in the experimentdesigned in Chapter 4. The system allows for substantial amounts of texts as items andrelies on users to employ the scroll–bars to see the entirety. In Figure 5.2 the top twoboxes display fixed text created by the experimenter. The third box is an open text areafor participant replies. The fourth box shows the text of a multiple choice question, withthe answers given beneath. It is not possible to move on to the next slide without enteringsome text for each open text question and selecting a response to each multiple–choiceitem.

When the subject is finished participating in the experiment, the results are recordedand saved. Result files timestamp responses.


Figure 5.1: Screen shot of subject entering username and password for an experiment


Figure 5.2: Screen shot of subject participating in an experiment


5.3 Implementation

This class2 has three main functions

1. Parse the experiment files

2. Output the experiment to the subject/experimenter

3. Record the results of the user input

The first two functions are implemented together, as the information is read and parsedand then output to the subject/experimenter.

Although the experiment has been created and information regarding number of slides,randomisation etc, recorded, it is in this class that such information is actually used andimplemented. It is here that the randomisation is implemented.

There are three methods to parse the files and output the experiment to the subject.

5.3.1 Parsing the comment slide

This method reads the comment slide. The comment slide holds information about:

• the pre–experiment comment

• whether the slides are in sections or not

• the number of slides in the experiment

• whether each slide appears in a random position

• whether each slide marks the end of a section or not

An example of the comment slide containing such information is seen in Figure 5.3. Itshows the LaTeX realisation (top half of the figure) of the comment slide (bottom half ofthe figure).

Firstly, using the indexOf method to read between labels, the system parses the experi-ment comment which is usually instructions to subjects. This is stored in a variable whichwill be called when it is time to display the comment slide to the subject.

Then the number of slides in the experiment is parsed and converted to an integerand stored in a variable. This variable is then used to create various arrays whose sizedepends on the number of slides in the experiment. In other words, the array is created

2This system is implemented in Java which has basic notions that noone in the set consisting of mysupervisor has yet understood, such as class, object and method. People like him should think of classesbeing to objects, what cookie cutters are to cookies. The former can make many of the latter. In otherjargon words, an object is an instance of a class. While implemented in Java, because of the web portabilityafforded by Java Virtual Machines being provided by web enabled browsers, the data structures andalgorithms are described at a level that would abet reimplementation of components in other languages.


Experimenter: maria

Pre-Experiment Comment: please answer the questions that follow

Sectioned: false

Total Number of Slides 2

Slide no. 0 randompositiontrue

Slide no. 1 randompositiontrue

Slide no. 0 EndSectionfalse

Slide no. 1 EndSectionfalse

\begin{experiment}{maria}\preexperimentcomment{please answer the questions that follow}\Sections{false}\TotalNumSlides{2}\Slide{0}randomposition{true}\Slide{1}randomposition{true}\Slide{0}EndSection{false}\Slide{1}EndSection{false}\end{experiment}

Figure 5.3: Example of a Comment Slide

and initialised and its size is the variable numberOfSlides. The size of this array is there-fore different for each experiment depending on the number of slides in that particularexperiment.

As noted in Chapter 4, slides can be grouped in sections (which is not the same as eachslide being divided into sections). If this option is chosen, then the appropriate variableis set to true. If it is not chosen, then the variable stores false. This variable is foundagain using the indexOf method. A loop is used to then parse each slide for informationregarding whether the slide marks the end of a section or not. This method is shown inFigure 5.4.

Then the system needs to know if each slide is to be randomised or not. A similarmethod to the one outlined in Figure 5.4 is used. Each slide is parsed to see if it saysrandom or not and then the variable is set to true or false accordingly. It again takes intoaccount the varying number of digits if there are double digits etc.


for each slideif it is the last slide

this is clearly the end of a section so variable should hold trueelse if there are more than 9 slides i.e. into double digits

use indexOf method to read between labels & store true orfalse in variable.

else if there are less than 10 slides i.e. single digitsdo as above except that the indexOf will read one less digit

Figure 5.4: Determining whether each slide marks the end of a section or not


5.3.2 Parsing the final slide

This method simply reads the post experiment comment. This comment is designed toenable permission for thanking the subject for participating in the experiment or to providefeedback on the motives behind the experiment. This method outputs this comment,recorded by the experimenter, to the subject as they reach the appropriate point in theexperiment.

5.3.3 Parsing the main file

The main file holds all information regarding the content of each slide e.g. whether textand questions are randomised, the actual text, simple questions and multiple choice ques-tions. Labels are used in files to identify the type of information which will follow, e.g.simplequestion is a label which is used to show that a simple question will follow.

This is the method which has had the most changes made to it in this class. As noted inChapter 4, new labels were added to the file to make parsing the file easier and in order toavoid errors. This meant that much of this method in this presentation class was changedto incorporate the new labels employed in the creation of the experiment.

In other words, when reading a piece of text/simple question/multiple choice question,the methods now read between different labels than in the program I inherited. Thisensures that the correct items are read on each iteration. Now it is irrelevant what optionsthe experimenter chooses as the file will still be parsed correctly.

Previously, for example, some text would only be parsed correctly if a simple questionfollowed the text, as the system was programmed to parse a piece of text by readingbetween a text label and simple question label. Clearly this encountered errors when theexperimenter had not chosen a simple question as then a simple question label would notfollow the text. If the experimenter had chosen a multiple choice question, there would bea completely different type of label following the text label which would cause errors whenthe system attempted to parse the piece of text.

As noted in Chapter 4, there are many options available to the experimenter whencreating an experiment. It is possible to have:

• text – sentences (e.g. Please answer the questions that follow)

• simple questions – open ended questions (e.g. What is your name?)

• multiple choice questions – questions which gives answer choices (e.g. What is yourfavourite colour; red, yellow, blue, green?)

• graphics – a picture/image (this is presently not quite functional)

Text

The method to randomise the text (sentences) already existed, however a slight modifi-cation was needed in order to ensure it worked correctly. The original method correctly


parsed whether the sentences should be randomised or not, and also read each sentence andstored it in an array for output to the subject. The method to randomise these sentenceswas also in place, however it was only ever called for one sentence as there was no loop toindicate to it that it should randomise for each sentence, i.e. for the first question, it wouldgenerate the random number and store the question in its new position in the new array.However it did not complete this process for the remainder of the sentences and so the textsentences could not be randomised. A loop was thus put in place to ensure this happenedfor each sentence. Finally an additional if statement was incorporated at the end of thismethod to tell the system to do nothing if the sentences did not have to be randomisedconditioned on the value of the setting of the appropriate variable.

Simple Questions

The program of Hourihane (2002) simply read in the simple questions and stored them inan array. There was no facility for randomising the simple questions. Hence this methodwas overhauled in order to incorporate the randomising of these questions. Firstly, thesystem must ascertain whether the simple questions must be randomised or not and thisis stored in a variable. Also, the number of simple questions must also be parsed andstored in variable2. If this randomisation variable is set to true (to indicate that the simplequestions on this slide should be randomised) four arrays are created A1, A2, A3, A4(these are explained next). Their size is governed by the number of simple questions inthe slide. They can be any size. A4 holds the actual simple questions in the order that theexperimenter entered them. A random number is generated, say Z. Then the Zth elementof A1 is set to 1 to show that this question is about to be displayed and should not bedisplayed again. If a random number is generated in the future and this element of A1already contains a 1, this means that it has already been displayed and should not bedisplayed again, therefore another random number is generated. Then the ith position (iis set to 0 at the beginning of this method) of A2 is set to hold the Zth element of A4(i.e. the question itself). The variable i is incremented and the process described above isrepeated for each simple question.

Upon termination of this method, A1 will store all 1’s indicating that each question hasbeen displayed. A2 holds all of the simple questions in a random order and the questionsare presented to the participant in this order. In order to ensure we can keep track of theoriginal numbers of these questions for future analysis of the results, A3 stores the originalnumber of the question. When the questions and answers given by subjects are recorded inthe result files, it is this original number which is recorded with the question. This meansfor each participant result file, each question has the same original number beside it (andnot the number relating to its presentation) and so makes the analysis straightforward.Multiple–choice questions are parsed and randomised in the same way.


5.3.4 Recording the subjects answers

This is the class where the .res file is written and saved. The file extension is the namearbitrarily determined (and inherited) for this system. The contents are just the responses,in text format, with a LaTeX markup. As the subjects answers are read, they are parsedand saved into a .res file. As noted above, questions and answers are recorded with theiroriginal number and not the number associated with the order of their presentation. Thisensures that questions are stored in the correct sections of the arrays and that the analyzingof results is straight forward and error free.

Other methods in this class include ones to generate random numbers which is usedto randomise the slides, text and questions, a method to add different types of layout andmethods to send information to the server.

5.4 Conclusion

This chapter has covered the presentation of the experiment section of the system. Firstlythe options available for viewing an experiment were shown. I discussed the implementationinvolved in presenting experiments in detail, including pseudo code and descriptions ofthe main methods involved in the process. I explained how I implemented methods torandomise the simple and multiple choice questions. The next chapter will demonstratehow the results of experiments are analyzed using the system.

Chapter 6

Analyzing Experiments

6.1 Introduction

This chapter discusses the analysis side of the program. It explores the analysis tool fromthe experimenter’s point of view. Some screen shots are included to show a step–by–stepview of analysing the results. It also provides a description of how this aspect of the systemis implemented, as well as discussing the improvements which have been made from theoriginal system as well as possibilities for future improvements.

6.2 Analyze Results

The aim of this class is to provide the experimenter with a means of analyzing the resultsof experiments (Chapter 4 describes in overview terms the levels of functionality availableto the three user categories). The experimenter does not need to look at the individualresult (.res) files (see §5.2 for an explanation of .res files) as this class allows the experi-menter to view the results on a webpage. There are many different options available to theexperimenter. Firstly, the experimenter chooses which experiment to analyze the resultsof. They may analyze only their own experimental data. Then the experimenter is toldhow many subjects have participated in the experiment (see Figure 6.1).

The experimenter then reaches the option to analyze the results. As can be seen fromFigure 6.2, there are many options available to the experimenter and he/she simply clicksthe button of which one he/she would like to view.

47

CHAPTER 6. ANALYZING EXPERIMENTS 48

Figure 6.1: Screen shot of how many have participated in an experiment


Figure 6.2: Screen shot of the options available to the experimenter


6.3 Implementation

When I inherited the program (Hourihane, 2002), (Ryan, 2001), (McGowan, 1999), thefacility for analysing the results existed, however there were some problems with it.

6.4 Opening the result files

When a subject participates in an experiment, the results are stored in a unique .res file.Each .res file is stored in a subject file which holds the names of all of the result files for aparticular experiment. Figure 6.3 shows an example of a result file and Figure 6.4 showsan example of a subject file. The subject files are stored in the experiment directory whichitself is a subdirectory of the experimenter’s directory.

\experimenter{paper}\begin{experiment}{Wed May 14 14:10:33 GMT+00:00 2003}

\begin{slide}{0}\response time{Wed May 14 14:17:25 GMT+00:00 2003}\SimpleQ{0}{This is a simple question?}\SimpleAnswer0{This is the answer to the simple question}\MultiQ{0}{This is a multiple choice question}\final user response{0}{Answer 1}\SimpleAnswer{ending}\final user responseX{ending}\end{slide}{0}

\end experiment time{ Wed May 14 14:17:25 GMT+00:00 2003}\SimpleAnswer{ending}\final user responseX{ending}

Figure 6.3: Example Participant Data File

expname11060005035518.resexpname11060084535587.resexpname11060119756473.res

Figure 6.4: Example Subject File

In the original Analyze–Results class, when the experimenter typed in the name of theexperiment they wished to analyze, the system would correctly locate the directory of theexperiment and also the correct subject file. The subject file holds all of the result files.However, there were problems when it sent the names of the .res files to the server. Iexplain these problems next.


In the old system, the method to open the .res files was incorrect. When the systemattempted to open the .res files it would not parse far enough into the names of the filesin order to obtain the correct full file name to open. In other words, on the first attemptit would try to open a .re file, the second attempt a .r file etc. Clearly this meant that theresult files did not open as they are in fact .res files. As a result there were many Out ofBounds and Bad File number errors. In order for the files to open correctly, the full nameneeds to be parsed, including the full extension, i.e. .res.

Through a process of trial and error, it emerged that a method was needed which parsedthe result files differently depending on the number of result files in the experiment. Thevariable variable is used to store the value which is the difference needed to parse the filescorrectly.

• If there is one result file, then variable = 2

• If there are two result files, then variable = variable

• If there are three result files, then variable = variable - 1

However if there are more than three result files, then variable = variable + 1 and thisvariable is used in the formula to open the next filename which is file = file + filelength +variable, where the filelength is size (this variable represents the length of the string) + 18(thread number + timestamp). (This used to be size + 17 in the original program). Thissuccessfully opens all of the result files on each iteration.


6.5 Parsing the results file

6.5.1 Results by Subject in original program

In the original class there was a method to output the results by subject. However, it wasin some ways inadequate. For each slide it did the following; it located the beginning andend of the slide in question and stored them in two variables. This has been carried overto the updated class as it ensures that only questions on this slide are looked at (recallthat it is possible to have text e.g. sentences, simplequestions e.g. open ended questionsand multiple choice questions). Then it searched for the first SimpleAnswer and stored itin a 2 dimensional array corresponding to subject number and slide number. This type ofarray did not distinguish between different subjects as it was only a 2 dimensional array.

There was no loop of any kind so the program only ever read the first simple question onany slide. Although there was no facility for randomising simple questions in the programat this stage, had there been, this method of analysing results would not have dealt withit properly as the first question on each slide recorded in the .res files could be differenteach time due to the randomisation.

The same applied to multiple choice questions, it would only ever find the first multiplechoice question and then stop parsing. Again, had there been loops in place and a ran-domisation facility, this tool would not have been able to deal with it. Sometimes the firstmultiple choice question would not even appear, this was due to problems accessing theanswers to the multiple choice questions. There was nothing recorded with the answers toshow that it corresponded to those questions and so when it tried to print out the answerit would not be able to locate them.

However there were also problems with the actual parsing of the multiple choice andsimple questions. The method to parse questions was carried out by locating the appropri-ate label (e.g. SimpleAnswer) and telling the program to read from the end of one label tothe next label and store the answer in a variable. However this class had incorrect valuesfrom which to read from, which meant that often the first letter of an answer would belost. For example it would tell the system to parse too far into the SimpleAnswer labeland so would parse maybe as far as the second letter of the answer also and start readingthe answer from there which is clearly inadequate as it will be missing the first two lettersof the answer.

6.5.2 Results by subject in new program

This whole area was completely overhauled. Clearly the 2 dimensional array (as discussedin 6.5.1) was not working. Some structure was needed that would clearly distinguishbetween subjects, slides and questions. As vectors grow dynamically, it was originallythought this type of data structure should be used to store the results. The results wereinitially stored in a three dimensional vector – number of subjects / number of results /number of answers. It was straightforward to store the answers in this way but it was quitedifficult to access such answers and to output them when iterating by subject or slide. It


was believed that such a method of storage would not give as great a flexibility or as manyoptions as a 3 dimensional array would. As a result a 3 dimensional array was used –[numSubjects][numResults][numAnswers] and this is the basis for how all of the results areoutput.

Firstly, the result file is opened. This contains all of the subject’s results for each slide.An example is seen in 6.3. The beginning of a slide is marked by beginSlide and the endof a slide by end slide. When the subject file is first parsed, these are the two things thatare first looked for. There is a while loop that iterates for each slide. It uses the indexOfmethod to find the beginning and end of the slide in question and stores them in twovariables. This means that only questions within these boundaries will be parsed on thisiteration.

Simple questions are the first to be parsed. A simple question is marked by the la-bel SimpleAnswer. However as noted in Chapter 4, I changed the labelling system. Thisincluded a change to some of the formatting of the .res files to now include a label Sim-pleAnswerending which marks the end of the simple answers. This is put in as soon as thereare no more simple answers. The same happens for multiple choice questions. Previouslythere was no label to mark the end of the simple questions and this caused problems whenparsing the file. Now in the current system before parsing the simple question answers,it locates the last simple question within the boundaries of this slide and stores it in avariable. Then it starts parsing the simple questions and does this until it knows there areno more questions to be parsed and it knows this when it reaches the SimpleAnswerendinglabel. We could not have used a loop here to iterate through each of the simple questionsi.e. while(counter<numberSimpleQuestions) as it will not be known how many simplequestions there are and it will vary for each slide. The method being used means it isirrelevant how many slides there are as they will all always be parsed once. It locates theSimpleAnswer and reads the answer from this and stores it in the appropriate place in thethree dimensional array according to subject number, slide number and question number.

Recall that questions or items may be randomised; thus, it is not possible to simplyadd each simple question to the array in the order that they are found. Simple questionsare stored in the order that the subject sees them but have the original number recordedwith them. This means that the experimenter’s first question will always be numbered 1etc. For example, one subject’s result file may look like this:

• SimpleAnswer3 Joe Bloggs

• SimpleAnswer1 100

• SimpleAnswer2 Ireland

However another subjects result file may look like this:

• SimpleAnswer1 101

• SimpleAnswer3 Pat Dolan


• SimpleAnswer2 England

So, in this example experiment slide, although the questions appeared in differentorders, each question 1 concerns numbers, each question 2 places, each question 3 namesand so they will all be stored correctly.

In the previous program, questions were stored in the order that they were parsed in.So different answers to different questions would be recorded in incorrect places. Alsothe data structure that was being used was only a 2 dimensional array and so was notaccounting for all three important pieces of information – subject numbers, slide numbersand question numbers. Thus the data could not be manipulated properly or analyzed.Now the program keeps the original question number with the question. It reads in thisnumber (e.g x) before it reads the answer to the question and then the answer will bestored in [numSubject][numSlide][x].

Then this SimpleAnswer label becomes the marker from which to start looking for thenext simplequestion which ensures no question is read more than once.

There is also a very small method here to keep check on the highest number of a simplequestion (i.e. how many simple questions there were) so that it knows where to startadding the multiple choice questions. 1 is added to this variable to cover for the fact thatthe multiple choice questions start at 0.

The multiple choice questions are parsed in the same way. The marker for the lastmultiple choice question is found and it parses until it reaches that. Once again themultiple choice question marks the start of where the next multiple choice question will beparsed from.

When all the questions have been parsed, it looks for the beginning of the next slidestarting from the previous end slide and parses the questions on that slide in the same wayas just described.

The end result of the above method is that there is now a three dimensional array withall of the results stored in it.

6.5.3 Results by Subject

The next methods of note are those which display the results to the experimenter. Firstly,the experimenter can view the results by subject – in other words it will output all ofsubject 1’s results, followed by subject 2’s results etc. The pseudo code for this is seen inFigure: 6.5. As can be seen on the slide, it firstly states the subject number and then theslide number. Each of the subjects answers to each question on this slide are printed tothe screen. When it has shown all of the answers for that slide, it moves on to the nextone. When each slide has been accessed, it moves on to the next subject.

An example of what the experimenter sees when this option is chosen is seen in Fig 6.6.


while counter1 is less than number of result files i.e. iteratewhile there is a result file to be parsed

while counter2 is less than the number of slides i.e. there isa slide to be parsed

find the number of questions on this slide by looking in thequestionsInSlideArray at the index corresponding to the variablecounter2 and storing this in the variable numQs

while counter3 is less than the numQsprint the 3dimensional array [counter1][counter2][counter3]increment the variables

Figure 6.5: Results by Subject


Figure 6.6: Screen shot of results by subject


while counter1 is less than number of slides i.e. iterate whilethere is a slide to be parsed

while counter2 is less than the number of subjects i.e. thereis a subject file to be parsed

find the number of questions on this slide by looking in thequestionsInSlideArray at the index corresponding to the variablecounter2 and storing this in the variable numQswhile counter3 is less than the numQs

print the 3dimensional array [counter2][counter1][counter3]increment the variables

Figure 6.7: Results by Slide

6.5.4 Results by Slide

To print the results by slide, the algorithm in Figure 6.7 was used. It outputs all of theanswers given by all subjects for slide 1, followed by all of the answers given by all of thesubjects for slide 2 etc.

An example of what the experimenter sees when this option is chosen is seen in Fig 6.8.Labels are printed to identify the start of a new slide and all of the answers to this slidefollow. A further facility could be added to this to print the questions which the answersare relating to on the screen. This is very straightforward as all information is stored ineasy to access arrays.

There was a method for outputting the results by slide in the original program. Itattempted to output all of each slide’s results by using the 2 dimensional array from theprevious method. Naturally this ran into problems as only the first simple questions andmultiple choice questions from each subject had been recorded, as mentioned in Section6.5.1.


Figure 6.8: Screen shot of results by slide


while counter1 is less than number of slides i.e. iterate whilethere is a slide to be parsed

find the number of questions on this slide by looking in thequestionsInSlideArray at the index corresponding to the variablecounter1 and storing this in the variable numQs

while counter2 is less than the numQswhile counter 3 is less than the number of result filesprint the 3dimensional array [counter3][counter1][counter2]increment the variables

Figure 6.9: Results by Question

6.5.5 Results by Question

To print the results by question, so it outputs all of question1’s answers followed by all ofquestion2’s answers etc. The pseudo code for this is given in Figure 6.9.

An example of what the experimenter sees when this option is chosen is seen in Fig6.10.

This is a useful facility as during analysis, the experimenter may wish to view all of theanswers to a particular question rather than all of a subject or slides results.


Figure 6.10: Screen shot of results by question


6.5.6 How often a particular answer occurred

Sometimes it may be useful for the experimenter to see how often a particular answeroccurred for a particular question. This method allows them to input the slide number,question number and then the answer they wish to check for. The program first locatesthe appropriate slide number, then the question number and parses that section of thearray for each subject and compares the answer given (using the compareTo method) toeach answer in the array. If the answer is found, a counter is increased. When all sectionsof the array have been accessed, the result is output to the experimenter.

6.6 Conclusion

Data structures and algorithms were described. Complexity analysis is not undertakenhere as that will not emerge significantly within the system before more sophisticated dataanalysis modules are added. Nonetheless, the system presented represents an improvementover the inherited system in that data is retrieved from individual files into a canonicallyordered 3–d array which may subsequently be processed in any number of ways. That is,the inherited data analysis feature amounted to displaying individual files. To some extent,the current one does the same, the difference is an advance in parsing the files into a datastructure which can be further processed as well as displayed.

Chapter 7

Research into Quantifiers

7.1 Introduction

This chapter provides an overview of some of the literature on quantifiers, frequency andprobability expressions. It is divided up into three main sections; mapping quantifiers onto scales (see §7.3), how context influences the way in which quantifiers are interpreted (see§7.4) and finally how quantifiers lead to different focus patterns (see §7.5). Experimentswhich have been carried out in each of these areas are discussed and critiqued. During thecourse of the discussion, some suggestions are made for changes which could be made tothese experiments.

7.2 Quantifiers and Frequency Expressions – Intro-

duction

Quantifiers, frequency expressions and probability expressions are an integral part of nat-ural language. They can be found in, for example, scientific textbooks, the Constitutionand conversations. However these expressions are vague and imprecise. Even quantifierswith seemingly universal force are at least subject to implicit restriction:

(7.1) It always rains in Ireland.

The sentence in (7.1) may well be accepted as true by natives of Ireland, even though itclearly isn’t in the most literal sense of universal quantification over events or times, evenwhen restricted to particular regions of Ireland. Research in the area of these expressionshas focused on a few main areas.

Barwise and Cooper (1981) classify quantifiers as monotone increasing and monotonedecreasing. A monotone increasing quantifier is one in which the statement is still trueeven if the quantified statement is made less restrictive. An example is Many fans went tothe football match early – Many fans went to the football match. A monotone decreasingquantifier exists if the statement is still true, if the quantified statement is made more

62

CHAPTER 7. RESEARCH INTO QUANTIFIERS 63

restrictive, e.g. Few fans went to the match, then few fans went to the match early. Quan-tifiers can also be classified as non–monotone when neither of the above holds. Examplesare:

• monotone increasing – many, several, at least two

• monotone decreasing – no, few, at most two

• non–monotone – exactly two

Moxey, Sanford, and Dawydiak (2001) describe quantifiers as being downward monotoneif they license negative polarity items. 1.Another feature of quantifiers is that they can bedenials. This is when something that “has been presupposed..is now being denied” (Moxeyet al., 2001, p. 430). Denials are made during communication. An example of a denialNot many people like cheese, do they? (Moxey et al., 2001, p. 431).

Research in the area of these expressions has focused in a few main areas. Many peoplehave attempted to provide a scale for such quantity–denoting expressions. In other words,they have tried to come up with an ordering for them, i.e. very often will be ranked higherthan often etc. This area is discussed in §7.3. Another issue is whether the context thatquantifying expressions are in will influence how they are interpreted, i.e. if subjects willmake their judgement based on the context that it is in (§7.4). Finally, the focus effectsof quantifiers is dealt with in §7.5 which investigates what areas of focus quantifiers leadto. Barton and Sanford (1990, p. 84) describe focus effects as quantifiers which “cause theattention of the reader to focus upon relations relating to the quantified assertion, whileothers do not”.

7.3 Scales

7.3.1 Introduction to Scales

Much work in the area of quantifiers has focused on trying to map quantifiers onto ascale. In other words, as mentioned in §7.2, whether it is possible to have an ordering forquantifiers and whether for example many would come before or after most, whether someof the time would come before of after usually.

There are four main types of scale; nominal, ordinal, interval and ratio.

• Nominal: This is a categorical type of scale. There is no ordering, no structure.No numerical meaning is attached to this scale. An example would be male/female,another would be flat/hotel/hostel/house.

1A negative polarity item is an expression which usually occurs in the scope of ordinary negations. Anexample is give a damn - Few people give a damn about that is acceptable, where as Many people give a

damn about that is not (Moxey et al., 2001)


• Ordinal: This type of scale measures differences in rank orders. The categories forma rank order along a continuum. This type of scale allows one to determine if oneitem is less than or more than another item.

• Interval: This measures differences in magnitude. Each interval is the same size.They are equidistant. They have numeric properties – it is possible to add andsubtract. An example would be the Fahrenheit temperature scale. Many studies inthis chapter have subjects assign quantifiers on interval scales.

• Ratio: This is similar to an interval scale except it has an absolute zero pointand thus has many numeric properties such as addition, subtraction, division andmultiplication.

In the studies reported in this chapter, there are varying ways of getting the subjectsto judge the quantifiers. Some ask the subjects to assign values between 1–100, others askfor percentages, others require subjects to indicate on a line what their judgement is. Itwould appear that having subjects assign a value between 1–100 is the same as havingthem assign a percentage as they are essentially giving a value between 1–100. I haveinformally asked some people about this and whether they think their judgements wouldbe different depending on the way in which they were asked to make the judgement andall responses have been negative. They view giving a number between 1–100 as being xtimes out of 100 which they see as being the same as giving a percentage. Perhaps it wouldbe more worthwhile when looking for a numerical value to put no limit on the values theycan give and simply ask for the value they think is appropriate for the quantifier. It is ofcourse possible that some subjects would view giving a value between 1–100 different togiving a percentage.

7.3.2 Research in the area of quantifiers and scales

A study by Hakel (1968) set out to produce a scale for frequency expressions. 100 universitysubjects took part in the experiment. The experiment was in the form of a questionnaire.Subjects were told that the aim of the experiment was to determine what the expressionsmeant to them. This experiment was a replication of an experiment carried out by Simp-son (1944), however Simpson (1944) had subjects assign percentages to the expressions,whereas in this experiment, numerical values were assigned to the expressions. They hadto give numerical values between 1 and 100 for 20 frequency expressions (see Figure 7.1).

The expressions (see §7.2 for a further discussion of these types of expressions) appearedon their own on the questionnaire, there was no context involved i.e. they did not appearin sentences or in combinations with other phrases. A high correlation between the twosets of results would thus be expected as there is very little difference in the methodologyof the two experiments (§7.3.1 discusses this further).

All of the quantifiers appeared on the same sheet of paper. There is no informationprovided about whether the quantifiers appeared in the same order for each subject or ina different order. However, it would appear that they were in the same order for each


almost neveralways

about as often as notfrequentlygenerally

hardly evernever

not oftennow and thenoccasionally

oftenonce in a while

rarelyrather often

seldomsometimes

usuallyusually notvery seldomvery often

Figure 7.1: List of expressions used in Hakel’s experiment

subject as an example of the questionnaire is given. No mention was made of, for example,having constructed an arbitrary ordering and its reverse, the two sorts randomly assignedto participants, nor other techniques for avoiding effects from order of the items. This willpotentially make the data obtained from the experiment noisy as there will be orderingeffects (§2.6 deals with ordering effects in experiments). For example, now and then appearsafter not often on the questionnaire. It is thus likely that the subjects will base the valuethey assign to now and then on the value which they had already assigned to not often.If a subject had assigned 20 to not often, then this would probably lead to now and thengetting a much higher value than 20. However, perhaps if subjects had been required togive a value to now and then when it is on its own, it would not obtain such a high value.The results showed that the median value for not often was 16, whereas the median valueassigned to now and then was 34.

However, the results of this experiment obtained a scale with always having the highestmedian value and never having the lowest median value. There was lots of cross–subjectvariability in the individual results. However, Simpson (1944) had results which weresimilar with a 99% correlation between the rank orders of the medians which demonstratesthat the ordering of the expressions obtained in the two experiments was very similar. Thus,it can be concluded that subjects are quite stable when assigning values to expressions when


there is no context involved. This correlation also shows that the experiment is reliable(see §2.7 for a further discussion of reliability in experiments).

In order to combat the ordering effects problem, experimental designs should havethe materials randomised for each subject (see §2.6). I have replicated this experimentby Hakel (1968) with a different experimental design and methodology. Replication ofexperiments is an important part of the scientific method, in that results which cannot bereplicated are thereby shown to be unreliable, as well as the conclusions drawn from them.The order of the materials is randomised for each subject (see Buckley and Vogel (2003a)and also Chapter 3 for a description of a system which implements such randomisation inexperiments). This design and the results obtained are discussed in Chapter 8.

Perhaps an extension of this experiment would be to have subjects assign a numericalvalue and not restrict them to a specific range. It is possible that subjects would still onlyprovide values between 1–100, however it would be interesting to see if they view someexpressions as meaning more than 100.

This next study by Bass and O’Connor (1974), deals with both scales and context. Itis discussed here but is also relevant in the next section §7.4.

Bass and O’Connor (1974) carried out a magnitude estimation experiment. Featherston(2002) describes magnitude estimation as when subjects provide judgements using numer-ical values, with no restrictions on the judgement scale and the judgement is relative toeither one’s own previous judgements or to that made of a reference item. There are clearlyorder effect issues inherent in this. Subjects will be basing their judgement of quantifierson judgements they have already given which may be totally different judgements to thosewhich they would give if there was no reference item involved in the experiment. FirstlyBass and O’Connor (1974) wanted to determine reliable means and standard deviations ofmagnitude estimations and secondly they wanted to investigate if these results dependedon whether the issues presented in the experimental items were important or unimportant.The important issues presented involved air pollution and the Vietnam war. The unim-portant issues were the amount of rainfall in Nepal2 and the worms in the street after arain storm. It was suggested that subjects will provide different judgements dependingon whether the issue in question is important to them or not. There was a test withinthe experiment where subjects rated these topics on a 5 point scale ranging from veryimportant to very unimportant to them. If subjects rated them mid–way between the twoend points, (i.e. they rated them as neither important nor unimportant), then their resultswere deleted from the experiment.

One hundred and seventy five subjects took part in this experiment. There were 39 fre-quency expressions and 44 expressions of amount. The frequency expressions were always,continually, constantly, frequently if not always, very often, a great deal of the time, veryfrequently, a great many times, usually, often, frequently, quite often, rather frequently,commonly, fairly often, fairly many times, sometimes, some of the time, to some degree,

2Presumably the subjects were not from Nepal, as students were used and I assume these came fromthe University of Rochester and those students from a High School from the same area as this is wherethe authors are located. Otherwise, views of importance might be distinctly different for this item


now and then, occasionally, once in a while, not often, not very often, fairly infrequently,infrequently, rather seldom, very seldom, rarely, very infrequently, seldom if ever, hardly atall, hardly ever, very rarely, almost never, seldom, none of the time, not at all, never. Theexpressions of amount were all, an exhaustive amount of, almost entirely, completely, anextraordinary amount of, almost completely, an extremely abundant amount of, an extremeamount of, a great amount of, a great deal of, very much, a full amount of, a lot of, much,quite a bit of, a good bit of, a considerable amount of, pretty much, fairly much, an ampleamount of, an adequate amount of, a moderate amount of, some, to some extent, to somedegree, somewhat, a limited amount of, a little, a small amount of, very little, a slightamount of, a meager amount of, a scanty amount of, a minimum amount of, a triflingamount of, scarcely any, a trivial amount of, an insignificant amount of, hardly any, none.The subjects had to first assign a number to sometimes and then, based on the value thatthey gave to sometimes, they then had to assign values to other words (frequency expres-sions). The values they gave had to be whole numbers greater than or equal to 0. Whenthe subjects were judging expressions of amount, they initially gave a value to some andthen to the rest of the words. There were two different conditions.

1. Condition 1:

• expressions of frequency with important topic

• expressions of amount with unimportant topic

2. Condition 2:

• expressions of amount with important topic

• expressions of frequency with unimportant topic

The aim was to see if subjects assigned different or similar values to the expressionswhen they were embedded in the two different types of context.

The materials were presented in 5 different orders which goes some way to combat theordering effects which can have an impact on results. However, I have designed a replicationof this experiment which has a thoroughly random ordering for each presentation of thematerials so as to completely avoid the ordering effects (see §2.6 for further discussion ofordering effects).

The results of Bass and O’Connor (1974) found that for both expressions of frequencyand expressions of amount, there was no significant difference in the values which thesubject’s assigned for important and unimportant issues. This means that the authors’hypothesis at the outset was rejected, and whether a topic is important or not, does notappear to influence how subjects judge expressions. This is interesting because impor-tance would seem to be one strong component involved in contextualized interpretation.However, the authors did produce five 4–9 point scales of the expressions. The nine–pointscale for expressions of frequency is seen in Figure 7.2 (Bass & O’Connor, 1974). Thepercentages in brackets show the percentage overlap between the points on the scale. It is


clear from the scale below that there is a large amount of overlap between the expressions,even though there are only 9 expressions on the list. In the 4 point scale, there is far lessoverlap, thus it was found that the more points on the scale, then the more overlap therewill be.

8. Always (24%)7. Continually (21%)6. Very often (24%)5. Quite often (42%)

4. Fairly many times (6%)3. Sometimes (45%)

2. Occasionally (16%)1. Not very often (7%)

0. Never

Figure 7.2: 9 point scale for expressions of frequencey

I carried out a replication of this experiment and the results of this are discussed inChapter 8. However one problem I see with this experiment, is that there are far toomany judgements for the subjects to make; there are too many items. As noted in Schutze(1996), boredom and fatigue can play a part and I believe in such a long experiment asthis one, these could definitely be such factors influencing the subjects and biasing theresults. It might be more worthwhile to carry out an experiment with fewer expressions offrequency and amount and such results should have less bias and be more stable.

Moxey and Sanford (1993a) discussed these and more studies and concluded that itwill be quite difficult to obtain a large scale for quantifiers.

7.3.3 Conclusion

This subsection discussed some studies that have attempted to provide a scale for quanti-fiers. I have also suggested some changes that could be made to these experiments whichcould be carried out in order to validate the existing results as well as further extendingwork in the area of scales.

7.4 Context

7.4.1 Introduction to research on quantifiers and context

This section discusses whether context influences subjects judgements, i.e. when a quan-tifier is placed in a sentence or situation so that the subjects are judging the quantifiersbased on their placing in the situation.


7.4.2 Research into quantifiers and context

Bass and O’Connor (1974) found (as discussed in §7.3.2) that context did not influencethe subject’s judgements – whether a topic was important or not did not influence howthe subjects’ judged the quantifiers. I personally found this quite unusual as surely in asentence like Most people want jobs, most is going to indicate a high percentage, whereas in a sentence like Most people enjoy eating worms, most would indicate a much lowerpercentage, thus showing that the context does play a part in the judgements.

Pepper and Prytulak (1974) note how Hakel (1968) and Simpson (1944) found thatsubjects are quite stable in their judgements for expressions which are not in contexts.It was also found that there were some expressions which were not at either end of thefrequency distribution or in the middle of the frequency distribution which are less variablethan the other expressions. The suggestion is that there will be differences in how subjectsjudge sentences like those described here. Examples are seen below. Pepper and Prytulak(1974, p. 96) investigated two effects of “context upon variability in the definition of someof the intermediate expressions with inexact meanings”. The two effects of context were:

1. Pepper and Prytulak (1974) suggest that variability would increase as the differencebetween an expression’s no-context definition and the context’s estimated frequencyincreased. In other words, there should be more of a difference in subjects’ judge-ments in low frequency contexts for high frequency expressions than there will befor low frequency expressions. An example given is California very often has sizableearthquakes which is a low frequency context for a high frequency expression. Itwould be expected that results obtained from subjects for such a sentence would behighly variable. However, for the next example, less variability would be expectedCalifornia seldom has sizable earthquakes as this contains a low frequency expression.(Pepper & Prytulak, 1974, p. 96)

2. Higher frequency expressions are more flexible than lower frequency expressions. Theexample given in Pepper and Prytulak (1974, p. 96) is that it will be more acceptableto say Airplanes very often crash than Hollywood Westerns seldom contain shooting.It should be found that there is less variability found among the subjects’ judgements.Pepper and Prytulak (1974, p.96), thus predict that “the higher an expression’s no–context definition, the greater the variability of its definition over diverse contexts”.

In this study (Pepper & Prytulak, 1974) there were 33 paid subjects and they gavea definition for five quantitative expressions in: two high-frequency contexts, two low-frequency contexts, one moderate-frequency context, one no-context.

Five quantifying expressions from Hakel’s (1968) list were used (See §7.1 for the com-plete list) and these expressions represented the entire range of frequencies excluding themost and least frequent expressions on this list – always and never. The five expressionswhich were used in this experiment were almost never, seldom, sometimes, frequently, veryoften. These were written on a separate piece of card with the context. Each student gotsix cards which they read aloud five times. Each time they had to incorporate one of the


quantitative expressions and then they would replace the blank with the most appropriatenumber from 0 to 100 (see §7.3 for a discussion of the literature about quantifiers andscales).

Firstly, it was found that when expressions and contexts are put together, the judge-ments given to the expressions will move from their judgements when not in a context towhat the estimated frequency of the context event is. Secondly, as was predicted, there wasgreater variance in the subjects judgements “as the discrepancy between the expression’sno–context definition and its context’s estimated frequency increased, with the effect beingstronger for the low–frequency contexts” (Pepper & Prytulak, 1974, p. 100).

As a result of all of this variability, it was claimed that perhaps the judgements givenby subjects to psychologists might not be very useful as they usually involve quantitativeexpressions.

A weakness of this experiment was that the subjects were paid. If subjects are paid forsomething, there is a possibility that their over–desire to give the “correct” answer couldinfluence the results. However, this is not always the case and perhaps the opposite mayalso be true (see §2.8 for a further discussion of such demand characteristics). Perhaps asuggestion would be to replicate this experiment with a wider range of subjects who arenot paid.

In the replications which I have carried out (see Chapter 8), the subjects were users ofthe Internet which tends to give quite a broad ranging sample of individuals in terms ofage, sex and location. This was noted in §2.4, where Hewson et al. (2003) concluded thatthe samples obtained on the Internet were in fact representative of the general population.

Having to read the card aloud five times in this experiment seems a bit unusual. Ibelieve it could mean that these results will not give a good representative result of thegeneral population, as people in general do not read statements aloud five times beforecomprehending them in natural circumstances. Perhaps a more realistic set of resultswould be obtained with the subjects simply being told to read the card.

Weber and Hilton (1990) carried out a study to investigate how the base rate andthe context (the severity of medical conditions) influences how subjects interpret differenttypes of probability expressions. The example of the materials presented to the subjects is

After your annual medical check–up, your doctor tells you that there is aslight chance that you will develop an ulcer during the next year. What do youthink is your doctor’s estimate of the probability of your developing an ulcerduring next year? (Weber & Hilton, 1990, p. 784)

Eighty five undergraduate subjects took part. They obtained extra course credit fortaking part (I discuss demand characteristics in §2.8). Subjects judged 12 statementsaltogether. The subjects indicated the probability by making a mark on a 10 cm long line.The line ranged from 0% on the low end to 100% on the high end. Having completed all12 judgements, they then had to provide judgements on what the probability was of theythemselves developing the medical condition.

The results found that the base rate and the severity of the medical conditions doinfluence the interpretation of the probability expressions. This conclusion seems very


plausible, and it appears likely that if the experiment was carried out with different contextsthen the results would show that the contexts will influence how subjects judge statements.For example, the severity of weather conditions may influence subjects judgements on howoften certain events may or may not occur. However, there was an unusual result in thateven when severity and base rate were controlled for, the medical situations still influencedthe judgements made and so a second experiment was carried out to investigate this further.

The materials were similar to those of the experiment just described. There were 71undergraduate subjects. The difference in the procedure was that instead of the subjectsmarking points on a line, they gave percentages. However they also gave a value between1 and 10 according to how serious they judged the medical condition to be. As well as thejudgements from the first experiment, they also had to judge how likely it was that otherpeople similar to themselves would develop the medical conditions.

The results of this experiment showed that the base–rate estimates which the subjectsgave for other people were significantly higher than those estimates they gave for themselves(i.e. how often they thought something would happen to others was higher than how oftenthey thought something would happen to themselves). The authors also concluded thatsubjects prefer to give judgements by giving numerical values rather than representing themgraphically as in the experiment previously described. This is based on the proportion ofvariance which was found when numerical judgements were used. It was found to be higherthan when the line was used to give judgements (Weber & Hilton, 1990).

Their final experiment manipulated the base–rates and the severity of the stimulusevents “thus providing more than just correlational evidence of their effect on evaluationsof probability words” (Weber & Hilton, 1990, p. 787). Forty one undergraduates tookpart. The statements were created by combining adjectives for base–rates and adjectivesfor severity of medical conditions. Subjects were told to pretend they had visited a doctorfor an annual examination. They were then presented with statements of the likelihoodof developing various conditions and they had to give a judgement of the probability theythought that the doctor believed. They were required to use the numerical scale from 0%to 100% but could also provide interval estimates.

An ANOVA test found that base–rate and severity of medical condition significantly in-fluenced subject’s judgements i.e. significantly different values were given for the likelihoodof developing medical conditions depending on what the base–rates were. For example,when the base–rate was common: i.e. “Doctor: It is likely that you will develop a commontype of influenza in the next year” (Weber & Hilton, 1990, p. 787), the average judgementwas 60.8, whereas when the base–rate was rare, the average judgement given was 44.3.Similar results were found for the other medical conditions (skin rash and migraine) withthese two base–rates. Furthermore, which medical condition was combined with whichword, was significant when the subjects were giving the judgements. The reasons givenfor this significance are the base–rate differences, severity of the medical conditions anddifferences in the medical conditions.

Perhaps it would be interesting to see if there is stability across the numerical valuesgiven for quantifier expressions in various contexts, in other words, to carry out an ex-periment to investigate if subjects gave similar values for quantifiers when they were in


different contexts as well as when there was no context involved. If there were large dif-ferences between the values that subjects gave to quantifiers in these different situations,then I think this would show that it will be extremely difficult to ever come up with acomplete scale for quantifiers.

Wallsten, Fillenbaum, and Cox (1986) carried out two experiments aiming to see ifbase–rates or expectations influence how probability expressions are interpreted. Secondly,they hoped to replicate the work of Pepper and Prytulak (1974) (discussed at the beginningof this section). In the first experiment, the subjects were all meteorologists, the reasonbehind meteorologists being used was that uncertainty is important to them. One issueto note is that the meteorologists were told before they took part that their judgementswould be used at a meeting of their association. I found this most unusual as clearly thiswill have an impact on the judgements the subjects give. It is likely they will be trying tovalidate the experimental hypothesis.

Each subject was sent a questionnaire to fill out. There were four basic contexts,and there were four probability expressions likely, possible, chance, slight chance. Theresults obtained were highly variable. However, this variability was the same as has beenfound in other studies so the fact that these subjects were meteorologists didn’t seem toinfluence the results (see Chapter 2 for a further discussion of participant motivation).Further comparisons should be done however, with studies which have the same rangeof expressions and with a more random sample. The results should then be correlatedto ensure that the results of this experiment are valid. I find it difficult to believe thatthey would be correlated, as meteorologists would surely judge such sentences differentlyto most subjects as uncertainty is clearly a part of meteorologists lives more so than it isfor most people. This is similar to why Schutze (1996) says that linguists should not beused in grammaticality judgements as they have linguistic training which most subjects donot. The same applies here, most subjects in the population do not have meteorologicaltraining.

The second main result which was obtained was that the event base–rate was a highlyinfluential variable as the values given by subjects to the medical contexts “varied as apositive function of event base–rate” (Wallsten et al., 1986, pg. 576).

The second experiment was carried out to further investigate this phenomenon. It wascarried out on a computer. There were 36 different scenarios which had different base–rates,involving weather and person topics. 9 probability expressions (sure, likely, probable, goodchance, possible, poor chance, unlikely, improbable, doubtful) and 9 frequency expressions(common, usually, frequently, often, sometimes, unusual, seldom, rarely, uncommon) wereused in the experiment. Subjects had to indicate what probability the expert who made aprediction would have had in mind. The sentence appeared on the computer screen with aline underneath it. It had a zero on the left end of the line and a 1 on the right end of theline. The subjects used the arrows on the keyboard to place the cursor on the line. Theywere then asked to indicate in the same manner the lowest and highest probabilities thatthe expert had in mind. The results showed that the probability and frequency expressionsinfluenced how the subjects interpreted the statements. The judgements they gave changedaccording to the probability of the context and also the expression itself.


Overall, the two experiments showed that base–rates influence how subjects interpretthe probability expressions. It is the high and neutral expressions which are mostly influ-enced according to base–rate, whereas the lower expressions are not influenced as much.

Moxey and Sanford (1993b) report on two studies which were carried out to investigate

• if prior expectation influences the interpretation of quantifiers

• if quantifiers provide information about the experimenters assumption

Their first experiment investigated whether the interpretation of quantifiers is depen-dent on context and also if the subjects’ prior expectations influence their interpretations.Four hundred and fifty undergraduate students took part in the experiment. They wereasked to take part in the study before the start of a lecture (see §2.8 for discussion ofusing undergraduate subjects who receive course credit or who are paid). There were threeconditions/scenarios and each subject only saw one condition. The quantifiers which wereused were few, a few, very few, only a few, quite a few, not many, many, very many, quitea lot, a lot.

The three conditions involved three situations which were shown to have three differentexpectations. They involved

• those who enjoyed a residents’ Christmas party – the social event of the year

• those who were convinced by a speech given at a party conference about educationcuts on British universities

• those female students who prefer to be examined by female doctors.

For example, the residents Christmas party condition was: The residents associationChristmas party was held last night in the town hall. QUANT (this is replaced by a quanti-fier from the list given above) of those who attended the party enjoyed what could be calledthe social event of the year. Then on a separate sheet of paper, the subjects were askeda question relating to the condition. For example, for the above condition, the questionwas What percentage of the residents do you think enjoyed the Christmas party? (Moxey& Sanford, 1993b, pg. 77).

The results of this experiment (Moxey & Sanford, 1993b) found that firstly the quanti-fier did very significantly influence the subject’s judgements. In other words, the quantifierthat was used in the statement would effect how the subjects judge it. Secondly, the resultsshow that the topic of the statement (the context) and the quantifier interact significantlywith each other. Thirdly, the authors’ hypothesis was supported – that if the subjectsexpectations are small for an event, then quantifiers which normally lead to large propor-tions, will then lead to smaller proportions in that case. An example of this might be ifthe quantifier a lot was used with the low base–rate statement about how many femalestudents prefer to be examined by female doctors. The proportion that a lot is said torepresent may be distinctly less than if it was used in a high–base rate situation. However,the opposite was found not to be true; i.e, quantifiers which denote small proportions do


not lead to large proportions if the expectation is large. I would have to question this as itwould appear to make some sense to me that if a small proportion quantifier was used in ahigh–base rate situation, that the proportion it is said to be denoting would increase andbe higher than if it is used in a low base–rate situation. For example, surely the quantifierfew is going to denote more in example 1 than in example 2 as Irish football fans wouldclearly be happy if Ireland won a game, compared to when they lost a game as in 2.

1. Few of the Irish football fans enjoyed Ireland’s 6–0 win against Germany.

2. Few of the Irish football fans enjoyed Ireland’s 0–6 defeat by Germany.

The overall conclusion of this study Moxey and Sanford (1993b) though, was that “pro-portions associated with quantifiers depend upon prior expectation” (Moxey & Sanford,1993b, pg. 79). Another side result of this experiment was that low–ranking quantifiersare not very well differentiated from each other. (see 7.3). This result could clearly beinvestigated as an experiment in its own right to see if subjects do in fact find them difficultto differentiate.

A second study was also carried out. Moxey and Sanford (1993b) put forward theproposal that quantifiers provide more information than proportions. They propose thatquantifiers also provide information about prior expectations, both of the speaker andthe listener. It is suggested that different quantifiers have different prior expectationsassociated with them. Quantifiers provide information on three different levels (Moxey &Sanford, 1993b).

1. L1 – the quantifier is used to explain the proportion involved. This is similar tothe first experiment that was carried out for this paper as discussed above, wheresubjects assign proportions to quantifiers.

2. L2 – the information the quantifier gives about the expected proportion the listenerthinks the speaker has about the occurrence of the issue.

3. L3 – the information the quantifier gives about the expected proportion the listenerthinks that the speaker thinks the listener has about how often the issue involvedoccurred.

This experiment investigated if it is possible to identify levels 2 and 3 from thesequantifiers. Nine hundred undergraduate subjects participated in the experiment. Theytook part before the start of a lecture. The same 3 conditions and 10 quantifiers as inthe first experiment were used. However, the question which was presented to subjectswas different. In order to investigate level 2, the following question was asked (Moxey &Sanford, 1993b, pg. 83)

Before the writer went to the party, and before knowing how many of theguests enjoyed the party, what % do you think the writer had expected to enjoythe party?.


Then in order to investigate Level 3, this question was asked:

Suppose the writer of the article is asked to say what percentage he thinksyou believed to enjoy the party before reading his article. Note you are beingasked your opinion about the proportion of the guests that the writer believedyou to have expected prior to reading his article

(See 7.3 for a further discussion of percentages vs frequency scales).The results for Level 2 (Moxey & Sanford, 1993b) found that the topic of the condition

had a significant impact on the percentage the subjects gave for the writer’s previousexpectations. Quantifiers and the topics together were weakly interacted. The authorsbelieved that one of the main results was that the subjects gave lower Level 2 estimates fora few and quite a few than the other quantifiers, thus indicating that the subjects beleivedthe speakers to have low prior expectations for these quantifiers (Moxey & Sanford, 1993b).Also when the subjects were asked about the expectations of the writer, they used their ownexpectations as a basis. With regard to the Level 3 results, topic was again significantlyimpacting the subjects judgements. Also a few and quite a few had low levels of expectation.A different result from those obtained from Level 2, was that very different values wereobtained for few and very few compared to not many. There are different interpretationsat Level 2 and Level 3 and so there are different expectations involved on the side of thelisteners and the speakers. The overall results of these experiments showed that the valueswhich subjects assign to high–proportion quantifiers are based on base–rate expectations.They are a function of context. This was not the case for lower–proportion quantifiers.The second experiment showed that it is possible for quantifiers to provide informationabout the expectations of the writers.

Moxey and Sanford tend to have subjects making only one judgement in their ex-periments so as to avoid within–subjects bias. It would be worthwhile replicating theseexperiments where the subjects make judgements on all of the statements in a conditionand determine if the same results are obtained. The statements would be fully randomisedin order to eliminate any ordering effects. If experiments are replicated and similar resultsobtained with a high correlation between the two sets of results, this shows that the resultsare reliable. However, if they are not similar and there is not a correlation between them,perhaps there is a flaw in the experimental design. If an experiment is valid and reliable,the same results should be obtained on another run of the experiment.

7.4.3 Conclusion on quantifiers and context

From the experiments discussed in this section it is clear that although Bass and O’Connor(1974) found that context did not influence the subject’s judgements, it usually does playa part as the remainder of the studies found. Depending on the context, their judgementof the quantifier will change.


7.5 Focus

7.5.1 Introduction to focus

This last section discusses research which has been carried out to investigate quantifiersand focus. Quantifiers lead to different focus patterns and these are outlined in this section.Experiments carried out in this area are also discussed.

7.5.2 Research into quantifiers and focus

Barton and Sanford (1990) say that quantifiers “control those aspects of a quantified state-ment to which people attend” (Barton & Sanford, 1990, pg. 82) while also providing in-formation on the proportions which quantifiers denote. The Frequency Signalling Theoryis used to describe the situation in which some individual does something frequently butusually only a small number of people do this. The Focus Control Theory applies if thequantifier turns focus to the proportion involved and also why that is the proportion. Bar-ton and Sanford (1990, p. 84) conclude that “certain expressions cause the attention of thereader to focus upon relations relating to the quantified assertion, while others do not”.

An experiment was carried out to test this. The quantifiers used in this experimentwere few, a few, only a few, occasionally, only occasionally, seldom. Sixteen universitystudents participated. There were 16 conditions and subjects made a judgement in eachcondition. The subject’s were given 2 sheets of paper - one with the materials and theother was an answering grid. The answering grid consisted of lines with the word subject atone end of the lines and the word object at the other end. Neither was in the middle of thelines. If the subjects thought the sentence said something special about the subject, thenthey would circle the subject. However, if they thought the sentence said something specialabout the object of the sentence, then they would circle object on the line. If they thoughtthe sentence wasn’t saying anything special about neither the subject nor the object, thenthey would circle neither.

The results showed that the quantifiers provide the information for making an attribu-tion. 3Secondly, it was found that occasionally and a few do not lead the reader to focuson obtaining a reason for the proportion or frequency in question, whereas few, only afew, only occasionally and seldom all tend to point towards low–proportions or frequencieswhich in turn focuses the readers’ attention on finding a reason for the low proportion.

An obvious alteration to this experiment would be having subjects answer a questionon who they thought the sentence said something about rather than them placing a markeron a line and then giving a percentage as to how certain they are of this judgement. Theresults could then be compared with the results obtained in this experiment to see if thechange in methodology changed the results.

Sanford, Moxey, and Paterson (1996) outline three experiments which show that quan-tifiers not only have the function of providing information on proportions of things, rather

3The Attribution Theory is concerned with the factors and conditions which influence subject’s patternsof explanation (Barton & Sanford, 1990)


they can also be distinguished by properties of focus. Differences in focus are “differencesin the availability of the various subsets that constitute the logicial representation of quan-tified statements” Sanford et al. (1996, p. 144). They claim that these differences in focusinfluence which quantifiers are chosen during production as well as influencing how subjectsunderstand quantifiers during discourse.

An example is given to explain focus sets. Some of the football fans went to the game

1. The reference set is “non–empty set of those fans who went to the game”

2. The complement set is “a possible set of fans who did not go to the game”

Previous studies e.g. (Moxey & Sanford, 1987) have shown that some quantifiers leadto subjects focusing on the reference set while others focus on the complement set. Forexample, if They watched it with enthusiasm was the follow up sentence, this is the refset.However if They watched it on TV instead was the follow up sentence, this is the complementset. Moxey and Sanford (1987) found that negative quantifiers allow complement setreference but that positive quantifiers require reference set reference. Sanford et al. (1996)carried out three experiments to test this claim.

The first experiment had three basic sentence frames with 10 quantifiers. The sentenceframes were of the form: QUANT of the parents allowed their children to the club. They

The quantifiers used were:

• negative quantifiers: not quite all, not all, less than half, not many, few

• positive quantifiers: nearly all, almost all, more than half, many, a few

(§7.2 discusses the classification of quantifiers in generalised quantifier theory. Recall thatmonotone increasing quantifiers are ones in which the statement is still true even if thequantified statement is made less restrictive, whereas a monotone decreasing quantifierexists if the statement is still true when the quantified statement is made more restrictiveand non–monotone is when neither of the above is true)

There were 300 undergraduate subjects with each subject making one judgement (§2.8discusses demand issues such as using undergraduate subjects in experiments). The sen-tences were printed on a separate sheet of paper. They had to read the sentence and thenmake a continuation that made sense. When they had done that, they were then askedto turn over the sheet and indicate who they was referring to. The subjects were givendifferent options corresponding to the different focus sets. The example given is Few of theparents allowed their children to the club. They .... Then one of the five options for thefocus sets was the parents who allowed their children to go (examples from Sanford et al.(1996, pg. 147)).

The results found that positive quantifiers nearly always (93% of the time) led toreference set continuations, whereas negative quantifiers led to both types of focus reference.Negative quantifiers lead to complement set continuations (71% of the time). The contentof the continuations was then analysed and they were classified as either reason–not (thereason why something didn’t happen), reason–true (reason why something did happen),


consequence (the consequence or result of the statement) or other. It was found thatnegative quantifiers lead to reason–not continuations, whereas positive quantifiers lead toother continuations, followed by reason–true continuations.

The aim of the second experiment was to further investigate that “complement set focusoccurs with negatives over a full range of proportion denotations” (Sanford et al., 1996, pg.149). There were positive and negative quantifiers that consisted of the modifiers nearlyand not quite with arguments in the range of 0 – 100 %. The list was:

• negative quantifiers: not quite 10%, not quite 30%, not quite 50%, not quite 70%,not quite 90%

• positive quantifiers: nearly 10%, nearly 30%, nearly50%, nearly 70%, nearly 90%

The same procedure as in the first experiment was used. Three hundred volunteer under-graduate subjects participated in the experiment.

The results once again found that positive quantifiers lead to mostly reference set con-tinuations, whereas negative quantifiers lead to a variety of continuations with 53% of thembeing complement set ones, 27% reference set, 6% were generalisations (the continuationdiscussed the general population in the sentence). When the content of the continuationswas analysed, it was found that the Other class was most often used. The other type iswhen the type of continuation was not deemed to be a Reason–true or a Reason–not or aConsequence type. Forty three percent of the continuations for negative quantifiers wereof the Other type and 58% of the continuations for positive quantifiers were of the Othertype.

The third experiment was quite different to the first two experiments. The authorsused a self–paced reading time task. The intuition was that if subjects read a quantifiedsentence and then read a following sentence which does not suit the preferred focus pattern,then this will disadvantage processing. Forty eight unpaid volunteer subjects took part.

The most striking result was that reading times were slower for complement set refer-ences with positive quantifiers and quicker for negative quantifiers. However for referencesets, the negative quantifiers led to slower reading times than positive quantifiers. Bymeans of an ANOVA test, the authors concluded that “negative quantifiers are perhapsprocessed more slowly than positive quantifiers” (Sanford et al., 1996, pg. 152). So thefocus effects of quantifiers which were found in the first two experiments did influence howwell the subjects read the subsequent sentences. It is worth noting here the predictionsconcerning processing times made by Barwise and Cooper (1981) which were that process-ing would be largest for non–monotone quantifiers, somewhat less for negative quantifersand smallest for increasing quantifiers. Barwise and Cooper (1981, p. 104) “predict thatresponse latencies for verification tasks involving decreasing quantifiers would be some-what greater than for increasing quantifiers, and that for the non–monotone it would bestill greater”.

The overall conclusion from these three experiments is that positive and negative quan-tifiers do have different focus effects. Negative quantifiers lead to mostly complement set


focus but do allow other types of focus, whereas positive quantifiers never lead to comple-ment set focus.

Moxey et al. (2001) investigate what aspects of negativity lead to complement set refer-ences. They proposed that it is denials of presuppositions and not downward monotonicitywhich leads to these complement set references. Three experiments were carried out toinvestigate this.

The first experiment had three main predictions:

1. the results obtained in earlier studies Moxey and Sanford (1987) and Sanford et al.(1996) would be found again when using a free continuation task instead of the usualcontinuations which start with a plural pronoun they. The continuation sentencesgiven would be similar to those given when subjects are forced to start the contin-uation sentence with they. If these results were found again, it would be clear thatthey are reliable results and can definitely be taken as being true.

2. quantifiers which lead to denials will produce complement set references, quantifierswhich lead to affirmations will not produce affirmations.

3. denials would cause Reason Why Not type continuations (i.e. reason why an eventdid not occur), e.g. Few of the fans went to the football match. They watched it onTV instead (Moxey et al., 2001, p. 430)

Monotone increasing and decreasing quantifiers were used. There were denial andaffirmation monotone decreasing quantifiers. Some quantifiers which have already beentested were also used. There were thirty simple declarative sentences, and these werecombined with a quantifier forming a predicate. There were eight quantifiers and so 240different sentences.

There were three categories to classify the continuations into:

• Reason Why Not (RWN)

• Reason Why

• Other

The subjects were 240 students from the University of Glasgow (§2.8 outlines demandissues involved in using university students). Each subject was shown one sentence andthen they would write the two following sentences “with the instruction to simply write acontinuation that they thought made sense”.

The results found that monotone decreasing quantifiers produce mostly Reason–Why–Not responses and monotone increasing quantifiers produce mostly Reason–Why responses.The results also confirmed that denials produce more Reason–Why–Not responses.

When analysing the continuations 5% of the continuations were complement set andonly 34% were reference sets. This confirmed the earlier hypothesis that negative quanti-fiers mostly lead to complement set reference. Monotone increasing quantifiers usually ledto reference sets, while monotone decreasing quantifiers led to complement sets.


The final, but main hypothesis, that the authors had made was that affirmations in themontone decreasing group of quantifiers, would produce fewer complement set referencesthan denials would. The results confirmed this.

The second experiment was carried out aiming to reproduce the results found in theprevious experiment but the difference was that this time the subjects were forced to startthe continuation sentence with They. Some of the sentences also had a connective conditionbecause which in previous work was found to “amplify the trend toward complement setfocus” (Moxey et al., 2001, p. 436).

There were four conditions:

1. at most N – monotone decreasing

2. No more than N – monotone decreasing

3. At least N – monotone increasing

4. No less than N – monotone increasing

There were 20 sentence stems for each condition. Subjects had to produce a continua-tion either:

• starting with the simple pronoun e.g. They

• starting with the connective because followed by a simple pronoun e.g. They

Examples were:No less than 10% of the fans went to the game. They ...No less than 10% of the fans went to the game because they ...

There were 320 unpaid undergraduate subjects. Each subject only saw one item (recall§2.8 covers demand issues). A sheet of paper was given to each subject with a partialsentence on it and they had to complete the sentence with something that made sense. Anexample is: No less than 10% of the fans went to the game. They ... (Moxey et al., 2001,p. 436). Then, when told, they turned over the sheet of paper and had to indicate whatthey meant the pronoun to refer to.

The results found that monotone decreasing quantifiers mostly led to Reason–Why–Not responses (46%). However monotone increasing quantifiers mostly led to Reason–Whyresponses (59%).

Secondly, there was a significant difference between affirmations and denials, for mono-tone decreasing quantifiers. The connective because was used in some of the sentences, therewas no significant difference between sentences with the connective and those without it.There was also no difference between quantifiers referring to a proportion and quantifiersreferring to a number. The results also confirmed the major hypothesis that denials leadto complement set focus. That is, sentences like No more than 10% of the fans went to the


match with subsequent anaphora were interpreted with the pronoun attached to the fanswho did not go to the game (complement set), rather than those who did (reference set).

One odd result was that the expression no less than 10% leads to complement sets 25%of the time. This would not be expected as it is an increasing quantifier. One reason couldbe that it is a double negative (both no and less than are negatives in their own right, sotogether are a double negative). Another reason could be that the expectation from worldknowledge is large, but 10% is quite a small percentage, so the authors propose that thereis a mismatch between the two.

This is what led to the 3rd experiment. 10% was replaced by 90% so that there werefour quantifier and connective combinations involving numbers and proportions.

1. no less than 90% of the ...

2. no less than 90% of the ... because

3. no less than 80 of the ... because

4. no less than 80 of the ...

There were 80 subjects who were first year undergraduate students. They volunteeredto participate in the experiment. The procedure for the experiment was the exact sameas in the experiment described just before this. The results obtained found basically nocomplement set references, which confirmed that the expectation mismatch hypothesis wasthe reason for the unusual result obtained in the second experiment.

The main conclusions are, thus, that denials lead to complement set references andthese in turn lead to the content of the responses being of the type Reason–Why–Not –reasons why the statement is false or the event did not happen.

Another area of potential research would be to explore the relationship between map-ping quantifiers onto a scale and also the complement sets and their focus effects. Forexample, one could investigate the scalar projection of complement set size estimation.Another possibility would be to investigate how double positives (e.g. much more than/lots more than) and positive negative combinations (eg. no more than) work – what focussets they lead to as well as the scalar projection of their set size estimation.

Sanford, Williams, and Fay (2001) argue that positive and negative quantifiers producedifferent patterns of attentional focus, supporting the reliability of the claims from the pre-ceding discussion. Positive quantifiers tend to produce reference set continuations, whereasnegative quantifiers lead to complement set continuations. In order to overcome the diffi-culty in establishing who they refers to when it is used as the beginning of a continuationsentence (usually subjects or judge’s judgements are used), the authors propose using theinclude(x,y) relation. This relation maps individuals on to sets to decide if complementset reference has occurred or not. (Sanford et al., 2001). Based on the findings discussedabove from Moxey et al. (2001), Sanford et al. (2001) propose that only denials will leadto complement set reference continuations.


The first experiment presented subjects with a quantified clause and then a questionwhich was essentially a judgement on whether or not the character in the quantified sen-tence was a part of the reference or complement set. An example given is: QUANT of thestudents had a car, and that includes Sophie. Did Sophie have a car? Yes? No? Don’tknow?. The denials used in the experiment were not many, few, no more than 10% andthe affirmations used were at most 10%, a few, at least 10%. It would be expected fromthe authors hypothesis above, that when the denials are used that this will lead to thecharacter in the sentence being part of the complement set reference. Twenty volunteersubjects took part. They were not paid.

The results found that there was more reference set focus for positives than denials.Negative quantifiers (denials) lead to the complement set focus pattern. Finally, “there isan asymmetry in focus patterns: Denial–forming quantifiers do not rule out reference setpatterns, whereas those that do not form denials almost block complement attachment”(Sanford et al., 2001, p. 1097).

The second experiment which was carried out had the aim of investigating the focuseffects observed with the including relation by using a self-paced reading paradigm. Therewere three conditions. In the first condition the quantifiers few and a few were used, inthe second condition at most 10 and not many and in the third condition no more than 10and at least 10 were used. There were 48 paid subjects and 32 passages. The passage waspresented one sentence at a time on a Visual Display Unit (VDU) and the subjects had toread it as fast as they could while understanding it. Then they had to answer a questionat the end.

For the passages using few and a few, a focusing difference was found. For the formerthere is complement and reference set reference, but for the latter it is just reference setfocus. For the passages using not many and at most 10% there was complement set focusfor the former and reference set focus for the latter. For the final set of passages, at least10% led to more reference set focus, but the no more than 10% (negative quantifier) ledto no preference for either. From those results, it is again clear that negative quantifiersgenerally lead to complement set reference.

7.5.3 Conclusion to quantifiers and focus

This section has discussed experiments that have been carried out investigating the rela-tionship between quantifiers and the different focus effects. It is clear that quantifiers doin fact provide more information than just information about proportions. This sectionhas shown that quantifiers lead subjects to focus on different types of sets.

7.6 Conclusion

This chapter has explored the area of quantifiers. It was divided into three differentareas. Experiments which have been carried out were discussed and critiqued. I have alsomade many suggestions for experiments which could be carried out in the future. The


next chapter will discuss two experiments which I carried out. They are replications ofexperiments reported in this chapter.

Chapter 8

Replications of Experiments

8.1 Introduction

This chapter1 outlines two separate experiments that have been carried out using thesystem described in this thesis. Both experiments are replications of experiments discussedin Chapter 7. The results of these replications are analyzed and correlated with the resultsdiscussed in Chapter 7.

8.2 Motivations for replicating experiments

It is worth replicating experiments in order to confirm results that have been previouslyreported. It shows that the results that were reported were not due to some other factorthat was not accounted for and ensures results are credible, reliable and valid (see §2.7).Experiments that seemed to be particularly open to demand characteristics (see §2.8)should be replicated with as many as possible of such demand characteristics removed orcontrolled for. Equally experiments can be replicated in order to question obscure results.I used a slightly different methodology to the one used in the original experiments, andthis might influence the results.

8.3 First Replication

8.3.1 Original Experiment

As discussed in §7.3.2, a study by Hakel (1968) also attempted to obtain a scale for fre-quency expressions. One hundred university subjects took part in the experiment whichtook the form of a questionnaire. Subjects were told that the aim of the experiment was

1This chapter derives from Buckley and Vogel (2003b), joint work with the co–author of that paper.The co–author, my dissertation supervisor, approves of textual overlaps between this chapter and thatpaper.

84

CHAPTER 8. REPLICATIONS OF EXPERIMENTS 85

to determine what the expressions meant to them. They had to give a numerical valuebetween 1 and 100 for 20 frequency expressions. All of the quantifiers appeared on thesame sheet of paper. There is no information regarding whether these quantifiers were ran-domised for each subject. The results found that always had the highest median value andnever had the lowest median value. Hakel (1968) reports that variability is widespread inthe results. Simpson (1944) had results which were similar with a 99% correlation betweenthe rank orders of the medians.

I replicated the above experiment using the web–based experimentation tool describedin this thesis. The next section describes two replications which I carried out. §2.7 explainsthe scientific value of replicating experiments generally.

8.3.2 Replication

I replicated the above experiment using the web–based experimentation tool described inthis thesis (see §3, §4, §5 for a discussion of this web–based experimentation system). Thefirst screen that was presented to subjects had the instructions for the experiment. Thesewere virtually the same instructions as given in the original experiment. The next screenhad the questionnaire itself. It had all 20 quantifiers, each with a text box for the subject’sjudgement. That is, subjects could supply any sort of text they wished. In practice whatthey supplied was a numeric value. Figure 8.1 shows the quantifiers used in the experiment.

Although information is not given in the Hakel paper regarding randomisation, theorder of the quantifiers on this screen was randomised for each subject in order to removeordering effects that might come from ordering of items. However, we added a furtherdimension to this experiment. The original experiment left itself open to context effectsinfluencing the results. Subjects may base their judgement for the second quantifier onthe judgement they had given for the first quantifier. We decided to obtain judgementson the same set of quantifiers for a second time but this time with only one quantifierappearing on each screen. The aim of this was to see if the judgements that people makewould be very different when they are presented with just one quantifier, than when theyare presented with a whole list of quantifiers on the same screen. The intention of thismodification to the original experimental design was to provide a within-experiment crosscheck on the reliability of projections of the set of quantifiers onto a linear ordering.

The research uses the same set of English temporal quantifying expressions employedby Hakel (1968) (see Figure 8.1)

The web based system on which the replication is constructed was designed to automaterandomized presentation of multiple items on a single visual field as well as to distributeitems across multiple visual fields, individually. The user could not navigate backwardsacross visual fields during the experiment. Thus, ordering effects are strictly controlled.The order of these 20 screens was also randomised. Thirty two subjects took part alto-gether. The mean age was 41, with 8 females and 25 males. Materials were not analyzedby sex.

To test the results, I examined three issues. One is a similar test to that constructed by


almost neveralways

about as often as notfrequentlygenerally

hardly evernever

not oftennow and thenoccasionally

oftenonce in a while

rarelyrather often

seldomsometimes

usuallyusually notvery seldomvery often

Figure 8.1: List of expressions used in Hakel’s experiment

Hakel (1968), who measured the correlation of rank orderings. Two other tests were alsointended. One was an alternative test of the similarity of rank orderings (both within theexperiment and with the results of Hakel (1968)), using a chi-squared test. The other wasa correlation, within-experiment, and with respect to the replicated study, of mean scalarscores for each of the temporal quantifiers examined.

8.3.3 Results and Discussion

The table shows the mean scores, median values and standard deviations given for thequantifying expressions in the replication experiment which was carried out for this paper.The expressions are ordered according to the median scores. The average standard devia-tion 9.93 is very large and shows that the values given for the expressions are very variable.Looking at some of the individual results, some subject’s frequently was another subject’svery often or usually or generally or often or rather often. Each of these expressions wereassigned the value 75 by some subjects. Not often was the most variable expression witha standard deviation of 20.03. This is due to two very big outliers, as 2 subjects gavevalues of 90+ for it. There is a 99% correlation (calculated using the statistical formula forcorrelation) between the median values of the results here and the median values found by


Table 8.1: Results from our experiment

Frequency Expression Mean Median Standard Deviation

always 99.55 95 1.30frequently 76.73 80 9.92very often 84.10 75 5.65usually 79.67 65 11.70generally 72.45 60 15.45about as often as not 45.97 50 13.57often 71.97 35 9.84rather often 66.36 25 18.28now and then 25.70 20 13.96sometimes 34.18 15 13.17usually not 17.09 12 15.61occasionally 30.67 10 14.83seldom 14.61 10 8.75not often 23.85 10 20.03once in a while 17.88 6 9.97hardly ever 7.42 5 4.66very seldom 8.66 3 5.52rarely 7.27 2 3.84almost 3.64 1 2.45never 0.03 0 0.17


Hakel (1968). Thus, it is possible to make the same conclusion as was made by him thatsubjects are “exceedingly stable about being exceedingly imprecise”.

Additionally, a correlation on the rank orderings realised in the original experimentand replication was computed — the correlation was 95% — and demonstrated a highpositive correlation between the rankings in the original work, and the work here (whichinvolved random ordering of the items in two different senses). Separately, a chi squaretest of significance was carried out to see if the rank ordering of our results was differentto the rank ordering of the results obtained in Hakel (1968)’s experiment. A value of 1.19was obtained with 19 degrees of freedom so it is possible to conclude that there is not asignificant difference between the rank ordering of the two sets of results (p<0.05 is thethreshold).

As mentioned, the subjects in this experiment were required to give a second judgementfor each quantifier. This time each quantifier appeared on separate screens, allowing thepossibility of double-checking individual responses to the quantifiers. When presentedall together in one visual field, even if randomized in presentation for each subject, thepotential for responses to items to influence each other could have an impact on the results.As reverse navigation through the materials is not possible, both order effects and reliabilitychecking of individual responses is also available in the modified replication reported here.There is a 99% correlation between the mean values obtained for the expressions whenthey were on separate screens and when they were all on the same screen. The greatestvariation was for the expression now and then, with a mean of 25.7 for when it was on thesame screen as the other quantifiers, but 33.7 when it appeared on its own screen.

As before, in relating these results to those of Hakel (1968), two additional tests werecarried out to examine rank ordering of temporal quantifiers. The first correlation wascomputed as 99% between the ranks of items presented in a single visual field and thosepresented individually. Secondly, a chi square test was again carried out and a chi squarevalue of 0.213 with 19 degrees of freedom was obtained which again shows that there is nota significant difference in the rank ordering of the quantifiers when they are presented onthe same screen or on different screens. This appears to show that subjects have providedstable judgements for the quantifiers and that whether the quantifiers appear on the sameslide or on different slides does not influence the results.

8.3.4 Future work

There is an abundant amount of research into quantifiers and whether they map to pointson a scale. This is partly due to people preferring to respond to questions containingquantifiers rather than to questions containing numbers. Moxey and Sanford (1993a)review research carried out in this area of mapping quantifiers onto a scale. Much ofthe research, including the experiments discussed in this chapter have involved subjectsmaking judgements about all of the quantifiers in the experiment. This leads to somewithin–subjects bias issues. Perhaps, the experiment described in this paper should bereplicated again but this time with subjects only making one judgement for one quantifier.

Moxey and Sanford (1993a) also raise the point, that because quantifiers are vague, then


Table 8.2: Results from our experiment

Frequency Expression Means (separate screens) Means (same screen)

always 99.03 99.55very often 81.85 84.1usually 76.12 79.67frequently 75.12 76.73often 73.18 71.97rather often 72.18 66.36generally 70.91 72.45about as often as not 45.24 45.97sometimes 35.21 34.18now and then 33.77 25.7occasionally 28.52 30.67once in a while 22.52 17.88not often 19.42 23.85usually not 17.15 17.09seldom 15.39 14.61very seldom 10.79 8.66rarely 8.82 7.27hardly ever 8.33 7.42almost never 5.12 3.64never 0.06 0.03


their scalar values could vary over time within an individual. A further experiment could beto have subjects make judgements on the same quantifiers but at a different time and theninvestigate how well correlated these results are. However, the experimental paradigmadopted here, attracting participants from the internet, doesn’t obviously lend itself tolongitudinal studies. Certainly, this is an area to explore. Note that experiments likethose of Hakel (1968) which depend on undergraduate students required to participate asa part of coursework for subject solicitation are liable to their own difficulties in guaranteedlongitudinal participation as discussed in §2.8.

Moxey and Sanford (1993a) conclude that since quantifiers are quite vague and con-fusing, then their “communicative impact may perhaps be found in other aspects of theirmeaning”, which then leads them to concentrate research on the focus effects of quantifiers.In the work presented here, complement sets and their accompanying focus effects havenot been explored at all. It would be interesting to examine the relationship between thesetwo factors.

(8.1) Sandy often runs to campus.

(8.2) Leslie rarely runs to campus.

For example, in (8.1) referential access is highlighted for those events in which Sandy runsto campus, while in (8.2), there is easy access to both the events of Leslie running, and thecomplement set of events that involve Leslie adopting other modes of transport to arrive atcampus. While the projection of temporal quantifiers onto a linear scale may seem at oddswith virtually all work in model theoretic semantics, the robust reliability that has beendemonstrated in making those projections has to be explained. If further work were toconsider the scalar projection of complement set size estimation, then there would be bothan additional reliability check on the scalar results, and perhaps a demonstration that workin formal semantics could profit from an alternative construal of temporal quantification.

8.4 Second Replication

The second replication that I carried out was a replication of Bass and O’Connor (1974)which was a magnitude estimation experiment. One hundred and seventy five subjectsparticipated in the experiment. Firstly they had to assign a number to sometimes and thenjudge the remainder of the words based on the value they had given for sometimes. Therewere two conditions – one for expressions of frequency and one for expressions of amount(this experiment is discussed in detail in Chapter 7). Seventeen subjects participated.However, one subject’s results had to be deleted due to them being complete outliers asvalues such as 1,000,000 were given for many of the judgements. This subject may havebeen trying to distort the results of the experiment as it does not appear to make sense toassign both always and none of the time values of a million as these expressions are clearlyat either end of the frequency distribution. It is generally recommended that if one obtainsresults which are not normally distributed due to an outlier, that the results of the outlier


be deleted and the results reanalysed. This is what I did, so, as a result only 16 subject’sresults were used in the analysis process.

The mean values for the frequency expressions in our replication and those obtained inthe Bass and O’Connor (1974) experiment can be seen in Table 8.3. Clearly there are somedifferences in the mean values given for the expressions, however the correlation betweenthe two results is 96%. This is clearly not as high as the correlation obtained for theprevious experiment, however, it is still a high correlation and shows that the results arequite similar. The expression constantly showed the greatest difference between the twomeans with a difference of 30.93. This is somewhat surprising as constantly seems to beat the high end of the frequency distribution scale and I would have expected expressionslike fairly many times or to some degree for example, to have bigger differences betweenthe means as they are quite ambiguous phrases. However this was not the case.

Although there is a high correlation between the mean results, there is not a highcorrelation for the standard deviations of the two sets of results. This is only 65% whichshows that there is a big difference in the spread of results given. The standard deviationsfor the replication which I carried out are much higher than the standard deviations for Bassand O’Connor (1974) results. I believe that the reason for this is that each expression thatthe subject’s were judging were on separate pages, they did not have previous answers tobase their judgements on as they could not navigate backwards through the experiment. Inthe original experiment, they could clearly see what they had given for previous judgementsand were perhaps basing their judgements on this and so their results would be less variable.

The results for the expressions of amount can be seen in Table 8.4The mean results for the expressions of amount are also highly correlated at 98%. The

expression an extreme amount of had the greatest difference between the mean values givenin both experiments. Again, I would have thought that this would not be the case andwould have expected perhaps to some extent to cause the biggest difference. Again thestandard deviations of the two sets of results are not highly correlated and this shows thatthere is a large spread across the sets of data. An extraordinary amount of had the higheststandard deviation in the replication experiment.

One weakness I find with this experiment is the number of judgements which thesubjects had to make. As noted in Schutze (1996) and discussed in Chapter 2, subjects areprone to fatigue and boredom at the end of an experiment and their judgements may not beas reliable at this point. This experiment is quite long and requires a lot of judgements andI believe these could be influencing the subjects at this stage. However, the replication didcombat against such effects by having the order of the expressions completely randomised,compared to the six different orders in the original experiment.

8.4.1 Future Work

I believe it would be worthwhile to carry out an experiment with the same expressions,but where the subjects do not have to base their judgements on the value they gaveto sometimes/some, as due to the large number of expressions (83 in total), I believeit is possible that subjects are no longer basing their judgements on sometimes/some


Table 8.3: Bass Replications

Expressions of Frequency Means Bass Means

sometimes 12.53 19.42always 82.88 58.01continually 80.44 20.16constantly 80.63 49.70frequently if not always 57.00 45.24very often 40.16 42.45a great deal of the time 45.03 41.37very frequently 49.94 40.02a great many times 45.06 39.28usually 44.00 39.18often 35.38 37.64frequently 38.44 36.07quite often 45.94 35.39rather frequently 36.22 34.44commonly 41.22 32.97fairly often 35.88 32.64fairly many times 33.13 30.65some of the time 12.97 18.01to some degree 13.09 15.52now and then 13.78 15.19occasionally 14.56 14.92once in a while 8.47 10.22not often 6.19 7.78not very often 5.66 7.23fairly infrequently 6.47 6.99infrequently 7.69 6.47rather seldom 6.09 6.42very seldom 4.78 4.72rarely 2.59 4.56very infrequently 8.69 4.54seldom if ever 2.38 3.69hardly at all 2.16 3.47hardly ever 2.72 3.34very rarely 2.01 2.99almost never 1.81 2.63seldom 4.19 0.33none of the time 0.19 0.17none at all 0.25 0.15never 0.25 0.08


Table 8.4: Bass Replications

Expressions of Amount Means Bass Means

some 16.16 18.63all 76.50 66.12an exhaustive amount of 54.66 59.27almost entirely 62.16 57.61completely 69.38 57.35an extraordinary amount of 69.31 54.46almost completely 55.34 51.38an extremely abundant amount of 49.66 48.89an extreme amount of 63.47 48.20a great amount of 43.44 41.56a great deal of 41.78 41.36very much 36.69 40.59a full amount of 50.00 40.50a lot of 36.69 37.10much 34.41 35.14quite a bit of 38.19 34.24a good bit of 28.31 32.65a considerable amount of 39.06 31.44pretty much 31.66 30.04fairly much 36.09 27.70an ample amount of 34.00 26.22an adequate amount of 23.34 24.07a moderate amount of 20.59 21.80to some extent 7.94 13.42to some degree 11.41 13.10a limited amount of 7.81 11.75a little bit of 4.13 9.57a small amount of 3.63 7.81somewhat 8.75 7.51comparatively little 3.66 7.22a little bit of 4.44 7.20not much 4.28 7.02a small degree of 3.31 5.27very little 3.00 5.21a slight amount of 3.38 5.09a meager amount of 3.34 4.28a scanty amount of 3.06 3.68a minimum amount of 3.94 3.64a trifling amount of 7.72 3.13scarcely any 2.22 2.98a trivial amount of 6.38 2.85an insignificant amount of 2.03 2.48hardly any 2.56 2.28none of the time 0.09 0.15


by the end of the experiment. If this experiment was carried out, the results could beanalysed and correlated. If the correlation between the results were high, then I believeit would be possible to conclude that subjects are in fact not basing their judgements onsometimes/some, however if they were not correlated then they were.

A possible alteration however would be for the judgement they give to some/sometimesto appear on each slide in the experiment so that they see it each time they are making ajudgement on an expression.

8.5 Discussion

Schutze (1996) argues at length about the empirical methods used in linguistic theory,focusing both on the factors that impinge on linguistic judgements and proper methods foreliciting judgements. While it isn’t clear that forcing a scalar interpretation of temporalquantifiers is well founded given formal model-theoretic semantic approaches to interpretingquantifiers, this chapter does demonstrate a resounding reliability in adult judgments aboutprojections of temporal quantifiers onto a linear scale. This addresses one of the criticismsthat Schutze (1996) makes about comparability of experiments that purport to depend onlinguistic judgements through what might otherwise seem ‘just’ replicating results of priorwork (see §2.7 for a discussion on the merits of replicating experiments). The researchdemonstrates that even with a wider subject pool, and with controls against orderingeffects, and with rigorous testing of correlations are reliable.

Thus, it is clear that the results with an experimental paradigm that appeals to a morediverse subject pool than the original (and potentially, though not actually, a larger subjectpool size) are reliable. The materials are controlled against effects of order of presentation,and with additional reliablility checks involving second-presentation of items to be clas-sified, in individual random orders. The results are compared with the earlier results inthe mean scalar values assigned to each quantifier, and in the overall rank ordering. Theresults are also internally compared with respect to the single-visual-field presentation andthe randomized individual item presentation for each subject. In the latter comparisoncorrelations of mean scalar value per quantifier and rank orders were also analyzed. Theprimary finding was a verification of the earlier results, with high positive correlations andno evidence of significant differences in distributions for the rank ordering.

8.6 Conclusion

This chapter has detailed two different experiments. The original experiments were dis-cussed in Chapter 7. I carried out replications of these two experiments using the web–based experimentation system as described in previous chapters. The results of the repli-cations were reported. I correlated the results of the replications with the results of theoriginal experiments. High correlations were found, thus validating the previous resultsand showing the experiments to be reliable. Suggestions were made for future work in


these two areas.

Chapter 9

Conclusion

9.1 Introduction

This chapter concludes the thesis. It provides a synopsis of the topics dealt with, the workcarried out, and recapitulates the main results. I will also provide some suggestions forfuture work which follow naturally from the accomplishments of the work I carried out forthis thesis.

9.2 Summary of work

The main part of this thesis involved work on a web–based experimentation system. In thisthesis I have described the modifications which I had to make to the system to improveits functionality, along with the additional facilities which were added to it, most notablythe randomisation of questions, the new labelling systems and the new data structure forstoring data to allow more flexible access to them. Chapters 4, 5 and 6 detailed theseissues and my approaches to them.

I also (mainly in Chapter 2) reported research into the area of carrying out experimentson the web. This however first included a discussion on the advantages and disadvantagesof the various other methods available to carry out experiments. I then focused on web–based experiments and why this is a useful platform for carrying out experiments whilealso mentioning the weaknesses of it. Demand characteristics, sampling and bias are threeareas of vital importance when carrying out experiments as they can all influence theresults which are obtained. I discussed these areas and also suggested some solutions tocontrol for them, e.g. using half volunteer, half non–volunteer subjects in experiments,posting to high membershipped, low trafficked newsgroups to obtain a web–based sampleand randomisation of materials. Sample bias may well exist in Internet sampling for someresearch questions; but, it seems that in general many research questions do not have alikely bias that derives from using the Internet as a sampling tool. Experiments (discussedin Chapter 7, and critiqued partly on their narrow sampling strategies) were replicatedusing Internet sampling and found highly correlated results.

96

CHAPTER 9. CONCLUSION 97

These results are the main empirical contribution of this work— previous research intothe area of quantifiers was reported. I discussed quantifiers in relation to obtaining ascale for them, how context influences their interpretation and finally the focus effectswhich they lead to. Many experiments were reported and suggestions were made whereimprovements could be made to their methodologies or sampling procedures (many of theexperiments involved undergraduate students and this for example should be avoided dueto bias issues).

9.3 Future Work

Many improvements can be made to the web–based experimentation system described inthis thesis. Firstly, individualised usernames and passwords for each subject could begenerated. Secondly, the experimenter could be automatically notified if a subject tries toparticipate in an experiment from an IP address which has already taken part. Thirdly,the layout of the webpages could be enhanced. Fourthly, the formatting of files couldbe modified to include markup strategies that are more compatible with a wider range ofpresentation and analysis packages. Finally, and most significantly, some statistical analysistools could be added to the analysis side of the system. This would be a great improvementas it would mean that experimenters could check if their results were significant or not,perhaps by using the chi square test of significance or a t–test, or a range of other measuresused in inferential statistics.

I carried out two replications based on the quantifier research which was discussed,many replications however could be carried out based on the suggestions I have made.For example, many of the experiments involved undergraduate subjects, these experimentsshould really be carried out using a more diverse sample.

Bibliography

Barton, S. B. & Sanford, A. J. (1990). The control of attributional patterns by the focusingproperties of quantifying expressions. Journal of Semantics, 7, 81–92.

Barwise, J. & Cooper, R. (1981). Generalised quantifiers and natural language. Linguisticsand Philosophy, 4, 159–219.

Bass, B. & O’Connor, E. (1974). Magnitude estimations of frequency and amount. Journalof Applied Psychology, 59, 313–320.

Birnbaum, M. H. (2000). Psychological Experiments on the Internet, chap. Decision makingin the lab and on the web, pp. 3–34. Academic Press.

Buckley, M. & Vogel, C. (2003a). Improving Internet Research Methods: A Web Labora-tory. In International Association for development of the information society.

Buckley, M. & Vogel, C. (2003b). Quantifying Temporal Quantifiers. Poster at AICS 2003.

Buckley, M. & Vogel, C. (2003c). A Web Laboratory for Improved Internet ResearchMethods. Tech. rep. TCD-CS-2003-XX, Computational Linguistics Group, TrinityCollege, University of Dublin.

Cowart, W. (1996). Experimental Syntax: Applying Objective Methods to Sentence Judge-ments. SAGE Publications.

Featherston, S. (2002). Magnitude estimation and what it can do for your syntax : Somewh–constraints in German. University of Tuebingen.

Greenbaum, S. & Quirk, R. (1970). Elicitation experiments in English: Linguistic Studiesin use and attitude. University of Miami Press.

Hakel, M. (1968). How often is often?. American Psychologist, 23, 533–534.

Hewson, C. & Charlton, J. P. (2003). Administration of the Multidimensional HealthLocus of Control Scale: a Comparison of Internet and non–Internet Methods. BoltonInstitute of Higher Education.

Hewson, C., Yule, P., Laurent, D., & Vogel, C. (2003). Internet Research Methods – apractical guide for the social and behavioural sciences. Sage Publications.

98

BIBLIOGRAPHY 99

Hourihane, F. (2002). Automated User Administration for a Web-Based Experiment Serverin the Human Sciences. http://www.cs.tcd.ie/courses/csll/projects4.html.BA (Mod) CSLL, Fourth Year Project, Trinity College, University of Dublin.

Keller, F., Corley, M., Corley, S., Konieczny, L., & Todirascu, A. (1998). WebExp: AJava Toolbox for Web-Based Psychological Experiment s. Tech. rep. HCRC/TR-99,University of Edinburgh.

Kenny, S. (1998). A Generic Automatic Experiment Creation and Presentation Tool.http://www.cs.tcd.ie/courses/csll/projects4.html. BA (Mod) CSLL, FourthYear Project, Trinity College, University of Dublin.

Krantz, J. H., Ballard, J., & Scher, J. (1997). Comparing the results of laboratory andWorld Wide Web samples of the determinants of female attractiveness. BehaviourResearch Methods, Instruments and Computers, 29, 264–269.

MacLeod, Colin, M. (1999). The item and list methods of directed forgetting: test dif-ferences and the role of demand characteristics. Psychonomic Bulletin and Review,6 (1), 123–129.

Mann, C. & Stewart, F. (2000). Internet Communication and Qualitative Research – AHandbook for Researching Online. Sage Publications.

McGowan, C. (1999). Extension of a WebBased Environment for Cognitive Science Experiments, with Testing for Metaphor.http://www.cs.tcd.ie/courses/csll/projects4.html. BA (Mod) CSLL, FourthYear Project, Trinity College, University of Dublin.

Moxey, L. & Sanford, A. J. (1987). Quantifiers and Focus. Journal of Semantics, 5,189–206.

Moxey, L. & Sanford, A. (1993a). Communicating Quantities. Lawrence Erlbaum Asso-ciates.

Moxey, L. & Sanford, A. (1993b). Prior expectations and the interpretation of naturallanguage quantifiers. European Journal of Cognitive Psychology, 5, 73–91.

Moxey, L., Sanford, A., & Dawydiak, E. (2001). Denials as Controllers of Negative Quan-tifier Focus. Journal of Memory and Language, 44, 427–442.

O’Brien, C. & Vogel, C. (2003). Spam Filters: Bayes vs. Chi-squared; Letters vs. Words.In International Symposium on Information and Communication Technologies.

Orne, M. T. (1959). The nature of hypnosis: Artifact and essence. Journal of Abnormaland Social Psychology, 58, 277–299.

BIBLIOGRAPHY 100

Orne, Martin, T. (1962). On The Social Psychology of the Psychological Experiment: WithParticular Reference to Demand Characteristics and Their Implications. AmericanPsychologist, 17 (11), 776–783.

Pepper, S. & Prytulak, L. S. (1974). Sometimes Frequently Means Seldom: Context Effectsin the Interpretation of Quantitative Expressions. Journal of Research in Personality,8, 95–101.

Rosenthal, R. & Rosnow, R. L. (1975). The Volunteer Subject. John Wiley & Sons.

Rosnow, Ralph, L. (2002). The Nature and Role of Demand Characteristics in ScientificInquiry. Prevention and Treatment, 5, Article pre0050037c.

Ryan, N. (2001). Web-based experimentation inCognitive Science. http://www.cs.tcd.ie/courses/csll/projects4.html. BA(Mod) CSLL, Fourth Year Project, Trinity College, University of Dublin.

Sanford, A., Moxey, L., & Paterson, K. (1996). Attentional focusing with quantifiers inproduction and comprehension. Memory and Cognition, 24 (2), 144–155.

Sanford, A., Williams, C., & Fay, N. (2001). When being included is being excluded: Anote on complement set focus and the inclusion relation. Memory and Cognition,29 (8), 1096–1101.

Schultz, D. P. (1969). The human subject in psychological research. Psychological Bulletin,72, 214–228.

Schutze, C. (1996). The Empirical Base of Linguistics: Grammaticality Judgments andLinguistic Methodology. The University of Chicago Press.

Silverman, I. (1965). Motives underlying the behavior of the subject in the psychologicalexperiment. Presented at the Symposium titled ”Ethical and methodological prob-lems in social psychological research” American Psychological Association, Chicago,Illinois.

Silverman, I. (1977). The Human Subject in the Psychological Laboratory. Pergamon Press.

Simpson, R. (1944). The specific meanings of certain terms indicating differring degrees offrequency. Quarterly Journal of Speech, 30, 328–330.

Smart, R. (1996). Subject selection bias in psychological research. Canadian Psychologist,7 (a), 115–121.

Smith, M. A. & Leigh, B. (1997). Virtual subjects: using the Internet as an alternativesource of subjects and research environment.. Behaviour, Research Methods, Instru-ments and Computers, 29 (4), 496–505.

BIBLIOGRAPHY 101

Wallsten, T. S., Fillenbaum, S., & Cox, J. A. (1986). Base Rate Effects on the Interpreta-tions of Probability and Frequency Expressions. Journal of Memory and Language,25, 571–587.

Weber, E. U. & Hilton, D. J. (1990). Contextual Effects in the Interpretations of Proba-bility Words: Perceived Base Rate and Severity of Events. Journal of ExperimentalPsychology: Human Perception and Performance, 16 (4), 781–789.

Appendix A

Examples

This appendix provides a sample of some of the responses which were posted in responseto my call for participation:

• Indeed, a number of possible responses to your words spring quickly to mind....

• I only have one response: I wish for you an offline experience. Go away (please).

• Actually, this could be fun and change the statistics.

• Sniff sniff, smells like a troll to farm e-mail addresses

• This explains the required web cam.

• IOW, it’s a psych test. Want to see if you can analyze us and prove something, eitherthat we are a ”crazy” social subgroup or we aren’t. Which is it this week?

• Sounds a lot like spam too.

• Maria is William F. your father? say howdy to Klan Moriarty & tip a pint’o guiness(or 2) for me

• You’re trying to get us to do your homework ? Its really Jenna Bush!!Like ALL theBush’s, she needs it to be handed to her!

• No, no, no. It should read like this:

REQUEST FOR URGENT BUSINESS RELATIONSHIP FROM DUBLINIRELAND

FIRST, I MUST SOLICIT YOUR STRICTEST CONFIDENCE INTHIS EXPERIMENT. THIS IS BY VIRTUE OF ITS EXPERIMENTALMODALITIES AS BEING UTTERLY CONFIDENTIAL AND ’TOP SE-CRET’. I AM SURE AND HAVE CONFIDENCE OF YOUR ABILITYAND RELIABILITY TO PARTICIPATE IN AN UNDERTAKING OF

102

APPENDIX A. EXAMPLES 103

THIS GREAT MAGNITUDE INVOLVING A PENDING RESEARCHPROJECT REQUIRING MAXIIMUM CONFIDENCE.

I AM TASKED WITH A RESEARCH PROJECT INVOLVING THERECOGNITION OF VARIOUS AND SUNDRY ENGLISH WORDS WHICHARE PRESENTLY TRAPPED IN MY NATIVE COUNTRY. IN OR-DER TO SUCCESSFULLY COMPLETE THE PROJECT I SOLICITYOUR ASSISTANCE TO ENABLE ME TO TRANSFER INTO YOURACCOUNT THE IMPOUNDED VOCABULARY.

BY VIRTUE OF MY STATUS AS A STUDENT I CANNOT TRANS-FER THE WORDS ON MY OWN NAME. I THEREFORE HAVE BEENDELEGATED AS A MATTER OF TRUST BY MY COLLEAGUES OFTHE UNIVERSITY TO LOOK FOR AN OVERSEAS PARTNER INTOWHOSE ACCOUNT WE WOULD LIKE TO TRANSFER THE FILES,CONTAINING 21,320 (TWENTY TWO THOUSAND, THREE HUN-DRED) ENGLISH WORDS. WE HAVE AGREED TO SHARE THE VO-CABULARY THUS: 1. ALL RECOGNIZED VERBS FOR THE AC-COUNT OWNER 2. ALL ADJECTIVES AND PRONOUNS FOR US 3.ALL NOUNS AND ARTICLES TO BE USED TO SETTLE ALL LOCALAND FOREIGN VOCABULARY RECOGNITION NEEDS.

PLEASE, NOTE THAT THIS TRANSACTION IS 100TRANSFERLATEST SEVEN (7) SCHOOL DAYS FROM THE DATE OF THE RE-CEIPT OF THE FOLLOWING INFORMATIOM: YOU USERID NAME,YOUR PASSWORD, YOUR MOTHERS MAIDEN NAME AND THEPIN NUMBER FOR YOUR ATM CARD. WE WILL USE YOUR NAMEAND PASSWORD ONLY TO TRANSFER THE FILES INTO YOURACCOUNT.

WE ARE LOOKING FORWARD TO DOING THIS BUSINESS WITHYOU AND SOLICIT YOUR CONFIDENTIALITY IN THIS TRANSA-TION. PLEASE ACKNOWLEDGE THE RECIEPT OF THIS CORRE-SPONDENCE SOONEST.

• That’s going to be a bit difficult if the account gets revoked because of inappropriateposting, now isn’t it? Still, I expect she’ll learn something from the experience.

• Hey Maria! We also need somebody for a great experiment. Listen to this! We needa girl to stand on the head of somebody we know while drinking lots of water. See?Even President Bush is laughing over this one. Listen guys, Americans are so goodto organize everything with everything placed in the proper spot and so on, so whydon’t we get them busy to make a big party? President Bush must be good at this,I’m sure. Even Secretary of State Colin Powell looks like a good guest when he comesin peace. Yes, just like I said, he is highly intelligent, If he wasn’t he wouldn’t beable to occupy such a position, the man is just a little too excited about technologyand power. Right? These guys need to quit their endeavors to fix the world, It’s

APPENDIX A. EXAMPLES 104

time to fix the human kind. Victory is when man prevails over bureaucracy and lawsby reason of mercy.

Documents

Cross-disciplinary research into improving Internet-based ... · Cross-disciplinary research into improving Internet-based research methodology applied to the interpretation of natural