Cloning Identities via Chatbot Learning Andy Walton … or the Artificial Linguistic Internet Computer Entity to give it its full title. Alice uses a ... Cloning Identities via Chatbot

Andy Walton Cloning Identities via Chatbot Learning

. i

The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student) _______________________________

Cloning Identities via Chatbot

Learning Andy Walton

BSc Information Systems 2004/2005


. ii

Summary

The objective of this project was to investigate the extent to which chatbot technologies

could be used to “clone” the identity of an individual. In order to do this, I had to conduct

research into chatbots and the way that they learn, choose a methodology to develop the

project by, design and implement the system, and finally conduct an evaluation via

comparison to other chatbots and qualitative analysis from users to answer the above

question.

The completed system models the personality of Joe Strummer, a well- respected musician

who sadly died in 2003. It is based on AIML, a derivative of XML, and runs in a user’s web

browser. It is kindly hosted by the Pandorabots.com webpage, at

http://www.pandorabots.com/pandora/talk?botid=ffeb5bc2ae353514 .


. iii

Acknowledgements

Huge thanks to Eric Atwell, my supervisor, for his help and willingness to point me in the

right direction over the past year, and Brandon Bennett, my assessor, for his positive

feedback at the Mid- Project Report stage and at the progress meeting.

Thanks also to the other three- quarters of The Attic Project- Mike (vocals), CJ (guitar) and

Andrew (drums) for lending their punk rock knowledge to the evaluation of this project.

Last but not least, thanks to my housemate Dave, for reminding me that sometimes, just

sometimes, a trip to the pub is just what a man needs

This project is of course dedicated to the late and very great Joe Strummer, who has

provided both an excellent choice of subject and much listening material over the last seven

months. I now feel like I know him better than some members of my own family.


. iv

1. INTRODUCTION ............................................................................................................................................ 1

1.1 AIM ............................................................................................................................................................... 1

1.2 OBJECTIVES ................................................................................................................................................... 1

1.3 MINIMUM REQUIREMENTS............................................................................................................................. 2

2. BACKGROUND RESEARCH........................................................................................................................ 3

2.1 OVERVIEW OF CHATBOT TECHNIQUES ........................................................................................................... 3

2.2 CHATBOT TRAINING METHODS ...................................................................................................................... 5

2.3 ISSUES RELATING TO CORPORA ...................................................................................................................... 6

3. PROJECT SCHEDULING............................................................................................................................ 10

3.1 IDENTIFICATION OF DELIVERABLES ............................................................................................................. 10

3.2 PROJECT SCHEDULE..................................................................................................................................... 10

3.3 REVISIONS TO SCHEDULE............................................................................................................................. 11

3.4 SELECTION OF METHODOLOGY ................................................................................................................... 12

3.4.1 Waterfall Model................................................................................................................................... 12

3.4.2 The Spiral Model ................................................................................................................................. 14

3.4.3 Prototyping models.............................................................................................................................. 15

3.4.4 Final selection ..................................................................................................................................... 16

4. DESIGN........................................................................................................................................................... 18

4.1 CHOICE OF SOFTWARE ................................................................................................................................. 18

4.1.1 AIML.................................................................................................................................................... 18

4.1.2 Elizabeth .............................................................................................................................................. 19

4.1.3 JFred ................................................................................................................................................... 20

4.1.4 Final Selection..................................................................................................................................... 21

4.2 CHOICE OF DEVELOPMENT TECHNIQUE ........................................................................................................ 21

4.2.1 Pandorawriter ..................................................................................................................................... 21

4.2.2 Least Frequent Word Approach .......................................................................................................... 22

4.3.3 Combined Pandorawriter/ Least Frequent Word Approach ............................................................... 23

4.3.4 Combined Pandorawriter/ Least Frequent Word Approach with additions........................................ 23

4.3.5 Final Selection..................................................................................................................................... 24

4.4 CHOICE OF SUBJECT..................................................................................................................................... 24

4.5 METHODOLOGY FOR SYSTEM IMPLEMENTATION ......................................................................................... 26

4.5.1 Prototyping .......................................................................................................................................... 26

4.5.2 Acquisition of Corpus .......................................................................................................................... 27

4.5.3 Conversion of Corpus.......................................................................................................................... 27

4.5.4 Identification of Keywords................................................................................................................... 28

4.5.5 Additions to Generated Input............................................................................................................... 29


. v

5. IMPLEMENTATION .................................................................................................................................... 30

5.1 PROTOTYPING.............................................................................................................................................. 30

5.2 ACQUISITION OF CORPUS............................................................................................................................. 31

5.3 CONVERSION OF CORPUS............................................................................................................................. 33

5.4 IDENTIFICATION AND ADDITION OF KEYWORDS .......................................................................................... 33

5.5 ADDITIONS TO GENERATED INPUT............................................................................................................... 34

6. EVALUATION ............................................................................................................................................... 37

6.1 IDENTIFICATION OF CRITERIA FOR EVALUATION OF SYSTEM........................................................................ 37

6.2 EVALUATION PART 1: TEST CONVERSATION AND COMPARISON TO ALICE ................................................. 38

6.2.1 Evaluation plan.................................................................................................................................... 38

6.2.2 Evaluation results ................................................................................................................................ 40

6.3 EVALUATION PART 2: QUALITATIVE ANALYSIS........................................................................................... 42

6.3.1 Evaluation plan.................................................................................................................................... 42

6.3.2 Evaluation results ................................................................................................................................ 43

7. DISCUSSION AND CONCLUSIONS .......................................................................................................... 45

8. BIBLIOGRAPHY........................................................................................................................................... 49

APPENDIX A – PROJECT REFLECTIONS.................................................................................................. 52

APPENDIX B – SCHEDULE GANTT CHART.............................................................................................. 54

APPENDIX C – GLOSSARY OF TERMS ...................................................................................................... 55

APPENDIX D – SAMPLE AIML FILE ........................................................................................................... 57

APPENDIX E – SCREENSHOTS .................................................................................................................... 62

APPENDIX F – LIST OF SYSTEM FILES..................................................................................................... 66


. 1

1. Introduction

1.1 Aim

The aim of this project is to investigate the extent to which a chatbot can assume the

personality of a specific human through conversational learning.

1.2 Objectives

• Conduct a programme of background reading on how chatbots learn, the essence of

human identity, and methods that can be used to train chatbots

• Produce survey of literature on background reading

• Produce development plan for system, stating what methodologies are to be used

and the reasons for these choices.

• Select specific humans or entities to model, and select corresponding data for the

chatbots.

• Devise method for converting training set data into chatbot- readable form (AIML)

• Produce personality modelling system based on the methodologies described earlier

and the entities selected for modelling.


. 2

1.3 Minimum Requirements

• A survey of literature on how chatbots learn

• A development plan for the system, stating methodologies to be used

• A method or system for converting input data into chatbot- readable form (AIML)

• A basic prototype system which exhibits some of the personality traits of the specified

individual


. 3

2. Background Research

2.1 Overview of chatbot techniques

The first step in this project was to conduct a study into the way that chatbots can learn and

the methodologies that could be used to train them. I also needed to conduct research into

the essence of human identity, in order to get a better idea of how to use these

methodologies effectively in the creation of my system. Before we look at this though, it is

worth examining how different chatbots store their conversation data. All chatbots use a

scripting language of some kind to store information. One of the most popular chatbots its

ALICE, or the Artificial Linguistic Internet Computer Entity to give it its full title. Alice uses a

language called AIML to store information to be recalled in a chat situation. As Wallace

explains in [1], AIML is a derivative of XML, the versatile eXtensible Markup Language. It

represents language as a series of key phrases (known as patterns), and responses to those

key phrases (known as templates). The following is an example of some simple AIML code:

<category>

<pattern> HELLO </pattern>

<template> Hi! How are you? </template>

</category>


. 4

This snippet of code is an example of an AIML category. Each category contains an input

from the user (pattern), and an output from the chatbot (template). In this instance, the

chatbot will respond to the input “hello” with the output “Hi! How are you?”. Note that

everything between pattern tags is in capitals. AIML automatically converts patterns to

capitals and removes punctuation when they come to be used.

There are, naturally, more functions available than simple pattern matching. (ALICE and

other AIML interpreters would be pretty poor conversationalists if this were not the case!) In

fact, [2] shows us that there is an extensive list of tags that AIML can recognise. The beauty

of AIML, however, and the reason for its acceptance within the A.I. Community, is its

simplicity and ease of use. It is possible to create very complex and involved conversational

scripts using only a few simple tags. In fact, this was one of the primary principles that AIML

was founded on. ([3]) Using the <that> tag to refer to the subject of the last piece of

dialogue, and <srai> tags to implement recursion, AIML combines power and ease of use.

However, not all chatbot systems are based on AIML. The Elizabeth system, developed by

Peter Millican, uses its own scripting language, more similar to a standard programming

language than the markup- based AIML. [4] explains fully how the system works, but it is

fundamentally based on a “notation character” system, with a different character at the

beginning of each line of the script denoting how the system is to handle it. (Is it, for

example, a pattern to be matched? A response to a pattern? Or a welcome message?).

Elizabeth also uses a system of input and output transfers to ensure that input is in a system

readable form, and output is in a form that makes sense to the user. As [5] explains, an

example of this could be changing “Mum” to “Mother”, on input (to prevent the system having

to have two separate yet virtually identical sets of responses, one for the word “Mother” and

one for “Mum”.

What these systems mean is that training a chatbot is essentially a case of converting

natural language into a form that can be read by a chatbot. (I.e. AIML or Elizabeth script)

There are many ways of doing this, some more practical than others in a given situation. The

raw script can simply be typed in as you would a program written in, say, Java, to produce a

chatbot that will probably work fairly effectively. This is one of the main methods for editing

Elizabeth scripts ([4]), but it can also be done in AIML as well. One of the strengths of AIML

over Elizabeth is that not only is it freely available over the Web, but it can be interacted with

online without requiring any downloads. This has given rise to websites like

Pandorabots.com, where users can create their own collection of AIML- compatible robots,

trained in a variety of different ways, and post them on the Internet for others to use. One


. 5

such method is the “train” interface. The Pandorabots tutorial offered in [7] is largely based

around this interface. In this case, users converse with the robot as normal, but have the

option to click a button and alter the response of the robot at any given stage of the

conversation. The program will then generate AIML script for the chatbot to use in future

conversation based on the altered responses. However, purists would argue that in these

cases the chatbot is not actually being trained. Rather, it is simply being directly told what to

do and carrying out those instructions in a way that all computers do. If the chatbot is to

“learn”, it must be able to generate its rules in a more natural way.

2.2 Chatbot training methods

One interesting way of doing this provided by AIML’s Targeting function. This is similar to the

Training interface provided by Pandorabots.com and discussed previously, but provides an

AIML rather than web based interface. In layman’s terms, this means that the bot can learn

through a natural conversational process, with the user telling the bot that it has made a

mistake and the bot asking what its response should be under the circumstances in a much

more “natural” way. An example of this (taken from [1]) is provided below:

The art of AIML writing is most apparent in creating default categories, which provide noncommittal replies to

a wide range of inputs. Here is a very nice example provided found on the alicebot-general mailing list:

User: Where do I find the webpage with Australopithecus?

Alice: Where do you want to find it?

User: Bad Answer Alice, time to Learn a new Line

Alice: How would you answer "Where do I find the webpage with Australopithecus?"

User: I would say: "http://www.ancientskulls.net/australopithecus.html"

Alice: Thanks, human. Try asking me again.

User: Where do I find the webpage with Australopithecus?

Alice: http://www.ancientskulls.net/australopithecus.html


. 6

In this example, the user creates a default AIML category using pre- defined keywords. (The

sentence “Bad Answer Alice, time to learn a new line” triggers the Targeting process)

Learning in this way allows chatbots to build up a larger repertoire of responses over the

course of natural conversation. However, while this may be fine for chatbots that have been

designed to chat about a particular topic, it is of little use for larger- scale chatbots such as

Alice, whose purpose is to emulate all of human speech and to be able to provide an

involving conversation about any topic. In order to do this, we need a way of translating

natural language straight into AIML on a large scale. This is done using a corpus.

2.3 Issues relating to corpora

The word “corpus” is defined by the Natural Language Processing dictionary ([8]) as “a large

body of natural language text used for accumulating statistics on natural language text. The

plural is corpora. Corpora often include extra information such as a tag for each word

indicating its part-of-speech, and perhaps the parse tree for each sentence” It is effectively a

training set of language. As the definition suggests, a corpus generally has markup tags on

it, similar to AIML. In fact, it is the similarity between the XML used to mark up a corpus and

the XML used in AIML that makes AIML so effective at tasks such as this, as pointed out in

[5] and [6], and has contributed greatly to the widespread acceptance of Alice by the chatbot

community. Converting a corpus to AIML is usually a case of simply converting one form of

XML to another, while converting a corpus to Elizabeth script involves major changes. A

Java application can be used to convert the corpus into AIML. (In [6] the corpus is in English,

while in [5] the corpus is an Afrikaans one) However, even converting into AIML is not

always as simple as it may first appear.

Firstly, there are myriad numbers of tagging schemes for natural language, each subtly

different to the last. The differences between tagging systems are explored in [9], however, I

feel they are only tangentially relevant to this project. Secondly, and perhaps more

importantly, corpora often contain mistakes or inaccuracies in the tagging used on them, and

often the dialogue within the corpora does not tie in well with the nature of chatbot chatting.

In a chatbot conversation, there is a highly defined structure to the conversation. There are

two participants (the human and the chatbot), who take regular turns to speak in (usually)

short sentences. Observe the example on the following page:


. 7

HUMAN: hello

CHATBOT: Hi there!

HUMAN: how are you

CHATBOT: I am fine thank you. And yourself?

Human conversation rarely, if ever, follows these patterns. Rather, there are often more than

two speakers, who frequently interrupt each other (meaning they do not take regular turns)

and who occasionally break into long monologues. ([6]) As such, it is worth stating that not

all texts would make good corpora, even if they are tagged correctly. Although there are

instances of texts such as a wad of newspapers read aloud being used as corpora, in reality

successful conversion into AIML depends on the corpus being translated being as close to

actual human speech as possible. Of course, often choosing a corpus for use in a project is

a case of using whatever corpora are available and choosing the most suitable. An

interesting side note to this is that it should not be assumed that all chatbot scripts are

designed to produce a chatbot that chats in exactly the same way that a human does.

[10] documents the creation of a different kind of chatbot, a means of accessing information

about the Muslin holy text the Qur'an. In this instance, the chatbot was designed to take an

ayyaa (verse) from the Qur'an as input, and match it to an output giving the next ayyaa,

amongst other information. Designed as a teaching tool for Muslims, it would of course be

useless to try and train this chatbot from a standard conversation- style corpus. As such,

corpus choice should take into account the suitability of the corpus to the task, as well as its

apparent quality. In terms of the tagging and syntax errors which frequently abound in

corpora (for example spelling mistakes or not having a closing tag where there should be

one), the only way to avoid them is to go through a corpus and manually edit it until it

features fully legitimate XML.

In this project I will need to consider issues like this carefully. I need to select a subject (or

preferably subjects) which are suitable to be modelled in this way. Suitable candidates will

need to have a sizeable corpus of conversational data that I can use to model them. As it is

obviously extremely unlikely that any of the individuals short listed would have ever taken

part in a study designed to create an official corpus, I will have to look elsewhere to find a

training set of data. One possibility is modelling a singer or lyricist. In this case, it would need

to be an individual with a long career and a lot of produced output, for example Joe


. 8

Strummer of 80s punk band The Clash. The interesting thing about using a celebrity lyricist

such as Joe Strummer is that the project could take on many different directions thanks to

the fact that both song lyrics and interview transcripts are both readily available. Song lyrics

could be used to produce a chatbot not dissimilar to the Qur'an educational aid documented

in [10], with a given lyric or lyric fragment producing any number of responses. (perhaps the

next line of the song, the song and/ or album that the lyric was culled from, or the meaning of

the lyric)

While this in itself may not necessarily be in keeping with the aim of the project, (after all, a

system of this nature could hardly be said to be modelling a personality) when combined

with more traditional chatbot behaviour using a corpus of interview transcripts this could

prove effective, as if the user was engaging Joe Strummer in a conversation specifically

about his lyrics. Another avenue which could be taken is that of modelling a lecturer making

posts on the School of Computing newsgroups. In this case it would need to be a very

regular poster across a wide range of groups. Someone who posts only on module groups,

for example, would be of little use as their posts would probably be quite detailed and

technical. What would be needed in this case would be someone who made posts on more

"chatty" groups (such as local.talk.humour) to give a better idea of their conversational style.

Corpus to AIML conversion is not simply a case of converting each corpus tag to an

“equivalent” AIML tag. In order to produce a sensible conversation, chatbots must be able to

do a lot more than basic pattern matching. For example, a chatbot needs to be able to refer

to what was said previously in the conversation in order to give its responses some context.

[5] discusses two methods of doing this, the first word approach and the least frequent word

approach. The first word approach is based on the concept that the first word of a sentence

is the key to producing a sensible response. While this is not necessarily true all the time,

the first word approach is useful for producing “catch- all” sentences which can be used in

the event of a full pattern match not happening to produce a (usually) coherent yet general

response.

HUMAN: What is your favourite food?

CHATBOT: Why do you ask so many questions all the time?

In my example, the chatbot does not recognise the specific question “what is your favourite

food?”. It does, however, recognise that a sentence starting with “what” will almost always be

a question, and that “Why do you ask so many questions all the time?” is a response that,


. 9

while not that useful from a conversational point of view, does at least make sense for just

about every question that could be asked. The least frequent word approach is a little more

complex in its implementation. It works on the basis that words which do not appear very

much in the corpus as a whole, but appear in a particular sentence, have a high information

value. In other words, they are probably the focus of the sentence because their use is so

infrequent that they are not likely to just “crop up” in normal conversation. [5] gives a detailed

description of how the last frequent word is established within the corpus, but in essence it

involves “tokenising” the whole corpus. (i.e. producing a list of all the words in the corpus

and their frequency) Once this is done, words with a high information value in any given

sentence can be identified. Any words with a low frequency/ high information content can

then be adopted by AIML as topics if the least frequent word approach is in operation.


. 10

3. Project Scheduling

In any major project such as this, adhering to a tight schedule is vital. As Thomsett puts it in

[11], “A project is like a journey. Like all journeys, if you want to arrive safely is sensible to do

some planning before you start.” This maxim applies to this chatbot project just as much as it

does to any larger software development type project.

3.1 Identification of deliverables

The first step in planning this project was an identification of deliverables. I identified four

major deliverables, each of which had a number of sub- deliverables. They are:

1 A mid project report containing a review of relevant literature

2 A draft chapter with table of contents

3 A completed identity cloning chatbot system

4 A completed project report

3.2 Project Schedule

In order to help me produce these deliverables with the highest level of efficiency possible, I

devised a project schedule Gantt chart. The Gantt chart can be found in Appendix B.


. 11

Fundamentally, my plan was to give myself plenty of time to do the necessary background

reading on this project, and ensure that my Mid- Project Report was completed with time to

spare before the 10th of December deadline. The sub deliverables I identified for the Mid-

Project Report are: preparation of a draft chapter on background research, preparation of a

neat schedule chart, and production of a final chapter on background reading. I intended to

leave the holiday period after that free for exam revision. After the Christmas break I

concentrated on implementing the designed system, producing the implementation write- up

as I went along in order to minimise fuss at the end of the project. I also ensured that my

draft chapter and table of contents are produced before the deadline. After the

demonstration of the system on the 19th of March, I used the Easter period to finish

evaluating the created system and report writing, before handing it in on the 27th of April

2005.

3.3 Revisions to schedule

Of course, no schedule can be perfect. It is impossible to foresee and allow for every

possible complication during the production of a project of this magnitude, just as it is

impossible to precisely estimate how long it will take to complete a given task. The first

alteration to the schedule came with the prototyping stage. I had originally intended to

produce a prototype system very early on in the project, in order to help gain perspective on

the background research I was doing at the time. However, during the conduction of this

research I realised that it would be a poor idea to produce the prototype at that stage. It

would have been a time consuming process and the benefits I would have reaped in terms

of increased understanding of chatbot technologies would have been minimal compared to

those of simply using the time to further my knowledge of chatbots. I detailed my reasons for

altering the schedule in this way in my Mid- Project Report.

It is also worth considering that I had not conducted a formal analysis into which chatbot

technology I intended to use at this point, and as such could have found myself developing a

prototype in a language that I decided not to use for the final solution. This would have

devalued a prototype created at this stage yet further. As such, I decided to postpone the

prototyping stage until the time came to actually implement the system and full analyses into

which language to use and how the system was to be developed had been conducted. The

prototype could them be treated as a precursor to the main system to help identify bugs and

flaws in the implementation plan, rather than a more general exercise in chatbot technique.


. 12

Apart from that, progress ran close to the proposed schedule until I reached the stage of

actually implementing the system according to my implementation plan. The problem here

was that I had underestimated the time that it would take to implement the tags that would

allow the chatbot to respond to inputs based on the recursion of keywords. I had originally

planned to produce the report and solution closely in parallel, but in practice I realised that

the best way to ensure the implementation of the solution did not overrun massively was to

put the writing of the report on hiatus and concentrate solely on implementing the system.

This allowed me to have the implementation of the system finished by the time I needed to

produce my draft chapter and table of contents. As a consequence of this, I was forced to

shuffle the evaluation of my system and the production of the implementation and evaluation

writeups until after the progress meeting, but this did not hugely affect me as I had

deliberately left three weeks before the deadline of the report with little to do in the original

schedule, in anticipation of a situation like this arising. Overall, I was pleased with how I

stuck to the original schedule. At no point during this project did I feel as if I was

mismanaging my time.

3.4 Selection of Methodology

However, successful management of a project of this nature requires more than simply

creating a schedule to adhere to. A methodology for development (sometimes referred to as

a “process model”) must be employed to ensure that the developer has the clearest possible

picture. In [12], Humphrey states: “A software process (model) comprises the activities,

methods, and practices necessary to develop a software system”. In other words,

methodologies for project development are useful for clarifying the project process by

providing a clear framework detailing how the stages in a project should be structured. In this

section of the report, I will evaluate each potential model and choose which methodology to

use to implement the system.

3.4.1 Waterfall Model

Many different software processes have been developed over the years. However, by far the

most famous and successful is the Waterfall Model. The Waterfall Model is commonly held

to be the forerunner to many more “modern” forms of process modelling ([13]).It was

proposed in 1970 by W.W. Royce in [14], and is a highly structured form of project

management. Development is split into a series of phases, of which specifics differ


. 13

depending on the views of the developer but which all contain the general stages Problem

Specification, Analysis, Design, Implementation, Integration and Evaluation. The principle

behind the model is that each phase is completed fully before the next one is moved on to.

(For example, the system is fully analysed in order to allow the design phase to begin. Once

the design phase is complete the system can be implemented according to that design and

so on.) Phases are completed in a strict order. The name “waterfall model” comes from the

concept that once the process “falls” to the next stage it cannot go back up, (so a developer

cannot go back to the design phase of a project that has begun its implementation, for

example) in much the same way that water cannot flow back up a waterfall ([15]). In practice,

as [16] demonstrates, “feedback loops” are often incorporated into the model so that if a

problem is encountered at a given phase, or if it becomes apparent that a phase was

implemented incorrectly, the appropriate corrections can be made.

The highly structured nature of this methodology provides both its advantages and

disadvantages. It is true that a lot of proponents of the object- oriented paradigm consider

the waterfall model an anachronism, and in some ways this is the case. [13] points out that it

is overly rigid in its insistence that each phase should be completed before the next one

begins, and that phases should only be returned to as a last resort. Also, a working version

of the product is not produced until relatively late on in the process, meaning that major

redesigns could be in order if the working version proves to be unsatisfactory. It could be

said that the fact that the steps of the process are rarely completed in order makes the

process largely redundant as well ([16]). However, in reality the waterfall model can still play

a valuable part in this kind of development process. Its emphasis on completing each stage

before attempting the next means that each stage is tested and documented before the next

is attempted, with evidence of progress generated at each stage, as illustrated in [17]. Also,

it is an easy model to follow. Even inexperienced developers can clearly see what stages

they have to complete and what those stages should feature.

An illustrative example of the waterfall model (taken from [16]) is shown on the following

page. Note the clearly defined stages in the model, with each stage following sequentially

from the last. The version shown has “maintenance” arrows showing feedback loops. This is

a neat example of how different developers use the waterfall model in different ways. Here it

is treated as a kind of continuous process, with the developers constantly going back to the

various stages of the process and analysing how they can be improved. This is one way to

work around the perceived shortcoming of the model if the system is to be used over a long

period of time.


. 14

3.4.2 The Spiral Model

The spiral model is a derivative of the waterfall model. It was developed by Boehm in 1986

as a more realistic “real world” alternative to the waterfall model in [18]. It consists of a series

of “task regions” in a similar way to the waterfall model. For example, a “six task region”

model may have the regions “customer communication”, “planning”, “risk analysis”,

“engineering”, “construction and release” and “customer evaluation”. The idea is that once

these six tasks have been completed a deliverable will be produced. If that deliverable is

acceptable, the task regions will be undertaken again to produce the next deliverable ([19]).

For example, the first deliverable could be a prototype, the second an area of functionality of

the main system, and so on. The development of the system “spirals out” with each

completion of the task regions. It allows working products to be viewed and evaluated very

early on, and the project can easily be cancelled at the risk assessment phase if it is going


. 15

awry, as detailed in [19]. The main advantage of the spiral model is its focus on risk

assessment ([20]). With a focus on the risks involved in a project analysed from the start, the

developer does not find themselves in the position of finding their system does not meet its

requirements late in the day, as is the case with the waterfall model. However, this risk

assessment focus can also be the downfall of the model. [20] points out that risk assessment

can be expensive, and as such the spiral model does not make sense for smaller scale

projects. Also, risk assessment is a skilled discipline, which requires the right staff to do it.

This is in contrast to the ease with which the waterfall model can be implemented.

As with most methodologies, the clearest way to display this is with a diagram. This figure is

taken from [19], and shows the development process “spiralling out” through several

iterations of the various stages.

3.4.3 Prototyping models

The term “prototyping models” is an umbrella term for a family of process models that are

based around the concept of producing several small prototypes of the core functionalities of

systems. These prototypes are generally continually embellished, one item of functionality at

a time, until the final product has been produced. There are three main types of prototyping

model. The first is Rapid prototyping. This is where a “quick and dirty” ([20]) prototype is


. 16

produced. If it is deemed acceptable, the prototype is discarded (on the basis that it was

nothing more than a rough demonstration of the idea for the finished product) and a new

system is developed based on the prototype. Incremental prototyping is similar, but the

prototype is developed with a little more care and then used as the basis for the actual

implemented system. Evolutionary prototyping is closely related to incremental prototyping,

but with less interaction with the client. Developers are largely left to pursue their own

designs in creating the prototype. As there is no “client” as such in this project, this

distinction is largely academic for the purposes of this evaluation. All these models are well

documented in [20]. Prototyping models are useful in that they allow the progress of the

system in development to be closely monitored, but their unstructured nature and lack of

clearly defined and documented stages can leave them vulnerable to overrunning and

makes them very difficult to schedule.

The diagram below is taken from [20]. It shows the incremental prototyping model.

Note the small “jumps” in the development as each small prototype is produced. The

diagrams of the Rapid and Evolutionary prototyping methodologies are very similar, but the

Rapid methodology in particular features fewer, but larger jumps.

3.4.4 Final selection

Overall, I feel that the Waterfall model of development is the one that is most suited to this


. 17

project. While it is sometimes viewed as being simplistic, the fact is that for a project of this

scale the perceived limitations of the model apply less. This project needs a methodology

which will allow me to develop the system in a clearly structured way, with the opportunity to

document every stage, rather than one which places a large focus on risk assessment and

prototyping. While I do intend to develop a prototype of the system to aid with my

implementation, I feel that the constant prototyping seen in the Prototyping set of models, as

well as the Spiral model to a limited extent, is not required. The Spiral model is more suited

to larger projects where risk is a major factor and the option to shelve or postpone the

project is a viable one. In the situation I am in, of course, there is no question of the system

development being cancelled, and as such the advantages of the Spiral model are rendered

largely redundant. The prototyping models could provide more of a viable alternative, but the

fact that they do not encourage easy documentation at each stage of the process is a major

factor. The Waterfall model allows the process to be broken down into several small stages

which are ideal for the development of a project of this size, and allow me to implement and

document the system in an efficient and effective way.


. 18

4. Design

4.1 Choice of software There are several different chatbot technologies available to me, most of which as freeware.

As mentioned previously, by far the most successful of these is AIML, used to create the

widely popular A.L.I.C.E. chatbot. However, with a range of alternatives to consider it would

be foolish of me at this stage to not evaluate the available systems as to which would be

most suitable for developing a project of this type.

4.1.1 AIML

AIML is a derivative of XML. It is very popular with the chatbot community due to its simple

yet powerful nature, as well as the fact that it is an Open Source project and therefore

completely free. AIML’s strength lies in the fact that, whilst it is possible to create extremely

intelligent chatbots using the system, in fact it is entirely based on simple pattern- response

type templates. Dr. Richard Wallace, the creator of A.L.I.C.E., states that he believes this

back to basics approach is the key to chatbot success, that “You don’t need a complex

theory of learning, neural nets, or cognitive models … Our stimulus-response model is as

good a theory as any other for these cases, and certainly the simplest.” ([1]) It is the

straightforward nature of AIML that gives it its flexibility, as with the use of just a few simple

AIML tags (for example, recursion is elegantly implemented via the use of a simple

“recursion” tag, <srai/>) a set of rules can be produced which can replicate practically any


. 19

kind of conversational scenario.

It is worth considering, too, that over the course of my background research for this project I

became quite familiar with the workings of AIML, and as such would not need to learn a new

language in order to begin implementation on the project. Another note is that projects in a

similar vein to the one that I am attempting here have already been implemented in AIML,

meaning that AIML’s ability to replicate a human personality has already been tried and

tested. The best example of this is the John Lennon Artificial Intelligence Project [21]. This is

an attempt by Triumph PC to recreate the personality of John Lennon via an AIML system.

The system works well, although the conversation can feel a little “directed”, as if there are

certain questions the bot really wants the user to ask, at times.

4.1.2 Elizabeth

Elizabeth is effectively an adaptation of Joseph Weizenbaum’s ELIZA chatbot, designed to

include features similar to AIML such as recursive pattern matching, and substitution. [22]

Like AIML, Elizabeth is based on a stimulus- response type system and has its responses

dictated by a set of scripted rules. Unlike AIML, however, Elizabeth’s scripts are text, rather

than XML, based, and rather more like a simple programming language to create. The

ELIZA program simply worked by turning all user input round in the following way: [23]

I am having a very bad day today

Did you come to me because you were having a very bad day today?

The overall effect was intended to be like that of talking to a psychologist. This is worth

bringing up because the input and output transformations required to produce ELIZA’s

responses are still very much in evidence in Elizabeth’s scripts [24], for example:

I mum => mother

I dad => father

O i am => YOU ARE

O you are => I AM

The knock- on effect of this in terms of using Elizabeth for this project is that, while Elizabeth

may be equally able to produce a coherent conversation to AIML, (and in fact the ease with

which the scripts allow answers to be turned into further questions means that in some


. 20

applications Elizabeth would probably be the preferable tool to use) it is unlikely that

Elizabeth would be able to model the conversational habits of a specific individual in the

same way. Elizabeth scripts are more complex than AIML documents, and therefore lose

some of AIML’s flexibility. The characters at the start of the lines in the example are “script

commands” [4]. These denote how each line of the script is to be treated by Elizabeth, for

example “K” denotes a keyword pattern (analogous to AIML’s <pattern> tag set) and “R”

denotes a response pattern (analogous to AIML’s <template> tag set). [24]

However, there are many situations in which the script command system seems inelegant

compared to AIML. For example, AIML can be given a list of random responses to move the

conversation along if input is used which it does not recognise. Elizabeth, on the other hand,

has separate script commands for void input, null input, and unrecognised input. This means

that Elizabeth scripts are often large compared to AIML rules, making it unsuitable for a large

and complex project such as this one. Another factor to consider is that, while I do have a

little Elizabeth experience, I am not as well versed in Elizabeth scripting as I am in AIML

scripting, meaning the implementation would take longer to complete if Elizabeth were to be

used.

4.1.3 JFred

JFred stands for Java- based Functional Response Emulation Devices. It is an Open Source

chatbot technology in the form of a Java applet. [25] It in quite different in its operational

style to both AIML and Elizabeth, in that it works on a system of probabilities. JFred scripts

are full of sections like this: [26]

action: HELLO

priority: 7

Hi there.

Hello, who are you?

Greetings.

The HELLO at the top of the segment is the keyword to be matched, and the “priority” field

underneath dictates the probability (relative to probability values of all the other keywords) of

one of the responses at the bottom being chosen if the user inputs something close to

“Hello”. This use of “fuzzy” logic, coupled with several responses for each keyword in the

style of AIML’s <random> tags means that JFred can produce a good conversation with

fewer of the “I’m sorry, I don’t understand” style face- saving lines that other chatbot systems


. 21

may have to use. However, this has two major downsides when compared to AIML and

Elizabeth. The first is that fuzzy logic and random responses mean that the responses of the

chatbot will be by nature unpredictable, even to its creator. This would be a hindrance when

creating a personality- replicating chatbot, as it increases the chance of the robot saying

something out of character. Secondly, JFred scripts are extremely large and complex, as

unlike AIML and Elizabeth no support is provided for the use of multiple scripts in one

chatbot to keep script sizes down. This means that the program would be extremely

cumbersome to develop an ambitious system such as the one being proposed here in.

4.1.4 Final Selection

Over the course of my research I looked at several other chatbot systems, such as the

NativeMinds software from Verity Response [27]. However, I felt that these three were the

only real options that I had to choose from in the creation of my chatbot system. Other

systems were generally too specialised (for example, NativeMinds is a chatbot creation tool

designed to create a virtual sales rep for large company websites- patently not what I am

looking for here) or too simplistic (many small Java applet type chatbots exist but they would

simply not be practical to implement this project in). On the whole, I decided that AIML

would be the best system to implement this chatbot in. It is very easy to use, and crucially

has the level of flexibility required to make the personality replicating chatbot a success.

4.2 Choice of development technique

Something else to consider carefully in the development of this project is the development

technique that I will use to convert files from the standard conversational corpus into AIML

readable format. There are several different ways of achieving this aim.

4.2.1 Pandorawriter

Pandorawriter is a feature from www.pandorabots.com. It basically acts as a parser for text

conversations, allowing conversations which have been correctly formatted to be

automatically converted into AIML files that any AIML based chatbot can use. [28] This is an

interesting feature, and with a very large corpus it is possible that this method alone could

produce a chatbot that could replicate a personality well enough to satisfy the aims of this


. 22

project. However, in practice this methodology would not produce a chatbot with the ability to

replicate the personality of the subject unless I was able to obtain a corpus with the answer

to every single question that users would be likely to ask. Another weakness in this

methodology is that, while responses can be fairly free- form, multi- sentence answers

including punctuation, inputs can only be single sentence queries with no punctuation. This

means that any corpus I acquire will be severely limited in its potential usefulness unless I

was to go through each file and reword all the inputs to conform to this specification.

4.2.2 Least Frequent Word Approach

This is a system based on the concept that each word in a sentence has “information

content”. Let’s take an example:

The giraffe escaped from the zoo by jumping the fence

Here, the individual words which comprise this sentence have different levels of information

content. Information content is best described as “the amount which the presence of a word

in a sentence contributes to the reader’s understanding of that sentence”. The words “The”,

“From and “By” have low information contents as they are very frequent words which appear

in a great number of sentences in the English language. However, the words “Giraffe”,

“Escaped” and “Zoo” are much less frequent words, and as such have a much higher level of

information content. In fact, it would be possible for a person to tell what the full sentence

was generally about just by reading those three words. This is significant to chatbots as their

responses can be tailored so that rather than matching whole sentences, they can simply

match keywords with a high level of information content in the knowledge that a response

relating to a giraffe will probably make sense for a sentence that includes the word “Giraffe”.

However, this approach is not without its problems. Mainly, there is the issue of just how to

decide which word in a sentence has the highest level of information content. Is the least

frequent word in our example “Giraffe” or “Zoo”? Making these decisions takes time and may

require a level of knowledge about the subject of the sentence on that part of the chatbot

creator, something which cannot always be assumed. Secondly, if there is no form of

Pandorawriter- style automation, the bulk of the corpus may have to be input by hand, an

extremely laborious and lengthy process.


. 23

4.3.3 Combined Pandorawriter/ Least Frequent Word Approach

A more effective stratagem may well be the combination of the two previously mentioned

approaches. Pandorawriter could be used to read in the bulk of the acquired corpus as

before, and the least frequent word approach could then be applied to the data when it was

already in AIML format. The least frequent word approach could take advantage of AIML’s

recursion (<srai>) tags, by looking at each sentence input using Pandorawriter, analysing it

for the word with the highest information content, and the setting up a series of statements

that recurred back to the original sentence based on that word. This would mean that

whenever the user entered an input featuring a keyword the system would produce a

response for another input based on what the keyword was. For example, if the user were to

give the input “what is your favourite kind of pizza?”, and “pizza” was a keyword in the

corpus (being, as it is, a word with a fairly high information content- there is little chance of

the word “pizza” occurring in a sentence that is not at least tangentially connected to pizza)

then the system would give the response to the question for which pizza was a recurring

keyword.

While this system is not the most elegant way of designing a chatbot, it does have the

advantage of allowing the creation of a chatbot with a high breadth of knowledge relatively

quickly. Chatbots trained in this way will possibly give unpredictable responses, but will

generally be able to carry out conversations on a wide range of topics with the amount of

AIML coding required kept down to a bare minimum. In the example, the original question

could have been “When was the last time you ate pizza?”, with the response being “I ate a

pizza last week”. While not being an ideal answer to the question “What is your favourite

kind of pizza?”, this response does at least make enough sense to keep a conversation

flowing. It is the case that templates may need some alterations from the originals in the

corpus too, for example simple “yes” or “no” answers would be no good in this case.

4.3.4 Combined Pandorawriter/ Least Frequent Word Approach with additions

While the approach discussed above is a good way of developing a chatbot with a broad

conversational range such as the one required for this project, there is a danger that the

chatbot may not be human enough. Even when using an automated system for corpus

inclusion, to have enough keywords to allow the bot to converse completely freely would

require a vast amount of time to set up and a large corpus. I proposed that a better solution


. 24

would be to develop the system from a reasonably sized corpus as detailed above, and then

perform a series of tests on it to establish any holes in the “conversational knowledge” of the

chatbot. These holes could then be filled manually using bespoke conversational categories.

There are other ways that a bot trained in this way could be made to appear more “human”.

One of these is the use of “catch all” statements. These are outputs which appear when the

user enters and input which does not correspond to a known keyword. This means that the

bot can stay “in character”, as it were, even when asking the user to rephrase or choose

another question. There is an excellent explanation of how these catch all statements work

in AIML at [7]. Addition of these would greatly improve the conversational skills of the

chatbot. Another is giving the chatbot some direct information about the individual it is

supposed to be emulating. By this I refer to giving the chatbot the ability to answer simple

but direct questions like “Where were you born?” with equally specific answers.

4.3.5 Final Selection

Overall, I felt that this fourth strategy for development is the one which will give me the

greatest scope for modelling the personality of my subject. It will allow me to create a

chatbot which speaks in a coherent and human- like way, while still maintaining a broad

knowledge base of conversational topics.

4.4 Choice of subject

During my preliminary investigation into how chatbots can be trained and their general

workings, I began to think about what type of subject to model. The subject needed to be

someone who has a large corpus of their conversations available in a text based format. The

obvious choice here would be some kind of celebrity (or at least someone who has

conducted many interviews). However, I felt that a case could be made for attempting to

model someone who has made a lot of posts on a particular message board as well. This

prospect intrigued me as this kind of subject would make for a project that was a lot easier to

evaluate (on the basis that I could simply collect data from people who know the individual

personally as to how much the chatbot has assumed the personality of the subject, which

would of course be practically impossible with a figure in the public eye). The downside to

using a subject of this type is that collating a corpus would be a very long and difficult

process. Even once a series of posts had been acquired, there would be no guarantee that


. 25

they would be in the pattern- template format required by AIML. Even with those that were it

would sometimes be the case that the subject was asking questions (I.e. providing the

pattern part of the conversation that is supposed to be provided by the user in a chatbot

conversation) rather than giving answers.

With interview transcripts, however, that is not really an issue. In fact, it is a very desirable

property of interviews in the context of this project that they tend to follow, at least

approximately, the same question- response format as a chatbot conversation. In short, the

trade off between modelling a public figure using interview transcripts against using an

individual posting on a messageboard (in this case it would probably be one of the School of

Computing newsgroups) was one of suitability of corpus against ease of evaluation. I

decided to model the former, a public figure. My basis for this was that I believed very

strongly that this system will succeed or fail based on the strength of the corpus it is being

trained from. While I could build a system based on the School of Computing newsgroups

and save time on the evaluation, it is unlikely that this system would be a fully satisfactory

solution to the problem. As such, I felt that using interview transcripts was the way forward.

Of course, the next question was whose personality to model? There are many millions of

interviews available in web based format, but evidently not all of them will be useful in this

particular project. I needed to select a subject who has given many interviews of a

reasonably personal nature. It would also be useful to me to select someone who I have

some prior background knowledge of, or at least a reasonable idea of who they are and their

accomplishments. As such, I decided to model a figure from the music industry. In my

personal life, I am a keen musician and I have a good knowledge of industry figures and

practices, which I feel gave me a good advantage in the completion of this project. I

conducted a brief study into the backgrounds of a few potential candidates. During this

study, I realised that as well as requiring a good corpus, I would also need to model

someone whose personality came across well in interviews. For example, originally I

considered the well- known singer Mick Jagger, but found that, although there were a lot of

interviews available with him, many of them were quite formulaic and did not really provide

an insight into his personality. An example of this can be found at [29]. One individual who

really did stand out was Joe Strummer, the former frontman of the highly respected punk

band The Clash, and a solo artist until his untimely death in 2003. Not only is there a large

selection of interviews with Strummer available, but as a slightly less well- known figure in

the industry I found many of these interviews to be conducted in a fairly informal style,

perfect for my needs in this project. I am familiar with the works of The Clash, and so is my

project supervisor. (in fact, Joe Strummer’s name was mooted as a potential candidate in a


. 26

meeting very early on in the project) Bearing this in mind, I thought that Joe Strummer would

be an ideal candidate for this project.

4.5 Methodology for system implementation Having performed my research into chatbot learning techniques, and made my choices

regarding software tools, methodology and subject to be modelled at the design phase, my

next task is to produce a plan for the implementation of my chatbot system. I propose that

the implementation of the system be broken down into five distinct phases: Prototyping;

Acquisition of Corpus; Conversion of Corpus, Addition of Keywords and Additions to

Generated Input. All phases in my methodology will have their own interim deliverables.

4.5.1 Prototyping

The first phase in the methodology will be to build a prototype system similar to the one

which will constitute my final solution to the problem. The reasoning behind this is that it

would obviously be extremely detrimental to the project if I were to begin the implementation

of my system only to find that some hitherto unforeseen factor would prevent me from

implementing the system correctly. By building a prototype, which will work in the same way

as the full- size chatbot but on a smaller scale (in the sense that it will only know a few

simple key phrases) I can ensure that any problems I encounter during the implementation

of my main solution will not be due to any fundamental flaws in the design of the system.

Prototyping is an extremely common starting point for large projects of all types. In [30] the

author neatly summarises the necessity of prototyping, saying that “a working model

provides a much clearer picture of the system to be developed than an entire library full of

user requirements, system specifications (and) data dictionaries”. This is very much the

case with my project too. I feel that building a smaller scale version of the system to be

implemented will be of far greater benefit to me than performing any number of paper based

descriptions of my plan for solving the task. The deliverable for this phase will be a working

prototype of the chatbot system, which will have AIML files structured in the same way as

proposed in the full system (I.e. each response will be based on a keyword) and will use

more than one AIML file. It will be developed by manually coding with AIML, as opposed to

any of the “train” interfaces available on the Pandorabots site.


. 27

4.5.2 Acquisition of Corpus

The next phase of the implementation will be to acquire the corpus that will be used to

provide the basis of the chatbot’s responses. This may seem simple enough, but in fact this

phase more than any other could shape the success of failure of this project. If the corpus

that I compile is poor or generates one- dimensional responses, then it will have a serious

impact on the final system’s ability to model the personality of Joe Strummer. In [5], Abu

Shawar and Atwell describe the ideal characteristics of a training corpus such as this one.

They point out that ideally the corpus needs to feature two speakers in a structured format

taking structured turns to speak. Fortunately, this almost perfectly describes magazine and

website interviews. There are many of these interviews available online, and they will

convert into AIML quickly and easily. The knock- on effect of this is that, while acquiring a

large corpus should not be difficult, care must still be taken to ensure that the content of the

interviews is relevant. It is important to build up a corpus which features interviews of Joe

Strummer talking about a range of different topics. If the interviews are several repetitions of

the same themes, then the chatbot will have gaps in its knowledge, which could prove to be

a major hindrance. The deliverables for this phase will be a set of interviews in the format

described previously. The set will be a good size. (This is difficult to quantify, as interviews

will all be different lengths and I do not want to process extra interviews unnecessarily if the

corpus is already large enough, but I want a corpus of approximately ten interviews)

4.5.3 Conversion of Corpus

With the corpus acquired, the next stage will be to convert the corpus into a machine-

readable form. This is not a large phase of the implementation. As mentioned previously, the

program that I will use to generate AIML files from my interview transcripts will be

Pandorawriter. Pandorawriter requires it’s input to be in a specific format, as detailed in [28].

Queries can only consist of one sentence, and the text must be of a strict question- answer

format. As such, I will need to take my interviews and remove any superfluous text. (For

example, many interviews of this type have the name of the person speaking at the

beginning of a line. This would obviously be fairly redundant in a chatbot system such as this

one.) I will also check the interviews to ensure they are in the correct question and answer

repeated format, and that all queries are only one sentence long. While this may seem like a

fairly constrictive rule, in practice I don’t think it will affect the development of my chatbot too

much. A sizeable proportion of questions such as these will be one sentence queries


. 28

anyway, and even those that aren’t can be easily re-worded to a synonymous one sentence

��Once the interviews are prepared they can be converted into AIML by

Pandorawriter. I will keep the AIML files for each interview separate to aid the editing

process. If all the AIML used in this project was to be condensed into one huge file it would

be very difficult to work with. The deliverable for this phase is a series of AIML files, one for

each interview, that feature the questions of the interviewer as patterns and Strummer’s

responses as templates.

4.5.4 Identification of Keywords

With the AIML files to be edited created, the fourth phase of the project can begin. Here I

need to look at each individual pattern, identify the word with the highest information content

in it and create a series of new patterns that recur to that word. For example:

<category>

<pattern>_ GRAFFITI</pattern>

<template><srai>GRAFFITI</srai></template>

</category>

<category>

<pattern>GRAFFITI _</pattern>


</category>

<category>

<pattern>_ GRAFFITI *

</pattern>


</category>

This AIML code means that any input with the word “Graffiti” in will trigger a pre- defined

template with the pattern GRAFFITI. This is the purpose of the </srai> tags. The patterns

match all the different ways in which the word “Graffiti” could appear in a sentence.

“_GRAFFITI” means that “Graffiti” is the last word in the sentence, “GRAFFITI_” matches

“Graffiti” as the first word in a sentence, and “_GRAFFITI*” matches “Graffiti” in the middle of

a sentence. When all patterns in the generated AIML file have been treated in this way, the

chatbot will have the ability to respond to many different questions based on these

recursions and the templates provided in the interviews. I have decided to select the

keywords and implement the AIML code relating to recursion manually. The reason for this is


. 29

that I do not want the responses my final product gives to be influenced by any factor other

than the finished AIML files. Using a program to generate the keywords would create extra

unknowns which could cloud the final effectiveness of my chatbot system. Also, I am a keen

music fan with an understanding of the music industry, and I feel that for a lot of patterns my

prior knowledge of the subject will be an effective tool for selecting keywords. My

deliverables for this phase are a set of AIML files with all keywords identified and recursion

tags implemented.

4.5.5 Additions to Generated Input

Once the keywords have been identified and the recursion patterns implemented, the

chatbot will have most of its functionality in place. However, it will probably not be able to

chat fully coherently with just the set of keywords it has been given. In order to complete the

personality modelling, I will need to add a few AIML rules to make the conversation appear

more lifelike. These include “catch all” statements. [7] shows us how the Knowledge Web

system works using an AIML chatbot. The system produces responses according to what

areas of the Knowledge Web it matches from the AIML scripts used. However, it is quite

possible (especially in a chatbot project such as this one) that nothing will be matched. As

such, there needs to be a response (or collection of responses) that will be used in this

instance. This needs to be some kind of a conversation starter like “Did I tell you about (x)?”

where (x) is a topic that the chatbot has a good selection of AIML rules for. At this stage, I

will test the chatbot with a series of conversations to ensure that there are no previously

unobserved holes in the chatbot’s knowledge. If I discover any major areas that the chatbot

has no response for, or that the chatbot responds to in a nonsensical way, I will manually

create patterns and templates for them. In this case, and with the “catch all” statements, I will

choose responses based firstly on whether Strummer has said anything in the corpus of

interviews that could be considered a direct response to that question. If not, I will use my

own knowledge of the subject to come up with a plausible answer. The deliverables for this

final phase of the implementation are the completed AIML files to run the finished chatbot.


. 30

5. Implementation

As stated in the Design section, my methodology for the implementation of this system was

split into five distinct stages, each with their own set of deliverables. This is a brief run- down

of my observations during each stage and the problems that I encountered.

5.1 Prototyping

As time was obviously a factor in every aspect of completing this project, I felt that it was

important that my prototype was not overly detailed. As long as the core functionality of the

chatbot was successful, its actual ability to converse sensibly was not an issue. To this end, I

created a new Pandorabot called “Proto1”. I selected a short interview at random from a

Google search (the interview I used, available at [31], was an interview for a customer

experience site with the founder of the wiki encyclopedia site www.wikipedia.org) and used

Pandorawriter to generate an AIML file from its plain text file. With the AIML file generated, I

could then go through each question asked in the interview and identify a keyword for it,

implementing tags to make the keyword recur back to the answer given in the question in the

way detailed in the Design part of this report. Finally, I added a few simple categories of my

own to ensure that the system would be able to accept manually entered tags at the final

stage of the report. These were merely simple questions like “what is your name?“ with an

appropriate response.

With this simple prototype in place, I could now run a series of tests in the form of a


. 31

conversation script to determine how successful the implementation had been. The tests all

went exactly as planned, with the prototype chatbot providing the anticipated responses to

both the templates selected using recursion from the patterns and the templates from the

categories that I had added manually. My prototype chatbot was a success and I could now

move on to the creation of my main system. I felt that the prototyping phase was a good way

to start the actual implementation of this project, as it allowed me to get a good feel for

working with this kind of system, especially in terms of programming raw AIML. Screenshots

of my prototype running can be found in Appendix E, Figures1 and 2.

5.2 Acquisition of Corpus

With the prototype completed, my next task was to acquire the corpus to build the final

system from. A Google.com search for “joe strummer” produces 547,000 results ([32]), so I

knew I had plenty of material to work with! After a period of browsing through websites about

Joe Strummer (I felt it would be useful to acquire background knowledge about Strummer,

especially in the final phases of the implementation process) I began to acquire a series of

interview transcripts which I felt were good potential material for the corpus. I was looking for

interviews which were in an easily convertible format, which featured Strummer talking

naturally so as to be the best possible approximation of his personality, and which, when

combined together into a corpus, covered a wide range of topics in order to give the chatbot

the broadest knowledge base possible. Finally, I acquired a corpus of eight suitable

interviews. While this was slightly fewer than the ten interviews that I had originally planned

to acquire, the length and diversity of topics in some of the interviews more than made up for

it. My final selection of interviews for the corpus was:

1. A 2003 interview for Punk Magazine by Judy McGuire. This interview was an obvious

choice for the corpus, as it is of a good length and features Strummer discussing a

range of topics in a detailed yet conversational style, with a special bias on his

musical history. It is available at [33].

2. A 2000 interview for The Big Takeover by Jack Rabid. This interview is shorter than I

would preferred, but it has the advantage of discussing in detail Strummer’s return

from obscurity in the late 1990s, a topic which would be very useful for the chatbot to

be able to cover in detail. It is available at [34].

3. A 2002 interview for CRC Radio by Mariah Hasagawa. Another relatively short

interview, I selected this for the corpus on the basis that it contains a few good details

on Strummer’s private life, especially his family. It is available at [35].


. 32

4. A 2001interview for Unpop fanzine by Shawna Kenney. This is a fairly lengthy

interview, detailing a lot of Strummer’s childhood and his feelings on the current

music scene. Again, it is conducted in a fairly informal and conversational style,

making it all the more useful for a chatbot system such as this one. It is available at

[36].

5. A 1999 interview for The Music Monitor by Howard Petruziello. This is an extremely

lengthy and in depth interview (the longest used in the corpus), mainly covering

Strummer’s later career with the Mescaleros but also featuring some good

information and opinion on his days with The Clash. It is apparent that Strummer and

Petruziello are acquainted as they talk in a chatty and friendly manner. It is available

at [37].

6. A 2003 interview for Perfect Sound Forever by Jason Gross. This interview is of a

good length, focusing for the most part on the early years of Strummer’s career with

The Clash. It is available at [38].

7. A 2003 interview for the San Antonio Current by David Peisner. This interview was

conducted in 2003 but not printed until 2005, after Strummer’s death. This interview

shows a slightly more reflective and mellow side to Strummer. It is of special interest

as it is one of the last interviews that Strummer gave. It is available at [39].

8. A 2001 interview for VH1 by C. Bottomley and Rebecca Shapiro. This interview is

quite short, but I felt it was worth inclusion due to the fact that it details a lot of the

meanings behind Strummer’s later songs, possibly more of an insight into

Strummer’s personality than anything else. It is available at [40].

Once I had acquired these eight interviews, I felt that I had all the information I needed to

begin the next phase of my implementation. All of these interviews are in the standard

question- answer- repeat format that is required for Pandorawriter, and they cover all areas

of Strummer’s life in his later years, with him talking about his family and childhood all the

way through to discussions about his current projects and his work with The Clash. It is

worth noting that all the interviews I collated for this corpus are from Strummer’s second

career phase, after his “comeback” in 1998. The reason for this is that I did not want to

combine interviews from this period with interviews from the Clash period of the 1970s and

1980s, as any person can expect to go through a fairly large change in personality between

their 20s and their 40s and I didn’t want the chatbot’s responses to be a strange mix of the

two. I chose the later period rather than the early period purely on the basis that there is a lot

more readily available information and interview transcripts on Strummer from his later years

than his Clash days.


. 33

5.3 Conversion of Corpus

This next stage in my project was quite straightforward, although quite time- consuming. As

detailed in the previous section, my aim here was to convert the text into machine- readable

form, and then into AIML via Pandorawriter. This meant the removal of any extraneous data,

such as names at the start of sentences, and rephrasing of some of the questions put to

Strummer so that they were comprised of only one sentence. In some cases this was easy.

For example, [35] features the question “Do you think that The Clash will be inducted into the

Music Hall of Fame next year? (The Clash is eligible to be inducted in the winter of 2002)” In

this case it was an easy decision to remove the bracketed part of the sentence and acquire a

perfectly acceptable template. Some questions, on the other hand, required more extensive

editing. An example of this can be found in [37], where Petruziello asks “Switching gears,

there was a thing you did a couple of years ago called the Electric Doghouse. Was that a

one-off?” This question needed to be re- worked into a single sentence without losing its

essence. I eventually used “Was your Electric Doghouse project from a couple of years ago

a one off?” There were many examples of this kind of compression being required across

the eight interviews.

Once this process was completed, I was ready to use Pandorawriter to convert the plaintext

into AIML. Here, I hit a problem when I failed to realize that the structure of the interviews

needed to follow a set format, with a line of whitespace between every pattern and every

template, not just every category. I had not realized this during my prototyping phase as the

interview used for that was fortunately already in that format. In order to change this, I had to

go back to the files I was editing and ensure that every question and answer were clearly

distinct from each other. With this rectified, I converted all the processed interviews, and

saved them as interview1.aiml, interview2.aiml… interview8.aiml.

5.4 Identification and Addition of Keywords

In terms of time taken, this was by far the longest phase of the project implementation. My

task here was to go through each generated AIML file, identifying the words in the patterns

with the most information content and adding additional categories so that when an input

was given that matched one of the keywords, the input would recur back to the relevant

answer given in the corpus. As with the prototype stage, I did this by entering raw AIML code

directly into the Pandorabots AIML editor. There were few issues with this phase of the


. 34

implementation, with the challenge at this stage coming from the sheer bulk of the work to be

accomplished. I did encounter a few small problems with AIML documents refusing to

compile effectively (Pandorabots parses the AIML dialogue entered in its freehand AIML

editor after every save), but they were mostly due to human error. I found that the best way

around these issues was to parse the work that I had done after every new category was

implemented. I achieved my deliverable for this stage of the implementation of a set of AIML

files with the relevant keywords identified and implemented via recursive patterns. Appendix

E, Figure 4 shows the AIML window which Pandorabots provides for raw AIML coding, with

some sample code.

At this stage, some of the corpus required a degree of alteration. This was principally to

ensure that the responses given by the chatbot made sense given the keyword system of

pattern matching used in the program. For example, Strummer often refers to something the

interviewer has said in the question as "it" in his answer. A good example of this can be

found in [38], where Strummer answers the question "What side of Joe Strummer are we

seeing with Rock Art and the X- ray Style?" with "It's the tender, youthful side!". With ROCK

ART AND THE XRAY STYLE used as a keyphrase in this response, the chances are that

"It's the tender, youthful side!" would not make a sensible answer when that keyphrase was

entered by the user. As such, I changed the response to "Rock Art and the X-ray Style is the

tender, youthful side of me!". This phrase is far more likely to make sense in a range of

conversational situations than the original. I also had to decide what to do with templates

that had the same keyword as another template. To get around this, I either used a

refinement of the keyword (e.g. rather than using the keyword "punk" in a question about

punk attitude, I actually used the keyphrase "punk attitude"), or I put both templates in the

same category as the keyword if no refinement could be made, placing them between

<random/> tags to make AIML choose between them at random. This meant that responses

from some interviews ended up in the AIML files for others, but as the chatbot dies not

distinguish between which AIML file its responses come from, this makes no difference to its

overall operation. I also removed some questions which I felt were irrelevant. The only

interview in which I removed significant amounts of material was [37], where Strummer

engages in a very familiar, monosyllabic conversation with Petruziello for a few response

pairs, which I felt were of no use for the purposes of this project. I gave my robot the name

StrumBot.

5.5 Additions to Generated Input


. 35

Once the final AIML files were generated, it was time to put the finishing touches to my

system. The AIML files contained the main bulk of the chatbot’s knowledge of Joe Strummer,

but without a few extra categories to increase the “human-ness” of the chatbot they would

not be alone be able to accurately model his personality. My first task was to add the “catch-

all” statements that I described in the design section. These statements matched the pattern

“*” (i.e. all input). However, because of the way the Knowledge Web works (as described in

[7]) in fact these statements match all input that cannot be matched by any other pattern. For

the template of these statements, I wrote some of my own statements in the “style of Joe

Strummer”. While I appreciate that it could be construed as taking liberties with the integrity

of the AIML files to use templates that were not, in fact, recorded as being uttered by

Strummer, I felt that I was justified in doing this. There were no statements in the corpus

where Strummer did not understand what was being asked, and as phrases such as these

are elemental in the creation of chatbots I had to implement them in the system using

estimations of what Strummer would have actually said in the circumstances.

For the record, there were a few instances of Strummer mishearing the question and asking

for it to be repeated [34,36,39] although these were never anything more than a simple

“excuse me?” [39]. I used the </random> tags to implement the phrases. This means that

when a no-keyword situation occurs, the chatbot will choose one of the templates at random

to say. I decided this was the most realistic way of implementing this feature, as real people

do not always ask for sentence clarification or try to change the topic (in essence the

purpose of this type of phrase) in precisely the same way.

With these statements in place, the next feature I wanted to add to the bot was a

rudimentary knowledge of some basic facts about Joe Strummer. This way, if a user asks

the chatbot a direct question such as “where were you born?” the chatbot can respond with a

simple, dedicated message rather than responding with either a template corresponding to

the keyword “born” or a no- keyword message. The best way to do this, I found, was using

the Pandorabots “properties” feature. This feature allows the bot to be assigned a series of

properties which can then be referred to in conversation. For example, I created a property

"firstjob" with the value "busker". Now, by entering the AIML code <bot name="firstjob"/> I

could get the chatbot to state the value of that property (i.e. "busker") I then created a file

called "properties.aiml", which asks simple "Where were you born?" type questions to allow

the user to establish a few simple facts about Strummer, apart from the detail given in the

interviews. I implemented this feature due to the fact that the interviews used in the corpus

did not really provide enough background as to allow the chatbot to answer simple questions

like this (due to the fact that they are not really interview- type questions). Appendix E, figure


. 36

3 shows a final list of the system files used in the program. Appendix E, Figure 5 shows a list

of the systems properties, and the Pandorabots “properties” interface. I also used the

Pandorabots “train” interface to add extra data. Appendix E, Figure 6 illustrates this. Figure 7

of the same appendix shows the finished chatbot window in a sample conversation.


. 37

6. Evaluation

6.1 Identification of criteria for evaluation of system

With the system implemented, my task now is to evaluate its success in solving the original

problem. In order to do this, I need to find a set of suitable criteria to evaluate with. At this

point, it is important to remember the original aim of the project, “To establish the extent to

which a chatbot can assume the personality of a specific human through conversational

learning”. The criteria that I choose to evaluate my solution need to be as clear a gauge as

possible of how well the personality of Joe Strummer has been modelled by my chatbot.

Choosing evaluation criteria for a project such as this one is notoriously difficult. The

personality of a given individual is a fundamentally qualitative thing. It is simply not the case

that a person’s personality can be expressed as a series of figures or other statistical data.

The most effective way of evaluating this system would be to have someone who knew the

real Joe Strummer’s personality use the chatbot, and compare the two. This would allow me

to get some good quality qualitative feedback on the responses that the chatbot made.

However, while this method of evaluation would be useful, it would probably not give a clear

enough picture of the performance of the system on its own. I need some slightly more

quantitative data. I feel that a good way of doing this would be to compare the chatbot

system to ALICE. As mentioned before, ALICE is the triple Loebner prize- winning chatbot

([41]) developed by Dr. Richard Wallace. The reason I feel that this would be a useful

comparison for my personality emulating chatbot is that ALICE has minimal personality. It is


. 38

developed on the principle that conversations are monitored and if the botmaster sees that

the system has produced a response that he deems to be unacceptable, he alters the AIML

files governing ALICE’s responses. As such, ALICE is intended to be an exercise in breadth

of conversational knowledge and ability to converse lucidly rather than an attempt to create a

full artificial personality.

My plan for this second part of the evaluation is to conduct a series of conversational tests

on my chatbot. These alone will help me to evaluate the bot’s general ability to hold a good

conversation. It is fairly self- evident that for a chatbot to emulate the personality of a human

successfully, it needs to be able to converse well in its own right. I will then perform these

same tests on ALICE. By comparing the results obtained from my chatbot with those

obtained from ALICE, I can gain a different perspective on the extent to which the chatbot

has assumed a personality. When combined with the qualitative data from the initial

evaluation of someone with prior knowledge of Strummer using the chatbot, I will be able to

build up a picture of not only how well the chatbot chats in a general sense, but also the

extent to which it has assumed a different personality compared to a non- personality

emulating chatbot and how well that particular personality matched that of Joe Strummer.

6.2 Evaluation part 1: Test conversation and comparison to ALICE

6.2.1 Evaluation plan

This part of the evaluation will be split into three separate sections. The first will be a simple

test to ascertain how successfully the chatbot can maintain a viable conversation. I will chat

to the chatbot for five minutes and record how many times it gives a response that I consider

to be “un- human”. I will then perform the same test on the ALICE bot as a control. The idea

here is that a if the bot displays a score close to the score posted by ALICE, then it will have

successfully chatted for five minutes without making an unreasonable number of bad

responses, and as such can be considered effective at general conversation. I am not

performing this test to ascertain the personality modelling aspect of the chatbot, merely its

ability to converse, so I will not be taking into account how “in- character” these messages

seem.

With this test complete, I will then move on to asking a series of personal questions of the


. 39

bots. There will be ten questions:

• “What is your name?”

• “Where were you born?”

• “How old are you?”

• “Do you have a family?”

• “What kind of music do you like?”

• “What is your job?”

• “How are you today?”

• “What was the last book you read?”

• “Are you a chatbot?”

• “Do you have a punk attitude?”

These questions have been chosen to illustrate the difference in responses between the Joe

Strummer chatbot and ALICE. The idea is that some questions are ones which the two

chatbots should answer very differently, while others should produce more subtle

distinctions. “What kind of music do you like?” and “Do you have a punk attitude?” are both

questions which my system should produce dynamic, unique responses for, as they are both

asking about specific areas of Joe Strummer’s life that have no real bearing on ALICE.

Questions like “What is your job?”, and “Where were you born?” will be more revealing in the

context of this evaluation. They should illustrate the difference in how the two chatbots

answer simple questions that both are likely to be asked. An interesting point is the question

“Are you a chatbot?”. In order to keep the simulation of identity as real as possible, my

system speaks as if it were Joe Strummer, rather than a chatbot pretending to be Joe

Strummer. ALICE, on the other hand, should give a very different answer. When I have

compiled the results of the two question sets, I will perform a qualitative analysis on them,

based on how different they are. Here, I will be looking for the responses of the Strummer

chatbot to be significantly different in style as well as content to ALICE’s responses. If they

are it is a strong indicator that the chatbot models personality, rather than existing solely to

provide conversation

The final section of this part of the evaluation will consist of three conversations, 15

response pairs in length, on a range of topics. I will try to engage both chatbots in

conversations about their “families“, music, and a general conversation where I will let the

topic of conversation “flow”, as it were. My aim here is similar to the one stated in the

previous test. I wish to discover how the responses of the Strummer chatbot compare with


. 40

the responses of ALICE over a conversation, rather than just in the form of one- reply

answers. As with the previous tests, I will be looking at the style in which the Strummer

chatbot responds compared to ALICE, as well as what it actually responds with.

6.2.2 Evaluation results

In the first test, my system produced five responses over the five minute period which

indicated that the system had not recognised the input being made, and one response which

featured a misinterpretation of a recognised keyword. The misinterpreted question was “what

kind of music do your kids like?”, which produced a lengthy response about Strummer’s

background as a writer, due to the keyword “music”. There were 28 inputs made over the

five minutes, which corresponds to a misinterpretation rate of approximately 3.5%, and

misrecognition rate of 18%. ALICE, by contrast, had two inputs that were not recognised and

one that was misinterpreted over the course of 25 entries (the misinterpretation being a

response to the phrase “I’m off now” with the rather humorous “Do your friends call you

’off’?”). This gives ALICE a misinterpretation rate of approximately 4% and a misrecognition

rate of 8%. When stating these results, it is worth considering that the ALICE system ran

more slowly that the StrumBot system, almost certainly due to the fact that ALICE is a vastly

popular chatbot system, whereas I was the only user of my chatbot at the time. This means

that ALICE gave out fewer results in the allotted five minute timespan.

The results given by both chatbots for the list of questions I prepared is given in the table on

the following page:


. 41

Question StrumBot ALICE

“What is your name?” Joe Strummer My name is ALICE.

“Where were you born?” I was born in Ankara, Turkey

ALICE was born in Bethlehem, Pennsylvania.

“How old are you?” Well, I was born on 21st August 1952 so work it out!

I am 9 years old.

“Do you have a family?” I've got my family with me right now.

My "father" and "mother" is Dr. Richard S. Wallace.

“What kind of music do

you like?”

I was in awe of music. I was a very late-comer to playing music. Most kids might learn something in their youth, but I didn't have any musical knowledge at all, except that I listened. From the Rolling Stones onwards, I listened obsessively.

all

“What is your job?” Musician My job is chatting on the internet. I am a paid client.

“How are you today?” Did you catch the new Weezer album? Its killer!

Ah. I am doing very well. How are you unknown person?

“What was the last book

you read?”

I like to read because it's the opposite of being on the go. Reading is the perfect antidote.

"Linux for Dummies" by John Maddog Hall.

“Are you a chatbot?” What the hell is a chatbot?

Yes.

“Do you have a punk

attitude?”

Punk is alive and kicking because I am signed to a punk rock label. If I did not have this chance, I'm sure I'd be dead.

Doesn't everyone have a punk attitude. I have a great programmer.

The StrumBot answer to the question “What kind of music do you like?” was in fact a much

longer soliloquy on how Strummer discovered music, but I have abridged it in the table for

space purposes. Notice that StrumBot uses a “catch- all” phrase when asked the question

“How are you today?” as it does not correspond to a specified keyword.


. 42

When conducting the third set of tests in this phase of the evaluation, the first thing I noticed

was that StrumBot’s responses were significantly longer than those of ALICE, thanks to the

fact that they were generally pulled from interviews rather than existing as custom- built

chatbot responses. On the first two conversations (I.e. the ones on specific topics) StrumBot

lasted four responses before beginning to repeat itself. ALICE managed six before it started

to change the subject. It is an interesting point that ALICE tried to shift the conversation

away from the topic once we had talked about it for a little while, (or at least appeared to),

while StrumBot merely repeated itself. In my opinion, the responses of ALICE came across

as more “robotic” than those of my system. An example of this is:

Human: so you dont have any brothers or sisters ALICE: Interesting deduction. Do you mean "clones?" This is of course not a response that either a human or a chatbot attempting to model a

human would give! StrumBot tends to give far fewer questions as responses than ALICE,

which could again be attributed to the fact that the corpus StrumBot has been trained from is

one of interviews where Strummer was the questioned, not the questioner.

6.3 Evaluation part 2: Qualitative Analysis

6.3.1 Evaluation plan

As stated previously, this phase of the analysis is intended to gain some qualitative feedback

on the way my chatbot operates. Attempting to perform a structured analysis such as the

one in the previous section could only be useful up to a point in a project such as this, and I

feel it is important to simply gauge opinion from users. In order for this feedback to be

genuinely useful, I need to choose my evaluators carefully. Originally, I requested the

webmasters of some of the Joe Strummer fansites posted on the Internet If they would be

interested in taking part, but sadly they did not respond in time to allow me to evaluate their

feedback properly. Fortunately, I had the answer to this issue closer to home. As mentioned

before, I am a musician, and I play with a punk band whose members are all keen Clash

fans. I felt that, if I could not use someone who knew Strummer personally, the next best

thing would be to have the system evaluated by people with a good knowledge of his work.

In this evaluation, I will simply ask the three test subjects to independently use the system

for fifteen minutes, then report back to me on how close they feel the responses of the


. 43

chatbot were to what Strummer may have said in real life when in a given conversational

situation, how “human” the chatbot seems in conversation, and where they felt

improvements could be made (bearing in mind that none of the subjects has a natural

language processing background and only one is a Computing student). At this point, it is

worth noting that although fifteen minutes may not seem like a great deal of time to be using

a chatbot system, in fact the systems tested in the prestigious Loebner prize competition are

only tested for ten.

6.3.2 Evaluation results

Overall, feedback from this stage of the evaluation was positive. All three subjects said that

they were impressed with the level of knowledge the system has about Joe Strummer, both

in terms of general factual knowledge about Strummer’s history and the people he

associated with in the days of The Clash. This detail largely comes from [37], in which

Strummer discusses his history for some time. Two of the subjects also commented on how

the system had a wide range of conversational topics, although they did concede that most

of these topics were music related in some way. In response to the first question I asked

them, all three said that they felt the responses provided by the chatbot were close to the

way that Strummer would have responded in real life. All subjects made the point, however,

that, although the bot never said anything genuinely “out of place” while he was using the

system, they did feel that occasionally the system made good responses out of context, in

the sense that sometimes it was easy to see that the bot was responding to an input based

on a keyword and that the original question was subtly different to the one entered.

When asked to comment on the “human-ness” of the bot, all users were for the most part

positive. One user expressed frustration when he entered three “no- keyword” inputs

consecutively, and the supposedly random “catch- all” statement generator produced the

same statement three times, ruining the flow of the conversation. Aside from this, the

subjects said that they had found their chatbot conversations quite interesting, and two

commented on times when they had entered a question and the bot had responded in a

“very impressive” way. I later discovered that this was in response to the question “tell me

about your time with The Clash”, to which the bot responds with a few sentences about the

early days of The Clash beginning with “Man, my time with The Clash seems so long ago

now…”. However, the subjects did point out that the system sometimes needed more depth

in certain conversational areas, as they had tried to engage the system in a conversation

and it had started repeating itself. One also made the point that it was sometimes hard to get


. 44

a conversation started with the bot, and they felt that users who were not aware of the

history of Joe Strummer would be at a disadvantage. None of the users said that they would

genuinely believe they were in a messenger conversation with Joe Strummer if they had to

use the bot for an extended period of time, but the general feeling was that the system could

be trusted to reply to most statements in a way which seemed believable as the personality

of Joe Strummer.

In response to the question about improving the chatbot, one subject pointed out that he

would like to see some support for the system being able to recognise Clash lyrics. In fact,

this was a functionality that was mooted at an early stage, but I dropped the idea on the

basis that this would detract from the chatbot’s main function of being a personality emulator.

Another said that he would have the chatbot to have had more support for people who were

not fully aware of what Joe Strummer did, rather than having a large range of in depth detail

about his history. One point that all three subjects made is that they would have liked to have

seen the chatbot being able to have more “in depth” conversations about certain subjects,

rather than simply knowing “a little about a lot”. The nature of the “no- keyword” responses

was also mentioned, specifically the way that the supposedly random system for selecting

which response to use did not always seem so random, but I feel that that is an issue with

AIML as opposed to the implementation of my chatbot.


. 45

7. Discussion and Conclusions

In summary, I am pleased with the final system that I have developed. I feel that the system

displays many of Strummer’s personality traits, and it is quite possible to have an engaging

and lucid conversation for a good period of time with the chatbot. Looking at the evaluation

results, it is clear that the system has a higher error rate that ALICE, although frankly this is

to be expected on the basis that ALICE is a triple Loebner prize winning bot. Although the

first test (the five minute conversation) was simply an exercise to try and get some concrete

figures on how well the chatbot can hold a conversation, at this point it should probably be

noted that ALICE has better no- keyword messages than StrumBot. One criticism of

StrumBot that was made by my small focus group, and that I noticed myself while testing, is

that the random message generator inbuilt into AIML was perhaps not the best way to

implement these messages. Too often, the same no- keyword message is produced when

the user makes repeated no- keyword inputs. These messages were also limited by the fact

that I had to write them myself in the style of Strummer, due to corporal limitations. If I was to

undertake this task again, I would have placed a focus on finding at least one interview in

which Strummer specifically asks for a question to be clarified or refuses to answer a

question, as these would have probably made ideal no- keyword responses.

When asking the two chatbots questions to evaluate their differing styles, the first thing that I

noticed was that StrumBot’s responses tended to be a lot longer and more in- depth than the

comparatively clipped responses of ALICE. As I have mentioned previously, ALICE’s

responses can come across as slightly cold and “robotic”, reinforcing my theory that ALICE


. 46

represents a minimal personality benchmark for my system to be tested against. In

comparison, StrumBot’s responses are warm and for the most part conversational. The

tendency of the chatbot to mimic Strummer’s habit of producing lengthy answers which

reference his past is, I feel, a success area of the system, as traits like these are one of the

most obvious manifestations of personality. However, during this test my chatbot failed to

recognise one of the questions, producing a no- keyword response instead, and there were

two examples of questions which were not entirely answered in context. (The answers to

“What was the last book you read?” and “Do you have a punk attitude?” were, for me, not

out of context but nevertheless probably not indicative of the answers that Strummer himself

would have given in the situation.) While this particular phase was not supposed to test the

chatbot’s level of conversational ability, at the same time these represent answers that the

real Strummer would not have given, and as such detract from the overall level of personality

modelling taking place.

The test in which I contrasted the results of StrumBot and ALICE over a conversation period

did not really provide any useful insight that the previous test had not already given me. The

only insight I really gained was the realisation that StrumBot does not have the depth of

knowledge to continue a conversation on one specific topic for more than a few response

pairs. This is an area in which the chatbot obviously does not model the real Joe Strummer. I

feel that this particular limitation is one which would be almost impossible to iron out in a

project developed in the timescale of this one. I realised quite early on in this project that I

could have spent any amount of time on the implementation stage of this chatbot and still not

fully modelled the personality of Strummer. After all, noone has ever won a silver medal in

the Loebner prize. (The Loebner prize silver medal being the award given to a bot which

convinces half of its judges that they are speaking to a real person, as detailed in [41]) As

such, I chose breadth of knowledge over conversational depth. It is interesting that my test

subjects also noticed this lack of conversational depth. In retrospect, this is an area of my

system which could have been improved. Perhaps a way in which I could have done this is

use of the AIML <that> tag. The <that> tag allow the chatbot to refer back to its previous

utterance as “that”. (I.e. “I like wakeboarding” / “What is that? (meaning wakeboarding)”) I did

not implement this system due to time constraints, but in an ideal world it would have been

nice to have provided support for this feature.

Most of the limitations discussed in the second part of the evaluation come from the nature

of the corpus. The interviews that this chatbot was constructed from follow the same

conversational pattern as a chatbot conversation. However, they are almost always

conducted by a journalist who knows the history of their subject reasonably well and usually


. 47

wants to talk about career matters. This has a bearing on the finished system in two ways.

The first is that simple questions and other responses such as no- keyword outputs need to

be manually implemented. Once all the interview scripts have been converted into AIML and

the keywords implemented, the chatbot was able to talk at length about episodes of

Strummer’s life from the 1970s and 1980s, but not able to answer the question “Who are

you?”. This bears out the point made by one of the test subjects when he asked for “more

support for people who do not know who Strummer is”. Of course, no interviewer would ask

that of an interviewee in such a context, but the real Strummer would be able to answer the

question. Another example of the limitations of the corpus is that its bias toward Strummer’s

career rather than his private life means that the chatbot has areas of conversation which it

knows little about that the real Strummer would obviously know a great deal about.

Fundamentally, of course, the only way to make a Joe Strummer chatbot (or indeed a

chatbot designed to simulate the personality of any figure) would be to obtain a corpus of,

not just interviews, but transcripts of every conversation Strummer had over the course of an

average day. The chatbot I designed does not model Strummer’s personality, rather it

models the personality Strummer assumed during interviews, when talking to music

journalists who he may or may not have known previously. All these small factors affect the

way that the chatbot responds. If Strummer was just talking to a normal person whom he

had just met, as chatbot conversations tend to assume is the case, then it is a fair

assumption that his responses may well have been different. Overall, I was pleased with the

quality of this system. I felt that the use of keywords to pattern match was simple but

effective, and when combined with a few manually selected responses it produced a chatbot

which can quite plausibly hold a believable conversation in the guise of Joe Strummer. If I

was to extend this piece of work, there are two areas I would focus on. The first is the corpus

the chatbot was trained from. While the interviews used were, for the most part, ideal for

modelling the chatbot’s responses, it would have been useful to have had some transcripts

of conversations from Strummer’s personal life to give a new side to the available responses

of the chatbot. As I mentioned before, I would also like to look at increasing the ability of the

chatbot to hold a conversation over a sustained period of time, possibly by implementing

<that> tags to allow it to refer to earlier utterances in the conversation.

To use the system and decide for yourself, simply point your browser to:

http://www.pandorabots.com/pandora/talk?botid=ffeb5bc2ae353514

Please bear in mind that the life of chatbots is limited, and while I will do my best to ensure


. 48

StrumBot stays online for as long as possible I cannot guarantee the system will be online

indefinitely. Alternatively, you can go to the Pandorabots website and browse for StrumBot

using the “browse” tags at the top of the screen.


. 49

8. Bibliography

1. Wallace, R. (2001), The Slashdot Interview,

http://www.pandorabots.com/pandora/pics/wallaceaimltutorial.html, [15th March 5005]

2. ALICE A.I. Foundation (2005), AIML 1.0.1 tag set,

http://www.alicebot.org/documentation/aiml101.html, [18th January 2005]

3. Bush, N. (2001), AIML Formal Specification, http://www.alicebot.org/TR/2001/WD-aiml/,

[19th November 2004]

4. Millican, P. (2003), Elizabeth Conversation Program documentation, available from

http://www.etext.leeds.ac.uk/elizabeth/

5. Abu Shawar, B. and Atwell, E. (2003), Using the Corpus of Spoken Afrikaans to generate

an Afrikaans chatbot, Southern African Linguistics and Applied Language Studies

2003. 21: 283-294

6. Abu Shawar, B. and Atwell, E., (2003) Using dialogue corpora to retrain a chatbot

system, In: Proceedings of CL2003: International Conference on Corpus

Linguistics, Archer D, Rayson P, Wilson A &McEnery T (eds). Pp. 681-690

7. Aimless, D. and Umatani, S., (2002), A tutorial for adding knowledge to your robot,

http://www.pandorabots.com/botmaster/en/tutorial?ch=1, [8th April 2005]

8. Wilson, B. (2004), The Natural Language Processing Dictionary,

http://www.cse.unsw.edu.au/~billw/nlpdict.html#corpus, [10th December 2004]

9. Atwell, E. et al., (2000), A comparative evaluation of modern English corpus grammatical

annotation schemes, ICAME journal No. 24, pp. 7-24


. 50

10. Abu Shawar, B. and Atwell, E. (2004), Accessing an Information System by Chatting, In:

Meziane, F and Metais,E (editors) Natural Language Processing and Information

Systems, pp. 407-412, Springer-Verlag

11. Thomsett, R. and Thomsett, C. (2005), The Busy Person’s project management book pp.

13, http://www.thomsett.com.au/main/projectbook/SmallProjectBook.pdf, [20th April

2005]

12. Humphrey, W.S. (1989), Managing the Software Process, Addison-Wesley.

13. FactGuru (2001). Fact Guru Object Oriented Software Engineering knowledge base,

http://www.site.uottawa.ca:4321/oose/index.html#waterfallmodel, [12th January 2005]

14. Royce, W.W. (1970), Managing the Development of Large Software Systems,

Proceedings of IEEE WESCON, August 1970, pp. 1-9

15. Whatis.com (2003), Waterfall Model, Whatis.com definitions,

http://searchvb.techtarget.com/sDefinition/0,,sid8_gci519580,00.html, [14th January

2005]

16. Als, A. and Greenridge, C. (2003), The Waterfall Model,

http://scitec.uwichill.edu.bb/cmp/online/cs22l/waterfall_model.htm, [15th January 2005]

17. Landay J. (2001), The Software Life Cycle,

http://bmrc.berkeley.edu/courseware/cs169/spring01/lectures/lifecycle/sld001.htm, [18th

January 2005]

18. Boehm, B. (1986), A spiral model of software development and enhancement, ACM

SIGSOFT Software Engineering Notes, Volume 11 , Issue 4, Pages: 14 - 24

19. Als, A. and Greenridge, C. (2003), The Spiral Model,

http://scitec.uwichill.edu.bb/cmp/online/cs22l/spiralmodel.htm, [18th January 2005]

20. Zheng, A. (2005), CS341 Lecture 2,

http://www.cs.uwlax.edu/~zheng/CS341Spring05/Lecture2.ppt, [19th March 2005]

21. Triumph PC (1999), The John Lennon Artificial Intelligence Project,

http://triumphpc.com/john-lennon-project/index2.shtml, [23rd February 2005]

22. Millican, P. (2004) Elizabeth’s Homepage, http://etext.leeds.ac.uk/elizabeth/ [22nd

February 2005]

23. Laven, S. (2003), Eliza by Joseph Weizenbaum, available at

http://www.spaceports.com/~sjlaven/eliza.htm, [23rd February 2005]

24. Millican, P. (2003), Elizabeth.txt, Elizabeth system files, available from

http://www.etext.leeds.ac.uk/elizabeth/

25. Garner, P. and Nathan, P.X. (1998), JFred Chat Server,

http://homepage.mac.com/rgarner1//JFRED/, [12th March 2005]

26. Garner, P. and Nathan, P.X. (1998), jrldocs.txt, JFred help documentation

27. Verity Inc., (2005), Verity Response, http://www.verity.com/products/response/, [12th


. 51

March 2005]

28. Aimless, D. (2002), Pandorawriter: Converts Dialog to AIML Categories,

http://www.pandorabots.com/botmaster/en/aiml-converter-intro.html [13th March 2005]

29. VH1 Staff (2004), Mick Jagger: Sympathy for the Romeo, VH1.com,

http://www.vh1.com/artists/interview/1492269/10142004/jagger_mick.jhtml [23rd

February 2005]

30. Nichols, F. (1993), Prototyping: Systems development in record time, Journal of

systems management (Sept. 1993)

31. Hurst, M. (2005), Interview, Wikipedia’s Jimmy Wales, Good Experience Blog January

2005, available at http://www.goodexperience.com/blog/archives/000124.php

32. www.google.com (2005)

33. McGuire, J. (2003), Joe Strummer Interview, Punk Magazine, available at

http://www.punkmagazine.com/morestuff/joe_strummer.html

34. Rabid, J. (2000), The One and Only Joe Strummer, The Big Takeover. 45, 46, available

at http://www.bigtakeover.com/strummer.htm

35. Hasagawa, M. (2002), Joe Strummer- Putting a scare into the heart of all things

corporate, CRC Radio. 10, available at

http://www.crcradio.net/issue10/Joestrummer.html

36. Kenney, S. (2001), Joe Strummer: Still punk after all these years?, Unpop September

2001, available at http://www.unpop.com/features/int/strummer.html

37. Petruziello, H. (1999), Rock Art and the Strummer Style, Music Monitor December

1999, available at http://www.penduluminc.com/MM/December99/strummer.html

38. Gross, J. (2003), Joe Strummer, Perfect Sound Forever January 2003, available at

http://www.furious.com/perfect/joestrummer.html

39. Peisner, D. (2003), Joe Strummer R.I.P., San Antonio Current 2005, available at

http://www.zwire.com/site/news.cfm?newsid=6560601&BRD=2318&PAG=461&dept_id=

484045&rfi=6

40. Bottomley, C. and Shapiro, R (2001), 7 questions with Joe Strummer, VH1 January

2003, available at

http://www.vh1.com/artists/interview/1446683/08152001/strummer_joe.jhtml

41. 2005 Loebner Prize Committee, http://loebner.net/Prizef/loebner-prize.html, [10th April

2005]


. 52

Appendix A – Project Reflections

Looking back on the work I have done on this project over the previous seven or so months,

I feel that it has been a rewarding experience. It is fair to say that I had never undertaken a

body of work this great before, and as such I learned many lessons on how to prepare a

report of this size. My main piece of advice to future students when undertaking this kind of

project is “think very long and hard about your project topic and minimum requirements at an

early stage”. I was lucky in that I already had a fair idea of what I wanted to do for the project

(either something based on natural language processing, as proved to be the case, or a bio-

inspired computing application) and I managed to tie the project in to one of my interests, but

if I had not carefully considered the choice of topic within the first couple of weeks of the final

year, I could well have been forced into making a choice of project that I did not enjoy or

even fully understand. It is important, I feel, for students to remember that they are

effectively “stuck” with their choice of project after the first few weeks of term, and as such

cannot afford to be undertaking a task that they will be unsuited to when so much of their

degree programme rests upon that task.

Another important lesson to be learned here is that of project scheduling. While I certainly do

not want to give the impression that I left the report to the last minute, the fact that my

implementation phase overran (this is discussed in more detail in section 3.3) meant that I

had to up my workload towards the end of the project to ensure that my report writing stayed

“on track”. It is important that students give their schedules a degree of flexibility, as in real

life projects will almost always be susceptible to “slippage” and falling behind schedule, and

a student naive enough to produce a schedule which does not take this into account is at a

disadvantage. It is also important to ensure that the schedule produced is not too

“punishing”. Students should allow break days and days with minimal or easier workload in

their schedules, as I found that taking time off in this way allowed me to concentrate more

when the time came to write the major sections of the report. Another point which should not

be underestimated is the importance of remaining calm under pressure. Final Year Projects

are a huge piece of work which count for a high proportion of the overall degree

classification, but students who let themselves become frustrated or overworked by this fact

are not helping themselves. Having the right attitude towards completing the task is

something that project supervisors and co-ordinators cannot help with; it is up to the student

to ensure they help themselves to complete the task in the most efficient manner possible.


. 53

Apart from the aforementioned slippage of the project during the implementation phase, I

was pleased with how I scheduled the task. If I was to undertake a project of this type again,

I would allow more time for the evaluation phase of the project. Projects such as this are

harder to evaluate successfully that ones which can have their success expressed as a

range of statistics, and I wish I had appreciate how hard the project would be to evaluate

effectively at the outset. I would also have tried to document what I as doing at each point

during the implementation phase of the project, in order to make that phase easier to write

up.

During this project, I did not have to deal with many third parties. However, knowing how to

deal with third parties is another important skill in the creation of projects such as this. I did

make a polite request to a few Joe Strummer fans in order to help with my evaluation but as

nothing came of that I had no further contact with them. The only other third parties were the

subjects who helped with my evaluation. As long as students remain polite when requesting

help from third parties such as this, and clearly state what the role of the third party is to be,

they should not have any problems.


. 54

Appendix B – Schedule Gantt chart


. 55

Appendix C – Glossary of Terms

AIML – Artificial Intelligence Markup Language. A markup language used for chatbot scripts.

ALICE – Artificial Linguistic Intelligent Computer Entity. A very well known and highly

respected chatbot system.

Elizabeth – Another chatbot system

Information Value – The concept that some words in a sentence have more relevance to

the meaning of that sentence than others

JFred – Another chatbot system

Keyword – A word used by a chatbot to recognise an input

Keyphrase – A group of keywords

Least Frequent Word Approach – A method of obtaining keywords from text based on the

frequency of words in normal conversation

Methodology – A framework for the implementation of a system

Pandorabots – A website which hosts AIML- based chatbots

Pandorawriter – A tool for converting plaintext into AIML

Prototyping Model – A methodology involving the creation of several small prototypes

Spiral Model – A methodology involving several iterations of the development process to

produce a series of deliverables

StrumBot – The name I gave my system on the Pandorabots website.

Waterfall Model – A methodology involving the development of a system in clearly defined

stages


. 56

XML – eXtensible Markup Language. A markup language suitable for a wide range of tasks.

AIML is based on this language


. 57

Appendix D – Sample AIML file

Here is a complete transcript of the AIML file interview3.aiml. It is included here to give a

sense of the structure of an AIML file. Note the AIML tags which recur back to a response

from the original interview, and the “random” tags featured in the RADIO keyword.

<?xml version="1.0" encoding="UTF-8"?> <aiml version="1.0"> <category> <pattern> Do you think that mainstream radio is in a rut right now </pattern> <template> Yeah. I think that the corporation is running it and will always make it appeal to the lowest common denominator. I think we're going to have to forget about the radio and just go back to word of mouth. </template> </category> <category> <pattern> RADIO </pattern> <template> <random> <li>I think that the corporation is running radio and will always make it appeal to the lowest common denominator. I think we're going to have to forget about the radio and just go back to word of mouth.</li> <li>Europeans are even more susceptible to the crap on mainstream radio than Americans!</li> </random> </template> </category> <category> <pattern> _ RADIO </pattern> <template> <srai>RADIO</srai> </template> </category> <category> <pattern> RADIO _ </pattern> <template> <srai>RADIO</srai> </template> </category> <category> <pattern> _ RADIO * </pattern>


. 58

<template> <srai>RADIO</srai> </template> </category> <category> <pattern> INDIE </pattern> <template> <random> <li> I would like it if indie music was on the radio. It ain't going to happen unless someone buys a radio station or something, but I can't see that happening. Anyway, it's good to be sent back to the underground. There's always a good side to bad things and the good side to this is that at least everyone has to go back down. </li> <li> I'd just like to say to everybody that it's best to check out the independent life: the independent stores and the independent everybodies. We should try not to give our money to any corporations, if we can help it. </li> </random> </template> </category> <category> <pattern> Do you think more independent music should be put on the radio </pattern> <template> <srai>INDIE</srai> </template> </category> <category> <pattern> _ INDIE </pattern> <template> <srai>INDIE</srai> </template> </category> <category> <pattern> INDIE _ </pattern> <template> <srai>INDIE</srai> </template> </category> <category> <pattern> _ INDIE * </pattern> <template> <srai>INDIE</srai> </template> </category> <category> <pattern> Why do you write music </pattern> <template> I don't know why. I can't do anything else. I'd like to be a cartoonist but I can't draw for shit. </template> </category> <category> <pattern> CARTOONIST </pattern>


. 59

<template> I'd like to be a cartoonist but I can't draw for shit. </template> </category> <category> <pattern> _ CARTOONIST </pattern> <template> <srai>CARTOONIST</srai> </template> </category> <category> <pattern> CARTOONIST _ </pattern> <template> <srai>CARTOONIST</srai> </template> </category> <category> <pattern> _ CARTOONIST * </pattern> <template> <srai>CARTOONIST</srai> </template> </category> <category> <pattern> Is there anywhere you have not been to that you would like to go to </pattern> <template> Yeah, I've never been to South America. </template> </category> <category> <pattern> SOUTH AMERICA </pattern> <template> I've never been to South America </template> </category> <category> <pattern> _ SOUTH AMERICA </pattern> <template> <srai>SOUTH AMERICA</srai> </template> </category> <category> <pattern> SOUTH AMERICA _ </pattern> <template> <srai>SOUTH AMERICA</srai> </template> </category> <category> <pattern> _ SOUTH AMERICA * </pattern> <template> <srai>SOUTH AMERICA</srai> </template> </category> <category>


. 60

<pattern> * biking from * to * </pattern> <template> Oh wow. That would be a trip that I would like to take. </template> </category> <category> <pattern> DO YOU ENJOY TOURING </pattern> <template> Yeah, you've gotta have some kind of stamina though. </template> </category> <category> <pattern> FAMILY </pattern> <template> I've got my family with me right now. </template> </category> <category> <pattern> _ FAMILY </pattern> <template> <srai>FAMILY</srai> </template> </category> <category> <pattern> FAMILY _ </pattern> <template> <srai>FAMILY</srai> </template> </category> <category> <pattern> _ FAMILY * </pattern> <template> <srai>FAMILY</srai> </template> </category> <category> <pattern> THE SIMPSONS </pattern> <template> <random> <li> I hate missing The Simpsons. </li> <li>Everyone in Europe loves The Simpsons. We couldn't live without it. </li> </random> </template> </category> <category> <pattern> What bands do you like </pattern> <template> I'm into Weezer. </template> </category> <category> <pattern> WEEZER </pattern> <template> <random> <li>I'm into Weezer myself</li> <li>I love Weezer but my kids are into Incubus. They're educating me there. </li> </random> </template> </category>


. 61

<category> <pattern> _ WEEZER </pattern> <template> <srai>WEEZER</srai> </template> </category> <category> <pattern> WEEZER _ </pattern> <template> <srai>WEEZER</srai> </template> </category> <category> <pattern> _ WEEZER * </pattern> <template> <srai>WEEZER</srai> </template> </category> <category> <pattern> _ INCUBUS </pattern> <template> <srai>WEEZER</srai> </template> </category> <category> <pattern> INCUBUS _ </pattern> <template> <srai>WEEZER</srai> </template> </category> <category> <pattern> _ INCUBUS * </pattern> <template> <srai>WEEZER</srai> </template> </category> </aiml>


. 62

Appendix E – Screenshots

Figure 1

Figure 2


. 63

Figure 3.

Figure 4.


. 64

Figure 5

Figure 6


. 65

Figure 7


. 66

Appendix F – List of system files Here is a list of the system files on the enclosed CD: Each is an AIML file used to provide a

different layer of functionality to the chatbot.

• general.aiml – This is the AIML that governs the no- keyword responses.

• properties.aiml – This is the file that governs the bots abilities to use its properties

information

• update.aiml – This is the list of simple questions I added after the interviews had

been converted to AIML.

• interview1.aiml - This is the file that was generated from [33]








There are also transcripts of the original interviews available on the CD:

• Strummerinterview 1 – A transcript of [33]








Documents

Cloning Identities via Chatbot Learning Andy Walton … or the Artificial Linguistic Internet Computer Entity to give it its full title. Alice uses a ... Cloning Identities via Chatbot