Studying the language and structure in non-programmers ...web.engr.oregonstate.edu/~burnett/CS589and584/CS589-papers/Nat… · these "ndings with the requirements imposed by popular

Int. J. Human-Computer Studies (2001) 54, 237}264doi:10.1006/ijhc.2000.0410Available online at http://www.idealibrary.com on

Studying the language and structure innon-programmers’ solutions to programming problems

JOHN F. PANE, CHOTIRAT &&ANN'' RATANAMAHATANA AND BRAD A. MYERS

Computer Science Department and Human Computer Interaction Institute,Carnegie Mellon University, Pittsburgh, PA 15213, USA.email: pane#[email protected]; http://www.cs.cmu.edu/&NatProg.

Programming may be more di$cult than necessary because it requires solutions to beexpressed in ways that are not familiar or natural for beginners. To identify what isnatural, this article examines the ways that non-programmers express solutions toproblems that were chosen to be representative of common programming tasks. Thevocabulary and structure in these solutions is compared with the vocabulary andstructure in modern programming languages, to identify the features and paradigms thatseem to match these natural tendencies as well as those that do not. This information canbe used by the designers of future programming languages to guide the selection andgeneration of language features. This design technique can result in languages that areeasier to learn and use, because the languages will better match beginners' existingproblem-solving abilities.

( 2001 Academic Press

1. Introduction

Programming is a very di$cult activity. Some of the di$culty is intrinsic to program-ming, but this research is based on the observation that programming languages makethe task more di$cult than necessary because they have been designed without carefulattention to human}computer interaction issues (Newell & Card, 1985). In particular,programmers are required to think about algorithms and data in ways that are verydi!erent than the ways they already think about them in other contexts. For example,a typical C program to compute the sum of a list of numbers includes three kinds ofparentheses and three kinds of assignment operators in "ve lines of code; in contrast, thiscan be done in a spreadsheet with a single line of code using the sum operator (Green& Petre, 1996). The mismatch between the way programmers think about a solution andthe way it must be expressed in the programming language makes it more di$cult notonly for beginners to learn how to program, but also for people to carry out theirprogramming tasks even after they become more experienced.

The Natural Programming Project seeks to in#uence future programming languages bycollecting a human-centered body of facts that can be used to guide design decisions(Myers, 1998). This will allow the language designer to be more aware of the areas ofpotential di$culty, as well as suggest alternate approaches that more closely match theways that people think. This article describes a pair of new studies that examine thelanguage and structure of problem solutions written by non-programmers, and contrasts

1071-5819/01/020237#28 $35.00/0 ( 2001 Academic Press

238 J. F. PANE E¹ A¸.

these "ndings with the requirements imposed by popular modern programminglanguages.

The "rst study focuses on children because they are the audience for a new program-ming language the authors are designing. In addition, children are less likely to beprogrammers, so their responses should reveal problem-solving techniques that have notbeen in#uenced by programming experience. The exercises in this study are drawn fromthe domain of computer games and animated stories, because children are often interest-ed in building these kinds of programs. The second study then examines how the resultsof the "rst study generalize to a broader audience and a di!erent domain.

2. Related work

There has been a wealth of relevant research in the "elds of Psychology of Programmingand Empirical Studies of Programmers. In these "elds, programming is often de"ned asa process of transforming a mental plan that is in familiar terms into one that iscompatible with the computer (e.g. Lewis & Olson, 1987). Among others, Hoc andNguyen-Xuan (1990) have shown that many bugs and di$culties arise because thedistance between these is too large. This concept is called closeness of mapping by Green& Petre (1996, p. 146): &&The closer the programming world is to the problem world, theeasier the problem-solving ought to be2. Conventional textual languages are a longway from that goal.'' In that article, they provide an extensive set of cognitive dimensions,which can be used to guide and evaluate language designs. More concretely, Soloway,Bonar and Ehrlich (1989) found that the looping control structures provided by modernlanguages do not match the natural strategies that most people bring to the program-ming task. Furthermore, when novices are stumped they try to transfer their knowledgeof natural language to the programming task. This often results in errors because theprogramming language de"nes these constructs in an incompatible way (Bonar & Solo-way, 1989). For example, then is interpreted as afterwards instead of in these conditions.Many similar "ndings are summarized in an earlier report (Pane & Myers, 1996). Whilethese studies identify many of the problems with existing languages, they do not prescribesolutions. The goal of the current work is to discover alternatives that can avoid orovercome these problems.

Striving for naturalness does not necessarily imply that the programming languageshould use natural language. Programming languages that have adopted natural-lan-guage-like syntaxes, such as Cobol (Sammet, 1981) and HyperTalk (Goodman, 1987),still have many of the problems that are listed above, as well as other usability problems.For example, Thimbleby, Cockburn and Jones (1992) list many ways that HyperTalkviolates the human}computer interaction principle of consistency. There are also manyambiguities in natural language that are resolved by humans through shared context andcooperative conversation (Grice, 1975). Novices attempt to enter into a human-likediscourse with the computer, but programming languages systematically violate humanconversational maxims because the computer cannot infer from context or enterinto a clari"cation dialog (Pea, 1986). The use of natural language may compound thisproblem by making it more di$cult for the user to understand the limits ofthe computer's intelligence (Nardi, 1993). However, these arguments do not imply thatthe algorithms and data structures should not be close to the ways people think about

LANGUAGE AND STRUCTURE IN PROBLEM SOLUTIONS 239

the problem. In fact, Bruckman and Edwards (1999) have found that leveraging users'natural-language-like knowledge in a more formalized syntax is an e!ective strategy fordesigning end-user programming languages.

There are many motivations for why a more natural programming language might bebetter. Naturalness is closely related to the concept of directness which, as part of directmanipulation, is a key principle in making user interfaces easier to use. Hutchins, Hollanand Norman (1986) describe directness as the distance between one's goals and theactions required by the system to achieve those goals. Reducing this distance makessystems more direct and therefore easier to learn. User interface designers and re-searchers have been promoting directness at least since Shneiderman (1983) identi"ed theconcept, but it has not been a consideration in most programming language designs.

User interfaces in general are also recommended to be natural so they are easier tolearn and use and will result in fewer errors. For example, Nielsen (1993, p. 126)recommends that user interfaces should &&speak the user's language'' which includeshaving good mappings between the user's conceptual model of the information and thecomputer's interface for it. One of Hix and Hartson's usability guidelines is to UseCognitive Directness (1993, p. 38), which means to &&minimize the mental transformationsthat a user must make. Even small cognitive transformations by a user take e!ort awayfrom the intended task.'' Conventional programming languages require the programmerto make tremendous transformations from the intended tasks to the code design.

The current studies are similar to a series of studies by Lance Miller (1974, 1981) in the1970s. Miller examined natural language procedural instructions generated by non-programmers and made a rich set of observations about how the participants naturallyexpressed their solutions. This resulted in a set of recommended features for computerlanguages. For example, Miller suggested that contextual referencing would be a usefulalternative to the usual methods of locating data objects by using variables and travers-ing data structures. In contextual referencing, the programmer identi"es data objects byusing pronouns, ordinal position, salient or unique features, relative referencing orcollective referencing (Miller, 1981, p. 213).

Although Miller's approach provided many insights into the natural tendencies ofnon-programmers, there have only been a few studies that have replicated or extendedthat work. Biermann, Ballard and Sigmon (1983) con"rmed that there are many regulari-ties in the way people express step-by-step natural language procedures, suggesting thatthese regularities could be exploited in programming languages. Galotti and Ganong(1985) found that they were able to improve the precision in users' natural languagespeci"cations by ensuring that the users understood the limited intelligence of therecipient of the instructions. Bonar and Cunningham (1988) found that when userstranslated their natural-language speci"cations into a programming language, theytended to use the natural-language semantics even when they were incorrect for theprogramming language. It is surprising that the "ndings from these studies haveapparently not had any direct impact on the designs of new programming languages thathave been invented since then.

The studies reported in this article di!er from the prior art in several ways. For example,

f Miller's studies used verbose problem statements, raising the risk that the languageused in the participants' responses was biased by the materials. In fact, one of the

240 J. F. PANE E¹ A¸.

frequently observed keywords actually appeared in the problem statement that wasgiven to the participants. The current studies take great care to minimize this kind ofbias by using terse descriptions along with graphical depictions of the problemscenarios.

f Miller's studies placed constraints on the participants' solutions, such as: they werebroken into steps, each line of the text is limited to 80 characters; steps had to beretyped completely in order to edit them; and, a minimum of "ve steps was required ina solution. The current studies are much less constrained, allowing users to write ordraw as much or as little text and pictures as they need to convey their solutions.

f Miller's tasks were typical database problems from the era of his studies. The currentstudies investigate a broader range of tasks that incorporate modern graphical userinterfaces and media such as animations.

f Miller's participants were all college students. The current studies investigate a broaderage range.

Thus, the current studies may yield more reliable information about the naturalexpressions of a wider audience, on a broader range of algorithms and domains.

3. The studies

In the studies reported here, participants were presented with programming tasks andasked to solve them on paper using whatever diagrams and text they wanted to use.Before designing the tasks, the authors enumerated a list of essential programmingtechniques and concepts that are needed to program various kinds of applications. Theseinclude: use of variables, assignment of values, initialization, comparison of values andBoolean logic, incrementing and decrementing of counters, arithmetic, iteration andlooping, conditionals and other #ow control, searching and sorting, animation, multiplethings happening simultaneously (parallelism), collisions and interactions among objectsand response to user input.

Because children often express interest in creating games and animated stories, the "rststudy focused on the skills that are necessary to build such programs. The authors chosethe PacMan video game as a fertile source of interesting problems that require theseskills. Instead of asking the participants to implement an entire PacMan game, varioussituations were selected from the game because they touch upon one or more of theabove concepts. This allowed a relatively small set of exercises to broadly cover most ofthe concepts in a limited amount of time. The skills that were not covered in the "rststudy were covered in the second, which used scenarios that involved database manip-ulation and numeric computation.

A risk in designing these studies is that the experimenter could bias the participants bythe language used in asking the questions. For example, the experimenter cannot justask: &&How would you tell the monsters to turn blue when the PacMan eats a power pill?''because this may lead the participants to simply parrot parts of the question back in theiranswers. To avoid this, a collection of pictures and QuickTime movie clips weredeveloped to depict the various scenarios, using very terse captions. This enabled theexperimenter to show the depictions and ask vague questions to prompt the participantsfor their responses. An example is shown in Figure 1. Full details about these studies,


including copies of the materials and complete tabulation of the results are available ina supplementary report (Pane, Ratanamahatana & Myers, 2000).

4. Study one

The "rst study examines children's solutions to a set of tasks that would be necessary toprogram a computer game.

4.1. PARTICIPANTS

Fourteen "fth graders at a Pittsburgh public elementary school participated in thisstudy. The participants were equally divided between boys and girls, were racially diverseand were either 10 or 11 years old. All of the participants were experienced computerusers, but only two of them (both boys) said they had programmed before. All of theanalyses in this article examine only the 12 non-programmers. The participants wererecruited by sending a brief note and consent form to parents. The participants receivedno reward other than the opportunity to leave their normal classroom for half an hourand the opportunity to play a computer game for a few minutes.

4.2. MATERIALS

A set of nine scenarios from the PacMan game were chosen and graphical depictions ofthese scenarios were developed, containing still images or animations and a minimalamount of text. The topics of the scenarios were: an overall summary of the game, howthe user controls PacMan's actions, PacMan's behavior in the presence and absence ofother objects such as walls, what should happen when PacMan encounters a monsterunder various conditions, what happens when PacMan eats a power pill, scorekeeping,the appearance and disappearance of fruit in the game, the completion of one level andthe start of the next, and maintenance of the high score list. Figure 1 shows one of thescenario depictions. The participants viewed the depictions on a color laptop computerand wrote their solutions on blank unlined paper.

4.3. PROCEDURE

After a brief interview to gather background information, participants were shown eachscenario and asked to write down in their own words and pictures how they would tellthe computer to accomplish the scenario. When a response was judged to be incompleteor unsatisfactory, the experimenter attempted to elicit additional information by askingthe participant to give more detail, by demonstrating an error in the existing answer, orby asking questions that were carefully worded to avoid in#uencing the responses. Thesessions were audiotaped.

4.4. CONTENT ANALYSIS

The authors developed a rating form to be used by independent raters to analyse eachparticipant's responses. Each question on the form addressed some facet of the par-ticipant's problem solution, such as the way a particular word or phrase was used or

FIGURE 1. Depiction of a problem scenario in study one.

242 J. F. PANE E¹ A¸.

some other characteristic of the language or strategy that was employed. Many ofthese questions arose from the results of a pilot study. In addition, preliminary reviewof the participant data revealed trends in the solutions that the authors thoughtwere important, so the rating form was supplemented with questions to explore theseas well.

Each question was followed by several categories into which the participant's res-ponses could be classi"ed. The rater was instructed to look for relevant sentences in theparticipant's solution, and classify each one by placing a tickmark in the appropriatecategory, also noting which problem the participant was answering when the sentencewas generated. Each question also had an other category, which the rater marked whenthe participant's utterance did not fall into any of the supplied categories. When they didthis, they added a brief comment.

Five independent raters categorized the participants' responses. These raters wereexperienced computer programmers, who were recruited by posting to Carnegie MellonUniversity's electronic bulletin boards and were paid for their assistance. They weregiven a one-page instruction sheet describing their task. Each analyst "lled out a copy ofthe 17-question rating form for each of the participants. Figure 2 shows one of thequestions from the rating form for study one.

4.5. RESULTS

The participants' solutions ranged from one to seven pages of handwritten text anddrawings. The raters were instructed to use each utterance (statement or sentence) as theunit of text to analyse. Since each rater independently partitioned the text into theseunits, the total number of tickmarks di!ered across raters, so the results are normalizedby looking at the proportion of the tickmarks credited to each category rather than theraw counts. Although there were variances among the results from individual raters,

3. Please count the number of times the student uses these various methods to express concepts aboutmultiple objects. (The situation where an operation a!ects some or all the objects or when di!erentobjects are a!ected di!erently.)

(a) 1*

2*

3*

4*

5*

6*

7*

8*

9*

Thinks of them as a set or subsets of entities and operates on those or speci"es them with plurals.Example: Buy all of the books that are red.

(b) 1*

2*

3*

4*

5*

6*

7*

8*

9*

Uses iteration (i.e. loop) to operate them explicitly.Example: For each book, if it is red, buy it.

(c) 1*

2*

3*

4*

5*

6*

7*

8*

9*

Other (please specify)

FIGURE 2. A question from the rating form for study one.


their ratings were generally similar. So the results are reported as averages across allraters (n"5) and all the non-programmer participants (n"12).

The results for each rating form question are summarized with an overall prevalencescore followed by frequency scores for each category sorted from most frequent to leastfrequent. The prevalence score measures the average count of occurrences that each raterclassi"ed for each participant when answering the current question. In study one, thisscore varies from 1.0 to 23.2, indicating the relative amount of data that was available tothe raters in answering the question. The frequency scores then show how thoseoccurrences were apportioned across the various categories, expressed as percentages.The frequencies may not sum to exactly 100% due to rounding errors. The examples arequoted from the participants' solutions. Table 1 summarizes the results that follow,which are sorted into four general categories: the overall structure of the solutions, theways that certain keywords are used, the kinds of control structures that are used and themethods used to e!ect various aspects of computation.

4.6. OVERALL STRUCTURE

4.6.1. Programming style. The raters classi"ed each statement or sentence in the solu-tions into one of the following categories based on the style of programming that it mostclosely matches.Prevalence: 22.7 occurrences per participant.

f 54%*production rules or event-based, beginning with when, if or after.Example: =hen PacMan eats all the dots, he goes to the next level.

f 18%*constraints, where relations are stated which should always hold.Example: PacMan cannot go through a wall.

f 16%*other (98% of these were classi"ed by the raters as declarative statements).Example: ¹here are 4 monsters.

f 12%*imperative, where a sequence of commands is speci"ed.Example: Start with this image. Play this sound. Display 00Player One Get Ready.''

TABLE 1Summary of results from the study. Items with frequencies below 5% do not appear

Overall structureProgramming style Perspective Modifying state54% Production rules/events 45% Player or end-user 61% Behaviors built into objects18% Constraints 34% Programmer 20% Direct modi"cation16% Other (declarative) 20% Other (third-person) 18% Other12% Imperative

Pictures67% Yes

KeywordsAND OR ¹HEN67% Boolean conjunction 63% Boolean disjunction 66% Sequencing29% Sequencing 24% To clarify or restate a prior item 32% &&Consequently'' or &&in that case''

8% &&Otherwise''5% Other

Control structuresOperations on multiple objects Complex conditionals ¸ooping constructs95% Set/subset speci"cation 37% Set of mutually exclusive rules 73% Implicit5% Loops or iteration 27% General case, with exceptions 20% Explicit

23% Complex boolean expression 7% Other14% Other (additional uses of exceptions)

ComputationRemembering state Mathematical operations Insertion into a data structure56% Present tense for past event 59% Natural language style * incomplete 48% Insert "rst then reposition others19% &&After'' 40% Natural language style * complete 26% Insert without making space11% State variable 17% Make space then insert6% Discuss future events Motions 8% Other5% Past tense for past event 97% Expect continuous motion

Sorted insertion¹racking progress Randomness 43% Incorrect method85% Implicit 47% Precision 28% Correct non-general method14% Maintain a state 20% Uncertainty without using &&random'' 18% Correct general method

18% Precision with hedging15% Other

244J.

F.P

AN

EE¹

A¸

.


4.6.2. Perspective. Beginners sometimes confuse their role or perspective while theyare developing a program. Instead of thinking about the program from the perspective ofthe programmer, they might adopt the role of the end-user of the program, or in the caseof games and stories, one of the characters portrayed by the program. The ratersclassi"ed the participants' statements according to the perspective or role that theyindicated.Prevalence: 23.2 occurrences per participant.

f 45%*player's or end-user's perspective.Example: =hen I push the left arrow PacMan goes left.

f 34%*programmer's perspective.Example: If arrow for Player 1 is &&left'' move PacMan left.

f 20%*other (99% of these were classi"ed by the raters as third-person perspective).Example: If he eats a power pill and he eats the ghosts, they will die.

4.6.3. Modifying State. The raters examined places where the participants were makingchanges to an entity.Prevalence: 4.6 occurrences per participant.

f 61%*behaviors were built into the entity, in an object-oriented fashion.Example. Get the big dot and the ghost will turn colors2

f 20%*direct modi"cation of the properties of entities.Example: After eating a large dot, change the ghosts from original color to blue.

f 18%*other.

4.6.4. Pictures. In addition to the above classi"cations done by the raters, the experi-menter examined each solution to determine whether pictures were drawn as part of thesolution.

f 67%*included at least one picture.f 33%*used text only.

4.7. KEYWORDS

4.7.1. AND. The raters examined the intended meaning when the participants used theword AND.Prevalence: 6.3 occurrences per participant.

f 67%*boolean conjunction.Example: If PacMan is travelling up and hits a wall, the player should2

f 29%*for sequencing, to mean next or afterward.Example: PacMan eats a big blinking dot, and then the ghosts turn blue.

f 3%*otherExample: Every level the fruit should stay for less and less seconds.

4.7.2. OR. The raters examined the intended meaning when the participants used theword OR.Prevalence: 1.5 occurrences per participant.

246 J. F. PANE E¹ A¸.

f 63%*boolean disjunction.Example: To make PacMan go up or down, you push the up or down arrow key.

f 24%*clarifying or restating the prior item.Example: When PacMan hits a ghost or a monster, he loses his life.

f 8%*meaning otherwise.f 5%*other.

4.7.3. THEN. The raters examined the intended meaning when the participants used theword ¹HEN.Prevalence: 2.2 occurrences per participant.

f 66%*sequencing, to mean next or afterward.Example: First he eats the fruit, then his score goes up 100 points.

f 32%*meaning consequently or in that case.Example: If you eat all the dots then you go to a higher level.

f 1%*to mean besides or also.f 1%*other.

4.8. CONTROL STRUCTURES

4.8.1. Operations on multiple objects. The raters examined those statements thatoperate on multiple objects, where some or all of the objects are a!ected by theoperation.

Prevalence: 6.1 occurrences per participant.

f 95%*set and subset speci"cations.Example: When PacMan gets all the dots, he goes to the next level.

f 5%*loops or iteration.Example: d5 moves down to d6, d6 moves to d7, etc., until d10 which is kicked o+ thehigh score list.

4.8.2. Iteration or looping constructs. The raters examined those statements that wereeither implicit or explicit looping constructs.


f 73%*implicit, where only a terminating condition is speci"ed.Example: Make PacMan go left until a dead end.

f 20%*explicit, with keywords such as repeat, while and so on, etc.f 7%*other.

4.8.3. ELSE or equivalent clauses. The raters looked for occurrences of E¸SE clauses orequivalent constructs in the participants' solutions. They simply counted these, withoutclassifying them further.



4.8.4. Complex conditionals. The raters examined those statements that specify condi-tions with multiple options.


f 37%*a set of mutually exclusive rules.Example: When the monster is green he can kill PacMan. When the monster is bluePacMan can eat the monster.

f 27%*a general condition, subsequently modi"ed with exceptions.Example: When you encounter a ghost, the ghost should kill you. But if you have a powerpill you can eat them.

f 23%*Boolean expressions.Example: After eating a blinking dot and eating a blue and blinking ghost, he should getpoints.

f 14%*other (95% of these either listed the exception "rst or did not list a general case).Example: If he gets a [power pill] then if you run into them you get points.

4.9. COMPUTATION

4.9.1. Remembering state. The raters examined the methods used to keep track of statewhen an action in the past should a!ect a subsequent action.


f 56%*using present tense when mentioning the past event.Example: =hen PacMan eats a special dot he is able to eat the ghosts.

f 19%*using the word after.Example: After using up the power pill, the ghosts can eat PacMan again.

f 11%*using a state variable to track information about the past event.Example: When the monster is blue PacMan can eat the monster.

f 6%*mentioning the future event at the time of past event.Example: When PacMan gets a shiny dot, then if you run into the ghosts, you get points.

f 5%*using the past tense when mentioning the past event.Example: In about 10 s, if PacMan didn+t eat it take it ow again.

f 4%*other.

4.9.2. Tracking progress. The raters examined the methods used to keep track ofprogress through a long task.


f 85%*all or nothing, where tracking is implicit or done with sets.Example: When PacMan gets all the dots, he goes to the next level.

f 14%*using counting, where a variable such as a counter tracks the progress.Example: When PacMan loses 3 lives, it1s game over.

f 1%*other.

4.9.3. Mathematical operations. The raters examined the kinds of notations used tospecify mathematical operations.


248 J. F. PANE E¹ A¸.

f 59%*natural language style, missing the amount or the variable.Example: When he eats the pill, he gets more points2

f 40%*natural language style, with no missing information.Example: When PacMan eats a big dot, add 100 points to the score.

f 0%*programming language style (count"count#20).f 0%*mathematical style (count#20).

4.9.4. Motions. The raters examined the participants' expectations about whethermotions of objects should require explicit incremental updating.


f 97%*expect continuous motion, specifying only changes in motion.Example: PacMan stops when he hits a wall.

f 2%*continually update the positions of moving objects.f 1%*other.

4.9.5. Randomness. The raters examined the methods used by the participants' inexpressing events that were supposed to happen at uncertain times or with uncertaindurations.


f 47%*using precision, where no element of uncertainty is expressed.Example: Put the new fruit in every 30 s.

f 20%*using words other than random to express the uncertainty.Example: ¹he fruit will go away after a while.

f 18%*using precision with hedging to express uncertainty.Example: After around 3 or 4 more seconds the fruit disappears.

f 15%*other (often the action was tied to another event).Example: Put a fruit on the screen when PacMan is running out of power.

f 0%*used the word random.

4.9.6. Insertion into a data structure. The raters examined the methods used by theparticipants to insert an element into the middle of an existing sequence of elements.


f 48%*inserting "rst, repositioning other elements afterwards.f 26%*no mention of making room for the inserted element.f 17%*making space by repositioning others, then inserting the element.f 8%*other.

4.9.7. Sorted insertion. The raters examined the methods used by the participants todetermine the correct place to insert an element into a sorted list.


f 43%*using an incorrect method, with missing or incorrect details.f 28%*a method that is correct for the current data, but not a correct general solution.f 18%*a correct general method that would work for any data.f 10%*other.


4.10. DISCUSSION

Combined discussion of the two studies appears in Section 6.

5. Study two

To see whether the observations from the "rst study would generalize to other domainsand other age groups, a second study was conducted. This study used database accessscenarios that are more typical of business programming tasks and was administered toa group of adults as well as a group of children similar to the participants in study one.

5.1. PARTICIPANTS

Nineteen adults from the Carnegie Mellon University community, ranging in age from18 to 34, participated in the study (10 men, 9 women). In addition, 22 "fth graders, ages10 or 11, participated (13 boys, 9 girls). These "fth graders were recruited from the samePittsburgh public elementary school as study one, but it was a new academic year sonone of the participants from study one were involved in study two. The participantswere racially diverse. Although the children spanned a range of academic abilities, all ofthe Carnegie Mellon participants had strong academic backgrounds.

Of the adults, only "ve had never programmed before (2 men, 3 women). Of thechildren, 14 said they had never programmed before (11 boys, 3 girls). There is reason tobelieve that some of the children who claimed to be programmers did not accuratelyanswer this question because they did not really know what programming is. Nonethe-less, only those participants who said they never programmed were included in theanalysis that follows.

The adult participants were recruited by word of mouth and signed the usual humansubject consent forms. The children were recruited by sending a brief note and consentform to parents. The adult participants received no reward for their participation; thechildren had an opportunity to leave their normal classroom for a half hour and weregiven a snack at the end of their participation.

5.2. MATERIALS

A set of 11 scenarios were created, representing a progression of problems that a pro-grammer might encounter in the process of creating and manipulating a database ofnames and numeric values. These scenarios were chosen to cover some of the essentialconcepts of programming that were not addressed in study one and to further elucidatesome of the results from that study. As in study one, graphical depictions of thesescenarios were developed. In this case they contained before and after pictures ofdatabase values in a tabular layout, with graphical annotations highlighting the di!er-ences between the before and after pictures, along with a minimal amount of text thatwas carefully chosen to avoid biasing the participants&& responses. The topics of thescenarios were: entering values into the correct rows of a table, adding certain values ineach row to produce a column of sums, discarding the smallest or largest value from eachrow when calculating the sum, assigning nominal values to each row depending ontextual attributes or numeric ranges, producing a numerically sorted summary table with

No. First name Last name Average score Performance

1 Sandra Bullock 30002 Bill Clinton 600003 Cindy Crawford 5004 Tom Cruise 50005 Bill Gates 60006 Whitney Houston 40007 Michael Jordan 200008 Jay Leno 500009 David Lettermen 700

10 Will Smith 9000

Question 5Af Describe in detail what the computer should do to obtain these results.

No. First name Last name Average score Performance

1 Sandra Bullock 3000 Fine2 Bill Clinton 60000 Extraordinary3 Cindy Crawford 500 Poor4 Tom Cruise 5000 Fine5 Bill Gates 6000 Fine6 Whitney Houston 4000 Fine7 Michael Jordan 20000 Extraordinary8 Jay Leno 50000 Extraordinary9 David Lettermen 700 Poor

10 Will Smith 9000 Poor

FIGURE 3. Depiction of a problem scenario in study two.

250 J. F. PANE E¹ A¸.

entries for only the rows with the highest sums, adding or subtracting a "xed value toevery value in a column, deleting rows from the table or adding rows to it, and zeroing allthe values in a column. Figure 3 shows one of the scenario depictions. The depictionswere displayed to the participants on paper and they wrote their solutions directly on theproblem pages.

5.3. PROCEDURE

The same procedure was used as in study one, except the sessions were not audiotaped.

5.4. CONTENT ANALYSIS

Once again a form was developed, similar to the one used in study one, so thatindependent raters could analyse the data. This rating form had 18 questions. Becausethe performance of the "ve analysts in the "rst study was satisfactory, there was generalagreement among them and the task was very tedious, the authors decided that threeanalysts were su$cient for the second study. The analysts from the "rst study werepermitted to return for this study because there was no reason to expect their prior


participation to have a material a!ect on the results. So three analysts from the priorstudy analysed the participants' responses in this study.

5.5. RESULTS

The participants answers typically consisted of one to "ve sentences in response to eachof the 11 questions. Once again, there was general agreement among the raters. Theperformance of adults was generally similar to the performance of children. So, theresults reported below are averages across the raters (n"3) and all of the non-program-mer participants (n"19, 5 adults and 14 children).

As in study one, the results for each question are summarized with an overallprevalence score followed by frequency scores for each category. The prevalence scoremeasures the average count of occurrences that each rater classi"ed for each participantwhen answering the current question. In study two, this score varies from 0.2 to 11.5,indicating the relative amount of data that was available to the raters in answering thequestion. The frequency scores then show how those occurrences were apportionedacross the various categories, expressed as percentages. The frequencies may not sum toexactly 100% due to rounding errors. The examples are quoted from the participants'solutions. Table 2 summarizes the results that follow, which are sorted into three generalcategories: the ways that certain keywords are used, the kinds of control structures thatare used and the methods used to e!ect various aspects of computation.

5.6. KEYWORDS

5.6.1. AND. The raters examined the intended meaning when the participants used theword AND.


f 47%*boolean conjunction.Example: Erase Bill Clinton and Jay ¸eno.

f 43%*sequencing, meaning next or afterward.Example: Crossed out the highest score and added the lower scores.

f 5%*other.f 4%*to specify a range.

Example: Fine is between 3000 and 20 000.

5.6.2. AND as a Boolean operator. The raters examined the answers to two questionsthat were likely to elicit Boolean expressions. If the word AND appeared in a Booleanexpression, the raters determined whether it was used correctly.


f 76%*incorrect, interpreting as Boolean conjunction would not give the intendedresult.Example: Everybody whose name starts with the letter G and ¸ would be in the blackgroup.

f 24%*correct, Boolean conjunction is intended meaning.Example: If AvgScore *1000 and (10 000, say Fine.

TABLE 2Summary of results from the second study. Items with frequencies below 5% do not appear

KeywordsAND OR B;¹

47% Boolean conjunction 100% Boolean disjunction 92% To mean &&except''43% Sequencing 8% Other5% Other NO¹

100% Low precedence ¹HENAND as a Boolean operator 91% Sequencing76% Incorrect 7% &&Consequently''24% Correct

Control structuresOperations on multiple Complex conditionals97% Sets and subsets, including plurals 45% Set of mutually exclusive conditions

36% Dependent clause cannot stand alone16% Nested conditions

ComputationSet construction Specifying open intervals Sorting46% Plurals 35% &&Above'' is exclusive 37% &&Alphabetical'', etc.18% &&Each'' or &&every'' 22% &&Above'' is inclusive 36% &&From A to Z'', etc.16% Naming a column of the table 22% Powers of ten 11% Concrete example14% &&All'' 15% Other 9% Provide a key to a sort operator

5% Mathematical notationSet manipulation Deleting an element from a data structure45% Set inverse Specifying closed intervals 73% No hole expected after deletion29% Set di!erence 35% &&From2to'' is inclusive 25% Repaired a hole after deletion22% Disjoint or mutually exclusive sets 19% Powers of ten5% Other 10% Mathematical notation Inserting an element into a data structure

9% Other 75% Insert without making spaceComplete speci,cation of ranges 9% &&Between'' used inconsistently 16% Make space then insert50% Correct 7% &&From2to'' used inconsistently 6% Insert then make space50% Incorrect 6% &&Between'' is inclusive

5% Ends of interval speci"ed separately Sorted insertion46% Incorrect method

Mathematical operations 34% Correct non-general method52% Natural language style * complete 13% Correct general method40% Other 6% Insert then sort

252J.

F.P

AN

EE¹

A¸

.


5.6.3. OR. The raters examined the places where the participants used the word OR asa Boolean operator to see if it was used correctly.


f 100%*correct.Example: [Score] in the hundreds or less is poor.

5.6.4. NOT. The raters examined the places where the participants used the word NO¹

as a Boolean operator to see what operator precedence was intended.Prevalence: 0.1 occurrences per participant.

f 100%*low precedence: NO¹ A or B means NO¹ (A or B).Example: ¹he Gold group [contains the people] with the ,rst two letters in their lastname that are not ¸e or Ga.

5.6.5. BUT. The raters examined the intended meaning when the participants used theword B;¹.


f 92%*to mean except.Example: Add every element in the row, but the maximum.

f 8%*other.f 0%*to mean and.

5.6.6. THEN. The raters examined the intended meaning when the participants used theword ¹HEN.


f 91%*sequencing, to mean next or afterward.Example: Add up all the scores in each row, then subtract the lowest score in each row.

f 7%*to mean consequently or in that case.Example: If their name begins with a G or an ¸ then put them in the Black group.

f 1%*besides or also.f 1%*other.

5.7. CONTROL STRUCTURES

5.7.1. Operations on multiple objects. The raters examined statements that operate onmultiple objects, where some or all of the objects are a!ected by the operation.


f 97%*set or subset speci"cations, including the use of plurals.Example: Select the four highest scores of the participants.

f 3%*loop or iteration.Example: Match the last name and ,ll the score until there is no more input.

f 1%*other.

5.7.2. Complex conditionals. The raters examined statements specifying conditions withmultiple options.

254 J. F. PANE E¹ A¸.


f 45%*a set of mutually exclusive conditions.Example: If average score is less than 1000, performance is poor. If average score isbetween 1000 and 10 000, performance is ,ne. If average score is more than 10 000,performance is extraordinary.

f 36%*a condition with a dependent clause that cannot stand alone.Example: If the people's last name start with G or ¸ they are on the black team. If notthey are on the gold team.

f 16%*nested conditions.Example: If average score is in the hundreds it1s poor. ¸ess than 10 000 is ,ne.

f 3%*other.

5.8. COMPUTATION

5.8.1. Set construction. The raters examined the places where sets are used, to determinehow those sets were constructed.


f 46%*using plurals.Example: Add the scores of 3 rounds.

f 18%*using the words each or every.Example: Add the score in every round.

f 16%*naming a column of the table.Example: Add 10 000 points to Round 1 and Round 3.

f 14%*using the word all.Example: Subtract [20 000 from] all elements in Round 22

f 4%*enumerating the members of the set.f 1%*other.

5.8.2. Set manipulation. The raters examined the ways that subsequent sets are createdafter an initial related set has been created.


f 45%*using set inverse, where the leftover items are operated on.Example: If the last name begins with G or ¸, they are in the Black group. ¹he rest are inthe Gold group.

f 29%*set di!erence, where some items are removed from the speci"ed set.Example: Add all the Rounds up except the highest score to get ¹O¹A¸.

f 22%*constructing disjoint or mutually exclusive sets.Example: Black is for G and ¸. Gold is for B, C, H, J and S.

f 5%*other.

5.8.3. Complete specixcation of ranges. The raters examined the participants' statementsthat specify a range of integers, to see whether all of the possibilities were covered withoutholes or overlaps.



f 50%*correct.Example: Scores below 1000 are poor. Scores from 1000 to 10 000 are ,ne. Any scoresabove 10 000 are extraordinary.

f 50%*incorrect.

5.8.4. Specifying open intervals. The raters examined the participants' statementsspecifying open intervals, where all values beyond a single boundary are speci"ed.


f 36%*words such as above, below, greater than or less than were intended to beexclusive.Example: ¹he performance of the person with the average scores below 1000 is con-sidered as poor (the participant then used good for 1000).

f 22%*words such as above, below, greater than or less than were intended to beinclusive.Example: Poor would be below 999 (the participant then used poor for 999).

f 22%*powers of 10 were used to specify the range.Example: If your score is in the hundred1s your performance is poor.

f 15%*other.f 5%*mathematical notation, with inequality operators such as &&''' or &&)''.

Example: if score (1000, performance"poor.

5.8.5. Specifying closed intervals. The raters examined the participants' statementsspecifying closed intervals, where both boundaries are speci"ed for a range of values.

1.2 occurrences per participant.

f 35%*from2 to, the symbol &&-'', or similar notations are intended to be inclusive.Example: ¹he performance of ones whose average scores from 1000 up to 10 000 isconsidered as a ,ne performance (the participant then assigned ,ne to both 1000 and10 000).

f 19%*powers of 10 were used to specify the range.Example: If your score is in the thousands, you are ,ne.

f 10%*mathematical notation, with inequality operators such as &&''' or &&)''.Example: 1000(x(9999; performance",ne.

f 9%*other.f 9%*between is used with an inconsistent meaning at each end of the interval.

Example: If the average score is between 1000 and 10 000, the performance is ,ne (theparticipant then assigned ,ne to 1000, and extraordinary to 10 000).

f 7%*from2 to, the symbol &&-'', or similar notations are used with an inconsistentmeaning at each end of the interval.Example: Scores from 1000 to 10 000 are xne (the participant then assigned ,ne to 1000,and extraordinary to 10 000).

f 6%*between is intended to be inclusive.Example: Score between 1000 and 10 000 is ,ne (the participant then assigned ,ne toboth 1000 and 10 000).

256 J. F. PANE E¹ A¸.

f 5%*speci"ed each end of the interval separately.f 0%*between is intended to be exclusive.f 0%2from2 to, the symbol &&-'' or similar notations are intended to be exclusive.

5.8.6. Mathematical operations. The raters examined the kinds of notations used by theparticipants' in specifying mathematical operations.


f 52%*natural language style, with no missing information.Example: Add 10 000 points to the scores in Rounds 1 and 3.

f 40%*other (which includes natural language style, with missing amount or variable).Example: Add up the scores of each person but don1t add the highest number (missingvariable).

f 4%*mathematical notation.Example: Column for r2"x!20 000.

f 4%*programming language notation.

5.8.7. Sorting. The raters examined the participants' solutions to see how sorting opera-tions were expressed.

Prealence: 1.3 occurrences per participant.

f 37%*using keywords such as alphabetical or numerical.Example: Sort the table alphabetically.

f 36%*using expressions like from A to Z or from the lowest to the highest.Example: Put the 4 highest scores2 in a di+erent table from the highest to smallest.

f 11%*using a concrete example from the current situation.Example: Put him in number 6 because his last name comes before Jordan but afterHouston.

f 9%*using a sort key, such as sort according to score.Example: Insert Elton John in order of the last name.

f 4%*using words like ascending or descending.Example: Sort &&total score'' column in descending order.

f 4%*other.

5.8.8. Deleting an element from a data structure. The raters examined the methods usedto delete an element from the middle of an existing sequence of elements, to see whetherthey expected a hole to be left behind.


f 73%*no hole was expected after the deletion.Example: ¹ake out Bill and Jay then put Elton John in.

f 25%*"xed a hole after the deletion.Example: Delete Row 2 and 8, moving everyone down to any unoccupied Rows.

f 2%*other.

5.8.9. Insertion into a data structure. The raters examined the methods used to insert anelement into the middle of an existing sequence of elements to see whether they expectedthat items would have to be arranged to make space for the new element.



f 75%*no mention of making room for the new element.Example: Put Elton John in the records in alphabetical order.

f 16%*make room for the element before inserting it.Example: ;se the cursor and push it down a little and then type Elton John in the freespace.

f 6%*make room for the element after inserting it.f 4%*other.

5.8.10. Sorted insertion. The raters examined the methods used to determine the correctplace to insert an element into a sorted sequence of elements.


f 46%*using an incorrect method, with missing or incorrect details.Example: Insert row between number 5 and 7 and name it Elton John.

f 34%*a method that is correct for the current data, but not a correct general solution.Example: Put him in number 6 because his last name comes before Jordan but afterHouston.

f 13%*a correct general method that would work for all data.Example: Insert Elton John into the table in alphabetical order of the last name.

f 6%*insert then sort.Example: Add Elton John, and then sort the table alphabetically.

f 2%*other.

6. Discussion of results

This section contains discussion of the combined results from the two studies. In additionto interpretation of the results, this section includes some recommendations on how theprogramming system might be made more natural.-

6.1. PROGRAMMING STYLE

The majority of the statements written by the participants were in a production-rule orevent-based style, beginning with words like if or when. However, the raters observeda signi"cant number of statements using other styles, such as constraints, other declar-ative statements (that were not constraints) and imperative statements.

The dominance of rule- or event-based statements suggests that a primarily imperativelanguage may not be the most natural choice. One characteristic of imperative languagesis explicit control over program #ow. Although imperative languages have if statements,they are evaluated only when the program #ow reaches them. The participants' solutionsseem to be more reactive, without attention to the global #ow of control. Whenimperative statements were used, it was usually for local #ow of control. The declarative

-These recommendations are not restricted to the programming language in isolation, but encompass theentire programming system, which includes the programming environment (editor, debugger, etc.), as well asthe language. In modern programming systems these components all work in tandem, so it is useful to considerhow the "ndings of this study might impact the entire system.

258 J. F. PANE E¹ A¸.

style seems to have been primarily used for setting up the scenario (data, characters,objects, etc.) of the program. Many of the constraints that were observed in this studywere graphical in nature, such as objects that had certain "xed positions relative to oneanother or limitations on where those objects could go. The event-based style is used byseveral popular end-user programming environments such as Visual Basic, Lingo forMacromedia's Director and HyperTalk for HyperCard, although these systems haveusability problems of their own (see, for example, Thimbleby et al., 1992).

This mix of styles suggests that designers might be able to improve usability by notlimiting the language to a single style. Di!erent styles seem to be more natural fordi!erent parts of the programming task.

6.2. OPERATIONS ON MULTIPLE OBJECTS

The results from both studies highlight an important area where today's popularprogramming languages di!er from the natural expressions used by the participants: theway that operations are performed on multiple objects. Most popular languages requireiterative operation on the objects, one at a time, while the participants strongly preferredto use set and subset expressions or plurals, to specify the operations in aggregate. Miller(1974, 1981) made similar observations in his studies.

It has been well established in the literature that loops are a hotspot of di$culty anderrors for novice programmers (du Boulay, 1989). And in many cases a loop is a morecomplicated and contorted way to specify operations that the participants were able toexpress easily and succinctly with aggregate set operations. New languages mightsupport these aggregate operations, thus eliminating many of the cases where loopswould otherwise be necessary.

Another requirement imposed by loops is the need to use extra variables to countiterations, #ag terminating conditions or hold the current object being operated upon.This is even true in &&high level'' looping constructs such as mapcar in Lisp. The aggregateoperations preferred by the participants reduce the need for these variables, which areanother known area of di$culty for beginners (du Boulay, 1989). Spreadsheets providea few aggregate operators, such as sum, but this feature is not generalized across all of theoperators.

However, the participants did use looping constructs in a few cases, and the languageshould support these as well. Often, these loops use until to specify a terminatingcondition, while other times the terminating condition is implicit in phrases such as andso on or etc. In deciding the exact loop control structures to provide, the languagedesigner should consider prior empirical studies which found that novices expect theterminating condition to be checked continuously, and the loop to halt the instant thecondition is satis"ed, rather than waiting until all of the subsequent statements inside theloop have been executed one last time (du Boulay, 1989).

6.3. SET CONSTRUCTION AND MANIPULATION

Study two illustrates a variety of ways that the participants construct sets: usingplurals, the keywords each, every or all or by naming columns in a table. Once they hadcreated a set, the participants often used operations such as inverse or di!erence to create


related sets. However, they sometimes preferred to create a separate disjoint set fromscratch.

6.4. COMPLEX CONDITIONALS AND NOT

The participants used a number of ways to avoid writing complex Boolean conditionals.For example, they often wrote a series of mutually exclusive simple rules instead ofa more complex conditional.

Also, they would sometimes express a general case followed by exceptions, as in:

if A do something unless B.

Notice that the equivalent Boolean expression that would be required to accomplishthis in many programming languages involves not only a conjunction, but also thenegation of the exception clause:

if A and not B do something.

The try2catch exception mechanisms in C##, Java, Lisp and other languagessupport this tendency by putting the general case "rst and listing the exceptions later, butother control structures in these languages do not. It might be useful to support the use ofunless clauses throughout the language.

The raters found very few uses of negation. This is consistent with earlier "ndings thatexpressing negative concepts is more di$cult than a$rmative ones (Wason, 1959).

When the participants did use the not operator they gave it low precedence, which iscontrary to the precedence that it has in most programming languages. A subsequentstudy found the use of not to be inconsistent: sometimes it was used with high precedenceand other times with low precedence; and using parentheses was not e!ective to clarifyprecedence (Pane & Myers, 2000). Operator precedence errors were among the high-frequency bugs observed by Spohrer and Soloway (1986) in novice programs in a tradi-tional programming language. However, in a recent study of a natural language styleprogramming language, Bruckman and Edwards (1999) found that operator precedenceerrors were very infrequent. Further study is warranted to determine how languagesshould deal with issues of precedence.

6.5. MATHEMATICAL OPERATIONS

In study one, all of the mathematical operations were expressed in a natural languageform; the raters found no mathematical or programming language notations. In studytwo, they found a very small amount of mathematical and programming notationsamong the adults' solutions. The vast preference for natural language mathematicaloperations should be supported by the programming language. However, more concisemathematical notation may be still necessary for calculations that are more complexthan the ones required by the tasks in these studies.

Many of the mathematical expressions were missing either the variable on which tooperate or the amount of the operation. This might be solved by providing slots thatmake the missing information more obvious, or by entering into a dialog with the user,with questions such as how much? or to what?

260 J. F. PANE E¹ A¸.

6.6. SPECIFICATIONS OF RANGES AND INTERVALS

In study two, the raters found that the participants were only about 50% success-ful in specifying ranges without holes or overlaps. Adults were more successful thanchildren, possibly because they had mathematical notations for inequality in theirarsenal. The children never made use of these mathematical notations. Instead they usedpowers of 10, or natural language expressions of inequality such as above or greaterthan. However, the participants were inconsistent about whether these latter termswere inclusive or exclusive. Adults achieved 100% accuracy when they used mathemat-ical notations, suggesting that these are a better choice for audiences who understandthem.

6.7. TACKING PROGRESS AND REMEMBERING STATE

The participants often avoided the use of variables to track progress in a task. This is notsurprising because, as mentioned above, variables are an area of di$culty for noviceprogrammers (du Boulay, 1989). Instead of variables, the participants preferred to useterms like all or none to detect when the task is "nished. When they needed to usehistorical information to make decisions about present actions (or present information tomake decisions about future actions), the participants usually did not use state variablesto record the information. Instead they used future and past tenses to refer to the neededinformation. State variables are the only way to accomplish this in most programminglanguages. The challenge for language designers is to "nd ways to accommodate themore natural preferences.

6.8. AND, OR AND BUT

The raters found that often the word and was used as a sequencing word rather than asa Boolean operator. Also, in study two the raters examined the Boolean uses of and, andfound that 75% were used in situations where the or operator would be required toachieve the desired e!ect in today's programming languages, as well as the querylanguages used for most database search engines. For example, a subject said, &&if youscore 90 and above'', but the score cannot simultaneously be 90 and greater than 90.Because the natural uses of and have such diverse meanings, and most of them areinconsistent with the Boolean operator, designers of future language should considersubstituting a di!erent name or symbol for this operator.

Or and but appeared too rarely in these studies to draw "rm conclusions withoutfurther research. When or was used, a Boolean interpretation would result in correctresults. The infrequent use of or may be because disjunctive expressions are cognitivelymore di$cult than conjunctive ones (Bourne, 1966).

6.9. THEN

The raters found that the most popular use of the word then is for sequencing orspecifying that an action should happen after "nishing a prior action. This is inconsistentwith its use in most programming languages, where it means consequently. This con"rmsan earlier observation by du Boulay (1989).


6.10. DATA STRUCTURE OPERATIONS: INSERTION, DELETION, SORTING

When the participants were inserting and deleting data elements, they often did notconsider issues about storage space that come up when working with the array datastructures in most popular programming languages. This suggests that a built-in list-likedata structure such as in the Lisp language, may be more natural.

The participants seemed to expect sorting to be a basic operator that they could utilizein their solutions, using expressions like alphabetical or from A to Z. When they wereasked to provide an algorithm for sorting, they were rarely able to do this in a correctgeneral way.

6.11. RANDOMNESS AND UNCERTAINTY

The raters did not "nd any uses of the word random in study one. Instead, theparticipants either expressed things with precision or used other ways of expressinguncertainty. Sometimes they tied the uncertain event to some other event that wouldhappen at some unknown time. Perhaps the system could supply the uncertainty that isimplicit in phrases like about 3 s.

6.12. OBJECT ORIENTED

Some aspects of object-oriented programming were apparent in the participants' solu-tions. Entities were treated as if they have state and an ability to respond to requests foraction. However, there was no evidence in these studies of other aspects of object-oriented programming such as inheritance or polymorphism. Cypher and Smith (1995)found in user studies that inheritance hierarchies cause di$culty for children. Evenamong professional programmers, researchers have found that full-#edged object-oriented programming is not necessarily natural (DeH tienne, 1990; Glass, 1995).

6.13. MOTION AND OTHER DOMAIN SPECIFIC NEEDS

The participants expected objects to move on their own, so their behaviors were similarto real-world objects. This is in contrast to the incremental way that animation isaccomplished in many systems. This may not come up on all programming tasks andthus might not be considered a language issue. But similar issues can arise in otherdomains and the usability of the programming system can bene"t from analysis of thespeci"c needs of the particular domains in which it will be used. One way may be toprovide domain-speci"c features in the programming language.

6.14. PICTURES

In study one, the experimenter counted how many participants used pictures or dia-grams in their solutions, and found that two-thirds of them did. All of these picturesappeared early in the solutions, when setup and layout were being de"ned. Programmingsystems should accommodate this form of graphical speci"cation in addition to textualspeci"cation.

262 J. F. PANE E¹ A¸.

7. Conclusion and future work

A large part of the programming task is to take a mental plan for solving a problem andtransform it into the particular programming language being used. These studies attemptto capture these plans before they undergo the transformation into a programminglanguage. Ideally, the distance between the plans and the programming language wouldbe minimal. However, these studies identify many places where an unnecessarily largegap is imposed by the features and requirements of today's programming languages.

Programming is a task of precision, and one reason that the programming languagesmay di!er from these natural language solutions is that programming languages aremore formal and facilitate the expression of solutions with more precision. Indeed, thereis a large amount of imprecision and underspeci"cation in the participants'work and it isimportant to "nd ways to help beginners to make their speci"cations more complete. Inmany cases, however, the structure and algorithms of the natural language solutions aresatisfactory, but are in a di!erent style than is allowed in today's programming lan-guages. Future systems should support these approaches, rather than requiring thesolutions to be transformed into di!erent, less natural forms.

The new programming language that the authors are designing for children has beenin#uenced signi"cantly by these studies. For example, it will support an event-based styleof programming as well as aggregate data access through set creation and manipulation.In order for this latter feature to be e!ective, it is necessary to improve the accuracy ofquery speci"cation; these studies show many serious problems with the boolean oper-ators and, or and not. A subsequent study proposes and tests several alternatives totextual boolean expressions, with promising results (Pane & Myers, 2000).

These studies, along with the results of other human-centered research about pro-gramming, are resources that can be used to guide and evaluate programming languagedesigns. In addition to the new language for children, the Natural Programming Project isusing this approach to design several new languages in other domains where it would beuseful for non-programmers to have the capabilities of programming. These includeauthoring of multimedia presentations, programs for inclusion on worldwide web pages,and scripts for automating repetitive tasks in direct manipulation interfaces. Hopefully,this approach will lead to languages that are easier to learn and use than existinglanguages. The results reported in this article should also be useful to the designers ofother new programming languages.

The authors wish to thank the following people for their contributions to this research: WayneGray, Albert Corbett, John Chang, Carol Beavers, John Meighan, the participants and analysts,and the anonymous reviewers of an earlier draft of this article.

References

BIERMANN, A. W., BALLARD, B. W. & SIGMON, A. H. (1983). An experimental study of naturallanguage programming. International Journal of Man-Machine Studies, 18, 71}87.

BONAR, J. G. & CUNNINGHAM, R. (1988). Bridge: tutoring the programming process. In J. PSOTKA,L. D. MASSEY & S. A. MUTTER, Eds. Intelligent ¹utoring Systems: ¸essons ¸earned,pp. 409}434. Hillsdale, NJ: Lawrence Erlbaum Associates.


BONAR, J. & SOLOWAY, E. (1989). Preprogramming knowledge: a major source of misconceptionsin novice programmers. In E. SOLOWAY & J. C. SPOHRER, Eds. Studying the Novice Pro-grammer, pp. 325}353. Hillsdale, NJ: Lawrence Erlbaum Associates.

BOURNE, L. E. (1966). Human Conceptual Behaviour. Boston: Allyn & Bacon.BRUCKMAN, A. & EDWARDS, E. (1999). Should we leverage natural-language knowledge? An

analysis of user errors in a natural-language-style programming language. Proceedings of theConference on Human Factors in Computing Systems, pp. 207}214. Pittsburgh, PA: ACMPress.

CYPHER, A. & SMITH, D. C. (1995). KidSim: end user programming of simulations. Proceedings ofCHI+95 Conference on Human Factors in Computing Systems, pp. 27}34. Denver: ACM.

DED TIENNE, F. (1990). Di$culties in designing with an object-oriented programming language:an empirical study. Proceedings of INTERACT +90 Conference on Computer-Human Factors,pp. 971}976. Cambridge, England.

DU BOULAY, B. (1989). Some di$culties of learning to program. In E. SOLOWAY & J. C. SPOHRER, Eds.Studying the Novice Programmer, pp. 283}299. Hillsdale, NJ: Lawrence Erlbaum Associates.

GALOTTI, K. M. & GANONG III, W. F. (1985). What non-programmers know about programming:natural language procedure speci"cation. International Journal of Man}Machine Studies, 22,1}10.

GLASS, R. L. (1955). OO claims* naturalness, seamlessness seem doubtful. Software Practitioner.GOODMAN, D. (1987). ¹he Complete HyperCard Handbook. New York: Bantam Books.GREEN, T. R. G. & PETRE, M. (1996). Usability analysis of visual programming environments:

a &cognitive dimensions' framework. Journal of <isual ¸anguages and Computing, 7, 131}174.GRICE, H. P. (1975). Logic and conversation. In P. COLE & J. MORGAN, Eds. Syntax and Semantics

III: Speech Acts. New York: Academic Press.HIX, D. & HARTSON, H. R. (1993). Developing;ser Interfaces: Ensuring;sability ¹hrough Product

and Process. New York, New York: John Wiley & Sons, Inc.HOC, J.-M. & NGUYEN-XUAN, A. (1990). Language semantics, mental models and analogy. In

J.-M. HOC, T. R. G. GREEN, R. SAMURhAY & D. J. GILMORE, Eds. Psychology of Program-ming, pp. 139}156. London: Academic Press.

HUTCHINS, E. L., HOLLAN, J. D. & NORMAN, D. A. (1986). Direct Manipulation Interfaces.Hillsdale, NJ: Lawrence Erlbaum Associates.

LEWIS, C. & OLSON, G. M. (1987). Can principles of cognition lower he barriers to programming?In G. M. OLSON, S. SHEPPARD & E. SOLOWAY, Eds. Empirical Studies of Programmers: Second=orkshop, pp. 248}263. Norwood, NJ: Ablex.

MILLER, L. A. (1974). Programming ny non-programmers. International Journal of Man}MachineStudies, 6, 237}260.

MILLER, L. A. (1981). Natural language programming: styles, strategies, and contrasts. IBMSystems Journal, 20, 184}215.

MYERS, B. A. (1998). Natural programming: project overview and proposal. Human}ComputerInteraction Institute Technical Report CMU-HCIL-98-100. Pittsburgh, PA: Carnegie MellonUniversity.

NARDI, B. A. (1993). A Small Matter of Programming: Perspectives on End ;ser Computing.Cambridge, MA: The MIT Press.

NEWELL, A. & CARD, S. K. (1985). The prospects for psychological science in human}computerinteraction. Human}Computer Interaction, 1, 209}242.

NIELSEN, J. (1993). ;sability Engineering. Chestnut Hill, MA: AP Professional.PANE, J. F. & MYERS, B. A. (1996). ;sability issues in the design of novice programming systems.

School of Computer Science Technical Report CMU-CS-96-132. Pittsburgh, PA: CarnegieMellon University.

PANE, J. F. & MYERS, B. A. (2000). Tabular and textual methods for selecting objects from a group.Proceedings of <¸ 2000: IEEE Symposium on <isual ¸anguages. Seattle, WA (to appear).

PANE, J. F., RATANAMAHATANA, C. A. & MYERS, B. A. (2000). Analysis of the language and structurein non-programmers1 solutions to programming problems. School of Computer Science Tech-nical Report. Pittsburgh, PA: Carnegie Mellon University.

264 J. F. PANE E¹ A¸.

PEA, R. (1986). Language-independent conceptual &&bugs'' in novice programming. Journal ofEducational Computing Research, 2.

SAMMET, J. E. (1981). The early history of COBOL. In R. WEXELBLAT, Ed. History of Programming¸anguages. New York: Academic Press.

SHNEIDERMAN, B. (1983). Direct manipulation: a step beyond programming languages. IEEEComputer, 16, 57}69.

SOLOWAY, E., BONAR, J. & EHRLICH, K. (1989). Cognitive strategies and looping constructs: anempirical study. In E. SOLOWAY & J. C. SPOHRER, Ed. Studying the Novice Programmer,pp. 191}207. Hillsdale, NJ: Lawrence Erlbaum Associates.

SPOHRER, J. G. & SOLOWAY, E. (1986). Analyzing the high frequency bugs in novice programs. InE. SOLOWAY & S. IYENGAR, Eds. Empirical Studies of Programmers, pp. 230}251. Washington,DC: Ablex Publishing Corporation.

THIMBLEBY, H., COCKBURN, A. & JONES, S. (1992). HyperCard: an object-oriented disappoint-ment. In P. GRAY & R. TOOK, Eds. Building Interactive Systems: Architectures and ¹ools,pp. 35}55. New York: Springer-Verlag.

WASON, P. C. (1959). The processing of positive and negative information. Quarterly Journal ofExperimental Psychology, 11.

Documents

Studying the language and structure in non-programmers ...web.engr.oregonstate.edu/~burnett/CS589and584/CS589-papers/Nat… · these "ndings with the requirements imposed by popular