Download doc - Form and Function in Language: Functional, cognitive and ... · Web viewHalliday, Michael. 1994. An Introduction to Functional Grammar (2nd Edition). London: Arnold Hancock, Craig

[to be published in Theory and Practice in Functional-Cognitive Space, edited by María de los Ángeles Gómez González, Francisco José Ruiz de Mendoza Ibáñez and Francisco Gonzálvez-García (John Benjamins, 2014+)]

Cognitive functionalism in language education

Richard Hudson

University College London, United Kingdom

Abstract

Functional pressures on language are always cognitive, and cognitive pressures are always functional, so cognitivism and functionalism combine to explain the structure of lexicogrammar - the continuum of lexicon and grammar - and also the statistics of language usage. As an example, the paper shows how Word Grammar explains the difficulty of centre-embedding in terms of dependency syntax combined with a general cognitive principle of binding, and also the benefits of non-canonical word orders (such as extraposition) in the lexicogrammar. These reordering options are part of the formal academic language that children learn through education, and education should be guided by linguistic research. This is a research area that calls for far more effort and collaboration with other disciplines.

Keywords

Word Grammar, word order, education, syntax, children

1. Cognitive functionalism

The terms cognitive and functional are often combined, as in ‘functional-cognitive

space’ (Gonzálvez-García and Butler 2006), ‘usage-based functionalist-cognitive

models’ (Butler 2006) or ‘cognitive-functional linguistics’ (espoused by a number of

university departments). This is a healthy development, but it is important to remember

that each term names a distinct set of assumptions. In linguistics, cognitivism applies

the insights of cognitive science, including cognitive psychology, to the study of

language, on the assumption that language is subject to the same constraints and

principles as other areas of cognition. Functionalism, on the other hand, seeks functional

1

explanations for language in terms of general assumptions such as the principle of

contrast (minimize ambiguity). Cognitivism need not seek functional explanations, and

functionalism need not seek cognitive underpinnings. Nevertheless, it makes perfect

sense to combine them because (as I shall argue below) functional pressures on

language are always cognitive pressures, and the effects of cognition on language are

always functional. This dual perspective is one of the attractions for me of Chris

Butler’s work, along with his unflagging determination to listen, learn and understand

his colleagues.

Functional pressures must always be cognitive for three reasons: it is only through

cognition that they apply to language, it is only because language is an example of

cognition that they apply at all, and they cover the full range of cognitive processes as

applied to language. To show the significance of these three claims, imagine a

functional analysis which is completely divorced from cognition, such as a branch of the

mathematical theory of communication. This would analyse the elements of any

communication, such as a message, a medium, a sender, a receiver and a code, and the

properties that any code would have to have in order to allow efficient communication.

There would be nothing in the analysis about the code’s users, its history or its social

significance. The only questions would involve efficient communication: how to

measure it, and how to design a code so as to maximize it.

In contrast, as soon as we bring cognition into the discussion the questions

multiply. How easy is the code to learn? How does it change diachronically? What is its

social significance as an important badge of group membership? How does it balance

the needs of the speaker (e.g. for brevity) against those of the hearer (e.g. for

explicitness)? Butler puts the complexities well in the following passage (Butler

2006:1):

2

“If we are to study language as communication, then we will need to take into

account the properties both of human communicators and of the situations in

which linguistic communication occurs. Indeed, a further important claim of

functionalism is that language systems are not self-contained with respect to such

factors, and therefore autonomous from them, but rather are shaped by them and

so cannot be properly explained except by reference to them. Linguists who make

this claim ... undoubtedly form the largest and most influential group of functional

theorists. The main language-external motivating factors are of two kinds: the

biological endowment of human beings, including cognition and the functioning

of language processing mechanisms, and the sociocultural contexts in which

communication is deeply embedded. We might also expect that a functionalist

approach would pay serious attention to the interaction between these factors and

the ways in which languages change over time, although in practice this varies

considerably from one model to another.

The question of motivation for linguistic systems is, of course, not a simple one.

Much of the formalist criticism of functionalist positions has assumed a rather

naïve view of functional motivation, in which some linguistic phenomenon is

explicable in terms of a single factor. Functionalists, however, have never seen

things this way, but rather accept that there may be competing motivations,

pulling in different directions and often leading to compromise solutions.”

This complex and sophisticated view of the pressures that shape languages has

been expressed recently as ‘stable engineering solutions satisfying multiple design

constraints, reflecting both cultural-historical factors and the constraints of human

cognition.’ (Evans and Levinson 2009:1). For Levinson and Evans, the most significant

property of language is the enormous diversity, which they hope to explain in relation to

3

the multiple (and competing) design constraints. My only disagreement – a minor

quibble about terminology - concerns their contrast between ‘cultural-historical’ and

‘the constraints of human cognition’: cultural-historical facts are themselves ultimately

facts about human cognition. If the English word for ‘cat’ is CAT, this is only true

because English speakers know it, act upon it and transmit it to the next generation. This

is a very different kind of cognitive fact from the fact that working memory is limited,

but cognitive it is nevertheless. I should therefore like to reword the quotation: ‘stable

engineering solutions satisfying multiple cognitive design constraints, reflecting both

variable cultural-historical knowledge and the permanent and universal constraints of

human cognition.’ Similarly, Butler’s ‘sociocultural contexts’ are only relevant to the

extent that they are part of speakers’ cognition.

If it is true that functional pressures are always cognitive, it is equally true that

cognitive pressures are always functional, in the sense that they push language towards

a better solution for one of the many competing design constraints. This claim is hard to

test in the absence of a closed list of design constraints, so we might treat it as a premise

to guide us in the search for design constraints: whenever we find a fact about language

which seems to relate to cognition, we must find a design constraint to mediate between

language and cognition. To take an elementary example, why does English rank the

speaker above the addressee in the pronoun system, so that the presence of the speaker

in a group forces the choice of we regardless of who else is in it? Even more

interestingly, why do so many other languages do the same? True, some languages

distinguish inclusive and exclusive pronouns for ‘we’, but (so far as I know) no

language has a word for ‘you’ which may or may not include the speaker. Presumably

the explanation lies in cognition, but it must include a design constraint such as the

4

paramount importance of talking about oneself – a sad comment on human nature,

perhaps, but apparently true.

If language is subject to functional pressures, what effects do these pressures

have? If their effects are always cognitive, as I am suggesting, they must affect our

minds first and foremost, and it is only via our minds that they affect our behaviour; so

if I choose the word we rather than you to refer to a group including my addressee as

well as myself, this is because my mind contains a ‘lexicogrammar’ which assigns each

of these words a meaning which dictates this choice. (The term lexicogrammar is a very

useful term from Systemic Functional Grammar for the continuum of lexicon and

grammar which has more recently been rediscovered by cognitive linguists – Butler and

Taverniers 2008). The pressure shapes the lexicogrammar, which in turn affects our

behaviour. But is it only via the lexicogrammar that functional pressures can affect our

behaviour? The answer depends on how we define ‘lexicogrammar’, but there are some

functional pressures whose effects clearly fall outside any familiar definition.

For example, if you and I are talking, we are more likely to understand each other

if only one of us is talking at a time, for the simple reason that listening and talking

compete for the same mental resources of attention. As with any pressure, this comes

with a cost – a competing pressure that has to be balanced against it. If you are talking,

and I have something to say, not only do I have to wait, but I also may have to take my

place in a queue along with others who also have something to say. Consequently

different communities develop different behavioural norms, ranging from complete

anarchy to the rigid rules of committee meetings; and these norms affect our speaking

behaviour in a striking way (Hudson 1996:133). But they cannot be part of the language

system if this simply controls the ways in which words are combined, pronounced and

interpreted. On the other hand, the rules for speaking or staying silent are equally clearly

5

related to the language system, because they govern its use – when to use language and

when not.

Some functional pressures clearly do affect the content of the language system,

and others clearly don’t. But in between these two extremes, we find ‘weak’ pressures,

where some kind of language behaviour is not actually dictated by the system, but is

nevertheless typical throughout the community. An example that comes to mind is the

use of directional expressions in English. If my wife is downstairs and asks me to join

her, I believe I would say I’ll come down in a minute rather than simply I’ll come in a

minute, even though the down is completely optional, and, in the situation concerned,

completely uninformative. And I believe the same is true of any English speaker

describing almost any movement or position which could be related to the deictic ‘here’.

So in all the following examples, the bracketed expression is grammatically optional

and situationally predictable, but nevertheless expected:

(1) I went (over) to Ben’s place the other day.

(2) It’s (up) in the spare bedroom.

(3) I’m driving (down) to Cardiff tomorrow.

I have no research evidence to support this claim, but my hunch is that the

bracketed words are much more likely to be uttered than omitted. What is supported by

research is the idea that our learning of language is ‘usage-based’ (Barlow and Kemmer

2000, Bybee 2010, Hudson 2007b, Tomasello 2003), which means that we maintain a

mental record of the statistical patterns in other people’s behaviour; so a statistical

tendency in other people’s behaviour may become part of my own behaviour (with the

obvious feed-back effects on other speakers).

But why should English speakers show this particular pattern? It might be just an

arbitrary pattern which we reinforce in each other, like the pronunciation patterns which

6

are so well documented in quantitative sociolinguistics(Hudson 1996: chapter 5). But

much more likely is that we have created our own local ‘functional pressure’ to specify

deictic locations and directions, regardless of the hearer’s needs. If so, this would be an

example of a functional pressure being created by collective linguistic behaviour, and

then being learned and applied by every novice speaker. It would be reflected in the

lexicogrammar by the particles which are tailor-made for this precise purpose, but their

use is not governed by categorial rules. How, then, do we decide whether or not to use

them?

This question is very similar to the one that arises in quantitative dialectology. For

example, given that we all have a choice between a velar and an alveolar nasal in the

suffix ing (as in walking or walkin’), how do we choose between them? Labov and his

colleagues and followers have shown very clearly that each speaker’s choices reflect

rather precisely the choice-patterns of the speakers who have served as their models, but

there is no agreed cognitive model for the mechanism of choosing. What I have

suggested elsewhere is that a model should take the form of a cognitive network with

dynamic activation levels which trigger choices (Hudson 2007a). Once a model is in

place, it could be extended to non-categorial functional pressures such as the one

discussed above. This is a major research challenge because it isn’t at all obvious how

to build the network needed, but the project would certainly reveal a lot about the

cognitive architecture behind human language.

The general challenge that linguistic theory faces is to relate functions to

structures: how to build a model of language structure which takes account of functional

pressures. The current proliferation of theories, including theories whose names contain

the word functional, testifies to the difficulty of this project. One basic question is

whether the functions might be so closely integrated into the system that they become

7

part of it. Some theories do merge functions and structures in this way, but in my

opinion it is a mistake; I shall consider two very different theories: Optimality Theory

and Systemic Functional Grammar.

Optimality Theory is the extreme case because each functional pressure is

represented directly as either a faithfulness constraint or a markedness constraint within

the system (Newmeyer 2010); for instance, the process that inserts an epenthetic vowel

in horses is triggered by the difficulty of pronouncing two adjacent sibillants. The

trouble with building pressures into the system in this way is that it turns the pressures

into concepts, so they only apply to the extent that speakers have the relevant concepts;

but the fact is that adjacent sibillants (for instance) are hard to pronounce whether or not

we ‘know’ this conceptually.

Systemic Functional Grammar keeps the functional pressures outside the system,

but analyses the structure so that it reflects the functions closely. Both the paradigmatic

system-networks and the syntagmatic structures of syntax are organised into a small

number of ‘metafunctions’ – ideational, interpersonal and textual – each of which is

responsible for a different set of functional pressures. This means that a clause has three

different syntactic structures: an ideational structure for the basic referential meaning,

an interpersonal structure showing how the speaker and addressee relate to this

meaning, and a textual structure showing how it relates to what has been said already

(Butler 1985, Halliday 1994). My objection in this case is that the analysis

misrepresents the relation between functions and structures by concealing the tensions

and conflicts. In my opinion, it would be much nearer to the truth to say that we try to

use a single structure to perform a number of very different jobs at the same time, so

there is no sense in which a single clause can dedicate one entire structure to each job.

For example, the clause Does she love me? uses she love me to describe a situation

8

(ideational), uses me and does she to relate it to the speaker and the hearer

(interpersonal), and she to relate it to the previous discourse; but these words are all

closely integrated in a single structure, where the redundant does is the price we pay for

this particular ‘engineering solution’ to the problem of satisfying these conflicting

pressures.

But even if some attempts to relate structures to functions have been unsuccessful,

we can all celebrate the twentieth century’s strong movement towards functionalism.

Whatever we may think of specific theories, they are all trying to go beyond the mere

analysis and description of language structures by looking for explanations. More

recently, we have a separate movement towards cognitive analyses of language

structures which explain how these structures relate to the rest of cognition. If we can

marry the two strands, functional and conceptual, into a single cognitive-functional

linguistics, then we have some hope of really understanding how language works.

2. Syntactic structure: Word order and dependency geometry

One area of language structure which has generated some particularly promising

functional explanations is word order. Why are some basic orders so much more

common than others? And why do languages provide so many alternative orders?

Cognitive explanations have always been prominent in the sense that terms such as

‘given’ and ‘new’ have been used to capture some kind of mental reality, but it is only

recently that these analyses have been able to build on work in cognitive science. One

especially promising link relates word order to limitations on working memory; perhaps

the best know exponent of this link is Hawkins, who argues that basic word orders

evolve so as to minimize demands on working memory (Hawkins 1994, Hawkins 1999,

Hawkins 2001). I find his evidence and arguments compelling, and agree with his

general conclusions.

9

However, any discussion of the effects of functional pressures on syntactic

structure presupposes some general theory of syntactic structure, and I believe

Hawkins’s case would be even stronger under a different set of assumptions. For him,

syntactic structure is phrase structure, so words are related to each other only via shared

‘mother’ nodes; so even if two words are adjacent, there is no direct syntactic relation

between them. This analysis is not a helpful basis for explaining why syntax favours

adjacency; nor is it promising as a basis for a cognitive theory of syntax because it

raises the obvious question: why can’t we link words directly to one another, using the

same mental apparatus that we use in relating events or objects in other areas of life?

For example, if we can represent the members of our family as individuals with direct

relations to other individuals, why can’t we do the same with the words in a sentence?

A much better basis for syntax, in my opinion, is dependency structure, in which

the relations between individual words are paramount. Like phrase structure,

dependency structure has many different interpretations in different theoretical

packages, so I shall select the package that I prefer, which (unsurprisingly) is the one I

created: Word Grammar (Hudson 2007b, Hudson 2010, Butler 2013). Figure 1 shows

the syntactic structure for a very simple example, Cows eat grass, in a typical, but

simplified, phrase-structure representation compared with a Word-Grammar

dependency structure. In these diagrams we are only concerned with the basic geometry

of the diagram, so labels are unnecessary; but in a complete analysis the labels (or more

accurately, the classification that they imply) are essential. The main point of the

diagram for present purposes is that the phrase-structure analysis puts two links between

eat and grass and three between eat and cows, whereas the dependency analysis has a

single link in both cases.

10

Figure 1: Phrase structure and dependency structure

Seen from the perspective of cognitive science these two structures allow very

different predictions. Most pertinently, the first structure predicts that the order of verb

and object is irrelevant to processing difficulty, because the geometry would be exactly

the same for Cows grass eat as for Cows eat grass. In contrast, the second structure

predicts the opposite, as can be seen in Figure 2. For the dependency analysis, Cows

grass eat ought to be harder to process because working memory has to hold the cows –

eat dependency for longer than in the first figure.

Figure 2: A contrast between phrase and dependency structures

In such simple examples, and for adult speakers, the differences are trivial; but

child-language research has shown that such differences do matter for novices with

small working memories. For example, small children use adjective-noun combinations

more frequently before the verb (e.g. big book fall) than after it (e.g. see big book). This

is easy to explain in terms of functional pressures from working memory, because the

separating word adds its mental demands to the existing dependency, so the processing

demands of Cows grass eat are less evenly distributed than those of Cows eat grass. In

11

Cows eat grass. Cows eat grass.

Cows grass eat. Cows grass eat.

contrast, determiner-noun combinations show the reverse pattern, being more common

after the verb (e.g. see that book) than before it (e.g. that book fall) (Ninio 1994). Once

again, this pattern is easy to explain if nouns depend on determiners (as they do in Word

Grammar - Hudson 1990:268-76). Figure 3 summarises the patterns, showing how

frequency follows predicted difficulty due to dependency patterns.

Figure 3: Dependency density in child language

One of the attractions of dependency analysis is the possibility of measuring the

relative processing difficulty of different structures. Various measures are available.

One, which I have called ‘dependency distance’, consists of a simple count of the

number of words that separate a word from the word on which it depends Hudson

2007b:124-9) and is very similar to the distance metric developed by Ted Gibson for

experimental work which clearly confirms the importance of dependency distance

(Gibson 2002) in adult language. However I now believe that a more appropriate

measure for some patterns would be ‘dependency density’, the number of dependencies

being held in memory at any given moment. This measure is most easily illustrated with

so-called ‘centre-embedded’ structures.

These sentences are so hard to process syntactically that ordinary adult

experimental subjects simply give up on syntax. Consider, for example, sentence (4).

(4) The patient who the nurse who the clinic had hired met Jack.

12

Big book fall. See big book. fall.

That book fall.See that book. fall.

>

>

When presented with a list of sentences to be judged as either grammatical or

ungrammatical, many people accept this sentence (and others like it), although it

actually doesn’t make sense either syntactically or semantically. The problem can be

seen in the simplified Word-Grammar analysis in Figure 4, which shows how the first

who introduces a relative clause which should have the nurse as its subject, but which

isn’t there.

Figure 4: An incomplete centre-embedded sentence

The main point of this example is to show how easily our working memory can

run out of resources, and how well this can be predicted by the extreme dependency

density at the point indicated by the dotted line. To see this, imagine yourself reading

this sentence a word at a time, and in slow motion. Figure 5 shows the state of play in

your mind just after reading the word clinic. (The dependency between the and clinic

can be ignored because it is so easily and quickly completed.)

13

The patient who the nurse who the clinic had hired met Jack.

????

Figure 5: Part of the way through an incomplete sentence

At this point of time, you have to hold five incomplete dependencies in your

mind, each looking for a word (labelled in the diagram) which you haven’t yet read:

The top dependency is looking for verb a, a non-dependent finite verb, for the

patient to depend on.

The next dependency down was set up by the first who, and is looking for a

dependent finite verb b.

Word c is needed for the nurse to depend on.

Similarly, words d and e are needed by the second who and the clinic.

The problem for you, as the reader, is that you’re looking for five finite verbs,

each of which is represented in your working memory simply as ‘some finite verb’.

Why this should be a problem isn’t completely clear, but the following explanation

strikes me as plausible.

One of the most important activities in your mental life is to recognise that two

concepts which are represented separately are in fact the same – that the person on the

phone is your friend, or that the next street on the right is the one you’re looking for. To

achieve this, you merge concepts (by binding them to one another) when they are the

same, so whenever you have two concepts with similar specifications (e.g. ‘finite verb’)

14

The patient who the nurse who the clinic

a

b

c

d

e

and similar activity level at the same time, you tend to merge them unless you have

reasons for keeping them separate (Hudson 2010:91-102) – that is, merging is the

default which can only be prevented by extra mental effort. In the case of this sentence,

you can safely merge verbs b and c as bc because it’s almost certain that the nurse will

turn out to be the subject of the verb expected by who: and similarly for d and e. The

trouble is that by this time your working memory is having to hold a lot of information

(five remembered words plus five dependencies plus from three to five anticipated

words) and hasn’t got the resources needed to keep these very similar nodes separate, so

it simply merges bc with de into a single dependent finite verb bcde, and once had hired

appears, the finite verb had is accepted as the merged bcde, even though this means that

it has to double as the expected complement of two who’s at the same time.

Of course, the sentence fragment in Figure 5 could have been completed

grammatically, as in Figure 6. This is grammatical because each of the two who’s has a

separate verb to head its relative clause; but it is virtually impossible to process.

Figure 6: A complete centre-embedded sentence

This example explains the difficulty of the famous ‘centre-embedding’ or ‘self-

embedding’ pattern, but of course such sentences are vanishingly rare in actual

performance because we all know how hard they are to process, whether as speaker or

as hearer. Fortunately, English offers an alternative way to express the same ideas. If we

15

The patient who the nurse who the clinic hired treated died.

want to attach a relative clause to a noun, we have the option of ‘extraposing’ it by

pretending that it actually depends on the next verb up. For example, in (5), the relative

clause that I bought last week is attached directly to goldfish, but in (6), which means

the same, it is extraposed so that it takes its position as a dependent of died.

(5) The goldfish that I bought last week has died.

(6) The goldfish has died that I bought last week.

This extraposition can be thought of as a mental operation that converts the basic

default structure (5) into one that is easier to process; but it is different from a classic

Chomskyan transformation because it can short-circuit the planning process so that it

applies while the planned words are still only partly specified. The point is that if we see

a complicated structure developing in our minds, we have ways to avoid it such as the

use of extraposition – an ‘engineering solution’ to the problem of syntactic complexity.

As can be seen from the partial structure in Figure 7, the extraposed version

delays the dependency between goldfish and that so that it doesn’t have to be processed

at the same time as the subject link from has to the goldfish.

Figure 7: A simple example of extraposition

16

The goldfish that I bought last week has died.

The goldfish has died that I bought last week.

Applying extraposition to the unprocessable (7) produces (8), which is easy to

understand.

(7) The patient who the nurse who the clinic hired treated died.

(8) The patient died who the nurse treated who the clinic hired.

Moreover, extraposition reveals the ungrammaticality of the first version of the

sentence, (4), repeated below as (9).

(9) *The patient who the nurse who the clinic had hired met Jack.

(10) *The patient met Jack who the nurse who the clinic had hired.

Perhaps the most interesting characteristic of extraposed sentences is that,

although they are much easier to process than their unextraposed equivalents, they are

structurally more complex because of the extra dependency between the extraposed

word and the higher verb (in Figure 7, the dependency between has and that). This extra

dependency coexists with all the dependencies found in the unextraposed sentence

(which, for simplicity, I omitted from Figure 7). The general conclusion is that ease of

processing is not a general matter of ‘complexity’, but of the distribution of processing

load: evenly distributed processing load is easy, but a high concentration of load in one

area is much harder.

Extraposition of a relative clause is not the only way to redistribute processing

load. In fact, English has a rich supply of grammatical solutions, each geared to a

different set of problematic sentences. The list below hints at the main ways in which

we can tweak a sentence’s syntax to suit our particular communicative purposes and

processing needs. The little formulae are meant as a guide to the particular examples

rather than as a correct generalisation of the process concerned.

(11) It-extraposition:

From: That you were able to help her so easily | is good. [1 | 2]

17

To: It | is good | that you were able to help her so easily. [it | 2 | 1]

(12) Heavy-NP shift:

From: Put | all the food that we’re going to need for the party and that we

can’t freeze | on this shelf. [1 | 2 | 3]

To: Put | on this shelf | all the food that we’re going to need for the party and

that we can’t freeze. [1 | 3 | 2]

(13) Dative shift:

From: Let’s give | something to remind her of all the good times she had

with us | to Mary. [1 | 2 | to 3]

To: Let’s give | Mary | something to remind her of all the good times she

had with us. [1 | 3 | 2]

(14) Subject delay:

From: A wonderful old oak tree with a tree-house in its branches | stands | in

the corner. [1 | 2 | 3]

To: In the corner | stands | a wonderful old oak tree with a tree-house in its

branches. [3 | 2 | 1]

(15) There-insertion:

From: A dog | is | in the garden. [1 | 2 | 3]

To: There | is | a dog | in the garden. [there | 2 | 1 | 3]

(16) Front-shifting:

From: I bumped into someone I met at a party given by our neighbours | last

night. [1 | 2]

To: Last night | I bumped into someone I met at a party given by our

neighbours. [2 | 1]

(17) It-clefting:

18


night. [1 | 2]

To: It was last night that I bumped into someone I met at a party given by

our neighbours. [it was 2 | that 1]

(18) Wh-clefting:


night. [1 | 2]

To: Last night was when I bumped into someone I met at a party given by

our neighbours. [2 | was when | 1]

(19) Passivization:

From: All the books that I’ve read by him | have | impressed | me. [1 | 2 | 3 |

4]

To: I | have | been impressed | by all the books that I’ve read by him. [4 | 2 |

been 3 | by 1]

Each of these patterns is firmly embedded in the grammar of English, with its own

rules and its own effects; and in each case it makes good sense to see it as an

‘engineering solution’ to some kind of functional demand on the speaker or hearer – in

other words, as an important tool that any mature user of English can apply effectively.

Which brings us to the language education which is needed in order to turn us all into

‘mature users’.

3. Language education

Education is, by definition, an interference with a child’s ‘natural’ development, an

attempt to direct that development in particular ways chosen by the adult world. For

some people, the notion of ‘language education’ is a contradiction because language

develops naturally under its own logic, so all it needs is raw data to trigger the built-in

19

grammatical system and to provide a vocabulary. In this view, second-language

teaching should just ‘expose’ children to comprehensible input (Krashen 1982), and

much the same philosophy dominated first-language English teaching for some decades

(Kolln and Hancock 2005). However, there is now a well-articulated and influential

body of opinion which sees language development in a very different way, with

education playing a major role (Hudson 2004).

The cognitive functionalism with which we started implies that each language

evolves to support the tasks that its users have to perform, so we expect, and find, as

much diversity among languages as among language users. This amounts to a rejection

of the romantic notion of ‘natural language’, which is language ‘as nature intended’,

unspoilt by human institutions such as schools (Chomsky 1987, Chomsky 2011, Olson

and others 1991). There is very little ‘natural’ about the language that you and I know,

and that allows me to write these words, and you to read them. You and I both spent

years of our childhood not only learning the skills of reading and writing, but also

‘academic language’ – the language of school, of universities and of a great deal of

adult life. This academic language has been shaped over the centuries by the need to

talk about mathematics, geography and literature, and by the need to argue, hypothesise,

reason and explain. It has also been standardized, but this is a relatively minor element

in its history compared with the enormous developments triggered by complex

communicative demands. The fact is that two generations of theoretical linguists have

used modern English as an example of a ‘natural language’ without worrying about the

many ways in which we interfere with our language, or even noticing them. If diversity

and cultural adaptation are normal for languages, then a ‘natural language’ is simply one

that is (more or less) adapted to its culture; and from this point of view, modern English,

20

with all its richness and complexity, is just as natural as a very simple language such as

Pirahã, which has evolved to fit a very simple culture (Everett 2008).

Let’s assume, therefore, that a complex society such as ours needs a complex

language, and that complex language requires education so that children can move

beyond childish and casual language development. What kinds of language do children

need to be taught in school? Part of the answer is obvious and uncontroversial: they

need to be taught ‘relevant’ language which they won’t learn outside school. What

makes language experience relevant is, of course, a social and even a political decision,

according to what ‘society’ deems necessary for adult functioning. Our society

generally agrees that school leavers should be able to cope with more formal and

academic styles, both in spoken and written modes, and though these notions are

inherently vague, there is enough agreement for examination boards to design public

tests of competence in these areas.

But what does this mean, in concrete terms, for first-language teaching? What

does ‘formal academic language’ contain that children won’t learn anyway from

ordinary linguistic interaction outside the school? This is a research question for

linguistics, but the research that it defines is remarkable for its paucity. There has been a

great deal of work by psychologists on the global statistics of vocabulary growth; for

example, one estimate (Bloom and Markson 1998) suggests that, starting at 30 months,

we typically learned 3.6 new words per day in our pre-school years, rising to 6.6 words

up to 8 years (the age when we typically become independent readers), then rising to

12.1 words. This particular estimate stops at age 10, but another research report (Nagy

and Herman 1987) estimates that the typical school-leaver (year 11) knows about

40,000 words, which implies a rate of about 3,000 words per year, or just under 10 per

21

day – roughly the same figure as for primary children. However, very few linguists

seem to have done research in this area (Hatch and Brown 1995).

Even more striking is the lack of research on grammatical development during the

school years – what develops, and how it can be encouraged. The most important source

of information on what develops is still the work on syntax done in the 1980s by

Katharine Perera (Perera 1984, Perera 1990, Perera 1994), with a few rather minor

recent additions such as my own (Hudson 2009). One of the conclusions that emerges

very clearly from this research is that children’s grammatical repertoire – the range of

constructions that they know with sufficient confidence to actually use in their own

writing – is still growing right through the school years. For example, they are learning

new conjunctions and prepositions (such as although, unless and in spite of), new ways

of using non-finite verbs (such as after when or on their own as adverbial clauses), and

new details that are on the borderline between grammar and vocabulary (such as the

prepositions selected by particular words, e.g. tired of but bored with). As to how it can

be encouraged, we now have solid research evidence that sensibly planned grammatical

instruction can have a considerable effect on children’s writing and reading skills

(Hancock 2009, Myhill and others 2010, Myhill 2011, Chipere 2003), so the way

forward is clear: English teachers can help children to develop grammatically by

judicious use of direct instruction.

I should like to finish by returning to the list of grammatical tools that English

provides for communicating complex ideas. These tools illustrate the potential for direct

instruction in grammar. The linguistic demands of adult life go well beyond mere

details of style such as formal and informal vocabulary (as in contrasts such as TRY

versus ATTEMPT). None of these details will help them to deal with complex

communication – reading other people’s attempts to put complex ideas into words, or

22

writing their own attempts. The fact is that adult life often depends on this ability, and

there are significant benefits not only for those who succeed, but also for those who are

trying to communicate with them. Unlike Pirahã culture, complex messages are part of

our culture. Our language has adapted over the centuries to these functional demands,

and now contains a large number of tools for effective communication, namely

extraposition and all the other structures listed in (11) to (19). School leavers could, and

arguably should, be consciously aware of these tools – that they exist, how they affect

syntax, how they can help, and maybe even their technical names. Grammarians know

and understand all the linguistic details, psychologists know how the tools help with

processing, cognitive-functional theorists know how to integrate the linguistic details

with functional demands and culture, and educationalists know how to teach such

things. The only missing element is collaboration.

References

Barlow, Michael and Kemmer, Suzanne. 2000. Usage based models of language.

Stanford: CSLI

Bloom, Paul and Markson, Lori. 1998. “Capacities underlying word learning”. Trends

in Cognitive Sciences 2: 67-73.

Butler, Christopher. 1985. Systemic Linguistics: Theory and applications. London:

Batsford

Butler, Christopher. 2006. “Functionalist Theories of Language”, in Encyclopedia of

Language & Linguistics, Keith Brown (ed.) (eds), 696-704. Oxford: Elsevier.

Butler, Christopher. 2013. “Word grammar”, in Theories and Methods in Linguistics

(Wörterbücher der Sprach- und Kommunikationswissenschaft), Johannes Kabatek

& Bernd Kortmann (eds.) (eds), Berlin: Mouton de Gruyter.

23

Butler, Christopher and Taverniers, Miriam. 2008. “Layering in structural-functional

grammars”. Linguistics 46: 689-956.

Bybee, Joan. 2010. Language, Usage and Cognition. Cambridge: Cambridge University

Press

Chipere, Ngoni. 2003. Understanding Complex Sentences: Native Speaker Variation in

Syntactic Competence. London: Palgrave Macmillan

Chomsky, Noam. 1987. “Chomsky on grammar teaching. Noam Chomsky interviewed

by Lillian R. Putnam”. Reading Instruction Journal 1987.

Chomsky, Noam. 2011. “Language and Other Cognitive Systems. What Is Special

About Language?”. Language Learning and Development 7: 263-278.

Evans, Nicholas and Levinson, Stephen. 2009. “The Myth of Language Universals:

Language diversity and its importance for cognitive science”. Behavioral and

Brain Sciences 32: 429-492.

Everett, Daniel. 2008. Don't Sleep, There Are Snakes: Life and Language in the

Amazonian Jungle. Pantheon.

Gibson, Edward. 2002. “The influence of referential processing on sentence

complexity”. Cognition 85 : 79-112.

Gonzálvez-García, Francisco and Butler, Christopher. 2007. “Mapping functional-

cognitive space”. Annual Review of Cognitive Linguistics 4: 39-96.

Halliday, Michael. 1994. An Introduction to Functional Grammar (2nd Edition).

London: Arnold

Hancock, Craig. 2009. “How linguistics can inform the teaching of writing”, in The

Sage Handbook of Writing, Roger Beard, Debra Myhill, Jeni Riley, & Martin

Nystrand (eds.) (eds), 194-207. London etc: Sage.

24

Hatch, Evelyn M. and Brown, Cheryl. 1995. Vocabulary, Semantics, and Language

Education. Cambridge: Cambridge University Press

Hawkins, John. 1994. A Performance Theory of Order and Constituency. Cambridge:

Cambridge University Press.

Hawkins, John. 1999. “Processing complexity and filler-gap dependencies across

grammars”. Language 75: 244-285.

Hawkins, John. 2001. “Why are categories adjacent?”. Journal of Linguistics 37: 1-34.

Hudson, Richard. 1990. English Word Grammar. Oxford: Blackwell.

Hudson, Richard. 1996. Sociolinguistics (Second edition). Cambridge: Cambridge

University Press.

Hudson, Richard. 2004. “Why education needs linguistics (and vice versa)”. Journal of

Linguistics 40: 105-130.

Hudson, Richard. 2007a. “English dialect syntax in Word Grammar”. English Language

and Linguistics 11: 383-405.

Hudson, Richard. 2007b. Language Networks: The New Word Grammar. Oxford:

Oxford University Press.

Hudson, Richard. 2009. “Measuring maturity”, in SAGE Handbook of Writing

Development, Roger Beard, Debra Myhill, Martin Nystrand, & Jeni Riley (eds.)

(eds), 349-362. London: Sage.

Hudson, Richard. 2010. An Introduction to Word Grammar. Cambridge: Cambridge

University Press.

Kolln, Martha and Hancock, Craig. 2005. “The story of English grammar in United

States schools”. English Teaching: Practice and Critique 4: 11-31.

Krashen, Stephen. 1982. Principles and Practice in Second Language Acquisition.

Michigan: Pergamon.

25

Myhill, Debra. 2011. “Grammar for designers: how grammar supports the development

of writing”, in Applied Linguistics and Primary School Teaching, Sue Ellis &

Elspeth McCartney (eds.) (eds), 81-92. Cambridge: Cambridge University Press.

Myhill, Debra, Lines, Helen, and Watson, Annabel. 2010. “Making meaning with

grammar: A repertoire of possibilities”. METAphor 2: 1-10.

Nagy, William and Herman, Patricia. 1987. “Breadth and depth of vocabulary

knowledge: Implications for acquisition and instruction”, in The nature of

vocabulary acquisition, Margaret McKeown & Mary Curtis (eds.), 19-35.

Hillsdale NJ: Lawrence Erlbaum.

Newmeyer, Frederick. 2010. “History and Philosophy of Linguistics: an interview with

Frederick J. Newmeyer”. ReVel 8.

Ninio, Anat. 1994. “Predicting the order of acquisition of three-word constructions by

the complexity of their dependency structure”. First Language 14: 119-152.

Olson, Gary, Faigley, Lester, and Chomsky, Noam. 1991. “Language, politics and

composition: a conversation with Noam Chomsky”. Journal of Advanced

Composition 11: 1-35.

Perera, Katharine. 1984. Children's Writing and Reading. Analysing Classroom

Language. Oxford: B. Blackwell in association with A. Deutsch.

Perera, Katharine. 1990. “Grammatical differentiation between speech and writing in

children aged 8 to 12.” in Knowledge About Language and the Curriculum,

Ronald Carter (ed.), 216-233. London: Hodder and Stoughton.

Perera, Katharine. 1994. “Child Language Research: Building on the Past, Looking to

the Future”. Journal of Child Language 21: 1-7.

Tomasello, Michael. 2003. Constructing a Language: A Usage-based Theory of

Language Acquisition. Harvard University Press.

26

27