[to be published in Theory and Practice in Functional-Cognitive Space, edited by María de los Ángeles Gómez González, Francisco José Ruiz de Mendoza Ibáñez and Francisco Gonzálvez-García (John Benjamins, 2014+)]
Cognitive functionalism in language education
Richard Hudson
University College London, United Kingdom
Abstract
Functional pressures on language are always cognitive, and cognitive pressures are always functional, so cognitivism and functionalism combine to explain the structure of lexicogrammar - the continuum of lexicon and grammar - and also the statistics of language usage. As an example, the paper shows how Word Grammar explains the difficulty of centre-embedding in terms of dependency syntax combined with a general cognitive principle of binding, and also the benefits of non-canonical word orders (such as extraposition) in the lexicogrammar. These reordering options are part of the formal academic language that children learn through education, and education should be guided by linguistic research. This is a research area that calls for far more effort and collaboration with other disciplines.
Keywords
Word Grammar, word order, education, syntax, children
1. Cognitive functionalism
The terms cognitive and functional are often combined, as in ‘functional-cognitive
space’ (Gonzálvez-García and Butler 2006), ‘usage-based functionalist-cognitive
models’ (Butler 2006) or ‘cognitive-functional linguistics’ (espoused by a number of
university departments). This is a healthy development, but it is important to remember
that each term names a distinct set of assumptions. In linguistics, cognitivism applies
the insights of cognitive science, including cognitive psychology, to the study of
language, on the assumption that language is subject to the same constraints and
principles as other areas of cognition. Functionalism, on the other hand, seeks functional
1
explanations for language in terms of general assumptions such as the principle of
contrast (minimize ambiguity). Cognitivism need not seek functional explanations, and
functionalism need not seek cognitive underpinnings. Nevertheless, it makes perfect
sense to combine them because (as I shall argue below) functional pressures on
language are always cognitive pressures, and the effects of cognition on language are
always functional. This dual perspective is one of the attractions for me of Chris
Butler’s work, along with his unflagging determination to listen, learn and understand
his colleagues.
Functional pressures must always be cognitive for three reasons: it is only through
cognition that they apply to language, it is only because language is an example of
cognition that they apply at all, and they cover the full range of cognitive processes as
applied to language. To show the significance of these three claims, imagine a
functional analysis which is completely divorced from cognition, such as a branch of the
mathematical theory of communication. This would analyse the elements of any
communication, such as a message, a medium, a sender, a receiver and a code, and the
properties that any code would have to have in order to allow efficient communication.
There would be nothing in the analysis about the code’s users, its history or its social
significance. The only questions would involve efficient communication: how to
measure it, and how to design a code so as to maximize it.
In contrast, as soon as we bring cognition into the discussion the questions
multiply. How easy is the code to learn? How does it change diachronically? What is its
social significance as an important badge of group membership? How does it balance
the needs of the speaker (e.g. for brevity) against those of the hearer (e.g. for
explicitness)? Butler puts the complexities well in the following passage (Butler
2006:1):
2
“If we are to study language as communication, then we will need to take into
account the properties both of human communicators and of the situations in
which linguistic communication occurs. Indeed, a further important claim of
functionalism is that language systems are not self-contained with respect to such
factors, and therefore autonomous from them, but rather are shaped by them and
so cannot be properly explained except by reference to them. Linguists who make
this claim ... undoubtedly form the largest and most influential group of functional
theorists. The main language-external motivating factors are of two kinds: the
biological endowment of human beings, including cognition and the functioning
of language processing mechanisms, and the sociocultural contexts in which
communication is deeply embedded. We might also expect that a functionalist
approach would pay serious attention to the interaction between these factors and
the ways in which languages change over time, although in practice this varies
considerably from one model to another.
The question of motivation for linguistic systems is, of course, not a simple one.
Much of the formalist criticism of functionalist positions has assumed a rather
naïve view of functional motivation, in which some linguistic phenomenon is
explicable in terms of a single factor. Functionalists, however, have never seen
things this way, but rather accept that there may be competing motivations,
pulling in different directions and often leading to compromise solutions.”
This complex and sophisticated view of the pressures that shape languages has
been expressed recently as ‘stable engineering solutions satisfying multiple design
constraints, reflecting both cultural-historical factors and the constraints of human
cognition.’ (Evans and Levinson 2009:1). For Levinson and Evans, the most significant
property of language is the enormous diversity, which they hope to explain in relation to
3
the multiple (and competing) design constraints. My only disagreement – a minor
quibble about terminology - concerns their contrast between ‘cultural-historical’ and
‘the constraints of human cognition’: cultural-historical facts are themselves ultimately
facts about human cognition. If the English word for ‘cat’ is CAT, this is only true
because English speakers know it, act upon it and transmit it to the next generation. This
is a very different kind of cognitive fact from the fact that working memory is limited,
but cognitive it is nevertheless. I should therefore like to reword the quotation: ‘stable
engineering solutions satisfying multiple cognitive design constraints, reflecting both
variable cultural-historical knowledge and the permanent and universal constraints of
human cognition.’ Similarly, Butler’s ‘sociocultural contexts’ are only relevant to the
extent that they are part of speakers’ cognition.
If it is true that functional pressures are always cognitive, it is equally true that
cognitive pressures are always functional, in the sense that they push language towards
a better solution for one of the many competing design constraints. This claim is hard to
test in the absence of a closed list of design constraints, so we might treat it as a premise
to guide us in the search for design constraints: whenever we find a fact about language
which seems to relate to cognition, we must find a design constraint to mediate between
language and cognition. To take an elementary example, why does English rank the
speaker above the addressee in the pronoun system, so that the presence of the speaker
in a group forces the choice of we regardless of who else is in it? Even more
interestingly, why do so many other languages do the same? True, some languages
distinguish inclusive and exclusive pronouns for ‘we’, but (so far as I know) no
language has a word for ‘you’ which may or may not include the speaker. Presumably
the explanation lies in cognition, but it must include a design constraint such as the
4
paramount importance of talking about oneself – a sad comment on human nature,
perhaps, but apparently true.
If language is subject to functional pressures, what effects do these pressures
have? If their effects are always cognitive, as I am suggesting, they must affect our
minds first and foremost, and it is only via our minds that they affect our behaviour; so
if I choose the word we rather than you to refer to a group including my addressee as
well as myself, this is because my mind contains a ‘lexicogrammar’ which assigns each
of these words a meaning which dictates this choice. (The term lexicogrammar is a very
useful term from Systemic Functional Grammar for the continuum of lexicon and
grammar which has more recently been rediscovered by cognitive linguists – Butler and
Taverniers 2008). The pressure shapes the lexicogrammar, which in turn affects our
behaviour. But is it only via the lexicogrammar that functional pressures can affect our
behaviour? The answer depends on how we define ‘lexicogrammar’, but there are some
functional pressures whose effects clearly fall outside any familiar definition.
For example, if you and I are talking, we are more likely to understand each other
if only one of us is talking at a time, for the simple reason that listening and talking
compete for the same mental resources of attention. As with any pressure, this comes
with a cost – a competing pressure that has to be balanced against it. If you are talking,
and I have something to say, not only do I have to wait, but I also may have to take my
place in a queue along with others who also have something to say. Consequently
different communities develop different behavioural norms, ranging from complete
anarchy to the rigid rules of committee meetings; and these norms affect our speaking
behaviour in a striking way (Hudson 1996:133). But they cannot be part of the language
system if this simply controls the ways in which words are combined, pronounced and
interpreted. On the other hand, the rules for speaking or staying silent are equally clearly
5
related to the language system, because they govern its use – when to use language and
when not.
Some functional pressures clearly do affect the content of the language system,
and others clearly don’t. But in between these two extremes, we find ‘weak’ pressures,
where some kind of language behaviour is not actually dictated by the system, but is
nevertheless typical throughout the community. An example that comes to mind is the
use of directional expressions in English. If my wife is downstairs and asks me to join
her, I believe I would say I’ll come down in a minute rather than simply I’ll come in a
minute, even though the down is completely optional, and, in the situation concerned,
completely uninformative. And I believe the same is true of any English speaker
describing almost any movement or position which could be related to the deictic ‘here’.
So in all the following examples, the bracketed expression is grammatically optional
and situationally predictable, but nevertheless expected:
(1) I went (over) to Ben’s place the other day.
(2) It’s (up) in the spare bedroom.
(3) I’m driving (down) to Cardiff tomorrow.
I have no research evidence to support this claim, but my hunch is that the
bracketed words are much more likely to be uttered than omitted. What is supported by
research is the idea that our learning of language is ‘usage-based’ (Barlow and Kemmer
2000, Bybee 2010, Hudson 2007b, Tomasello 2003), which means that we maintain a
mental record of the statistical patterns in other people’s behaviour; so a statistical
tendency in other people’s behaviour may become part of my own behaviour (with the
obvious feed-back effects on other speakers).
But why should English speakers show this particular pattern? It might be just an
arbitrary pattern which we reinforce in each other, like the pronunciation patterns which
6
are so well documented in quantitative sociolinguistics(Hudson 1996: chapter 5). But
much more likely is that we have created our own local ‘functional pressure’ to specify
deictic locations and directions, regardless of the hearer’s needs. If so, this would be an
example of a functional pressure being created by collective linguistic behaviour, and
then being learned and applied by every novice speaker. It would be reflected in the
lexicogrammar by the particles which are tailor-made for this precise purpose, but their
use is not governed by categorial rules. How, then, do we decide whether or not to use
them?
This question is very similar to the one that arises in quantitative dialectology. For
example, given that we all have a choice between a velar and an alveolar nasal in the
suffix ing (as in walking or walkin’), how do we choose between them? Labov and his
colleagues and followers have shown very clearly that each speaker’s choices reflect
rather precisely the choice-patterns of the speakers who have served as their models, but
there is no agreed cognitive model for the mechanism of choosing. What I have
suggested elsewhere is that a model should take the form of a cognitive network with
dynamic activation levels which trigger choices (Hudson 2007a). Once a model is in
place, it could be extended to non-categorial functional pressures such as the one
discussed above. This is a major research challenge because it isn’t at all obvious how
to build the network needed, but the project would certainly reveal a lot about the
cognitive architecture behind human language.
The general challenge that linguistic theory faces is to relate functions to
structures: how to build a model of language structure which takes account of functional
pressures. The current proliferation of theories, including theories whose names contain
the word functional, testifies to the difficulty of this project. One basic question is
whether the functions might be so closely integrated into the system that they become
7
part of it. Some theories do merge functions and structures in this way, but in my
opinion it is a mistake; I shall consider two very different theories: Optimality Theory
and Systemic Functional Grammar.
Optimality Theory is the extreme case because each functional pressure is
represented directly as either a faithfulness constraint or a markedness constraint within
the system (Newmeyer 2010); for instance, the process that inserts an epenthetic vowel
in horses is triggered by the difficulty of pronouncing two adjacent sibillants. The
trouble with building pressures into the system in this way is that it turns the pressures
into concepts, so they only apply to the extent that speakers have the relevant concepts;
but the fact is that adjacent sibillants (for instance) are hard to pronounce whether or not
we ‘know’ this conceptually.
Systemic Functional Grammar keeps the functional pressures outside the system,
but analyses the structure so that it reflects the functions closely. Both the paradigmatic
system-networks and the syntagmatic structures of syntax are organised into a small
number of ‘metafunctions’ – ideational, interpersonal and textual – each of which is
responsible for a different set of functional pressures. This means that a clause has three
different syntactic structures: an ideational structure for the basic referential meaning,
an interpersonal structure showing how the speaker and addressee relate to this
meaning, and a textual structure showing how it relates to what has been said already
(Butler 1985, Halliday 1994). My objection in this case is that the analysis
misrepresents the relation between functions and structures by concealing the tensions
and conflicts. In my opinion, it would be much nearer to the truth to say that we try to
use a single structure to perform a number of very different jobs at the same time, so
there is no sense in which a single clause can dedicate one entire structure to each job.
For example, the clause Does she love me? uses she love me to describe a situation
8
(ideational), uses me and does she to relate it to the speaker and the hearer
(interpersonal), and she to relate it to the previous discourse; but these words are all
closely integrated in a single structure, where the redundant does is the price we pay for
this particular ‘engineering solution’ to the problem of satisfying these conflicting
pressures.
But even if some attempts to relate structures to functions have been unsuccessful,
we can all celebrate the twentieth century’s strong movement towards functionalism.
Whatever we may think of specific theories, they are all trying to go beyond the mere
analysis and description of language structures by looking for explanations. More
recently, we have a separate movement towards cognitive analyses of language
structures which explain how these structures relate to the rest of cognition. If we can
marry the two strands, functional and conceptual, into a single cognitive-functional
linguistics, then we have some hope of really understanding how language works.
2. Syntactic structure: Word order and dependency geometry
One area of language structure which has generated some particularly promising
functional explanations is word order. Why are some basic orders so much more
common than others? And why do languages provide so many alternative orders?
Cognitive explanations have always been prominent in the sense that terms such as
‘given’ and ‘new’ have been used to capture some kind of mental reality, but it is only
recently that these analyses have been able to build on work in cognitive science. One
especially promising link relates word order to limitations on working memory; perhaps
the best know exponent of this link is Hawkins, who argues that basic word orders
evolve so as to minimize demands on working memory (Hawkins 1994, Hawkins 1999,
Hawkins 2001). I find his evidence and arguments compelling, and agree with his
general conclusions.
9
However, any discussion of the effects of functional pressures on syntactic
structure presupposes some general theory of syntactic structure, and I believe
Hawkins’s case would be even stronger under a different set of assumptions. For him,
syntactic structure is phrase structure, so words are related to each other only via shared
‘mother’ nodes; so even if two words are adjacent, there is no direct syntactic relation
between them. This analysis is not a helpful basis for explaining why syntax favours
adjacency; nor is it promising as a basis for a cognitive theory of syntax because it
raises the obvious question: why can’t we link words directly to one another, using the
same mental apparatus that we use in relating events or objects in other areas of life?
For example, if we can represent the members of our family as individuals with direct
relations to other individuals, why can’t we do the same with the words in a sentence?
A much better basis for syntax, in my opinion, is dependency structure, in which
the relations between individual words are paramount. Like phrase structure,
dependency structure has many different interpretations in different theoretical
packages, so I shall select the package that I prefer, which (unsurprisingly) is the one I
created: Word Grammar (Hudson 2007b, Hudson 2010, Butler 2013). Figure 1 shows
the syntactic structure for a very simple example, Cows eat grass, in a typical, but
simplified, phrase-structure representation compared with a Word-Grammar
dependency structure. In these diagrams we are only concerned with the basic geometry
of the diagram, so labels are unnecessary; but in a complete analysis the labels (or more
accurately, the classification that they imply) are essential. The main point of the
diagram for present purposes is that the phrase-structure analysis puts two links between
eat and grass and three between eat and cows, whereas the dependency analysis has a
single link in both cases.
10
Figure 1: Phrase structure and dependency structure
Seen from the perspective of cognitive science these two structures allow very
different predictions. Most pertinently, the first structure predicts that the order of verb
and object is irrelevant to processing difficulty, because the geometry would be exactly
the same for Cows grass eat as for Cows eat grass. In contrast, the second structure
predicts the opposite, as can be seen in Figure 2. For the dependency analysis, Cows
grass eat ought to be harder to process because working memory has to hold the cows –
eat dependency for longer than in the first figure.
Figure 2: A contrast between phrase and dependency structures
In such simple examples, and for adult speakers, the differences are trivial; but
child-language research has shown that such differences do matter for novices with
small working memories. For example, small children use adjective-noun combinations
more frequently before the verb (e.g. big book fall) than after it (e.g. see big book). This
is easy to explain in terms of functional pressures from working memory, because the
separating word adds its mental demands to the existing dependency, so the processing
demands of Cows grass eat are less evenly distributed than those of Cows eat grass. In
11
Cows eat grass. Cows eat grass.
Cows grass eat. Cows grass eat.
contrast, determiner-noun combinations show the reverse pattern, being more common
after the verb (e.g. see that book) than before it (e.g. that book fall) (Ninio 1994). Once
again, this pattern is easy to explain if nouns depend on determiners (as they do in Word
Grammar - Hudson 1990:268-76). Figure 3 summarises the patterns, showing how
frequency follows predicted difficulty due to dependency patterns.
Figure 3: Dependency density in child language
One of the attractions of dependency analysis is the possibility of measuring the
relative processing difficulty of different structures. Various measures are available.
One, which I have called ‘dependency distance’, consists of a simple count of the
number of words that separate a word from the word on which it depends Hudson
2007b:124-9) and is very similar to the distance metric developed by Ted Gibson for
experimental work which clearly confirms the importance of dependency distance
(Gibson 2002) in adult language. However I now believe that a more appropriate
measure for some patterns would be ‘dependency density’, the number of dependencies
being held in memory at any given moment. This measure is most easily illustrated with
so-called ‘centre-embedded’ structures.
These sentences are so hard to process syntactically that ordinary adult
experimental subjects simply give up on syntax. Consider, for example, sentence (4).
(4) The patient who the nurse who the clinic had hired met Jack.
12
Big book fall. See big book. fall.
That book fall.See that book. fall.
>
>
When presented with a list of sentences to be judged as either grammatical or
ungrammatical, many people accept this sentence (and others like it), although it
actually doesn’t make sense either syntactically or semantically. The problem can be
seen in the simplified Word-Grammar analysis in Figure 4, which shows how the first
who introduces a relative clause which should have the nurse as its subject, but which
isn’t there.
Figure 4: An incomplete centre-embedded sentence
The main point of this example is to show how easily our working memory can
run out of resources, and how well this can be predicted by the extreme dependency
density at the point indicated by the dotted line. To see this, imagine yourself reading
this sentence a word at a time, and in slow motion. Figure 5 shows the state of play in
your mind just after reading the word clinic. (The dependency between the and clinic
can be ignored because it is so easily and quickly completed.)
13
The patient who the nurse who the clinic had hired met Jack.
????
Figure 5: Part of the way through an incomplete sentence
At this point of time, you have to hold five incomplete dependencies in your
mind, each looking for a word (labelled in the diagram) which you haven’t yet read:
The top dependency is looking for verb a, a non-dependent finite verb, for the
patient to depend on.
The next dependency down was set up by the first who, and is looking for a
dependent finite verb b.
Word c is needed for the nurse to depend on.
Similarly, words d and e are needed by the second who and the clinic.
The problem for you, as the reader, is that you’re looking for five finite verbs,
each of which is represented in your working memory simply as ‘some finite verb’.
Why this should be a problem isn’t completely clear, but the following explanation
strikes me as plausible.
One of the most important activities in your mental life is to recognise that two
concepts which are represented separately are in fact the same – that the person on the
phone is your friend, or that the next street on the right is the one you’re looking for. To
achieve this, you merge concepts (by binding them to one another) when they are the
same, so whenever you have two concepts with similar specifications (e.g. ‘finite verb’)
14
The patient who the nurse who the clinic
a
b
c
d
e
and similar activity level at the same time, you tend to merge them unless you have
reasons for keeping them separate (Hudson 2010:91-102) – that is, merging is the
default which can only be prevented by extra mental effort. In the case of this sentence,
you can safely merge verbs b and c as bc because it’s almost certain that the nurse will
turn out to be the subject of the verb expected by who: and similarly for d and e. The
trouble is that by this time your working memory is having to hold a lot of information
(five remembered words plus five dependencies plus from three to five anticipated
words) and hasn’t got the resources needed to keep these very similar nodes separate, so
it simply merges bc with de into a single dependent finite verb bcde, and once had hired
appears, the finite verb had is accepted as the merged bcde, even though this means that
it has to double as the expected complement of two who’s at the same time.
Of course, the sentence fragment in Figure 5 could have been completed
grammatically, as in Figure 6. This is grammatical because each of the two who’s has a
separate verb to head its relative clause; but it is virtually impossible to process.
Figure 6: A complete centre-embedded sentence
This example explains the difficulty of the famous ‘centre-embedding’ or ‘self-
embedding’ pattern, but of course such sentences are vanishingly rare in actual
performance because we all know how hard they are to process, whether as speaker or
as hearer. Fortunately, English offers an alternative way to express the same ideas. If we
15
The patient who the nurse who the clinic hired treated died.
want to attach a relative clause to a noun, we have the option of ‘extraposing’ it by
pretending that it actually depends on the next verb up. For example, in (5), the relative
clause that I bought last week is attached directly to goldfish, but in (6), which means
the same, it is extraposed so that it takes its position as a dependent of died.
(5) The goldfish that I bought last week has died.
(6) The goldfish has died that I bought last week.
This extraposition can be thought of as a mental operation that converts the basic
default structure (5) into one that is easier to process; but it is different from a classic
Chomskyan transformation because it can short-circuit the planning process so that it
applies while the planned words are still only partly specified. The point is that if we see
a complicated structure developing in our minds, we have ways to avoid it such as the
use of extraposition – an ‘engineering solution’ to the problem of syntactic complexity.
As can be seen from the partial structure in Figure 7, the extraposed version
delays the dependency between goldfish and that so that it doesn’t have to be processed
at the same time as the subject link from has to the goldfish.
Figure 7: A simple example of extraposition
16
The goldfish that I bought last week has died.
The goldfish has died that I bought last week.
Applying extraposition to the unprocessable (7) produces (8), which is easy to
understand.
(7) The patient who the nurse who the clinic hired treated died.
(8) The patient died who the nurse treated who the clinic hired.
Moreover, extraposition reveals the ungrammaticality of the first version of the
sentence, (4), repeated below as (9).
(9) *The patient who the nurse who the clinic had hired met Jack.
(10) *The patient met Jack who the nurse who the clinic had hired.
Perhaps the most interesting characteristic of extraposed sentences is that,
although they are much easier to process than their unextraposed equivalents, they are
structurally more complex because of the extra dependency between the extraposed
word and the higher verb (in Figure 7, the dependency between has and that). This extra
dependency coexists with all the dependencies found in the unextraposed sentence
(which, for simplicity, I omitted from Figure 7). The general conclusion is that ease of
processing is not a general matter of ‘complexity’, but of the distribution of processing
load: evenly distributed processing load is easy, but a high concentration of load in one
area is much harder.
Extraposition of a relative clause is not the only way to redistribute processing
load. In fact, English has a rich supply of grammatical solutions, each geared to a
different set of problematic sentences. The list below hints at the main ways in which
we can tweak a sentence’s syntax to suit our particular communicative purposes and
processing needs. The little formulae are meant as a guide to the particular examples
rather than as a correct generalisation of the process concerned.
(11) It-extraposition:
From: That you were able to help her so easily | is good. [1 | 2]
17
To: It | is good | that you were able to help her so easily. [it | 2 | 1]
(12) Heavy-NP shift:
From: Put | all the food that we’re going to need for the party and that we
can’t freeze | on this shelf. [1 | 2 | 3]
To: Put | on this shelf | all the food that we’re going to need for the party and
that we can’t freeze. [1 | 3 | 2]
(13) Dative shift:
From: Let’s give | something to remind her of all the good times she had
with us | to Mary. [1 | 2 | to 3]
To: Let’s give | Mary | something to remind her of all the good times she
had with us. [1 | 3 | 2]
(14) Subject delay:
From: A wonderful old oak tree with a tree-house in its branches | stands | in
the corner. [1 | 2 | 3]
To: In the corner | stands | a wonderful old oak tree with a tree-house in its
branches. [3 | 2 | 1]
(15) There-insertion:
From: A dog | is | in the garden. [1 | 2 | 3]
To: There | is | a dog | in the garden. [there | 2 | 1 | 3]
(16) Front-shifting:
From: I bumped into someone I met at a party given by our neighbours | last
night. [1 | 2]
To: Last night | I bumped into someone I met at a party given by our
neighbours. [2 | 1]
(17) It-clefting:
18
From: I bumped into someone I met at a party given by our neighbours | last
night. [1 | 2]
To: It was last night that I bumped into someone I met at a party given by
our neighbours. [it was 2 | that 1]
(18) Wh-clefting:
From: I bumped into someone I met at a party given by our neighbours | last
night. [1 | 2]
To: Last night was when I bumped into someone I met at a party given by
our neighbours. [2 | was when | 1]
(19) Passivization:
From: All the books that I’ve read by him | have | impressed | me. [1 | 2 | 3 |
4]
To: I | have | been impressed | by all the books that I’ve read by him. [4 | 2 |
been 3 | by 1]
Each of these patterns is firmly embedded in the grammar of English, with its own
rules and its own effects; and in each case it makes good sense to see it as an
‘engineering solution’ to some kind of functional demand on the speaker or hearer – in
other words, as an important tool that any mature user of English can apply effectively.
Which brings us to the language education which is needed in order to turn us all into
‘mature users’.
3. Language education
Education is, by definition, an interference with a child’s ‘natural’ development, an
attempt to direct that development in particular ways chosen by the adult world. For
some people, the notion of ‘language education’ is a contradiction because language
develops naturally under its own logic, so all it needs is raw data to trigger the built-in
19
grammatical system and to provide a vocabulary. In this view, second-language
teaching should just ‘expose’ children to comprehensible input (Krashen 1982), and
much the same philosophy dominated first-language English teaching for some decades
(Kolln and Hancock 2005). However, there is now a well-articulated and influential
body of opinion which sees language development in a very different way, with
education playing a major role (Hudson 2004).
The cognitive functionalism with which we started implies that each language
evolves to support the tasks that its users have to perform, so we expect, and find, as
much diversity among languages as among language users. This amounts to a rejection
of the romantic notion of ‘natural language’, which is language ‘as nature intended’,
unspoilt by human institutions such as schools (Chomsky 1987, Chomsky 2011, Olson
and others 1991). There is very little ‘natural’ about the language that you and I know,
and that allows me to write these words, and you to read them. You and I both spent
years of our childhood not only learning the skills of reading and writing, but also
‘academic language’ – the language of school, of universities and of a great deal of
adult life. This academic language has been shaped over the centuries by the need to
talk about mathematics, geography and literature, and by the need to argue, hypothesise,
reason and explain. It has also been standardized, but this is a relatively minor element
in its history compared with the enormous developments triggered by complex
communicative demands. The fact is that two generations of theoretical linguists have
used modern English as an example of a ‘natural language’ without worrying about the
many ways in which we interfere with our language, or even noticing them. If diversity
and cultural adaptation are normal for languages, then a ‘natural language’ is simply one
that is (more or less) adapted to its culture; and from this point of view, modern English,
20
with all its richness and complexity, is just as natural as a very simple language such as
Pirahã, which has evolved to fit a very simple culture (Everett 2008).
Let’s assume, therefore, that a complex society such as ours needs a complex
language, and that complex language requires education so that children can move
beyond childish and casual language development. What kinds of language do children
need to be taught in school? Part of the answer is obvious and uncontroversial: they
need to be taught ‘relevant’ language which they won’t learn outside school. What
makes language experience relevant is, of course, a social and even a political decision,
according to what ‘society’ deems necessary for adult functioning. Our society
generally agrees that school leavers should be able to cope with more formal and
academic styles, both in spoken and written modes, and though these notions are
inherently vague, there is enough agreement for examination boards to design public
tests of competence in these areas.
But what does this mean, in concrete terms, for first-language teaching? What
does ‘formal academic language’ contain that children won’t learn anyway from
ordinary linguistic interaction outside the school? This is a research question for
linguistics, but the research that it defines is remarkable for its paucity. There has been a
great deal of work by psychologists on the global statistics of vocabulary growth; for
example, one estimate (Bloom and Markson 1998) suggests that, starting at 30 months,
we typically learned 3.6 new words per day in our pre-school years, rising to 6.6 words
up to 8 years (the age when we typically become independent readers), then rising to
12.1 words. This particular estimate stops at age 10, but another research report (Nagy
and Herman 1987) estimates that the typical school-leaver (year 11) knows about
40,000 words, which implies a rate of about 3,000 words per year, or just under 10 per
21
day – roughly the same figure as for primary children. However, very few linguists
seem to have done research in this area (Hatch and Brown 1995).
Even more striking is the lack of research on grammatical development during the
school years – what develops, and how it can be encouraged. The most important source
of information on what develops is still the work on syntax done in the 1980s by
Katharine Perera (Perera 1984, Perera 1990, Perera 1994), with a few rather minor
recent additions such as my own (Hudson 2009). One of the conclusions that emerges
very clearly from this research is that children’s grammatical repertoire – the range of
constructions that they know with sufficient confidence to actually use in their own
writing – is still growing right through the school years. For example, they are learning
new conjunctions and prepositions (such as although, unless and in spite of), new ways
of using non-finite verbs (such as after when or on their own as adverbial clauses), and
new details that are on the borderline between grammar and vocabulary (such as the
prepositions selected by particular words, e.g. tired of but bored with). As to how it can
be encouraged, we now have solid research evidence that sensibly planned grammatical
instruction can have a considerable effect on children’s writing and reading skills
(Hancock 2009, Myhill and others 2010, Myhill 2011, Chipere 2003), so the way
forward is clear: English teachers can help children to develop grammatically by
judicious use of direct instruction.
I should like to finish by returning to the list of grammatical tools that English
provides for communicating complex ideas. These tools illustrate the potential for direct
instruction in grammar. The linguistic demands of adult life go well beyond mere
details of style such as formal and informal vocabulary (as in contrasts such as TRY
versus ATTEMPT). None of these details will help them to deal with complex
communication – reading other people’s attempts to put complex ideas into words, or
22
writing their own attempts. The fact is that adult life often depends on this ability, and
there are significant benefits not only for those who succeed, but also for those who are
trying to communicate with them. Unlike Pirahã culture, complex messages are part of
our culture. Our language has adapted over the centuries to these functional demands,
and now contains a large number of tools for effective communication, namely
extraposition and all the other structures listed in (11) to (19). School leavers could, and
arguably should, be consciously aware of these tools – that they exist, how they affect
syntax, how they can help, and maybe even their technical names. Grammarians know
and understand all the linguistic details, psychologists know how the tools help with
processing, cognitive-functional theorists know how to integrate the linguistic details
with functional demands and culture, and educationalists know how to teach such
things. The only missing element is collaboration.
References
Barlow, Michael and Kemmer, Suzanne. 2000. Usage based models of language.
Stanford: CSLI
Bloom, Paul and Markson, Lori. 1998. “Capacities underlying word learning”. Trends
in Cognitive Sciences 2: 67-73.
Butler, Christopher. 1985. Systemic Linguistics: Theory and applications. London:
Batsford
Butler, Christopher. 2006. “Functionalist Theories of Language”, in Encyclopedia of
Language & Linguistics, Keith Brown (ed.) (eds), 696-704. Oxford: Elsevier.
Butler, Christopher. 2013. “Word grammar”, in Theories and Methods in Linguistics
(Wörterbücher der Sprach- und Kommunikationswissenschaft), Johannes Kabatek
& Bernd Kortmann (eds.) (eds), Berlin: Mouton de Gruyter.
23
Butler, Christopher and Taverniers, Miriam. 2008. “Layering in structural-functional
grammars”. Linguistics 46: 689-956.
Bybee, Joan. 2010. Language, Usage and Cognition. Cambridge: Cambridge University
Press
Chipere, Ngoni. 2003. Understanding Complex Sentences: Native Speaker Variation in
Syntactic Competence. London: Palgrave Macmillan
Chomsky, Noam. 1987. “Chomsky on grammar teaching. Noam Chomsky interviewed
by Lillian R. Putnam”. Reading Instruction Journal 1987.
Chomsky, Noam. 2011. “Language and Other Cognitive Systems. What Is Special
About Language?”. Language Learning and Development 7: 263-278.
Evans, Nicholas and Levinson, Stephen. 2009. “The Myth of Language Universals:
Language diversity and its importance for cognitive science”. Behavioral and
Brain Sciences 32: 429-492.
Everett, Daniel. 2008. Don't Sleep, There Are Snakes: Life and Language in the
Amazonian Jungle. Pantheon.
Gibson, Edward. 2002. “The influence of referential processing on sentence
complexity”. Cognition 85 : 79-112.
Gonzálvez-García, Francisco and Butler, Christopher. 2007. “Mapping functional-
cognitive space”. Annual Review of Cognitive Linguistics 4: 39-96.
Halliday, Michael. 1994. An Introduction to Functional Grammar (2nd Edition).
London: Arnold
Hancock, Craig. 2009. “How linguistics can inform the teaching of writing”, in The
Sage Handbook of Writing, Roger Beard, Debra Myhill, Jeni Riley, & Martin
Nystrand (eds.) (eds), 194-207. London etc: Sage.
24
Hatch, Evelyn M. and Brown, Cheryl. 1995. Vocabulary, Semantics, and Language
Education. Cambridge: Cambridge University Press
Hawkins, John. 1994. A Performance Theory of Order and Constituency. Cambridge:
Cambridge University Press.
Hawkins, John. 1999. “Processing complexity and filler-gap dependencies across
grammars”. Language 75: 244-285.
Hawkins, John. 2001. “Why are categories adjacent?”. Journal of Linguistics 37: 1-34.
Hudson, Richard. 1990. English Word Grammar. Oxford: Blackwell.
Hudson, Richard. 1996. Sociolinguistics (Second edition). Cambridge: Cambridge
University Press.
Hudson, Richard. 2004. “Why education needs linguistics (and vice versa)”. Journal of
Linguistics 40: 105-130.
Hudson, Richard. 2007a. “English dialect syntax in Word Grammar”. English Language
and Linguistics 11: 383-405.
Hudson, Richard. 2007b. Language Networks: The New Word Grammar. Oxford:
Oxford University Press.
Hudson, Richard. 2009. “Measuring maturity”, in SAGE Handbook of Writing
Development, Roger Beard, Debra Myhill, Martin Nystrand, & Jeni Riley (eds.)
(eds), 349-362. London: Sage.
Hudson, Richard. 2010. An Introduction to Word Grammar. Cambridge: Cambridge
University Press.
Kolln, Martha and Hancock, Craig. 2005. “The story of English grammar in United
States schools”. English Teaching: Practice and Critique 4: 11-31.
Krashen, Stephen. 1982. Principles and Practice in Second Language Acquisition.
Michigan: Pergamon.
25
Myhill, Debra. 2011. “Grammar for designers: how grammar supports the development
of writing”, in Applied Linguistics and Primary School Teaching, Sue Ellis &
Elspeth McCartney (eds.) (eds), 81-92. Cambridge: Cambridge University Press.
Myhill, Debra, Lines, Helen, and Watson, Annabel. 2010. “Making meaning with
grammar: A repertoire of possibilities”. METAphor 2: 1-10.
Nagy, William and Herman, Patricia. 1987. “Breadth and depth of vocabulary
knowledge: Implications for acquisition and instruction”, in The nature of
vocabulary acquisition, Margaret McKeown & Mary Curtis (eds.), 19-35.
Hillsdale NJ: Lawrence Erlbaum.
Newmeyer, Frederick. 2010. “History and Philosophy of Linguistics: an interview with
Frederick J. Newmeyer”. ReVel 8.
Ninio, Anat. 1994. “Predicting the order of acquisition of three-word constructions by
the complexity of their dependency structure”. First Language 14: 119-152.
Olson, Gary, Faigley, Lester, and Chomsky, Noam. 1991. “Language, politics and
composition: a conversation with Noam Chomsky”. Journal of Advanced
Composition 11: 1-35.
Perera, Katharine. 1984. Children's Writing and Reading. Analysing Classroom
Language. Oxford: B. Blackwell in association with A. Deutsch.
Perera, Katharine. 1990. “Grammatical differentiation between speech and writing in
children aged 8 to 12.” in Knowledge About Language and the Curriculum,
Ronald Carter (ed.), 216-233. London: Hodder and Stoughton.
Perera, Katharine. 1994. “Child Language Research: Building on the Past, Looking to
the Future”. Journal of Child Language 21: 1-7.
Tomasello, Michael. 2003. Constructing a Language: A Usage-based Theory of
Language Acquisition. Harvard University Press.
26
27