Play as Precursor of Phonology and Syntax

CHRIS KNIGHT

From The Evolutionary Emergence of Language: social function and the origins of linguistic form, eds Chris Knight, Michael Studdert-Kennedy & James R Hurford. Cambridge University Press, Cambridge, UK. ISBN 0 521 78696 7. 2000

The theme of language as play suggests inquiries into non-cognitive uses of language such as that found in riddles, jingles, or tongue twisters — and beyond this into the poetic and ritual function of language, as well as into parallels between language and ritual, language and music, and language and dance. It also provides an explanation for the obvious fact that so much in language is non-optimal for purposes of communicating cognitive information. Morris Halle (1975: 528)

Primate vocalisations are irrepressible, context-bound indices of emotional states, in some cases conveying additional information about the sender’s condition, status and/or local environment. Speech has a quite different function: it permits communication of information concerning a shared, conceptual environment — a world of intangibles independent of currently perceptible reality.

A suite of formal discontinuities are bound up with this fundamental functional contrast. Whereas primate vocalisations are not easily faked, human speech signals are cognitively controlled, linked arbitrarily to their referents and ‘displaced’ – hence immune from contextual corroboration (Burling 1993). The meanings of primate gestures/calls are evaluated on an analog, ‘more/less’ scale; speech signals are digitally processed (Burling 1993). When combined, primate signals and associated meanings blend and grade into one another; the basic elements of speech are discrete/particulate (Abler 1989; Studdert Kennedy 1998). Primate recipients evaluate details of signalling performance; in speech, the focus is on underlying intentions, with listeners compensating for deficiencies in performance (Grice 1969; Sperber and Wilson 1986). Primate vocal signals prompt reflex responses; in speech, computational processes mediate between signal and message (Deacon 1997).

If primate calls do not reflect details of cognition, we may ask how it became possible in the human case for vocalisations to express conceptual processes? Insofar as a chimpanzee may be said to think in concepts, conveying these will involve facial expression, position, posture and bodily motion (Köhler 1927; Menzel 1971; Plooij 1978). Humans intuitively use the same method: when an initially functional action is replayed for purposes of communication, success is achieved through direct iconic expression of the thought (McNeill 1992). For either species, it is much simpler and more effective to involve any or all manipulable parts of the body rather than accept restriction to just hands, or just voice.

Against this background, one school of thought concludes that in the absence of a conventional code, humanity’s earliest signs can only have worked as gestural replicas or icons (Hewes 1973; Kendon 1991; Armstrong, Stokoe and Wilcox 1995). During the course of human evolution — so runs the basic argument — thought gestures of the kind occasionally observed among apes (Köhler 1927; Plooij 1978) become habitually deployed. Through frequent use, these become curtailed and conventionalised, leading eventually to a system of arbitrary signs.

Recently established sign languages illustrate how iconic gestures become reduced to conventionalised shorthands, sometimes within a generation (Kegl, Senghas and Coppola 1998). Even following conventionalisation, sign languages remain more iconic than spoken ones. Yet they exhibit essentially the same hierarchical, embedded structure as spoken language, and are acquired by children just as naturally (Bellugi and Klima 1975, 1982). It appears, then, that the ‘language organ’ central to Chomskyan theory works as well with visuo-manual gesture as with sound. Had the evolution of syntactical competence been driven by motor control for vocal communication, as argued by Lieberman (1985), this outcome would seem difficult to explain. Even in spoken language, syntax remains to a significant extent iconic (Haiman 1985), leading Givón (1985: 214) to treat iconicity as ‘the truly general case in the coding, representation and communication of experience’, arbitrary convention being ‘a mere extreme case on the iconic scale’. Acceptance of this principle logically excludes a vocal origin for the representational functions of language: apart from the special case of sound symbolism or onomatopoeia, it is not easy to see how iconic resemblances can be made using sound alone.

But if a language of visual signs was initially adaptive, why would it subsequently have been phased out? By comparison with manual signing, vocal communication saves time and energy, liberates the hands for other tasks and is effective around corners or in the dark. Proponents of an originally gestural modality explain the transition to a vocal one in these terms. But, asks MacNeilage (1998: 232), if the advantages of vocalising are so decisive, how and why did visuomanual gesture take precedence in the first place? Why start with an inefficient modality and then switch to an efficient one? Why not resort to the appropriate modality from the outset? For MacNeilage, the gestural theory encounters ‘an insuperable problem’ at this point (1998: 232).

A further difficulty — according to MacNeilage — is that few entities in the real world allow a natural linkage between iconic gestures in both visual and vocal modalities. Admittedly, one might represent ‘lion’ by pouncing and roaring. Translation into a purely vocal medium is here straightforward: just omit the pounce. However, most referents are not iconically identifiable by sound. Iconic signing, moreover, exploits spatial dimensionality, an option not available in vocal-auditory signalling. This in turn implies very different principles of phonological organisation in the two modalities. Given the associated translation problems, how could the posited modality switch to vocal speech have occurred?

On the basis of such objections, MacNeilage (1998: 238) makes the strong claim that ‘the vocal-auditory modality of spoken language was the first and only output mechanism for language’. This coincides with Dunbar ’s (1996: 141) view that gesture was never necessary — ‘it can all be done by voice’.

Statements of this kind, however, pose the central question of precisely how it could all be done? At what point and through which mechanisms did it become technically feasible to communicate details of conceptual thinking by exclusively vocal means?

Precursors of Compositional Speech

Prominent recent models of the evolution of speech suggest a two-stage process beginning with the appearance of referentially functional ‘words’. In Bickerton’s (1996: 51) view, ‘syntax could not have come into existence until there was a sizeable vocabulary whose units could be organized into complex Structures’. Studdert-Kennedy (1998) likewise considers words to have emerged at an early stage. In his view, it was a steady increase in the size of the ancestral population’s vocabulary which necessitated the radical restructuring of the vocal apparatus characteristic of modern Homo sapiens (Lieberman 1984).

Such models begin with a simple, limited lexicon, and then derive complexity from vocabulary expansion and related challenges premised upon the prior existence of words. The basic reasoning (cf. Studdert-Kennedy 1998) is as follows. Ancestral speakers increasingly needed multiple semantic distinctions, but had only limited articulatory resources to achieve this. Some primate species possess up to 30 holistically distinct vocalisations, each with its special meaning. Humans required more than this. The solution was to independently recycle the components of formerly holistic signals. This involved reduplicating each signal with variability at only certain positions — as in ‘flim-flam’ or ‘higgledy-piggledy’. If just one component – say, the initial consonant – could be varied, while holding the remainder invariant, this would allow a vastly expanded lexicon. The argument is that during human evolution, this ‘particulate’ principle increasingly supplanted the ‘holistic’ principle of primate signalling. The development drove changes in physiology and anatomy allowing vocalisers to control lip muscles independently of tongue muscles, these independently of the soft palate and so on. The human vocal tract was in this way progressively differentiated into independently controllable parts (Studdert-Kennedy 1998: 208—209).

Note that in this scenario, ‘words’ are already being used before the evolution of the distinctively human vocal apparatus, hence prior to any correspondingly enhanced competence in differentiating syllables. Studdert-Kennedy (1998: 211) acknowledges that this evolutionary sequence bears no relationship to the stages through which children pass in acquiring speech:

If the assumption that differentiation of the hominid protosyllable evolved in response to pressure for increased vocabulary is correct, the onset of differentiation before the first words in modern children must be a relatively late evolutionary novelty, selected and inserted into the developmental sequence for whatever facilitatory effect it may have on later processes of differentiation.

Studdert-Kennedy, then, acknowledges that his model addresses one issue only to face us with an additional puzzle. If evolving humans first used words and only then began differentiating syllables, why is it that children nowadays do just the opposite, first learning to differentiate syllables and only then deploying words?

Children start babbling at an early age, when they are also displaying capacities for thinking. But at first, these two activities – babbling and thinking – remain unconnected. The infant is not thinking through its babbling. Then, at about age two, ‘the curves of development’ of intellect and transmission, previously separate, ‘meet and join to initiate a new form of behavior’ (Vygotsky 1986: 82). As the child’s cognitive faculties gain control over the former babbling vocal transmission system, thought at last becomes verbal while trans-mission becomes intellectual. Speech is the result.

By comparison with primates, birds often display remarkable vocal ability, yet outputs lack cognitive significance (Marler 1998). As in the case of animal communication generally, cognition and vocal transmission never meet. Although this can be explained by reference to neurophysiological deficits, fundamentally the reasons are social. Cognition and communication are intrinsically divergent functions, subject to radically contrasting Darwinian selection pressures (Ulbaek 1998). Cognition is likely to enhance fitness even where social strategies are individualistically competitive; this is not true of communication. Why share valuable information with competitors who may turn out to be direct rivals? Why pass over reliable sensory evidence in favour of information received only second-hand? In resisting deception, animals respond preferentially to signals whose intrinsically hard-to-fake characteristics guarantee their reliability. This sets up selection pressures against evolution in the direction of speech.

But what if the signals simply don’t matter? Suppose certain internal variations within a primate vocal sequence reflect intentional manipulation expressed only as ‘idle play’. Provided no risks are entailed, conspecifics might respond with relaxed ‘play’ vocalisations of their own. If such call-and-response exchanges served bonding functions, sophisticated capacities for detecting and producing signal variety might evolve. We would then have the paradox that signals could be intentionally manipulated, but only on condition that little of social importance was conveyed.

This idea may have wider application than has previously been suspected. Gelada monkeys accompany their relaxed, ‘friendly’ social interactions with a wide range of subtly different vocalisations (Richman 1976, 1987). These include nasalised grunts, long, melodically complex inhalations, stop consonants, fricatives and glides, a range of vowel quality differences, tight voicing, muffled voicing, pitch variations and so forth. Geladas also employ a variety of rhythms and melodies. Rhythms may be fast, slow, staccato, glissando, first-beat accented or end-accented. Melodies may have evenly spaced musical intervals covering a range of two or three octaves.

Moreover, geladas in groups accurately synchronise their complex and varied vocalisations (Richman 1978). This ability is remarkable, for it involves high- speed modulation of the signal stream in response to conspecifics’ anticipated contributions to each rhythmic sequence, with vocalisers switching between digitally contrastive alternatives. In human speech, vowels and consonants are, of course, not objective, physical units but psychologically defined entities; the fact that geladas can accurately echo and replicate one another’s vocal alternations suggests that they, too, must be processing acoustic parameters of the signal stream in a digital, categorical way (cf. Hamad 1987).

Chimpanzee males often give ‘long calls’ together in chorus, striving to match the acoustic characteristics of each other’s vocalisations (Mitani and Brandt 1994). Such chorusing and duetting leads to some local standardisation of call variants, so that neighbouring communities may even display ‘dialectical’ differences (Mitani et a!. 1992). Each such distinctive chorus might almost amount to a ‘signature’ of local group identity (cf. Arcadi 1996; Mitani et al. 1992; Ujhelyi 1998). Where calls must carry over considerable distances, there is selection for salient, discrete form (Marler 1975: 16). These and comparable primate calls may be richly structured, the capacities underlying them constituting plausible precursors of the vocal competences drawn upon by humans in speech (Ujhelyi 1998).

Still more impressive are the vocalisations of those songbirds which can generate an extensive repertoire by recombining the same basic set of minimal acoustic units — avian equivalents of ‘phonemes’ and ‘syllables’. Each species has special rules for generating songs in this way. In the case of swamp sparrows, for example, each syllable is made up of two to six different notes, themselves meaningless, arranged in a distinctive cluster. The constituent notes are all drawn from a restricted species-wide repertoire of six note types with a set of rules for assembling them into a song (Marler and Pickett 1984).

Apart from speech, the only other animal signals displaying comparable structure are the learned songs of humpback whales (Payne, Tyack and Payne 1983) and other cetaceans. ‘Phonological syntax’, as Marler (1998: 10—11) terms such combinatorial creativity, is not found among nonhuman primates. Admittedly, chimpanzees construct their pant-hoots and gibbons their songs by assembling novel sequences from more basic recyclable units. But in their case each individual adopts for life just one combinatorial pattern, not a variable repertoire (Marler and Tenaza 1977).

Although categorically perceived, the minimal acoustic units of birdsong do not function in the manner of speech phonemes: that is, they play no role in selecting between overall meanings. Marler (1998: 11) describes ‘syntactical’ birdsong as ‘impoverished in referential content, but rich in idle emotional content’. The term ‘idle’ is well chosen here, testifying to the close relationship between such variability and the leisured creativity of animal ‘play’. Like play, syntactical creativity in animal signalling reflects inner realities, not functional demands or environmental stimuli. ‘The variety’ writes Marler (1998: 12),

is introduced, not to enrich meaning, but to create diversity for its own sake, to alleviate boredom in singer and listener, perhaps with individual differences serving to impress the listener with the singer’s virtuosity, but not to convey knowledge.

In this respect, such signalling differs not only from speech, but also from those other calls of birds, cetaceans or primates which do have meanings. Where alarms or other calls must convey reliable information, this can only be at the expense of ‘syntactical’ creativity or play.

‘Phonological’ Versus ‘Lexical’ Syntax

Acknowledging this dynamic, Marler (1998: 10—11) distinguishes between ‘phonological syntax’ on the one hand and ‘lexical syntax’ on the other. Phonological syntax we have just discussed. Lexical syntax in the animal world would be the rule-governed assembly and reassembly not just of phonetic representations but of semantic ones. Neither birds nor primates show evidence of syntax of this kind.

In a thought experiment, we might imagine vervet monkeys syntactically ‘playing’ with combinations of calls such as those warning of eagles, leopards or snakes (Cheney and Seyfarth 1990). Why is it that in real life, this never happens? In this and other cases, neurophysiological limitations have been invoked to explain observed or postulated deficits in the signalling of primates other than modern humans (e.g. Bickerton 1990, 1996, 1998). Such explanations, however, overlook a deeper problem. Combining carefree, ‘playful’ signalling with life-and-death functional communication is logically paradoxical. Central to the very definition of play is that no immediate function is served, no compulsion applied. If animals could freely ‘play’ with signals conveying life-and-death meanings, then the result would be more than ‘creativity’ — it would be fatal unreliability and confusion.

Against this background, the puzzle of speech is that digital alternations among low-energy signals carry weighty social consequences. Substituting a ‘d’ for a ‘t’ in English, for example, will turn ‘tin’ into ‘din’ or ‘mat’ into ‘mad’. Speakers may make such phonemic substitutions to construct utterances which, if accepted as relevant, earn corresponding social status (Dessalles 1998). Just one consonant can decide between relevance and irrelevance, or life and death — between, say, ‘We will meet you tomorrow’ and ‘We will eat you tomorrow’. While this may be conceptualised as ‘extraordinary power’ (Studdert-Kennedy 1998: 202), it is important also to appreciate the social costs. How can changes in socially contestable meanings be left to the discretion of individuals who, to secure such changes, need only substitute one low-cost signal — one vowel or consonant — for another? How can listeners vest trust in a system as apparently arbitrary and open to abuse as this?

One fact is certain: in the animal world, sceptical recipients would insist on making any such substitutions costly, precluding a role for low-energy signals in deciding between socially contestable meanings (Zahavi and Zahavi 1997). This alone rules out the idea that ‘lexical pressure’ — in advance of ritually enforced signal reliability (cf. Power, this volume) — can have driven the evolution of syllabic differentiation or the associated restructuring of the human vocal tract. In seeking to explain early vocal preadaptations for speech, then, we appear to have no alternative but to invoke ‘play’, on the model of birdsong and the song sequences of cetaceans.

Language and Animal Play

It is known that children derive substantial cognitive benefits from the sense of mastery and well-being associated with imaginative play (Piaget 1962; Vygotsky 1978; Bjorklund and Green 1992; see also Bruner, Jolly and Sylva 1976). Human infants from around 18 to 24 months start playing ‘pretend’, a critical development prefiguring more advanced levels of mind-reading competence (Leslie 1987; Dunn and Dale 1984). Representational play with realistic toys begins at about the age when children first acquire referential words (Bates 1976). Sequences of thematically related representational play roughly coincide with first use of syntactic combinations in expressive language (Bates et al. 1979; McCune-Nicolich and Bruskin 1982). From then on, young childrens’ most elaborate use of language occurs not in reality-bound, functional contexts but during make-believe play. ‘In play, as in fiction’, to quote one study (French et al. 1985: 24), ‘one has the freedom to violate the way things really are in favour of transitory transformations of reality’. As an instrument of ‘displaced reference’ (Hockett 1960), speech has exactly this function.

Maternal responsiveness is strongly correlated with complexity and preplanning in childhood representational play (Spencer and Meadow-Orlans 1996). No mother could play with her infant if she were intent on ‘winning’; she must know how to ‘lose’. In the animal world, too, if a normally dominant individual is to play with a subordinate, it must experiment with ‘losing’. Wherever inequalities exist, players must renounce physical advantages — or there will be no game. For play to flourish, safety and security must be sufficient to al-low participants freedom to explore the full range of their locomotor, cognitive and social capacities, trusting in the intentions of others. In all this, suggestive parallels with language are hard to avoid.

What makes an animal’s play gestures so different from the displays staged when under serious competitive pressure? Clearly, freedom from anxiety is decisive in making the difference. ‘Play’, as one specialist has noted (Shultz 1979: 10),

only seems to occur when the animal is essentially free of survival pressures — when it is not suffering from the heat, the cold, or the wet, when it is not being harrassed by predators, and when it is free of various physiological pressures such as hunger, thirst, drowsiness or sex.

For play to be possible, vulnerable individuals must feel able to afford the luxury of ‘losing’ without suffering the costs. Whereas male-male sexual contests or other fights focus repetitively on a narrow repertoire of locomotor routines, those engaged in ‘play fights’ may ring the changes on a varied repertoire. In play, losers and winners willingly exchange roles — a pattern reminiscent of turn-taking in conversational speech. Play participants gain cognitive benefits through identification with alternate roles in succession. Syntactical competence involves ‘playing’ with basic ‘who-does-what-to-whom’ categories such as Agent, Theme and Goal (Chomsky 1981). Social ‘pretend play’ draws on comparable capacities, and suggests a likely context for the evolution of such competence.

Where winning is not the intention, the play versions of actions need not be acted out in full — low-cost ‘tokens’ may suffice. In Kendon’s (1991) model of language origins, conceptual communication begins with the partial, tokenistic acting out of sequences whose significance was originally functional. Worden (1998) persuasively traces syntactical competence to its roots in social intelligence. Prior to the emergence of language, it would have been in the tokens of social play that such internal intelligence became externalised most fully.

The difference between a play representation and its serious functional prototype is categorical. A puppy which mistook a play bite for its real counterpart would respond inappropriately, just as would a human listener unable to ‘read behind’ the literal meanings of words (Grice 1969; Sperber and Wilson 1986; Baron-Cohen 1995). A play bite resembles a real bite. But by being patently inserted in a nonfunctional context, it acquires a wholly different meaning (Bateson 1973: 150—166). When a preliminary signal is used to indicate ‘What follows is play!’, the effect is to systematically reverse the meanings of subsequent signals. For example, a dog may solicit play by lowering its head so as to appear nonthreatening; it wags its tail while crouched on its forelimbs, hindquarters raised (Bekoff 1977). In a pattern reminiscent of grammar, such a ‘play bow’ may introduce the rest of the sequence. The fact that a preliminary signal here reverses the ‘literal’ meanings of subsequent ‘attacks’, rather than simply augmenting or blending with them, suggests a plausible phylogenetic starting point for more complex forms of transformative, discrete/combinatorial signalling such as those involved in speech.

True imitation among apes has been most convincingly documented not in contexts of technical problem solving but during play (Visalberghi and Fragaszy 1990). Juveniles in the Arnhem Zoo, for example, have been observed amusing themselves by walking single file behind an adult group member, deliberately imitating their target’s limping or otherwise distinctive gait (de Waal 1996: 72). It is in such imaginative games — in these instances suggestive of subversive humour or even ‘name calling’ — that young chimpanzees approximate most closely to the conceptual richness and creativity of speech.

Language and Laughter

‘Mimesis’ is Donald’s (1991) term for putative early human emotional displays which, in being adapted to serve intentionally communicative functions, are brought increasingly under cognitive control. Children playing chase games provide familiar examples, as they fill the air with partly simulated screams. Inevitably, on hearing distant alarms, it may be difficult for others to distinguish real from fictional danger. Among primates, selection pressures have clearly acted to minimise such risks.

Noisy play among young primates is relatively rare, a fact which has been explained also by the danger of attracting predators (Biben 1998: 171). Where play is accompanied by vocalising, as when squirrel monkeys ‘play peep’ (Biben 1998: 171) or frolicking chimpanzees ‘laugh’ (Goodall 1986: 371), the sounds may assist in ‘framing’ other activities as ‘pretend’ versions of their serious prototypes. Instances of double-deception — deceptively signalling ‘play’ to trick and defeat an opponent — are not reported in the literature on primate ‘Machiavellian’ intelligence. Primate vocalisations, then, appear to differ from manual or whole-body gestures in one crucial respect: being reserved for reliable communication, they resist bifurcation into ‘pretend’ versions on the one hand and ‘real’ prototypes on the other. In the human case, this evolutionary constraint has evidently been overcome — a fact pointing to the impact upon social communication of distinctively human levels of safety, social security and corresponding freedom to play.

Homo sapiens possesses radically enhanced capacities for producing vocal signals which, like play bites, can be thought of as ‘displaced’ or ‘fictional’. Playful ‘screams’ are one example. Others are to be found in the games used by mothers to prompt their babies to laugh. One such trick is to hide and then suddenly reappear, to the exclamation ‘Boo!’ (Bruner and Sherwood 1976). There is a risk that instead of laughing, the baby may cry. This will almost certainly happen if the ‘Boo!’ is emitted by a stranger. But provided the context is reassuring, the baby should overcome its initial fear response, constructing an alternative referential frame which reverses the sound’s ‘literal’ meaning. Laughter gives expression to the baby’s sense of mastery and relief. Involved here is a minor revolution: the very signal most likely to cause alarm is, given sufficient trust, the surest way to elicit laughter in the child (Sroufe and Wunsch 1972).

The same principle applies to teasing, tickling and humour more generally. Young chimpanzees often engage in ‘tickling’ games, laughing all the while. The tickle gestures are aggressive actions, but only in pretend forms (Goodall 1986: 371). In humour of the human verbal kind, a train of thought in one frame of reference bumps up against an anomaly: an event or statement that makes no sense in the context of what has come before. The anomaly can be resolved by shifting to a different frame of reference, in which the event at last makes sense (Koestler 1964). Recall the baby who for a split second may have been puzzled by its mother’s ‘Boo!’ It laughs when it can place the signal in a different context, reversing its former meaning. More sophisticated jokes work in a similar way.

Pinker (1998: 552) points out that such frame shifting is not limited to the challenges of appreciating jokes. Involved here is none other than the principle of relevance (Sperber and Wilson 1986) on which the very possibility of language depends. The semantic meanings of words, taken literally, are abstract and often irrelevant. In terms of their currently perceptible contexts, they may be inappropriate — like a mother’s ‘Boo!’ to her child. But as with babies displaying a sense of humour, human listeners do not leave matters there. On hearing such inappropriate abstractions and irrelevancies, they respond by adopting whatever frame of reference is required to make sense of them, amending or even reversing literal meanings as necessary. The aim is always to delve behind surface appearance in search of the signaller’s underlying intention, which may be quite different (Grice 1969; Sperber and Wilson 1986).

According to Eibl-Eibesfeldt (1989: 138), the sounds characteristic of human laughter may be traced back to the rhythmic mobbing calls of group-living primates:

The loud utterance of laughter is derived from an old pattern of behavior of mobbing, in which several group members threaten a common enemy. Thus it is a special case of aggressive behavior and this component retains its original significance. If we laugh aloud at someone, this is an aggressive act, bonding those who join in the laughter. Common laughter thus becomes a bonding signal between those who are common aggressors.

Chimpanzees ‘laugh’ when they ‘play fight’; here, the laughter indicates that the accompanying ‘aggressive’ behaviour is only ‘pretend’ (Goodall 1986: 371). We have then, as Pinker (1998: 546) points out, two candidates for precursors to human laughter: (1) a signal of collective mobbing or aggression and (2) a signal of ‘pretend’ aggression. These, however, are not mutually exclusive: pranks which are cruelly effective in puncturing outsiders’ pretensions may amuse insiders for precisely that reason.

Laughter is contagious, irrepressible and energetically demanding. Unlike dispassionate speech, it acts as a powerful bonding mechanism. As Elbl-Eibesfeldt (1989) points out, such bonding typically reflects an in-group/out-group dynamic: collusive laughter between allies is likely to be at the expense of targets outside the group. If we assume complex structures of dominance and status to have characterised early human social life, laughter — like the antics of de Waal’s chimp juveniles in the Arnhem Zoo — is likely to have signalled outbreaks of collective insubordination to those in authority. As Pinker (1998: 551) writes:

No government has the might to control an entire population, so when events happen quickly and people all lose confidence in a regime’s authority at the same time, they can overthrow it. This may be the dynamic that brought laughter — that involuntary, disruptive, and contagious signal — into the service of humor. When scattered titters swell into a chorus of hilarity like a nuclear chain reaction, people are acknowledging that they have all noticed the same infirmity in an exalted target. A lone insulter would have risked the reprisals of the target, but a mob of them, unambiguously in cahoots in recognizing the target’s foibles, is safe.

Laughter, then, may testify to the importance of humour as a levelling device among early human hunter-gatherers (cf. Lee 1988), helping to sustain distinctively human levels of in-group trust and mutuality on which speech in turn depends.

Can this understanding of laughter be extended to explain also the emergence of speech? Might phonology and syntax have arisen as the reverse side — the in-group ‘playful’ redeployment — of ‘ritual’ behaviour evolved originally for purposes of aggressive coalitionary display? When choral chanting and other such vocal display is used simply to demarcate in-group/out-group boundaries, form becomes everything, meaning nothing (Staal 1986: 57). Let me quote Staal (1986: 57) on how Vedic literature becomes ‘meaningless’ when adapted for purposes of pure ritual:

Entire passages that originally were pregnant with meaning are reduced to long ‘o’s’. This is precisely what distinguishes mantras from the original verse: to be made into a mantra, and thus fit for ritual consumption, a verse has to be subject to formal transformations, operations that apply to form and not to meaning…

Ritual traditions have obvious social significance in that they identify groups and distinguish them from each other. They give people, in that hackneyed contemporary phrase, ‘a sense of identity’. That identity, however, is often due to distinctions that rest upon meaningless phonetic variations. Thus the Jaiminīya and Kaǔthuma Rānāyanīya schools differ from each other by such characteristics as vowel length, or because the former uses ‘a’ when the latter uses ‘o’. Up to the present time, the Vedic schools themselves are distinguished from each other by such variations of sound that can more easily be explained in grammatical than in religious terms.

If this is accepted, then in the evolutionary past, group-on-group ritual display may plausibly have set up selection pressures for vocal imitation, syllabic differentiation and control — all in the complete absence of meaning. Along such lines, one might visualise ‘war dances’ to the accompaniment of assertive choral chanting, the whole display being mounted whenever a group felt threatened by local opposition. On each occasion when danger passed, however, we need not suppose complete cessation of the performance. Instead, on the model of play fighting, we might envisage elements of the formerly ‘meaningless’ display becoming redeployed internally for more complex conceptual and communicative ends. We might even follow Pinker (1998: 551) in linking successful outcomes with outbreaks of laughter. Incipiently language-like properties of both vocal and whole-body play — discussed earlier — would now characterise in-group communication, with recently evolved mimetic skills yielding a system more complex and syntactical than anything known before.

Play and the Emergence of Language

Many Darwinian attempts to explain the evolutionary emergence of language have been gradualist. By contrast, Maynard Smith and Szathmáry (1995: 279-309) view the origins of speech — together with other aspects of symbolic culture — as a ‘major evolutionary transition’ occurring late in human evolution. Building on this idea, I have modelled this development as one culminating in revolutionary social change (Knight 1991, 1996, 1998, 1999; Knight et al. 1995). This would locate Pinker’s (1998: 551) ideas about irreverent humour within a broader context of revolutionary social upheaval. Let me now, in this new context, integrate this body of theory with the previous discussion of play.

In the scenario I favour (cf. Knight 1998, 1999), coalition members assert group identity through locally distinctive patterns of chanting and other such ritual display, coming under pressure to imitate and synchronise with ‘friendly’ signals (cf. Studdert-Kennedy, this volume). As in any choral ensemble, attention to internal cues is valued as an indication of commitment to the coalition, in-group status being conferred accordingly (cf. Power, this volume). Given enhanced choral diversification and frequent breaks or changes, maintenance of overall synchrony and coherence relies heavily on information conveyed internally through brief, low-energy signals. Discernible at close range, syllables differentiated by subtle vowel modulations and consonantal contrasts serve this function. Selection pressures in this context drive evolutionary differentiation of the upper vocal tract. Whereas the ‘lexical pressure’ model presupposes speech from the outset, this model makes no such assumptions. Citing known biological precedents and respecting Darwinian constraints, it may better explain the emergence of a high-speed, low-cost, digital encoding medium available for subsequent exaptation to serve speech functions.

Conclusion: The Emergence of Syntactical Speech

In all mammalian species, it is the young who invest most energy in play. As with human speech, there is a genetically determined ‘critical period’ for engaging in social play to maximum cognitive advantage. An animal deprived of play opportunities during infancy may later show a deficit in normal social skills (Biben 1998). In the human case, childhood play is not phased out but rather preserved in the elaboration of adult symbolic competence and performance (Huizinga 1970; Bruner et al. 1976: 534—704). By contrast, the playfulness of young animals is for the most part inhibited with the onset of sexual maturity. Sexual competition can provoke lethal conflict. As animals mature, their play correspondingly becomes closely involved in the determination of social rank. With increasing frequency, play fights become real fights — whereupon the play stops. Adulthood for most primates is challenging and risky, affording relatively few opportunities for that trust and abandon which is the hallmark of genuine play.

The distinctively human counterdominance strategies intrinsic to ‘sham menstruation/sex strike’ (Knight, Power and Watts 1995; Power and Aiello 1997; Power and Watts 1996, 1997) drive the emergence of symbolic culture by extending ‘play’ into the domain of adult relationships. Siblings and more distant relatives who might otherwise have been pitched into direct sexual rivalry are bonded in playful coalitionary opposition to the out-group. By retaining close bonds with kin-related females (cf. Power 1998, 1999, this volume), each coalition is enabled to extract increasing levels of mating effort from males. The outcome is ‘bride service’, an arrangement characteristic of hunter-gatherers, in which in-marrying males bring regular meat or other provisioning under supervision from their in-laws (Knight 1991, 1999). While this amounts to ‘economic exploitation’, Darwinian considerations clarify why minimal resistance is to be expected. In-marrying males are gaining access to the group’s fertile females; moreover, they are provisioning their own probable offspring. Combative coalitions formed to secure such outcomes, meeting little organised resistance, should be highly stable. They are familiar ethnographically as unilineal lineages and clans.

What is the significance of all this for language evolution? The key point is that ‘lexical syntax’ (Marler 1998) presupposes digital as opposed to analog distinctions between meanings. Like distinctions between the face values of banknotes, such contrasts depend entirely on collective agreement. Take the case of kinship terms — an obvious initial focus for any human language. In hunter-gatherer kinship terminologies, ‘sister’ is defined in opposition to the contrastive term ‘wife’. Primates could not sustain belief in such contrastive meanings, even if they had the cognitive competence. This is because their kin coalitions are neither categorically bounded nor stable. A close female relative from one standpoint will therefore be a less close relative — potentially a mate — from another. Instead of being categorically — in the eyes of a stable collectivity — ‘sister’ or ‘wife’, each female will be more or less either according to individual standpoint. Primate politics determine that other social meanings will be similarly graded and contested.

Within human systems of ‘fictive’ kinship, a woman is ‘our sister’ (or a man ‘our brother’) because the collectivity asserts it to be so. Children engaged in games of ‘let’s pretend’ may likewise assert, ‘this rag is mummy’ or ‘that stick is a horse’ (Leslie 1987). In stratified societies, specified persons on a similar basis may count as ‘the government’ while certain small pieces of paper count as ‘money’. Not necessarily dependent upon verbal language, such ‘institu-tional facts’ are expressions of collective intentionality (Searle 1998). To uphold them is a social, moral and — in a most fundamental sense — religious challenge (Durkheim 1965). To confuse ‘sister’ with ‘wife’, after all, would be more than mere semantic or cognitive error — it would be a violation (Levi-Strauss 1969). Likewise if you visited my home and confused our family tablecloth with the doormat. Transgression of such categorical boundaries amounts to sacrilege. Words would lose all meaning if such boundaries could not be enforced.

The main institutional fact — the condition of all others — is that the collectivity exists. To represent this fact is to assert group self-identity, defined in opposition to the out-group. Such boundary maintenance requires serious effort, presupposing costly signals, not mere tokenistic substitutes. I have argued elsewhere (Knight 1999) that as group-living ancestral humans came under corresponding pressure to perform their war dances or sing their mantras, they shared in representing ‘the sacred’ as an emblem of group-level solidarity and identity (cf. Durtheim 1965). In this chapter I have suggested that during intervening periods of relaxation, however, as the performers periodically dispersed, these same representational techniques became available for redeployment in a quite different — essentially playful — atmosphere. Intentions were now once again those of distinct individuals, partitioning their shared representational resources accordingly. Processes of trust-based abbreviation and conventionalisation in this context generated a growing repertoire of low-cost tokens which, while expressive of merely personal intentions, nonetheless retained the social authority and communicable status of the whole. ‘Words’ were in this way ‘authorised’ — endowed by the ritual collective with performative force (cf. Austin 1978; Bourdieu 1991).

Finally, we may return to the ‘insuperable’ problem posed by MacNeilage (1998). When, how and why did the modality switch to vocal speech occur?

MacNeilage’s basic argument, we may recall, is that if the vocal-auditory modality was adaptive during the later stages of human speech evolution, it must therefore have been equally adaptive from the outset. This argument would have force if it could be confirmed that the social contexts of language use remained invariant throughout the course of human evolution. But if changing social strategies are built into our models, there is no reason to suppose that a modality which is adaptive during one period must remain equally adaptive later. Where social contexts are ‘Machiavellian’, as is the case among primates (Byrne and Whiten 1988), constraints operate to obstruct the emergence of low-cost, conventional — in other words fakeable — signalling (Zahavi and Zahavi 1997). We have seen that in the primate case, the need to retain intrinsic signal credibility precludes playful cognitive expressivity in the vocal-auditory channel. Until this problem was solved, conceptual signalling had therefore to rely on a different modality. We may suppose that hominid use of the hands and body — whose manipulability had originally evolved in the service of noncommunicative functions — came increasingly to serve this novel purpose. Unfettered cognitive manipulability, however, was inconsistent with signal credibility (cf. Knight 1998). Mimesis (Donald 1991) may in this light have emerged in the human lineage as a compromise between these opposing pulls: hard-to-fake signals became manipulable, but only within limits. Costly, hard-to-fake and for that reason intrinsically convincing ‘song and dance’ remained central to communication wherever resistance to deception remained high.

As exogamous kin-coalitions became repeatedly successful and correspondingly stable, however (Knight 1991), the outcome was a radical intensification of in-group trust. Not only did this allow costs to be cut through adoption of conventional shorthands. A corollary was the establishment, through collective intentionality, of semantic meanings in the form of digitally contrastive collective representations. In arriving at shorthands for these, we would expect ‘conspiratorial whisperers’ (cf. Krebs and Dawkins 1978) to resort to the cheapest, most efficient available encoding medium. Considerations of speed and efficiency in this new context drove progressive exaptation of the phonological system, yielding syntax in the Chomskyan sense — an autonomous level of structure serving as a ‘switchboard’ (Newmeyer 1991) between the formerly disparate systems of vocal transmission and conceptual representation.

Acknowledgement

I would like to thank Catherine Arthur and Michael Studdert-Kennedy for their critical comments on an earlier version of this chapter.

References

Abler, W. 1989. On the particulate principle of self-diversifying systems. Journal of Social and Biological Structures 12: 1—13.

Arcadi, A. C. 1996. Phrase structure of wild chimpanzee pant hoots: patterns of production and interpopulation variability. American Journal of Primatology 39: 159—178.

Armstrong, D. F., W. C. Stokoe and S. E. Wilcox. 1995. Gesture and the Nature of Language. Cambridge : Cambridge University Press.

Austin, J. L. 1978. How to Do Things with Words. Oxford : Oxford University Press.

Baron-Cohen, S. 1995. Mindblindness: An essay on autism and theory of mind. Cambridge , MA : MIT Press.

Bates, E. 1976. Language and Context. New York : Academic.

Bates, E., L. Benigni, I. Bretherton, L. Camaionj and V. Volterra. 1979. The Emergence of Symbols: Cognition and communication in infancy. New York : Academic.

Bateson, G. 1973. A theory of play and fantasy. American Psychiatric Association Psychiatric Research Reports 2 (1955). Reprinted in G. Bateson, Steps to an Ecology of Mind. London : Paladin, pp. 150—166.

Bekoff, M. 1977. Social communication in canids: evidence for the evolution of a stereo typed mammalian display. Science 197: 1097—1099.

Bellugi, U. and E. S. Klima. 1975. Aspects of sign language and its structure. In J. F. Kavanagh and J. E. Cutting (eds), The Role of Speech in Language. Cambridge , MA : MIT Press, pp. 17 1—203.

Bellugi, U. and E. S. Klima. 1982. From gesture to sign: deixis in a visual-gestural language. In R. J. Jarvella and W. Klein (eds), Speech, Place, and Action: Studies of language in context. New York : Wiley.

Biben, M. 1998. Squirrel monkey play fighting: making the case for a cognitive training function for play. In M. Bekoff and J. A. Byers (eds), Animal Play. Cambridge : Cambridge University Press, pp. 161—182.

Bickerton, D. 1990. Language and Species. Chicago and London : University of Chicago Press.

Bickerton, D. 1996. Language and Human Behaviour. London : University College London Press.

Bickerton, D. 1998. Catastrophic evolution: the case for a single step from protolanguage to full human language. In J. R. Hurford, M. Studdert-Kennedy and C. Knight (eds), Approaches to the Evolution of Language: Social and cognitive bases. Cambridge : Cambridge University Press, pp. 34 1—358.

Bjorklund, D. F. and B. Green. 1992. The adaptive nature of cognitive immaturity. American Psychologist 47: 46—54.

Bourdieu, P. 1991. Language and Symbolic Power. Cambridge : Polity.

Bruner, J. S. and V. Sherwood. 1976. Peekaboo and the learning of rule structures. In J. S. Bruner, A. Jolly and K. Sylva (eds), Play: Its role in development and evolution. Harmondsworth: Penguin, pp. 277—285.

Bruner, J. S., A. Jolly and K. Sylva (eds). 1976. Play: Its role in development and evolution. Harmondsworth: Penguin.

Burling, R. 1993. Primate calls, human language, and nonverbal communication. Current Anthropology 34: 25—53.

Byrne, R. and A. Whiten (eds). 1988. Machiavellian Intelligence: Social expertise and the evolution of intellect in monkeys, apes, and humans. Oxford : Clarendon.

Cheney, D. L. and R. M. Seyfarth. 1990. How Monkeys See the World. Chicago : University of Chicago Press.

Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht : Foris.

de Waal, F. 1996. Good Natured: The origin of right and wrong in humans and other animals. Cambridge , MA : Harvard University Press.

Deacon, T. 1997. The Symbolic Species: The co-evolution of language and the human brain. London : Penguin.

Dessalles, J.-L. 1998. Altruism, status and the origin of relevance. In J. R. Hurford, M. Studdert-Kennedy and C. Knight (eds), Approaches to the Evolution of Language:

Social and cognitive bases. Cambridge : Cambridge University Press, pp. 130—147.

Donald, M. 1991. Origins of the Modern Mind: Three stages in the evolution of culture and cognition. Cambridge , MA : Harvard University Press.

Dunbar, R. I. M. 1996. Grooming, Gossip and the Evolution of Language. London and Boston : Faber and Faber.

Dunn, J. and N. Dale. 1984. I a daddy: a 2-year-old’s collaboration in joint pretence with sibling and with mother. In I. Bretherton (ed), Symbolic Play: The development of social understanding. Orlando and London : Academic, pp. 131—157.

Durkheim, E. 1965/19 12. The Elementary Forms of the Religious Life. New York : Free Press.

Eibl-Eibesfeldt, I. 1989. Human Ethology. Hawthorne , NY : Aldine de Gruyter.

French, L.A. , J. Lucariello, S. Seidman and K. Nelson. 1985. The influence of discourse content and context on preschoolers’ use of language. In L. Galda and A. D. Pellegrini (eds), Play, Language and Stories: The development of children’s literate behavior. Norwood , NJ : Ablex.

Givón, T. 1985. Iconicity, isomorphism and non-arbitrary coding in syntax. In J. Haiman (ed), Iconicity in Syntax. Amsterdam and Philadelphia : Benjamins, pp. 187—219.

Goodall, J. 1986. The Chimpanzees of Gombe: Patterns of behavior. Cambridge , MA and London : Belknap.

Grice, H. 1969. Utterer’s meanings and intentions. Philosophical Review 78: 147—177.

Haiman, J. 1985. Introduction. In J. Haiman (ed), Iconicity in Syntax. Amsterdam and Philadelphia : Benjamins.

Halle , M. 1975. Confessio grammatici. Language 51: 525—535.

Hamad, S. 1987. Categorical Perception: The groundwork of cognition. Cambridge : Cambridge University Press.

Hewes, G. W. 1973. Primate communication and the gestural origin of language. Current Anthropology 14: 5—24.

Hockett, C. F. 1960. The origin of speech. Scientific American 203: 89—96.

Huizinga, J. 1970/1949. Homo Ludens: A study of the play element in culture. London : Granada .

Kegl, J., A. Senghas and M. Coppola. 1998. Creation through contact: Sign language emergence and sign language change in Nicaragua . In M. DeGraff (ed), Language Creation and Change: Creolization, Diachrony and Development. Cambridge , MA : MIT Press.

Kendon, A. 1991. Some considerations for a theory of language origins. Man (N.S.) 26:199—22 1.

Knight, C. 1991. Blood Relations: Menstruation and the origins of culture. New Haven , CT , and London : Yale University Press.

Knight, C. 1996. Darwinism and collective representations. In J. Steele and S. Shennan (eds), The Archaeology of Human Ancestry: Power sex and tradition. London and New York : Routledge, pp. 331—346.

Knight, C. 1998. Ritual/speech coevolution: a solution to the problem of deception. In J. R. Hurford, M. Studdert-Kennedy and C. Knight (eds), Approaches to the Evolution of Language: Social and cognitive bases. Cambridge : Cambridge University Press, pp. 68—91.

Knight, C. 1999. Sex and language as pretend-play. In R. I. M. Dunbar, C. Knight and C. Power (eds), The Evolution of Culture. Edinburgh : Edinburgh University Press, pp. 228—229.

Knight, C., C. Power and I. Watts . 1995. The human symbolic revolution: a Darwinian account. Cambridge Archaeological Journal 5: 75—114.

Koestler, A. 1964. The Act of Creation. New York : Dell.

Kohler, W. 1927. The Mentality of Apes (trans. Ella Winter). London : Routledge and Kegan Paul.

Krebs, J. R. and R. Dawkins. 1978. Animal signals: information or manipulation? In J. R. Krebs and N. B. Davies (eds), Behavioural Ecology: An evolutionary approach. Oxford : Blackwell, pp. 282—309.

Lee, R. B. 1988. Reflections on primitive communism. In T. Ingold, D. Riches and J. Woodburn (eds), Hunters and Gatherers, Vol. 1: History, evolution and social change. Chicago : Aldine, pp. 252—268.

Leslie, A. 1987. Pretence and representation: the origins of ‘theory of mind’. Psychological Review 94: 412—426.

Levi-Strauss, C. 1969. The Elementary Structures of Kinship. London : Eyre and Spot tiswoode.

Lieberman, p. 1984. The Biology and Evolution of Language. Cambridge , MA : Harvard University Press.

Lieberman, P. 1985. On the evolution of human syntactic ability: Its preadaptive bases— motor control and speech. Journal of Human Evolution 14: 657—668.

MacNeilage, p. 1998. Evolution of the mechanism of language output: comparative neurobiology of vocal and manual communication. In J. R. Hurford, M. Studdert Kennedy and C. Knight (eds), Approaches to the Evolution of Language: Social and cognitive bases. Cambridge : Cambridge University Press, pp. 222—241.

Marler, p. 1975. On the origin of speech from animal sounds. In J. F. Kavanagh and J. Cutting (eds), The Role of Speech in Language. Cambridge , MA : MIT Press, pp. 11—37.

Marler, p. 1998. Animal communication and human language. In G. Jablonski and L. C. Aiello (eds), The Origin and Diversification of Language. Wattis Symposium Series in Anthropology. Memoirs of the California Academy of Sciences, No. 24. San Francisco : California Academy of Sciences, pp. 1—19.

Marler, P. and R. Pickett. 1984. Species-universal microstructure in the learned song of the swamp sparrow (Melospiza geogiana). Animal Behavior 32: 673—689.

Marler, P. and R. Tenaza. 1977. Signaling behavior of wild apes, with special reference to vocalization. In T. Sebeok (ed), How Animals Communicate. Bloomington , IN : Indiana University Press, pp. 965—1033.

Maynard Smith, J. and E. Szathmáry. 1995. The Major Transitions in Evolution. Oxford :Freeman.

McCune-Nicolich, L. and C. Bruskin. 1982. Combinatorial competency in play and language. In K. Rubin and D. Pebler (eds), The Play of Children. Current theory and research. New York : Karger, pp. 30—40.

McNeill, D. 1992. Hand and Mind: What gestures reveal about thought. Chicago and London : University of Chicago Press.

Menzel, E. W. 1971. Communication about the environment in a group of young chimpanzees. Folia primatologica 15: 220—232.

Mitani, J. C. and K. L. Brandt. 1994. Social factors influence the acoustic variability in the long-distance calls of male chimpanzees. Ethology 96: 233—252.

Mitani, J. C., T. Hasegawa, J. Gros-Louis, P. Marler and R. Byrne. 1992. Dialects in wild chimpanzees? American Journal of Primatology 27: 233—244.

Newmeyer, F. J. 1991. Functional explanation in linguistics and the origins of language. Language and Communication 11: 3—28.

Payne, K., P. Tyack and R. Payne. 1983. Progressive changes in the songs of humpback whales (Megaptera novaeangliae): a detailed analysis of two seasons in Hawaii . In R. Payne (ed), Communication and behavior of whales. AAAS Selected Symposia Series. Boulder , CO : Westview Press, pp. 9—57.

Piaget, J. 1962. Play, Dreams, and Imitation in Childhood. New York : Norton.

Pinker, S. 1998. How the Mind Works. London : Penguin.

Plooij, F. X. 1978. Some basic traits of language in wild chimpanzees. In A. Lock(ed), Action, Gesture and Symbol: The emergence of language. Lock. London and New York : Academic.

Power, C. 1998. Old wives’ tales: the gossip hypothesis and the reliability of cheap signals. In J. R. Hurford, M. Studdert-Kennedy and C. Knight (eds), Approaches to the Evolution of Language: Social and cognitive bases. Cambridge : Cambridge University Press, pp. 111—129.

Power, C. 1999. Beauty magic: the origins of art. In R. I. M. Dunbar, C. Knight and C. Power (eds), The Evolution of Culture. Edinburgh : Edinburgh University Press, pp. 92—112.

Power, C. and L. C. Aiello. 1997. Female proto-symbolic strategies. In L. D. Hager (ed), Women in Human Evolution. New York and London : Routledge, pp. 153—171.

Power, C. and I. Watts . 1996. Female strategies and collective behaviour: the archaeology of earliest Homo sapiens sapiens. In J. Steele and S. Shennan (eds), The Archaeology of Human Ancestry. London and New York : Routledge, pp. 306—330.

Power, C. and I. Watts . 1997. The woman with the zebra’s penis: gender, mutability and performance. Journal of the Royal Anthropological Institute (N.S.) 3: 537—560.

Richman, B. 1976. Some vocal distinctive features used by gelada monkeys. Journal of the Acoustic Society of America 60: 7 18—724.

Richman, B. 1978. The synchronization of voices by gelada monkeys. Primates 19: 569—581.

Richman, B. 1987. Rhythm and melody in gelada vocal exchanges. Primates 28: 199—223.

Searle, J. R. 1998. The Construction of Social Reality. London : Penguin.

Shultz, T. R. 1979. Play as arousal modulation. In B. Sutton-Smith (ed), Play and Learning. New York : Gardner , pp. 7—22.

Spencer, P. E. and K. P. Meadow-Orlans. 1996. Play, language, and maternal responsiveness: a longitudinal study of deaf and hearing infants. Child Development 67: 3 176—3 191.

Sperber, D. and D. Wilson. 1986. Relevance: Communication and cognition. Oxford : Blackwell.

Sroufe, L. A. and J. P. Wunsch. 1972. The development of laughter in the first year of life. Child Development 43: 1326—1344.

Staal, F. 1986. The sound of religion. Numen 33: 3 3—64.

Studdert-Kennedy, M. 1998. The particulate origins of language generativity. In J. R. Hurford, M. Studdert-Kennedy and C. Knight (eds), Approaches to the Evolution of Language: Social and cognitive bases. Cambridge : Cambridge University Press, pp. 202—221.

Ujhelyi, M. 1998. Long-call structure in apes as a possible precursor for language. In J. R. Hurford, M. Studdert-Kennedy and C. Knight (eds), Approaches to the Evolution of Language. Cambridge : Cambridge University Press, pp. 177—189.

Ulbaek, I. 1998. The origin of language and cognition. In J. R. Hurford, M. Studdert Kennedy and C. Knight (eds), Approaches to the Evolution of Language. Cambridge : Cambridge University Press, pp. 30—43.

Visalberghi, E. and D. M. Fragaszy. 1990. Do monkeys ape? In S. T. Parker and K. Gibson (eds), ‘Language’ and Intelligence in Monkeys and Apes: Comparative developmental perspectives. Cambridge : Cambridge University Press, pp. 247—273.

Vygotsky, L. 1978. Mind in Society. Cambridge , MA : Harvard University Press.

Vygotsky, L. 1986. Thought and Language. Cambridge , MA : MIT Press.

Worden, R. 1998. The evolution of language from social intelligence. In J. R. Hurford, M. Studdert-Kennedy and C. Knight (eds), Approaches to the Evolution of Language: Social and cognitive bases. Cambridge : Cambridge University Press, pp. 148—166.

Zahavi, A. and A. Zahavi. 1997. The Handicap Principle: A missing piece in Darwin ’s puzzle. New York and Oxford : Oxford University Press.