IO 3ÂRRY C. SMITH O'Shaughnessy, B. (zooo). Consciousness and the Wo d. O>.fo¡d: Clarendon press. 3:'ine. W U. (t96cl. Worcl and Oú7crr. Canrbriclge. Ma*.: MIT pre,s. rcneler, l(. t(., Korvr:maki. J.. and Rosenúal, R. (1972). .Minimal Cues in ¡he VocalC.oÌ'un jcârion of Afect: J udging !morio", ft "; ó;,;_M;:;.ìioär,. ¡r_,oJ lsycholinguisti Rci?arch. t: 26965. Smiú, B. C. (2oo6a). .Whar I l{norv When I Know a Language,, in E. Lepore and :l c Sûnrl (eds.), Trre oxford Handbook of phírosophy i nü*in'.'i*Íi,¿, o"¡*aUniveniry press. -(2ooób)..'Davidson, Interyrctarion and Fr¡st_pe¡son Conscraints on Meaning,.Intemation.al Journal of philosophital Studies, r4(3): 38J_4oó. -(zoo6c). 'publiciry, Exrernalisn ¿nd Inne¡ S,rr.rl, in T. Manan \ed..), WhatDetemiines Contell¡? me Ifltern¿ ish/Extemalisn Dispute. a"_U.rå*.,'"ììrr., a"-_bridge Scholars press. Trout, J. D. (2oor). .The Bioloøic¡t Basis for Speech: \X/hât to Infer from TaÌking rothe Aninr¿ls. Psyrltotogical Reiiew. rod: J2J-4r. -(zooÐ 'Biorogical Speciarizarion for speech: whar can rhe Animals Tell us?,Culfttlt Ditedions ín psychological Science, rz(5): r55_9. von Kriegstein, K., Èger, 8., and Kleinschmidt, ,A. (zoo3). ,Modulacion of NeuralResponses ro Speech by Direcrion Atention to Voii .. V*frA ò""r"",,. arrrftr,Btain Research, r7: 48_JJ. -Sterzer, P., and Giraud,,{. (2ooj). ,lnre¡action of Face and Voice Ateas l)uring ,",Sperker Recogn iti on . Journal oJ Cognttiue Neurcsriet¡t-,7i3;, 3u7_70' "" wâLren, R. (r97o). .perceptual Resro¡¿hon of Missing Sp*.1ri.""J., S.jence, 167:392-3. 'Wi*gensrein, L. (r9\). Re,,ai<s on the Foundations of Mathehlatics, rcy. ed'.Cambridge, Mass.: MIT P¡ess. 1.0 The Motor Theory of SPeech Perception' CHRISTOPHER MOLE There is a long-standing project in psychology the goal ofwhich is ro explain our abiliry to perceive speech. The project is motivated by evidence lhat seems to indicale lhal the cognitive processing to which speech sounds are subjected is somehow different from rhe nor-rnal processing employed in hearing The Motor Theory ofspeech perception was proposed in the r9óos as an atlempl to e>i:plain this specialness. It is currently enjoying a renewal ofinlerest' pârtly on aciount of our developing understanding of mirror-neurons (lhe existence of rvhich is suggestive but not conclusive) and partly on account of some ¡ecent work using Transcranial Magnetic Stimulation (Fadiga et al zooz) ' This esáy has two pats. The fìrst is concerned with the Motor Theory's ""pl"nandom and shows that it is Élher hard to give a precise account of what the Motor Theory is a theory of The second part of the essay identifìes problems with the explanans: The¡e are difüculties in finding a Plausible account of what the conlent of the Moto¡ Theory is supposed to be The agenda of both parts is rather negative, and problerns will be uncovered râther tñan solved. In the concluding section, I shall suggest where one might look if one wanls to solve the Motor Theory's problen1s, but it is unclear whether the Motor Theory's problems orglrf to be solved, or whether the whole theory should be abandoned. I. Psychologists were first persuaded that speech perception is unlike the pe..eptiJn ofoth"r sounds by the failure of atlemPts to build reading machines .Thisrvo¡krv¡sdonewhietheau¡horheldthe WilianlAlexanderFleetFeowship,Thanksâre due to Msjuliå Fìeet. who sadlv died lvhiie this volume was ni prePârâlion' 2I2 CHRISTOPHER MOLE for the blind. Nowadays our computess do a goodjob ofrendering a written texl into speech, but ir was not always so easy. _A.fter rhe Second World War there were lots of recently blinded people, for whom a machine rhat could read aloud would have been a very good thing. There was also ra¡her little computing power available. The task of building a n.rachine that would tu¡n texL ilrto speech seemed to be a practically impossible one, but the tâsk of building a machine that would make soze sort ofdistinc¡ ro¡se for each of rhe lerte¡s in a text seemed straightfoÌward enough. The more ambitious task of building a machine that would make a disdnct noise for each of the separare speech sounds in the text (that is, each of the phoøemes) also looked like a real possibiliry. Such a rask may have been computationally tractable, but as a substitute for reading ir was complerely hopeless. The thing thât made the project hopeless was that the listener's ears just couldn't keep up. If one,s reading machine was making its sounds a! a pace that was anything like the pace at which ou¡ mouths make souncfs when we speak, then it was making sounds at a rate that was fa¡ too fast fo¡ the listener to ¡esolve. If rhe sounds were given slowly enough for the listener to resolve rhem, then they came far too slowly to effectively comr¡unicate a text. Whichever sounds are allocated to individual letters or phonemes, the resuking auditory presentation of wo¡ds takes much longer to comprehend and puts a much greater load on working memory than w¡itten text or speech. Training cloes litrle ro help. ,{s Alvin Liberman, one of the first and most prominent resea¡che¡s on speech, puts it, ,Only the sounds of speech are eficient vehicles fo¡ phonetic structures; no other sounds, no matcer how a¡tfully contrived, will work beter than about one tenth as well, (Liberrnan r99o). The blind war vetemns stood no chânce at all oflearning to recognize the rapidly presented sounds ofthe 'reading machines,, bur they had, like ¡he resr ofus, learned in infancy how to recognize the phonemes that make up speech, and these phonemes can be resolved at a very fast rare indeed. Why, then, could they not lea¡n to resolve any other sounds ar anything approaching the same pace? It seems that somehow speech must be speciaì. perhaps its specialness has something to do with its being learned so earþ on. perhaps it is special because it is massively mo¡e familiar thân other sounds, and massively more pracliced, but perhaps it is special in orher ways, too. Language so of¡en is.2 '.Not eve¡yorae in rhe psychoÌogy of speech percepdoD hâs sigDed on to thc ,speech is spe_ cial' vierv, but dissenr ß ce¡!âinly mre. The dissendne view can be found in rhe discussion of Milleis Auditory-Perceptual Pointer Model in IC¡tr (1989). Or Fowterì ,Di¡ecr Re¡lisr, lheory (FowÌer re8ó). TH! MOTOR THEORY OF SPEECH PIRCEPTION 213 To be to)d that speech is sPecial is to be rold soøell¡Íg of interesr but ir i\ not ,.J;î.';; ;;;'i'rns'oss''r*'1v p"*:ri:ï:i:'::-:: iÏ;,ï;T i':::'"1' scienti[rc rheory thât puq)ons tt :ìtrå"*','n.' "lrk on reading machines h'rd ,h"m brs.d ar H¿skins L¡bor¡rone: . f ,-- .L^r .,,"," rhousht to üuminare been c¿ried our) uncovered -iTiïï;"ili,"r. ,,"".*ã rhe artention of the ura¡ in.which sp:..,l-ìì,'ilT]" were: rhe caregorical PercePnon c'fspeech' osvchologists more widerv t:"': ;';";;;,r;" år,,p...'., and the McGurk ,t'r" 1"ck of it'tuati"nce' the 'dupl€ effect I'11 explain eàch ofthese rn ttrm' Categoical Perceptiox . --r- ^^....tinn consider the ;;"ï;;","""i the catesoncar naú;::ï:.'"ï\.,i,ïiÏlni' åi''."'n"' discri min¿tion *"::::,:': l:i:i..i "î,' oi"o"n rhe smallest discernibl e gives. for each magnitude or: ï;;;;il;io¡nrl Li,te',"r, wirh pairs of ãiff.r"n.. from ¡bat T"Tc"T. ,îirìoï'r"rrro rhe sâme or dit'erent we sounds. ¿nd a(kinC whethel-:lre.r';;";;;il."-;" .rny dtmension. whrt the cân work out, for anv "*"t'i ;;;;;itude is. And with th,is data, we smallest discernibìe diflerel:: ]',.;';; ;;."'t,.n.r we have tested. bow nruch can plot a grapb showing'.tot "ì , . -- .L- !a,^ .^,rnds to sound ciÉlerent. or .teviarion is needed lrorn tt" o:t:i ï;; .n-*" '" ¡o¡c is needed for rwo *t t^t' plot a +1{ showin.,ø^l on for various other drmensrons rùong which .ounds to tound different and so t d' ru-vr "'.i. ¡i..ri-ination lunction :::Ïi."r*.;"' va ry rh ese tiîtl: jt:iJ"',,:':i'i, J;'i".,. ^, - "for sitnP]e rcoustic ProPe re5,iir;irr;; ;;." t;';"' and. iouder "ound'' test a subject s abilirY ro drsce' " ,ì|.r. "ìrì ""*r" from uhjch deviation( âre * e don'r come across ¡ny ""t.. 11" ät"i"'.,"' i'om 'lightly 10ud"r' or slightìy muclr more easrlv derected 'l*;t:Ï;;r; ;n. p,,.t fron. another changes quierel sounds.Th: *tlÏ-t-",ï;';,.i. f¡. à-" is rme.for vol'¡ne and smootlùy as a funccion :j Ïl l*äjt.. of variation is a dimension th¿t is rimbre. t his isr'¡ true tl t"" ;ï";;. ìi;ì. sound we are testing is a speech relevant to diferences'i t:,*:':"';-;;if the magnitude we are varying is sound-such as che svllable ::tí"-*;ï't*"""'lr'lr" n"*¿-tuch as the ône that can make a difle¡ence to w'xcll iÏ:::::."^" srrch that deviations äÏä.i'*-then there ï " P'''1:?'1:1TÏ'J1;::1,ï", r,om the fiom that magniude are much ea$er rr-, ur.* "Jrr.,tt on *hl"h *" ptot previous and c'*'l: "l;lll::,ü'åi:";;,i.,"",¡rv. rhere wiìr be ,h. ,r.r"llest discemibìe dlflelences *'l '-' .^" "..r^""t, hecome exrrenrely Tl'l' "ü'äi'i" "-, :: i: l"ï; î:ï:,#; ìïolJlu' "ii'l', exa mpre i n sensirive ro srnalì vanatrons' some detail 2I4 CHRISTOPHER MOLE The syllables /pa/ and /ba/ borh begrn with bitabial plosive consonants. That is to say that the inirial sound in borh is produced by closing the 1ips, allowing a little air ro build up behind them, and then releasing rhar air. They differ in the moment during this performance at which the vocal chords begin to vibrate. For a /ba/ the vocal chords vibrate almost as soon as the air is released. Ä,/pal is produced when the vocal chords vib¡ate a little later. The time between the release of the ai¡ held ¿r ¡he lips and the beginning of the vocal chord vib¡ation is called voice-onset cime (VOT). VOT can be varied continuously ând a spectrum ofphonemes can be artificially produced, each of which differs from the preceding phoneme only by, say, an inc¡ease in VOT of ro ms. If we listen to each of rhe sounds in this continuously varying specrrum they are not heard as varyíng conrinuously. We don,t hear a sequence oflbals gently sloping offinto penumbral cases which gradually become recognizable as /pals. Instead, the first half of ¡he spect¡um is all heard as more or less the sante /ba/ sound, and the second half of the spectrum is ¿ll heard as more o¡ less the same /pa/ sound, and there is a nar¡ow band in the nriddle of the spectrum, when VOT is around zó.8 ms, where subjects differ âs to how they hear the sound. A¡ this transition point subjects can recognize very slight variations in VOT, since these slight variations are enough to move the sound from one category to the next. From the point of view of the subject, rhe clifference betv/een a syllable wirh VOT of z6_8 ms and a syllable with VOT of36.8 ms sounds like a realty big difference, whereas a diffe¡ence ofthe same objective magnitude, between, say, VOT of 3o ms and of 4o ms, sounds like a very slight difi'erence. In the first pair, one of the syllables sounds lìke a /l>a/ and ¡he orhe¡ sounds like a /pa/.ln the second pair, rhe two syllables are more or less indistinguishable ,/ba,/s.3 This distincrive discrimination function is found for many of the dimensions of variacion chat, like VOT, make the difference between two consonants, although it isn,c found fo¡ vowels. This finding by itself is rarher unrenârkable. It isn,r surprising to find a dilÌ-erence berween rhe discrimination funcrions for complex sorts ofvariation, such as variation in VOT, and the discrimination funcrions for simple sorts of variation, such as variation in pitch. And it isn'r surprising that we exploir 3 This is ã. slighr sirnplifìcahon of the experinental procedure used, trut no r, I thDt, a significãnt one. The usuâl procedu¡e is not pai¡-wlse companson but comparisoD of ripÌes. Subjects are presenre<l wrth a pàir ofneìshboriDg sounds-,4 ãnd B-¿nd then presented a thirdìoùnd X_whiclì ¡jusr Èhe sème as either.4 or B. Their task is to discem whrclì ofthe rwo initiat sounds X repe¡ts. Wirh a VOT continuun wlìere neighbouring ìtcnx diû-er by ro ms, subjects perform close to Àance on the .lBX task fòr âlncr âll ofthe specrrum excepr fo¡ rhe poinr (when VOT is aroLrnd 2ó.8 nrs) âÈ which A is heard xs ^ /ba/ aÍd B as t / p^/ . THE MOTOR THEORY OT S?EECH ?ERCEPTION 215 the categorical PercePdon of such variations in the boundaries that we use ,Ï't"ä"i" ,.*l",tcaïy relevant differences in speech Somethin^g similar lo .l - ."¡"oorìr¡l oerceþtion of VOI holds for the perception of colors: we '' than ¿ ¡ed and â yellow' even though theperceive rwo reds as nlole srmllâr irr¡"",rv" aif"r.lt"e belween the wavelengths may be the same lf you were designing a cornÌnunicâtlon system with tolo'"d flags' you'd assign different ;e;il;: to r"d "nd yellow, iûsteâd of making the differencebetween rwo reds a semantically signifìcant one. The caregorià perception ofspeech sounds misht be used in the speech codt in 'orrrlethittg like this way' without rhe i"":il #;;;"-i" iooo¿t'it' coincide withìhe boundaries of categorical ;;;;;;;;i"g anv connection berween caregorical perception ând the sDecialness of speech Perceptlon'*'¿'J;:';;Ë;;f i"" i"*nt be a reature or rhe Perception or complex ""riJ;;r-" i."to'"'h"t the speech code makes use of' but that is not a ;"ä;",";.-;; 'Peech PerceP;ion as such rwo rhings suggest this Fisst' it is found that lhere ls categorlcal Perception ofsome non-speecn ::* ;r;.;, for example, musicians show categorical perception tor serrrlone i.""dìJ.r. o. *hen no'm¡l people show u nlearned . categorical perceprion effects for certain .buzz, noise, anirelative riming continua' (Harnad 1987: 9)- ;.ï;;'.J*,oï i*p""iuttv categorical perception is lound in some cteatures that lack language i""j i"tt t'ttytttittj that might be thought of as a oroto-language) 1 he categoncal pt"tp'ion of spe"ch sounds.was found in .fr".hu;;'i;iäi ,na vîu" i'çzs) i" Japaneie quail,,bv,Kluender er a/ irtt^l-, * ú,n" Mongolian gerbilbv.sinnott a* Y:t-ti:'.!1:Ï] This is nor lo say thâr câtegoricl PercePtion doesn t ror¡núør¿ to-. 'the ";t;ì^"* of rpeecbonly th'rt the categorical natnre of speech . percepllon ;ffi; tìäJ*""* ì"l tr'"'""*"'ins'the wav in which speech is special' There is, in fact, good evidence that ãtegorical perception is related to rhe ;;;:t-;G":h coming fiom the fact'that the categorical percepdon of speech is aflected by the role 't pi"y' i" langurge -We f<now ¡tra¡ 1 is ¿flected in this way because we tlto* Jh"t th" patiern of categorical perception rhat i""ni. til* ai "eech sounds is afecred by the patlern ofphonemic contrasts that can make for a difference ir.r -""*.rg in tleir nati.'e language English ,p""t"rr, i., *ft-t the diference btc-t"i /l/ and /t/ crlnbe.t semantically relevant diÍlerence, Percelve a contiruous change from one to the olher as ;.";C;";i;""gË rl'",. discriminarion function has the trough that is chafacleristicofcategoricalperception.InJapanese,thediffe¡ence'between/l/ ;ä;;;;", " dift'ã'"'tt" ih"t ".,et disti"gui'h"' two phonemes (That is to sav that the difi-erence between whethe r ^i /l/ o' an /'/ was said neve¡ makes ;:1 i#J;.;'*r""r' -t'¿t the speaker unered' unlike in English' when 2I8 CHRISTOPHER MOLE the existence of a mapping {ìom sirnple feâtures of the stimuli to perceived categodes is not the no¡m, the absence ofsuch a mapping cannot be evidence ofspecialness. It does no¡ illuminâte the specia.lness ofspeech. It is tenpring to think that one could c¡ea¡e the appea¡ance ofân invaflance problem for tlny cãtegonz¿tton task if one starced with a suficiently low_level description of the input. One can recognize a large number of faces viewed at various angles and in various lights, and one can recognize then-r benearh a wide range of hats, spectacles, false noses, and so on. The invariances which one exploirs in face recognition are at such a high level of description that if one were fying to work out hou/ it was done given a moment-bv_moment mathematical descrþrion of the rerinal array. it might well appear i'mposible. There is sureþ no simple pattem of retinal stimulation ¡hai alwavs and onlv occu¡s when I see a face as being my b¡orhe¡'s. My abiliry ro ,.ágrrir" hiá is not a nrafier of my being triggered by some sirnple property of the array he projects to my retina. A well-rrained boy scout c"o re.ogniz" gr"rroy knots, reef knots, ¿nd sheet bends by touch alone, but the feaìures that he uses when making these dìscriminations would be extremely ha¡d to recover fiom a moment-by-moment presentation of the pressure that each kno¡ exerts on his fingertips. There is certainly no profile of finger_tip pressures that is always or only associated with granny knots, and so ¡he same sã¡¡ ofinvanance shown by the âcoustic signal for speech is shown by rhe haptic signal for kno¡s. Bur ir would be absurd to d¡aw any conclusions abour the specialness ofboy scout knor-perceþtion on the basis of this invariance. It is siLply rhar lhe boy scoul does no! categorize on the basis of simple fearures rhat can be discerned in the moment-by-moment descnprion of finge¡rip pressures. The speech spect¡ographs for which we find invariance in the speech signal are moment-by-moment descriptions of the low_level properties presented to rhe ear. It is no surprise thar rhey fail to show any featu¡es thac correspond with the phonemes we hear in speech, and it is no indication ofspecialness. Duplex Perteption Duplex perception is a strange phenomenon and it occu¡s in a st¡ange context, making it ¡ather ha¡d to interyret, bu¡ in rhei¡ article ,A Specialìzation for Speech Perception', Liberman and Mattingly (r989) resr their ù¡hole case fo¡ the existence ofcognirive resources thar are devoted solely to speech processmg on the phenomenon of duplex perception. Duplex perceprion occu., wh"r, headphones are used to play a difi-erent sound to each "".. Mor. specifically, it occu¡s when the sound given ro rhe first ear is a speech sound: a syllable Iike /da/ or /ga/, but a speech sound that has been docrored so thar the initial burst ofacoustic energy is absent. The ¡esult of ¡his doctoring is that the sound, THE MOTOR THEORY OF SPEECH PERCE?TION 2I9 if heard in isolation, is ambiguous between /dal and / ga/ ' The sound whicll is presented to the other ear is just lhât bu$t of acouslic energy needed to disan.rbiguate lhe doclored sound-lhe burst of rising frequency sound lhat' if addeJ to the fìsst sound, would make it sound like a /ga/' ot the bursr of falling frequency sound rhat would make it sound like a /da'l This second ,oonã, if h""rd in isolatior.r, sounds like a little chirp lt does nol sound like speech. Here is Libennan and Mattingly's (1989) account of what it's like to hear chis combinarion. (I've replaced theirjargon with r.nine): Listeûess hear two sounds, one at each ea¡ A¡ the eâr receiving the lsecond soundl' they hear a non-speech chirp, just as rhey do when the þecond sound] is presenred in irolatiorr. Ac the e". .eceiving the [fìnt sound] they hear /dal or /gal But' surprisingly' these laller percepts are not tmbigLlous, as lhey were when the [fìrsr sound] rs presenred rn isolationi rather, they are unambiguously delennined to be /dal or /gal by the [fact about whether the second sound is a chirp of rising Íiequency or lilling frecluencyl' just as when the undivided syllable is Presetaed in lhe nomlâi way (Liberman and Matlingly 1989:49o) Perhaps this resull is, as Liberman and Maltingly say, a surprising one' but is it e.,iáence of specialness? One would not exPecl this mingling of lbe sounds presented at either ear if simple non-speech sounds we¡e presenled' but' as we emphasized above, simple sounds are not lhe relevant control group To see if the duplex effect shows speech to be sPecial, we should compare speech sounds to ,tãlt-sp"".h sountls of compatable complexity When Fowler and Rosenblum (rqqo) did this, colr.rparing rhe duplex perception ofsyllables wirh the duplex perception of wooden and metal door slams, the speech sounds behave in rrrore or less the same way as lhe non-sPeech sounds Duplex percePtion seems nol to indicate sPecialness The MtCurk Efect The slory so far is this We are trying lo unclerstand how it is that speech perceplion diffess ftorn normal perceplion in such a way rhat speech sounds can te resolved much faster than other sounds We have looked at lhree Phenomena tha¡ a¡e said to iÌluminate this specialness. Two of these phenomena (the lack of invariance and duplex perception) we found to tell us nolhing âbour lhe specialness of speech per se. They did nolhing more than Point tow¡r<ls some ways in which the perception ofcomPlex, composite sounds can be expected to dift-er f¡om the perception of simple sounds The othe¡ phenomenon we have looked at is the categorical perception of sPeech We found there to be good evidence that this phenomenon is relâted to the sPecialness that we are trying lo understand, but we also saw some good reasons to doubl 22O CHRISTOPHER MOLE that this relationship is an especially intin.nte one. We curn now to the fou¡¡h of the phenomena that has been thought to cast light on speech,s specialness. This is ¡he phenonenon known as the McGu¡k efi-ect. In the McGurk effec¡, rhe syllable thar a speaker is hea¡d to have said is found to be influenced by lip movements that the speake¡ is s¿e¿ to produce, as well as by the acoustic infon¡ration glven ro the heare¡,s ear (McGurk and MacDonald t97ó). The effecr occuls in the following way: A video is taken of a speaker repeating the sfl)able /ga/ and an auditory recording is n.ude of the speaker repearing rhe syllable /ba/ . When the auditory recorãng rs heard alone, listene¡s âccurately recognize the syllable heard as a /ba/. li they are hearing rhese sy))ables whiLe watching the vídeo of apyropríateþ timed /ga/s beíng mouthed, then ¡he listener is subject to an illusion in which the souid. heard js reported as being /dal. The illusory syllable splits rhe difference berween the syllable heard and the syllable seen. /ba/ , whtch gets presented to the ears, dift-ers f¡om / ga/ , whtch gets presented to the eyes, in irs place ofa¡tìculacion. /b/s are bilabial (which is to say that rhey are articulated ât the lip$, whìle /g/s are velar (which is ro say thar they are articulated towards the back ofthe throaÐ. lØhat listene$ hear in the McGu¡k effect js a /d/, which is an alveolar consonanr, made towards rhe middle. The efect may be thought ofas a somewhat surprising instance ofthe context efecss that we discussed unde¡ rhe heading of ,The lnvariance Froblem,. The facts about r.vhich phonemei a bu¡st of sound is hea¡d ¡o contâin are, as we saw, influenced by a great many features of L.rre contcxt of the sound. What the McGurk eft-ect shows is that context effects are not limited to efects of a sound's øudítory context. The efi-ect is an efec¡ of ¡;¡søøl context on heard sound. We we¡e unmoved by the invariance problem because context effects are the norm for rhe perception of complex stimuli. The McGurk effect is more impressive because cross-modal context effects are less obviouslv nor-rnal. But c¡oss-modal context effects are not entirely exception¿I. If the McGurk effect shows that there is something special about speech, ic is not because the¡e is anyrhing specìal about the facr thar speech is a srimulus that is subject to influence f¡om concurrently presented visua.l stimuli. Lots of stimuli¡the¡ rhan speech are subject to that sor¡ of influence. The influence is most frequently discussed in connection with examples ftom ou¡side the audiuory domaìn, such as ¡he illusion of selÊmotion produced by motion in rhe perþhery of the visual field (Lee and Lishman r97j). Cross_modal efi-ec¡s a¡e ioolr¿ in th" auditory domain, too. The McGu¡k case is not the only case in which vision and auditory modalities combine in illusory ways, and so i¡ does not show THE MO'I'OR THEORY OF SPEECH PERCÊPT'ION 221 that such illusory combinations are special to sPeech saldanaand Rosenblum (1993) have shown rhat judgments of whether a cello sounds like it is being iìí"k.J ". ut*.a "re suújec; to Mccurk-like interference fiom visual stirnuli' 'Sound and vision can also inre¡act to produce tl5'41 illusions' notjust âuditory ones. The number of flashes that a s;bjecl seerìs ro ree cân be influenced by the number of conculaent tones that he frears (Lewald ând Guski 2oo3) 4 It tr^"", *"t"f to sPeech thar sound and vision can interact to produce hybrid p"r."prion, influenced by both modalities, without the subjecr's being awâre ofthe influence. This is not to say lhal the McGurk eft'ect shows us nothing speciâl aboul speech. The Mccu.k.f..t dot"tt'"^l an aspecl ofspeech that is in need ofa ,p".i"l "*pl"r.t"tion because lhe McGurk efect is of a much gteatet nøgtlítude ,'hrn "lt.tågo.t, cross-modal context efects for non-speech sounds Although """-rp.*it -t"¿t øre influenced by vision in much lhe same way that speech soundóareinfluencedintheMcGu¡kefect,lheydonotseemtobeinfluenced to rh" ,"-" extent The Parliculâr degtee of inflience from vision on whar seems to the subject to be the auditory perception of speech does. seern lo be anell.ectthar''..d,tob"explainedbythepostulationofsomethingspecial âbout speech Processing. ffrir'i, *ottt, emphaeizing becau se a quuntítcttít)e diferer.rce between speech perceptiolr ancl the ierception of olher sounds u.ray be erplained by reêrence io ^ )rortitotir" kinã of specialness on lhe parl of speech Given-that sounds i.r gåerat are so lewhat susceptible to McGu¡k-like effects' we do nol need io'porr.tt",. very much ,p".iul'.t"ss lo explain why speech isdistinguhhed f.oà orh.r sounds by rhe degree of irs suscepribiLiry to such efi-ecrs Perhaps the unusua\ high susceptibility of speech sounds to the McGutk effecr is explained Uy tl" a"t drat the contexts in which speech sounds ate heard "rË, ,o " gr"",a, exlenl than are lhe conrexls of other sounds' occasìons *h.r. th.-ro.tr." of the souncl is visible and whe¡e the visual inforrnation i, " pot"lrti"l source of useful disambiguating information The existence of othe, "uditory-vi oal c¡oss-modal illusions shows th¡t there are n-recl-ranisms i" p1".. Uv -hi.h -rirôl stinuli can influence the Perception of sound The i".i ,tt"r rp"."f, sounds ate unlike othe¡ sounds in lhe degree to which rt is useþl to make fine discriminations, and the facr lhal speech sour.rds "r".r.tlik" othe¡ sounds in the frequency with whicl.r visual i¡formation f¡onr lhe sound sot¡rce is available for helping with such discrilninations' .oold tog.ttt", explain why lhe mechanisms of c¡oss-modal influence (not .AvividdeÌnonstralion'describedinKrnntaniatldshinojo(2ool),canbefoundat:<htç://wrnv cns.ar¡ jpl-klrtn/¡udiovisu3lRâbbrt/index htrÙÌ> 222 CHRISTOPHER MOLE .tr;::ä rhemselves) come to be especiaÌly influential on rhe perceprion The magnitude of rhe McGu¡k e.ffect does reveal something special aboutthepsychology of speech perception,. br, ,h. ,;;;"';rr'"i.åi,,,"* a, ,n.McGu¡k efect might just be thar the "."""i ;;.;;; li "r¿io,r¡.r"lint::1c:-ion are especially ac¡ive fo¡ speech on ,".o"rrr-oîìlrî """"rr_"r,availability of occasions on which rhey ir" *_. ,"," nli, Jroii'. """",r_"oudliry of their doing so. our final verdic¡ "" ,h;;*jt"";;;'1, ,o. ro,o,Theory of speech perception is rhis: the û", .¡rp;'.;;;;;.,ä,io' o""ato be, erplained, bur the phenomena that have Uå" afr."rr"j ", i they wererevealing of the ways in which speech ì, ,p."i"l rrro out to ,.[ lrl"ro.. rru..The McCu¡k eû'ecc does show ,.,-..t;i" .^-;; ";:: :" * " i 'bwrairstomakÇil;;il"r.îfi::,:i:i:inîï:ï:*,ïî.îi:::; needs. I¡ fails to tell us whether tt difi-ers f¡om normal audirorv :-thj e:ttlttl of speech is special because dtnè;;";;;';;îii:"J, ji:"ï:.å'å:.,i.::;'"rJ"üffi :#jlî:: f,uploseg ¡o suppon. and thar purports ro give an explanarion of ¡hem, and ofrhe specialness with which we began. Any interpretarion of a sìien¡ific rheory is probably irristaken if the theoryis interpreted as sayrng something riwial, or __.,frUg ""ry'"Urrio,rr. ft tequally likely to be nristaken if the theory i, irrt.rp."r"d'", -r"l¡"* '__.,nr"* obviously false. The most ciiscussed ,lr-i.rry..."n:, ,p.oüJ" ,i il. ,r,".Theory of Speech perception. The task of salng _rri, ,frî'"ï",""ts "a,lr*rheory are prove\ ro be much her¿er "," l--"*,^T_ -l: :""':l because rhe rheory has rro, o".rrt"ll-ll than one. might expect Thrs is not the theory ,..*, io- ;";; grven a canonical statement' but because ;;;";';'J", j:"i:i;.i'i:;j:l;iJ",**i1ä,îi:TtrJJ,"i The canonical statement of the theory was given in r9g5 when Libe¡manand Matringry wrore 'The Mo¡o¡ Theoty "rsp"ã.rrn**iåï¿ïrr.o,. ,n"1'tell us thâr'The first claim of rhe Moror Theo.y, * .."ir:;,;;;;,i" .r¡""nofspeech perception are the intended phonemic gestures of rhe speaker,. ,Firstand firnd.rmenralJy'. we are rotd. .,¡"à ¡ ,¡. aj,o',ir, onii.J. i'r..n,,""rs perception of gesture, (Liberman ¿nd M¿rtingly ,9sj, ;;1.'H;;;" _. .unde$tand this claim? There a¡e ar least two possibilities, suggested by a farniliar THE MOTOR THEORY OT SPEECH IERCE?TION 223 distinction from discussions ofperceptual epistemology ln those discussions, we often encounter the distinction behveen two diferenl Perceptual relations distinguished in natural language by the dife¡ence belween perceiving an entiry and perceiving úhøf something or othe¡ is the case. There are, corresponding to these two perceptual relations, at least tlvo ways in which Liben¡an and Mattingly's claim about rhe object of speech percePtion could be interyreted' It could be a claim about the sort of lhing that goes in the 1 place in true sentences ofthe forrl 'He heard y' (when the hearing in queslion is an inslânce of speech hearing). Or, altemarively, it couìd be a claim about the sort of rhing thac goes in place of the P in lrue senlences of the forrrr 'He heard that P' (when the hearing in question is an instance of speech hearing) When Liberman and Mâttingly talk of perceiving 'gestures', what they mean is that when we hear a /b/ lhe object of our perception is a bilabial plosive gesrure; that when we hear a /n/ we ìheer an alveolar nasal gesture; and so on What isn't clear is which of the rwo perceptual reladons these geslures are sLrPposed to be the objects of On a fìrst reading, Liberrrran and Mattingly are claiming rhat when e lisrener hears speech, it is true lhât he hea¡s intended phonemic gestures lf this is the couect reading of rheir clairn, then thei¡ claim is surely true Sentences of the form '¡ hea¡d s' are true if and only if there is something identicâl to s that x heard. This context for 's' is an extensional one So, for example, it is true rhat Miss Sca¡lett heard the gunshot jusl if it is true thal there is something that Miss Scarlett heard, and true chât that lhing was the gtrnshot lt doesn't matler whether she recognized it as a gunshot, or even if she has any concept ofgunshots. IfMiss Scarlett thought that she wâs hearing a champagne bottle beir.rg opened, but lhe sound was in fact thât of a gun firing, then it is nonetheless rrue that Miss Sca¡lett heard the gunshot She heard it; she was just mistaken about what s]rre heatd. Understood in this way-as a claim about ìhe object of ¡he x }reard y relation-the 'fì¡st claim of the Motor Theory' is uncontrovenial. A speech act ls a sequence ofinlended phonenic gestures, so the truth of sentences of the form 'He heard inrended phonemic gestures' is guaranteed. by the existence of truths of the form 'He hea¡d the speech act' Þerhaps, tike Miss Sca¡letr, we do not k¡ow what il is thal we are hearingThe claim lhat rve hear phonemic gestures is conPatible with ¡he claim thar we hear such gestures unbeknownst to usThis claim is true, and obviously so, but it won't do as an interpretation of what Liberman and Matringly intend, because it can'r do the work that the Motor Theory is supposed to do To claim that lhe Perception ofspeech is the perception ofgesture in lhis sense is not to idenrifi a feature lhat r.nakes speech 224 CHRISTOPHER MOLE :ï:tj;^|t,:" if speech perception were exactþ the same as normat audition,rnen me obJect ol perceprion in this sense would srill be the gesture. If.we wan!the Motor.Theory to be making a non-obvious craim, we should understandit to be making a claim abour the othe¡ of the two ,orr, oÇlJptt ..t",iorr, Ic must make a claim about the objecr of the rela¡ion ,, ¡.ui'rno, O, . l¡orcis required for rhe trurh of sentences with the fonn ,ø hea¡d thot pl, , ,h^o *^,required for the t¡uth of .¡ hearcl s, because this context is "n in¡.nsion"l or_re.Although Mìss Scarletr can ,.uly say, on learning about the circumstances ofthe dea¡h. _'I heard the gunshoÉ, she .rnoot t_,rlir"y ,i", ,l"i""iJ ,frr, ,fr.." was a gunshot. Nor ifshe took il for rhe opening ofa bo,ú". H"".irre ¡¡o¡ ,t "r"w'rs a gunshor requires (rnrong orher rhingg rh¿r rhe event be h-eir¿ as ¡"¡r" a gunshot lf-rhe Moror Theory cla;ms that inrended Oh";;;;;;;;';;:rr¡e oo¡ects ot speech perception jn the sense rhat speech percepriãn rnvoluesperceiying f/ral there were ce¡rain intended phonemic g.rr,.,."r, d"r, ilr. ,Ìr.oryis conrmirted ro our hearing speech øs iriog ^ ""r"of in ""i"i'pfr.""_"gestu¡es. . The claim that we hear speech as being phonemic gestu¡es is ¡arher counre¡_intuitive, and it is easy to produce an argument showing it to be false. Supposewe have a listener, who, being in rhe grip of rorn" Tl."-ri.oîìbo,,, ,h.phonemic gestures, believes that /b,/ i, oo", " b;t.,bi"l pl*;"", ¡-"*'O*rl*"fartrill. Such a listener is easy uo imagine. It is equally ""ry r"'irrrlgìri""rlr"r r".l,a lis¡ene¡ is listening to a speech replete -ith lrrr,"rr""r'of /i7, ?"ã ,n", rr. araking the experience at face value. He need not believe he is subjecr to anyso¡t of illusion. If hearing the /b/s in the speech i,luolu.a t ""l-ij rl"* o,bilabial plosives, rhen this rhinker would be juilry ofso me son ofìTrationatíty,just as one who believed rhar no gun n"J U"er, fi¡ed would be gurlty ofirracionaliry if he persis¡ed in his belief of ha'ing experiencJ;;;; ,.or"rr."of events as being gunshors. A âlse rheory "blo,:iphorretics i, "","r. -"Arfyrefuted: speech perception doesn,t present us wich the underþing gestures asa_part of the content of expedence. We could make the ,"tlr. po-irriio ,rro..Wittgensteiniân rones: one who is searching for " t"bioaeotJAi.ìorr" ,rr"yneed a look-up chart to tell him when he Àas successfully ø""j-."". ,r""who is searching fo¡ a recl flower famously needs "" ,".Í ùJ_ìp"'"¡"rr. rfPhonemic gestu¡es were given in the contents ofexperience, then ,labiodental fücarive' would behave like ,red, in this respect. Tie .."** "f "*0."""".have to be non-inferentially grzeø, and phonernic ,"r,or", "."J, Ou.i ," ,fr",way. It is not the case that when x perceives ,pe"ch, , perceirres that ceftainphonemic gestures were made. lt can seem thac the Motor Theory is stuck with an irresolvable dilemma. Either it is making a claim about rhe ¡ela¡ion ofheari"g, ", Ji, -"iì.; " a"i_ THE MOTOR THEORY OF SPEECH PERCEPTION 225 about the relation o fheartrg that.Ifit is making rhe first claim, then it is saying something true, but something that cannot contribute to our undesslanding of the specialness of speech. If it is making the second claim, then it is saying something demonstrably false This dilemma only arises because we tâke the Motor Theory co be making a claim about rhe object of a perceptual relation in which the subjett ís a petson. CeLn the theory avoid these problens if it ¡etreals to making a clain about a søbpersoxal relacion? The fìrst ho¡n of the dilemma remains-'r hea¡d s'is an extensional context for s, whalever we pLrt in the x place, so the idendry of speech with phonemic gesturing guâIanlees rivially that phonemic gestures are perceived when speech is' The move to a subpersonal perceiving subject can't help here. Bul perhaps it helps with the dilemrrrat other horn. The second horn of the dilemma does look like a place in which the tactic of noving to a subpersonal perceiving relation seems more promising. The problems at that hom were problems thal ârose because the ih"ory ,."-"d wrongly ro convict a certain kind of thinker of ilrationaliry' These are problems thal rhe move to rhe subpersonal may help wirh, since the notions of¡ationaliry and irrationality are notions that lose their grip when we move to lhe subPersonal. To avoid the problems sel out above, the Motor Theory needs ro be interpreted as making a claim about a subpersonal perceiving relation, and ir needs this perceiving relalion not to be an exlensional one, or else the ptoblems associated with the fìnt hom ofthe above dilenrma will arise again How should we undersrand this noúon of a subpenonal, non-extensional 'perceiving that' ¡elation? When we were at the pe$onal level, we had some intuitive grasp of the way in which the personal 'perceiving rhat' relation fails to be exlensional' but at the subpersonal level, much mo¡e wo¡k is needed ifwe a¡e to u¡rlersrand the source of the non-extensionaliry of the 'perceiving that' ¡elation '\c the personal level, hearing rlror there was a gunshot requires hearing lhe soúnd as a lunshot. If we are lo make sense of ¡he Motor Theory as clâiming that speech is represented as phonetic gestures âl the subpersonal level, then we shall need " ,tbp".roo"l notion of representing ds, corresPonding to the personal-level notion of hearing dr. Il is â natural thought that lhe way to undentand this notion is through some kind of connection with particular concepts B:ot, for reasons akin to lhose we've already seen, a conceptually dernanding notion won't serve the motor theorisl's puÌposes. One who lacks the concePls of phonemic geslures cân nonerheless hea¡ what's being said to hin, and even if ihe think". has those concepts, they do not seem to be engaged jusr because speech is being Perceived. The situation we are in is this: The Motor Theory makes the claim that gestures are the objects of speech PercePtion We are trying to understand 22ó CHRISTOPHER MOI,E what this could mean. We havc,a,¡ntha,c;;.;;;.;;:ïïi:iïiä;:;ï:,ï3""."ï.,:iïnîi,,:: ::,:,,':i : l i!' :,{;,i3i',"1"î:o' " t'"r "ì on. o, ., i,i }' å,,iT ", p e,c ep r u ¿ r "r.h or,h"r";;ä¡*ì,':#iî be m¿de that a perceprual relarion that ¿voicls A subpenonal represenrarion .l:t'-'oo-.'ot notion of carrying info¡ma¡ion. ,tiog *rrt'o",;;,;îä;':"011,*t inJ mation about soÍne properties of a c.rrr¡,rng informarion ,oou, ,ort'n* conceprs ofrhar thing. and rhis norion of th. "ooî.,rt, ;;*'*;;:;i"e aspec¡ of a rhing aììows us ¡o individuate j:-:::0" ",," á;J;;;;;;i::."j"""T,jïij"fi:.H:ï:J:i:täarepresenhrion of speech is the appearance is nL;sieìdìng.,"; ìriiffi;l'î ;l;ilï,::.:ä ïJ:JT;becarrse phonemes ¿re inrtíuiduated úy the tip ;.;;;J,;.i iiJo"* ,n",,.A glance ar the intem¿rion¿l oh .r.ì,in.a uy ;il ;i;;ä",ï"**,.:,li:ff ,;;i;ïï',:iï"1,1ï:iî,îî by rhe sor of movemenr made þrosives, *r"r, inür,'"0r, äiîiìr"r, "r,o roon. are ways oImoving rhe mourh.parrs). wh", i,; ail iàro ì. _r,r,' :^l::" .""r,":"1: is for its pronunciarioo ¡o ir,r,olu. _Jr* -1"".."""u "a";ijïïi: i:.l;i#:1î,î'"""'" "u"'iir'"""ä., *,',"n.i, ."r,o* have,o be .";":'lä;,,'i:Ë:iiïfïtrï:,*l';lf:X :*lrepresenradon that carries informarior, "t our tlr" *orä;;;";;";å.ro ,?_latto.ctrne" inform¡rion ¿bour rhe lip movernents '',r;"."'"'" ,"."". The thing thar is dis¡inctive abr and orhe¡ Moror Theorists " ,r'"' ,n" approach of Liberman and Mamingly phoo.-i. g.,tuå, ;, ;;;;ï. i.ï11 jl:.]: ""j'"" o|represenring speech as Í-.:, u.:. n :,,o,r,",r,i nr.i, .ìpJ.l ;1ï,;1.:;ilii,#n[:;i::l,ti¡nstead to the thinker.s caÞacip r , h i, n ",,r ;. ì,' ;;; ;.ìj,.ïilä':,í::l'[J if :iï: i; j:,. î",0*.n ii," "'î,Jiff îi' ;î iH,3','. :i' i,'h"' -; ;.i; ;", ;i: :,Ï''ir-:: L r," p*,',r";.ï;;; ;"..1 ï'åïåîilXîï11'ï"î::*:*::* surely possibìe ¡o he¿r sÞeech so¡¡nds that one cannor produce oneselh Thechild born with an ill_fo¡med mou¡h does ,rot, of.our"", face deafnessNoris the poor mimic unable ¡o hear that ir is beyond him to imitate. a,il: tPt"th.lf,'hose wilh regional accents r,, +",.,o.,, o? .;ffi ;,il;. ;ii:: J : : #iï?l iïil,l:ï ;:ï:1rhe Moror I heory the findine bv M¿cNeil.*". ooo,.r. ìnl'aiì.. i,à.¿ ,nr,'peopJe who hrve been oarhoto;.,¡1y ¡¡ç¿p*,. u._i,"n ii.ii.li,i* ,n.,.a¡riculaton rre nonerh"l.ss rble ro perceive speech. 1f ibermin-riaî.,,r1ngty THE MOTOR THEORY OF SPEECH PERCEPTION 227 r98j: 24). On account of rhese findings, they moved from a claim aboul the vocâl Íact itselJ to a claim about an internal model of the vocal t¡act The theory as revised does not claim that we actualìy use our mouths and thro¡ls in hearing speech, but thal the perceplion of speech involves lhe use of 'an internal, innately specifìed vocal-tract synthesizer' (Liberman and Mattingly r98j: zó). This move f¡om a claim about the vocal tract to a claim about an internal model of the vocal tract brings with it a loss of clarity because it is not imrnediately obvious whar it faÈ¿s for a bit of r.reurâl apparatus to constitute an internal model ofthe vocal tracl. Several suggeslions could be made to help us undersrand the claim. One such suggestion would starl with the observ¡tion that there are some contexts in which one sysleDl can be said to n-rodel another just if the model can be used lo generare reliable predictions about the system modeled. This is the sense of 'model'in use wl.ren a load-bearing spring is said lo model an inte¡-molecular force, the effects of LSD are said to model schizophrenia, and, perhaps, some computer progralr-rs are said to model rhe wearher. If che Motor Theo¡isl's claim that the apparatus ofspeech perception includes a model of rhe vocal tract is undersrood as a claim that involves this sense of modeling-as-prediction-gene¡ation, rhen problenrs arise alongjust the Lines that we have already seen. There is a problem with saying that some parl of our brâin generates reliable predictions about vocâl ttâcl gestures if these predictions are personal-level stales-normal perceivers of speech make no such predictions Ar.rd there is a problem if rhe 'predictions' in question are subpersonal representations encoding information about the vocal t¡act-any subpenonal state that encodes information about phonemes also encodes infomration about vocalic gestures, on account of phonemes being individuated by the vocalic gestLr¡es that Produce lhem. There is, however, anolher sense of 'rnodel' on which the Motor Theorist's cl¿in.rs stand more chance of being both plausible and explanatory' We can say thal one syslem models another if the fìrst behaves in a way analogous to the behavio¡ of the second, and if it does so Jot analogous r¿aso¡'rs 5 lf one system is a model of another in this sense then it can be said co represent that system, and the parlicular states in the model that occupy the same functional role as a parlicular part of rhe system modeled can be said to represent those particular parts. This may grve us â sense of'represent' lhat we can use to undersrand the Moto¡ Theorist's clain-r that we represent speech as phonemic gesture. 5 This is rea1ly just a dynmic vession of the comtnoD or gatdeD concept of ¡ model as we find rt applied to model trains, and rhc lìke. It is nothng to do wlth the technicâI, logioanl sens228 CHRISTOPHER MOIE lf one system can be said ¡o model another in this sense (as opposeclto rhe less clemanding and already rejecred sense "l_;;;;;_"r_p*or.r-"_generadng), then there must be a high degree of ,yrrr_"ri,-þ.*".o ,fr.causal archirecru¡e of the model¿nd-th., or,ñ" ,rr,",', Io-.oì*å. irrn. rrrr._modeled inctudes r,ovo srares. bo¡h of which ,,iú"r" i;;;;;; rìn4e fe"tu.eof the system modeled, then rhe corresponding ""Lr'"îrià ï"0" ,,-'"r,also share rheir origins. Similarly, if two srâres ãa ür.-rr**'-"Oeled havedfferent exptanarions, the¡e shouid be ." difi."re,rc" ;; ;;;; ;. anatogousstates arise in the model. Moreo requires that a s,2sr",,",. -",;¿:ï;.*,:",î"tHff Í"ïî::ti:'i:*:; âfâirs, that s'"re must be a genuínely unfied sr^te.,¡,¡"-r*lîiä", ,"0 ,lr.flooding in wales a.e both .a,,."d úy rh"e "r"" "f t;;;;;;;^.ä', "o- ,n.north, then my mereorologicat model is nor g".d ;;; ;;;î.0..r..,,r.,*Brisrol as rainy is a result ofhavìng access to information about rainfall, and irsrepresenting Wales as flooded is a result ofhaving acc.r, ,o ,o-" oìr,. ,"p"."r.body of intormaLion abour river flow. rhe .;"ï;;;;;il;tlqui..n,.n, ca'''r be mer by gerymandering a disjunctive ,"r. "";;;:;;';; bodies ofrnformation.6 Fo¡ the brain to contain a model of rhe vocal tracr (and so, in this sense, forit to be able ro represent speech as phonetic gesrurerl, ìr.. rrrr.iiLr.¡ ,¡"brain gets from the souncls at the eâr ro tf,. ."i..r"r.rr*io', o1*l.or rro"t.",rrrr,inclu.ie a part in which the processing of ,.;..r.;rJo;; .iffiir._.*r"with the trearment received bv .o.,,rã, ". ,h"y p"r, À"ï "ïä1i"." . ,,nr,and ^oul Could the b¡ain,s p-""rrirrg of speech proceed in a way that wor.rldsatis$' chis non-gerÐ¡manderea ."r,ot ,r-rro.pt,irrr' ;.ilåii; i. ,¡rr, i"virtue of modeling rhe vocal rract, rt. U.rrr, .á,_rta .ig;öï"-r*'r"'*Or"r".r, vocal t¡act gesrures? I think not, bu¡ I do think ,l_r"i *J i""" i"å, "_""aar the co'ecr *ay ro undershncr rhe con¡enr "f ,h.;;;;ï;;;ir sp"..tl:ï.or-" The Motor Theory shoutd be ""d",;;J;;'"ì;"li ior", ,n"exrslence in the b¡ain ofa causellv j5e¡¡orphic model "f,fr. u".rì,'r"",. f, _"ybe that there is such a model, bit, fo. " couple ofreasons that we shall nowturn to, it does not seem to be at all likely thaì there is. There are rwo ways fo¡ the b¡ain,s processing ofspeech ¡o model the vocaltract, and each is problematic. The model .o,_rld *ork U".t*"r¿r_ì"U"g ,,input rhe acousric profiles which th" ,ro.rl ..a.rc ;i;;;;;"iì; or, .", iio -lftî* them to find rhe phoneric i",."ri.", ,r,",-r"ïìi."ì,i"" ,.".,gorng. This is the most obvious way for such a mo¿a a _.rt,îr, ,i" *"¿a " Spellrg our lvhcn c\d.Jv , rr.,re rs scnuinety un,fird rnd whcn gerÌynandired ,j, of { uûe, nor¡ne¡\ymJrrrr.forourpu¡poscsrheiDrurtivcnorionwiUh¡ve,o,rm.",-- --.-,,*. TrlE MOTOR THEORY OT SPEECH PERCE?TION 229 could, alternatively , work Jorwatds. The model could try out a whole range ofvarjous inputs. and use these to geneüre representarions of various acouscic profiles which it then conpares lo the acoustic profile that has been encoded Ly the ear. When it frnds a rnatch belween one of the generâted âcouslic profiles and the profìle perceived, it can identifi the input lhat produced the march. These lwo akernative ways of using a model correspond ro the two strategies rhat, in the psychological litemture, are given the unlovely names '"lr"fii, by analysis' and'analysis by syr.tthesis' Alr analogy will help to clarify the áifference between the two approaches. Suppose chat Mr Jones is playing noles, one at â time, on the piano, and that Mr Smich has the job offinding out which notes Mr Jones is playing To help him in his task, Smith is seated in the same room as Jones, and at the keyboard ofan exactly similar piano There are two tactics Smith can use The speediest tactic would be to Lift che lid of his piano, press lhe sustain pedal so that the strings are not damPened' and watch co see which stdng resonates This r¡'jlt be the string lhat coüesponds to the note Jones is playing. The second laclic is for Smith to press each of the notes on his keyboard, or.re after the other, and listen to hea¡ when the noce he plays sounds lhe same as the nole Jones plays ln each case, Srnitb uses his piano as a rnodel ofJones's The fìrst tac¡ic is analogous to analysis by analysis' Th" r".o.rd racdc is analogous to analysis by synthesis The method of analysis by analysis is the mo¡e efiìcient ofthe two' To åeet the causal isomorphism requiremenl, a speech processor which could successfully detect the /d/ at the beginning of 'di' and the / d/ at the begrnning of'du; would have to do so by the same means, for both '/d/s result frÃ th" ,"-. patteln of gestures. But, as we saw in our discussion of the lack of invariance, the dift-erence in the following vowel causes this pactern of gestures to Produce diferent effects on the features of rhe acoustic proftle' Th-e degree to which there is a lack-of-invariance problem, as discussed above' ,ho-, lr"t lhere can be no model of the voc¿l tract that satisfìes rhe causal isomorphism requirement and conducss successfi¡l analysis by analysis The causal isor.norphism requirement calls for a single part ofthe model detecting all and only, for example, longue-backing, while the lack-of-invariance problem tells us that there is no featu¡e of the acoustic Profile such lhat a device thar operated as a detector of that feature would be responding to all and only tongue-backing. Perhaps because lhey are aware of the tension between their claims about lack of invariance and the possibiliry of analysis by analysis, the advocates of the Motor Theory have rypically accepted the prima-facie less plausible aøøl1'sis by synthesss account, ¿ccording to which the model of the vocal trac¡ in the b,r'ain generates several representations of acousric profiles, and then comPares 23O CHRTSTOPHER MOrE the profiìes it has generated to the acoustic profile heard, so that, on finding a match, it is able to identify whatever inpur to the model of the vocal tract produced a representation that coresponds to the profile presented. Even ifan initial bìt ofanalysis by analysis is used to reduce the set ofprofiles that must be generated ro a set ofplausible candidates, the task ofanalysis by synthesis seems so vast that ir could only be successfully completed in a realistic ¡ime frame if the candidate profiles are produced by massively parallel processing. There a¡e a huge number ofpossible things thar you could be doing with your mouth at any dme, and the analysis by synthesis approach requires that a model of the vocal lract try each one ofthem out to see whether the acoustic consequences it genentes match the sound hea¡d. ,\ single model ofrhe vocal cract tryiug out each of these possibilities in series would have to be working at a colossal rate for speech to be perceived in ¡eal ¡ime. Analysis by synthesis is only plausible if parallel processing is employed, but parallel processoss fül ¡o meet the ,no gerrymandering' clause in the causal isomorphism requirement on modeling. To see that they must do so, suppose there are two models working in parallel, one ofwhich tries out rhe lip movements corresponding ro ,du, and the other ofwhich tries out the lip movements corresponding to ,da,. On one occasion the sound presenred is a 'du' and rhe profile produced by the fi¡st model gets matched ¡o the profile of the sound hearcl. On another occasion the sound hea¡d is a'da'and rhe second model produces the match. In bo¡h cases a /d/ is recognized, and so to meet the causal isomorphism requirement there must be a single stare fearuring in Ìhe recognirion of both sounds-but for there to be such a state is fo¡ there r?¿l to be separate paths operating in parallel. Ânalysis by synthesis is implausible unless the synthesizing models operare in parallel, but models operacing in parallel fail to meet the causâl isomorphism requirement, and the sþtes of models operating ìn parallel therefo¡e fail to count as reptesenradons ofparts ofthe vocal t¡act. We have ¡ried various ways to inre¡prer the claims of the Motor Theory of Speech Perception, but found none of them to be both plausible and neaningful as an account of how speech perception is done. We have also found thar ¡he evidence that has been thought to ¡ecom¡nend the Motor Theory's approach is wanting. This might lead us ro.give rhe whole thing up. Nonetheless, I claimed above that I would end with a gesture in the cli¡ection of a place where these problen.rs couid be solved. That place is, I think, closer to rhe spirit of the original Motor Theory than ir is ro the more sophisticated theory that was developed in the light of the evidence and argumenrs that have been reviewed here. We rejected rhe idea thar the apparatus of speech perceprìon is the apparatus of speech production because the perception of speech that one cannot produce is so obviously possible. THE, MOI'OR THEORY OF SPDECH TERCN?TION 23I 'We saw chat this consideration led the Moto¡ Theorists to change their clains to claims about inlernal uodels oî the vocal ffact They went f¡om ;;i;;;;.", there being just one svstem to a claim about two svstems' one of which wâs a model of th" othe' This seems to meto have been a source of unnecessaÐ¡ difüculties There was no need for the original oneä;;;;Ñ;; to be abandoned so entirelv The Mocor Theoris¡ can .--r"¡- *'ttl claim rhat ch"te lt " single comn1on mechanisnl of speech Pcrlcru) '"'" -^---- , that t.Jis rrre.h"ni m represents phonernicproducrion and p¡r1ep.on, a1;. ,;;;,;il;"tic idea rh¡r rhe c¿peciry gelrures witholrt b"t:* tt*tÏ;;;, ,¡i .ror.lny ro perceive it A single ro produce 'peech alu rvs *"-li;;;;:.;,åï ìi.o "'",, n"* rhe runcrion common mechanrrn of Productrol of directing speech productron ""J p"""ptù" when orher thins11e elual; î ;"ïïr.tJ il case that *i"ni'u 'i.' svsten is able to perceive it is î""ì" i-ut." ih"t" "'" plt''f of ways in which the per{ormance of a conrbined production/comprJt'-"iolt 'y"tltt could be impaired on the il;.;;;1å.":*T*,Y'ii"Ii::ff :î':;,':å,:ïï:lir'ffi:"i: resource' ofspeech productlon' rnJt rnesc 't-""." ,'f'': ^-^" ,- ,.,i",",, ""a',r'", trrae '"me -r;;.5::il,il:Ï:: Ï.itiiil"iilit,l"; OverlaPs in Ptocessing resource ;;;;ä Jira"l,ã""_,¡" lexicon, presumably, ser.ves both, as do sorne ,îr'oJ".ro, *'"tt"talicâl ânalysis W" ti'.t ttnde"r"nu 'n: Y,"-::: Theory as proposing that the overlaps in represenurional resources continue-out to lhe less abstract levels of ,"pr.r"rlr",iår., n.eded ro get rhe n.roulh_ ro move in the righr way. and needecl to *tt u' ì"ro a position ìo know, whjch -word5 are s¿id fir. ;il, t' a tketch for the 'o'r ol p'opo'"] thac mighr be nr¡de lf ic is to be developed, rhen we shalÌ neeJto bå " lot clta'et ¡bout the rtuth conditions ofthevarious,o,t,of,.p.","',t"tionpostulatingclainrsthatcanbemadein subpersonal cognitive psychology' References Resl. C. and McRobe¡rs, G (zoo3) 'lnfant Percepliol of Non-native Consonanl u,*'ä ";';;ì** : I :*"li:::,jJ;*Jff:*"ï:* [ïî:"ï*ï]jìîì,:i Arditorv Event-Rehle d Potennals rt o.ri*J Ãoi',,*,t ^ge 8 Yerrs"4nrals ol Dvslexia' 541q -38 Fadiga, L. Craighero L ' Buccino' ti' ¡nd'Rizzol¿tri G {2002) 'Speech Listening '"roî'À.t', nn""o,'Àtc' rhe E\crrrblhry of Tongr're Muscìcs: A TMS Srudy Erltopcar lotrmal oJ Neuroscience, t5t 399-4o2 232 CIIRISTOPHER MOTE Fowle¡. C. A. (r98ú). ..AIr Event Aroi*.,n.Jiì,í.ipäi:';"::i:;i:;:,;:l.,;l::';:**.'percepcionfrona ;'iiåîi*i; ó:..11ï:i'oupt'* t""..oaonlì con,p,.r.on or Monosyrn"¡"*"rr",, ,o, r"*? --.-" lwnat oJ Expeimenal Psyrhology: Human Perteption and H¿m¿d, S. G9B7). Care&or¡rat perc cambridge unive¡sitv p¡ess ptíon: The Grountlwotþ of cognition carabridge: '"i;åJ;:lij;:îî åi;Ín)-:,"y:u¡áI rmhnriâ,ion orche Motor rheory or*ä#l *i; i#;'-'þ :':- ¿i:i:':J::ff ,l'o'". Ji "*" ",, '.a. b,,,, . r o u r n at or KlatË, D. H. (r989). ,Review ofSelecred Models ofSpeech perception,, in \V. Ma¡slen_\Vilson (ed.). Lzxial Representation tal"""¿.., r. n.-or"¡.,'w""' .,unon anl Proaess Camb¡idge' Mass : MIT Press. ,_ phoneuc caeegorie;:';;";;',:r;,iÏi;" * (re87) Japanese Quail can Leam Kuhl, P. K. ¿nd Mille.J. D. t ¡ozB). lSo.e.h perceprion by rhe ChinchiUa: lden¡irrc¡t¡onFunccìons for Synrhedc VOT SËi; 9o5-tj. o)l" Journal oJ rhc Aøusrial sotiery oJALmeica,63: Laoe, H. (r9ó5). .The Moto¡ Theo o ,yrna"si*t R*ti*,'r;:;;;-r;:;*" of Speech Perception: 'a criticâl Review'. Lec. D. N. ¿nd Lishm¡n, J. R. (rszs). ,Visual proprioceptive Con¡rol of Srance,. , Joum.al of Huna Mouemen¡ Studícs. | 87_95. Lewald. J. and cuski, R. (2oo.ì). .Cro! pi::;ilil.ì'l'.ii;'. ;:iiilî9il:ir:'.;;;il:"ilí;:t::ft1;1 liberm¿n, A. (¡99o). .Af¡e¡¡houehcs o vat,ingty ana'í,a. sc,,;iliä.il Y"3']'."y'"0 the Motor 'rheory'. in I. G. **r,ã, H,r,'a.ì*'*1ì;ì::ï:i.,,;lj* i:duto",ír,t .ond rhe Mu* rhciry oJ specrh -and Matringly. L G. (r9SJ). .The cogltition, 2ri r .16. Motor Theory ofspeech Percepcion Revised'. t;ï::: " and MacDonald, J. (re7ó). ,Hearins Lips and Seeing voices,. Nature, 264i MacNedage, P. F., Roores, T. p.. â *a r..î"p,i." - "ì;;,;;;j:o chase' R A (reóì'speech Production . . Moror Conrror..,ro ,;,;i;i;;;;:;::;';'"":i:ï:#;",:::T,ilic perception and rvj¡nn, v. A. and Repp. B. H. rroRnì ,*".".""rv..i. ¿;;,._i "" perceptron of ^,ln: l.lt] f:t Disrinc¡ion . o"àp, ion ona p:ychophysic:.28: 2 r..t_ ¿8. ^llyâwàki. K.. Scrange, W., Ve¡b¡uøFujimurr o. (;;;;; .;,,6.::':fYl R liu"'-* A M,Jenhins.J J and í;;;;r,;;[:tì:,,::åi:];j,:::ii:,.d;:;,.i|;i:;;;;,_;î,i z4i(4890):489-94. THE MOTOR THEORY Of SPEECH PERCEPTION 233 Saldan¿, H. M. and Rosenblum, L. D. (1993). 'Vn¡al Influences on Audirory Pluck ancl BowJudgments'. Percssption 4n¡1 Psychaphysics, 54(3): 4o6 16 Sinnot, J. M. ancl Mosteìle¡ K. !1. (zoor) 'A Comparative Assessment of Speech Sound Disc¡imination in the Mongoliân Gerbil' Joumal of the Atoustical Society oJ Afiletica, rt.a(4) : t7 2932. Í1?l_"1 Specializarion for Speech percepcion,. S.¡er¿¿, rew series,