Elsevier

Cognition

Volume 99, Issue 2, March 2006, Pages 113-129
Cognition

Spatial representation of pitch height: the SMARC effect

https://doi.org/10.1016/j.cognition.2005.01.004Get rights and content

Abstract

Through the preferential pairing of response positions to pitch, here we show that the internal representation of pitch height is spatial in nature and affects performance, especially in musically trained participants, when response alternatives are either vertically or horizontally aligned. The finding that our cognitive system maps pitch height onto an internal representation of space, which in turn affects motor performance even when this perceptual attribute is irrelevant to the task, extends previous studies on auditory perception and suggests an interesting analogy between music perception and mathematical cognition. Both the basic elements of mathematical cognition (i.e. numbers) and the basic elements of musical cognition (i.e. pitches), appear to be mapped onto a mental spatial representation in a way that affects motor performance.

Introduction

“…The results are clear-cut and unequivocal. High tones are phenomenologically higher in space than low ones. …The fact that on any place-theory of hearing the lowest tones would fall at the apex and the highest tones at the base of the cochlea opposite the oval window no more means that we hear the world upside down than the inversion of the retinal image forces us to stand on our heads to see the world right side up. The experiments were done, however, not so much with auditory theory in mind as with the query as to whether the results would throw any light on the moot question of the apparent auditory movement which is set up by tones of different pitch when presented in succession.”

Pratt, 1930

Pitch, primarily determined by frequency (Moore, 2003), is classified in many languages by using terms having a spatial connotation such as “high” and “low” (e.g. Chinese, English, French, German, Italian, Polish and Spanish). Melara and Marks (1990) showed that, in speeded reaction tasks, responding to the written word “high” is faster in the presence of a high pitch, whereas responding to “low” is faster in the presence of a low pitch. They argued that crosstalk occurs between written words and pitch at a semantic level of processing, which suggests that the association between verbal labels denoting spatial positions and sound frequency is not arbitrary. In music psychology, the structural relations among perceived pitches have been described with geometrical models derived from multidimensional scaling of relative pitch judgments (Shepherd, 1982, Ueda and Ohgushi, 1987). A model which is typically regarded as an effective metaphor for pitch perception in both musically naïve people and trained musicians is in the shape of an ascending spiral having circular (chroma) and vertical rectilinear (octave) components (Ueda & Ohgushi, 1987). Also, the standard music notation system maps pitches to vertical locations, whereby notes corresponding to higher pitches are represented with higher spatial positions on the staff.

Stumpf (1883) was looking for the origin of the association between pitch height and the vertical spatial dimension which emerged consistently across languages. He argued that cross-modality associative bonds and relations of similarity are used as effective metaphors but, in fact, no real spatial characterization is intrinsic to the tonal sensation. It was only in 1930 that Pratt put forward the hypothesis of a real pitch-space correspondence after observing that the specific succession of the tones in a musical phrase can generate a sensation of apparent movement (e.g. by presenting successively the notes of the diatonic scale from C3 to C4, almost everybody perceives an upward movement). He asked participants to locate on a numbered scale running from the floor to the ceiling the position of tones sounded by a hidden loudspeaker having variable location. Five tones were used which bore the octave-relation to each other and had frequencies of 256, 512, 1024, 2048, and 4096 Hz. Participants were asked to indicate by one of the numbers on the scale the region from which the tone seemed to come. Every participant consistently judged the tones to be placed in the order, from top to bottom, 4096, 2048, 1024, 512, and 256 Hz. Thus, Pratt concluded that “…of two tones of different pitch the one of greater frequency is called higher, not because of any extraneous associations with altitude, but simply because it is perceived as occupying a higher position in phenomenological space” (Pratt, 1930, p. 283). However, as Pratt, (1930) himself and Trimble (1934) noticed, it is possible to interpret those results in the light of habits in pitch discrimination, in which case localization would be a matter of discrimination on a pitch continuum rather than discrimination on a spatial continuum (i.e. participants systematically assigned locations on the basis of the pitch, even if not instructed to do so, for sound localization is very poor in the vertical dimension). Moreover, the only expedient aimed at preventing the use of words such as “high” and “low” to mean perceived spatial location, but which could also evoke the associated concept of pitch height, was labelling locations with numbers, and the number-to-location assignment was not counterbalanced between participants (small numbers were always assigned to low and large numbers to high locations). Trimble (1934; Experiment 2) replicated Pratt's (1930) results with a fixed sound source and nine different tones, which did not bear the octave relation (500, 600, 700, 850, 1200, 1750, 1900, 2250, and 3950 Hz). He asked participants to record the apparent displacement of a phantom sound source from the horizontal plane both at the beginning and at the end of an “ascending” and a “descending” series of pitches in rapid succession. In addition, participants were instructed to draw on a chart the apparent course made by the phantom sound. As expected, higher frequency tones were localized higher in space than lower frequency tones and musically “ascending” series followed a bottom-to-up trajectory, whereas “descending” series followed an up-to-bottom trajectory. More than 30 years later, Roffler and Butler (1968) replicated again Pratt's (1930) results with nine tones (250, 400, 600, 900, 1400, 2000, 3200, 4800 and 7200 Hz), and showed that the association between pitch height and vertical locations persisted even when the distance of the viewer from the panel and the viewer's orientation were manipulated. They also found similar results in congenitally blind people and in 4–5 year-old children who, according to their self report, were unaware of the use of the words “high” and “low” to describe differently pitched sounds.

The major shortcoming of Pratt, (1930), Trimble (1934) and all the following studies claiming for a spatial representation of pitch height (e.g. Mudd, 1963, Roffler and Butler, 1968) was that participants were explicitly instructed to estimate the spatial position of sounds. For example, Mudd (1963) tested pitch height for “associative spatial stereotype”, following Fitts and Deininger (1954) who had defined the operational measurement of population stereotype as the determination of the relative frequency with which each permissible response is made to a stimulus, when instructions do not specify what response is considered as appropriate. On each trial, Mudd's participants listened to two stimuli of different frequency. After listening to the stimulus pair, they were asked to move a plug from the centre reference hole in a pegpanel, whose position was assumed to correspond to the first stimulus, and plug the peg into another hole in the panel to represent the second stimulus. The spatial difference between the two pegholes was taken to represent the difference between the auditory stimuli being compared. Interestingly, Mudd found frequency to have both a horizontal and a vertical associative spatial stereotype (i.e. it was multidimensional), the latter being considerably stronger. Higher-frequency pitches were assigned to right/up locations and lower-frequency pitches were assigned to left/down locations. The procedure used to measure the “associative spatial stereotype” was considered adequate by Mudd (1963) on the basis of two arguments: (a) the verbal instructions emphasised that participants could and should plot anywhere on the panel and (b) the cartoon illustrations used to facilitate understanding of the task were designed to avoid, or at least to hold constant, any such tendency to establish any response bias. However, the instructions requested that pitch height be intentionally represented in terms of visuo-spatial metrics. As a consequence, one cannot say whether pitch height is spontaneously or mandatorily mapped onto space, and whether the assignment of high-frequency pitches to high (or right) locations and low-frequency pitches to low (or left) locations generalizes to a context where participants' main concern is not to locate or to represent pitches in space.

In addition to the theoretical relevance of the question as to whether pitch height possesses intrinsic spatial characteristics, knowledge of the mental representation of pitch may have applications in ergonomics. A general rule for interface design, indeed, should be to maintain correspondence when possible. Moreover, although compatibility effects decrease in magnitude with practice, significant effects remain that do not disappear even with extended practice and stimulus-response incompatibility cannot be overcome by training (Dutta & Proctor, 1992). In the present study, we tested the spatial representation of pitch height in more restrictive conditions than previous works did. We asked whether pitch height would prime and speed up manual spatial responses, when the task does not require consideration of spatial location of the stimuli.

We explored the spatial representation of pitch height through the pairing of pitch to different response positions. In speeded choice reaction tasks, whenever dimensional overlap occurs between stimuli and responses, the mapping of stimuli to responses influences speed and accuracy of performance (Kornblum, Hasbroucq, & Osman, 1990), a phenomenon known as stimulus-response compatibility (SRC), and which has been a matter of relevant interest both in ergonomics and in experimental psychology since the pioneering studies of Fitts and colleagues (Fitts and Deininger, 1954, Fitts and Seeger, 1953). Dimensional overlap between stimuli and responses can occur in their spatial location, so that stimuli appearing on the right are responded to faster with a right than a left response key (Fig. 1A), and stimuli appearing up are responded to faster with an upper than a lower response key, irrespective of whether the keys lie on a frontal or on a transverse plane (Cho and Proctor, 2003, Vu et al., 2000).

Spatial SRC effects with stimuli accessing a mental spatial representation and responses mapped in the physical space have also been reported. Dehaene et al., 1993, Dehaene et al., 1990 showed that, with parity or magnitude judgments on centrally presented numbers, responses to large numbers are faster with the right than the left key, whereas responses to small numbers are faster with the left than the right key, irrespective of the responding hand (Spatial–Numerical Association of Response Codes or SNARC effect; Fig. 1B). Therefore, number magnitude is thought to be mapped onto a spatial representation, each number being represented as a point on a left-right mental line.

We reasoned that mapping sets of elements onto spatial positions might represent a useful cognitive tool across different domains (see also Gevers, Reynvoet, & Fias, 2003), and if spatial codes were assigned to pitch height, performance would result better when pitch (cognitive) location corresponded to response location than when it did not (Fig. 1B).

Section snippets

Experiment 1

With Experiment 1 we aimed at testing whether any SRC effect was present when pitch height and response position were varied orthogonally, in a task where nonmusicians were asked to compare the frequency (and not the spatial location) of two pure tones. Higher/lower judgments, when referred to the perceived location of a sound source as in Pratt's original work, call into question a different concept than higher/lower judgements referring to pitch height. Polysemantic words, such as bank, refer

Experiment 2

In Experiment 2, nonmusicians required to classify sounds as being produced by wind or percussion instruments (i.e. they had to perform a musical instrument identification task). To proceed with the analogy between music psychology and mathematical cognition, our identification task can be regarded as the equivalent of a parity judgment task. In a parity judgment task participants are required to decide whether a number is even or odd, instead of processing number magnitude. Nonetheless they

Experiment 3

Rationale and stimuli were identical to Experiment 2, but participants were trained musicians (i.e. they could sight-read scores).

Conclusion

We asked a group of musically naïve participants to perform a pitch comparison task, a different group of musically naïve participants and a group of musicians to perform a musical instrument identification task on sounds having different pitch. A SMARC effect (i.e. high-frequency pitches favouring up responses and low-frequencies pitches favouring down responses) was present both when pitch was task relevant, and when it was task irrelevant. Moreover, when pitch height was task irrelevant, a

Acknowledgements

Preparation of this manuscript was supported in part by grants from the European Commission (RTN Grant HPRN-CT-2000-00076) to BB and CU and a grant from MIUR to CU. We would like to thank Prof. Diana Deutsch for her useful suggestions.

References (30)

  • W. Gevers et al.

    The mental representation of ordinal sequences is spatially organized

    Cognition

    (2003)
  • A.J. Marcel

    Conscious and unconscious perception: An approach to the relations between phenomenal experience and perceptual processes

    Cognitive Psychology

    (1983)
  • Y.S. Cho et al.

    Stimulus and response representations underlying orthogonal stimulus-response compatibility effects

    Psychonomic Bulletin and Review

    (2003)
  • S. Dehaene et al.

    The mental representation of parity and number magnitude

    Journal of Experimental Psychology: General

    (1993)
  • S. Dehaene et al.

    Is numerical comparison digital: Analogical and symbolic effects in two-digit number comparison

    Journal of Experimental Psychology: Human Perception and Performance

    (1990)
  • D. Deutsch

    Musical illusions and paradoxes

    (1995)
  • D. Deutsch

    Grouping mechanisms in music

  • D. Deutsch

    The puzzle of absolute pitch

    Current Directions in Psychological Science

    (2002)
  • Deutsch, Hamaoui, K., & Henthorn, T. (in...
  • D. Deutsch et al.

    Absolute pitch is demonstrated in speakers of tone languages

    Journal of the Acoustical Society of America

    (1999)
  • D. Deutsch et al.

    Absolute pitch, speech and tone language: Some experiments and a proposed framework

    Music Perception

    (2004)
  • A. Dutta et al.

    Persistence of stimulus-response compatibility effects with extended practice

    Journal of Experimental Psychology: Learning, Memory and Cognition

    (1992)
  • P.M. Fitts et al.

    S-R compatibility: Correspondence among paired elements within stimulus and response codes

    Journal of Experimental Psychology

    (1954)
  • P.M. Fitts et al.

    S-R compatibility: Spatial characteristics of stimulus and response codes

    Journal of Experimental Psychology

    (1953)
  • S. Kornblum et al.

    Dimensional overlap: Cognitive basis for stimulus-response compatibility. A model and taxonomy

    Psychological Review

    (1990)
  • Cited by (412)

    • The origins of musicality in the motion of primates

      2024, American Journal of Biological Anthropology
    View all citing articles on Scopus
    View full text