Abstract
In this paper I challenge the widely held assumption that loudness is the perceptual correlate of sound intensity. Drawing on psychological and neuroscientific evidence, I argue that loudness is best understood not as a representation of any feature of a sound wave, but rather as a reflection of the salience of a sound wave representation; loudness is determined by how much attention a sound receives. Loudness is what I call a quantitative character, a species of phenomenal character that is determined by the amount of attention that an underlying perceptual representation commands. I distinguish quantitative from qualitative character; even qualitative characters that represent degrees of sensible magnitudes are phenomenally and functionally distinct from quantitative characters. A bifurcated account of phenomenal character emerges; the phenomenal is not exhausted by the qualitative.
Similar content being viewed by others
Notes
Psychophysical models of the relationship between loudness and sound intensity in pure tones date back at least to Fechner (1860). Stevens (1961) proposes another influential model on which the perceived loudness of a pure tone is related to its sound intensity by a power law. It has long been acknowledged that factors other than sound intensity play a role in the determination of loudness (a point which will be emphasized in the next section of this paper), but despite these confounds it remains standard to think of loudness as more-or-less determined by sound intensity. For example, Scharf (1978) defines loudness as “the attribute of a sound that changes most readily when sound intensity is varied,” and Epstein and Marozeau (2012) state that loudness is “the primary perceptual correlate of physical sound intensity.”.
The “what it is like” locution as a description for phenomenal character originates in Nagel (1974).
For example, O’Callaghan (2007) offers an analysis of loudness amenable to an externalist representationalist account of phenomenal character. “Loudness,” he says, “depends on the intensities of a sound’s spectral constituents” – the various frequencies that comprise the sound (pp. 86–87).
Related to this is the phenomenon of loudness constancy; for at least some sounds, reports of source loudness remain constant—and accurate—even as distance from the source (and thereby proximal intensity) changes. The mechanism by which loudness constancy is achieved is not fully understood. One hypothesis is that source loudness is recovered from encoded intensity as a function of distance from the source, but this requires an accurate representation of source distance, which subjects systematically underestimate. Another possibility is that loudness constancy is achieved not as a function of distance information, but rather a function of information about reverberant sound energy recovered from the proximal stimulation (Zahorik & Wightman, 2001).
See, e.g., Suzuki and Takeshima (2004). This trend reverses with very high frequency sounds for reasons that will take us too far afield to explain; for present purposes, it is enough to appreciate that encoded intensity, and thereby loudness, depends not only on proximal intensity but also, in part, on frequency.
This is why playing both keys together on the piano sounds like a single rough and beat-y tone, rather than two distinct tones as with C and E.
These numbers are idealized for sake of demonstration. The amount of neural activity in a critical band is determined by myriad factors. For example, as I have already noted, the amount of neural activation caused by a tone depends in part on the tone’s frequency, so the lower-frequency critical bands contribute less neural activity than the mid-range ones. For more on the relationship between loudness and critical bands, see, e.g., Schreiner and Malone (2015).
This simplified description eschews interesting and important details regarding, e.g., whether and how top-down attention influences this process, how bottom-up inputs from various feature maps are aggregated, and whether visual attention is subserved by one or more than one saliency map. For further discussion, see e.g. Treisman and Sato (1990), Itti and Koch (2000), and Burrows and Moore (2009), respectively.
Since Kayser et al.’s study, several saliency map models of auditory attention have demonstrated improved predictive power by incorporating additional feature maps. Even in these improved models, intensity (sometimes under the guise of ‘envelope’—the “shape” of a waveform from which intensity information may be gleaned)—remains a dominant factor in salience prediction. See, e.g., Kaya (2012).
Another possible explanation is that the model is wrong with respect to encoded intensity’s contribution to auditory salience. This seems unlikely. In order to be wrong in a way that delivers these results—that there is no difference in the influence of encoded intensity on salience between ‘rely’ and non-‘rely’ judgments—instead of expected results—that ‘rely’ judgments involve more influence of encoded intensity on salience than non-‘rely’ judgments—then the model would have to be dramatically wrong about how intensity-based-salience is determined. Given that their model on which intensity alone is able to predict which stimulus is judged as more salient nearly as well as the more complicated model, it would be surprising if they were deeply mistaken about determining salience from intensity.
The ILR study is interesting for another reason. The component of the auditory evoked potential (AEP) that correlates with loudness is the N1-P2 deflection; reductions in loudness correspond to decreasing N1-P2 amplitude. The N1 component of the N1-P2 complex is associated with change detection, including the occurrence of deviant and oddball stimuli (Pratt, 2011). The N1 potential begins around 100 ms after stimulus onset and is believed to be generated by feature traces antecedent to integrated representations of auditory objects. It is hypothesized that feature integration of auditory objects occurs between 150–200 ms after stimulus onset, which overlaps with the end of the N1 component and the beginning of the P2 (Näätänen & Winkler, 1999). In other words, AEP activity that corresponds to loudness also corresponds to the processing immediately before and throughout the generation of a feature-integrated percept. This is precisely where we should expect to see loudness-related activity on the view that loudness reflects salience. After all, the saliency map is the guide by which bottom-up attention directs its resources, and, on FIT, it is this directing of attention that binds features at the attended location. If there is a saliency map for auditory bottom-up attention, we should expect it to arise at the interface of feature traces and feature-integrated auditory objects, and it would seem that the N1-P2 complex is where we should expect to locate that sort of processing. This does not, of course, constitute evidence that loudness is a presentation of salience; there is much more to say about the N1-P2 complex than I have discussed here. I raise it only to suggest that current understanding of the N1-P2 complex is consistent with loudness reflecting salience.
Näätänen and Winkler (1999) point out that one way in which feature-integrated auditory objects differ from feature-integrated visual objects is that the “medium” of object formation is space while the “medium” for auditory object formation is time. Hence, the analog of a saliency map for auditory attention is unlikely to be a topographic spatial representation, and so the notion of a “location” is here an analogy for whatever it is that “temporal maps” encode.
More precisely, but less memorably: many sounds are salient largely because they are encoded as intense; as I have stressed, encoded intensity is not the sole contributor to a sound’s salience.
Hence, it is in principle discoverable that an apparently metathetic sensory dimension—one that bears the heuristic hallmarks described above—is actually prothetic, and vice versa. For example, Stevens was surprised to discover that apparent saturation does not bear a linear relationship to objective saturation, and thereby counts as a prothetic sensory dimension by his lights (Panek & Stevens, 1966).
Indeed, it is an open question in neuroscience whether our brains feature a multisensory magnitude estimator; Baliki et al. (2009) suggest that the insula may be a hub for “how much” representation. It would be beyond the pale to consider such a possibility if magnitudes across modalities were not phenomenally comparable.
An anonymous reviewer points out that silence is sometimes highly salient, as when we describe a tense silence as “deafening,” or when we are awoken by the tv being turned off. Salient silences such as these stand in apparent opposition to the claims that a sound loud to no degree is inaudible and that we cannot be aware of inaudible sounds, for intuitively it is something we hear that wakes us up, and acute awareness of silence that renders it “deafening.” I think there are a few ways that salient silences may be accommodated without violating my salience-based view about loudness. One is to hold that silence may be salient, but not in virtue of its (nonexistent) audible qualities; by extension, it could be held that we may be aware that it is silence without being phenomenally aware of silence. Another is to maintain that what is heard in these cases is not silence, but some other event—an event comprised of a sound and its cessation, perhaps. A broader perspective on salience and its phenomenology must surely feature an analysis of salient silence. However, it is beyond the scope of this paper to present and defend such an account.
An anonymous reviewer comments that because this view relativizes loudness to (types of) perceivers according to the operation of their attention systems, it would seem to permit the possibility of “loudness inversion,” such that distal sound stimuli that are very loud to one perceiver are very soft to her invert, and vice versa. This is correct. One significant difference in the possibility of loudness inversion (and quantitative character more generally) compared to, e.g., hue or pitch inversion (and qualitative character more generally) is that where it would be difficult if not impossible to detect the latter based on the invert's behavior, it would be nearly impossible to miss in the former case. Someone loudness-inverted relative to me, for example, would strain to hear rock concerts and cover her ears in agony at the sound of pin-drops. This is a welcome distinction. My account of quantitative character is, after all, a functional one, so the fact that loudness inversion between individuals predicts “inverted” behavior is a feature of the view.
I am grateful to an anonymous reviewer for raising this point.
I am grateful to an anonymous reviewer for raising this point.
Ascription of salience to stimuli and their features is pervasive in the attention literature. Ascription of salience to feature representations and locations arises within the literature on computational models of bottom-up attention, e.g. Koch and Ullman (1987). Ascription of salience to actions arises within literature that conceives of attention as selection for action, e.g. Kerzel and Schönhammer (2013), including views on which bottom-up attention subserves a “belief optimizing” function of perception by determining the best locations to conduct “experiments” aimed at reduction of uncertainty, i.e. by determining locations of eye saccades (Parr & Friston, 2019).
This is not to say that qualitative characters like hue and pitch pose no threat to externalist representationalism; a defense of externalism about qualitative content must still contend with empirical data to the effect that qualitative continua do not correspond to distal features in the orderly way the view predicts. The difference between the cases against externalism about qualitative character and for quantitative is that, with respect to quantitative character, the case is not constituted principally out of these failures of correspondence; quantitative characters are subject to “coincidental covariation,” as Pautz (2015) calls it, and salience attribution is a competing explanation for quantitative character that has no corollary for qualitative character.
References
Arieh, Y., & Marks, L. E. (2011). Measurement of loudness, part II: Context effects. In M. Florentine, A. N. Popper, & R. R. Fay (Eds.), Loudness (pp. 57–87). Springer.
Baliki, M. N., Geha, P. Y., & Apkarian, A. V. (2009). Parsing pain perception between nociceptive representation and magnitude estimation. Journal of Neurophysiology, 101(2), 875–887. https://doi.org/10.1152/jn.91100.2008
Burrows, B. E., & Moore, T. (2009). Influence and limitations of popout in the selection of salient visual stimuli by area V4 neurons. Journal of Neuroscience, 29(48), 15169–15177. https://doi.org/10.1523/jneurosci.3710-09.2009
Dretske, F. (1995). Naturalizing the mind. MIT Press.
Epstein, M., & Marozeau, J. (2012). Loudness and intensity coding. Oxford Handbooks Online. https://doi.org/10.1093/oxfordhb/9780199233557.013.0003
Fechner, G. T. (1860). Elemente der Psychophysik. Breitkopf und Heitel.
Glasberg, B. R., & Moore, B. C. (2002). A model of loudness applicable to time-varying sounds. J Audio Eng. Soc., 50(5), 331–342.
Huang, N., & Elhilali, M. (2017). Auditory salience using natural soundscapes. The Journal of the Acoustical Society of America, 141(3), 2163–2176. https://doi.org/10.1121/1.4979055
Itti, L. (2005). Models of bottom-up attention and saliency. In L. Itti, G. Rees, & J. K. Tsotsos (Eds.), Neurobiology of attention (pp. 576–582). Elsevier Academic Press.
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10–12), 1489–1506. https://doi.org/10.1016/s0042-6989(99)00163-7
Kandel, E. R., Schwartz, J. H., Jessell, T. M., Siegelbaum, S. A., & Hudspeth, A. J. (Eds.). (2013). Low-level visual processing: The retina. In Principles of neural science (5th Ed., pp. 1339–1388). McGraw-Hill Medical.
Kaya, E. M., & Elhilali, M. (2012). A temporal saliency map for modeling auditory attention. In 2012 46th Annual Conference on Information Sciences and Systems (CISS). https://doi.org/10.1109/ciss.2012.6310945
Kayser, C., Petkov, C. I., Lippert, M., & Logothetis, N. K. (2005). Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology, 15(21), 1943–1947. https://doi.org/10.1016/j.cub.2005.09.040
Kerzel, D., & Schönhammer, J. (2013). Salient stimuli capture attention and action. Attention, Perception, & Psychophysics, 75(8), 1633–1643. https://doi.org/10.3758/s13414-013-0512-3
Koch, C., & Ullman, S. (1987). Shifts in selective visual attention: Towards the underlying neural circuitry. Matters of Intelligence. https://doi.org/10.1007/978-94-009-3833-5_5
Liao, H.-I., Kidani, S., Yoneya, M., Kashino, M., & Furukawa, S. (2015). Correspondences among pupillary dilation response, subjective salience of sounds, and loudness. Psychonomic Bulletin & Review, 23(2), 412–425. https://doi.org/10.3758/s13423-015-0898-0
Liao, H.-I., Yoneya, M., Kidani, S., Kashino, M., & Furukawa, S. (2016). Human pupillary dilation response to deviant auditory stimuli: Effects of stimulus properties and voluntary attention. Frontiers in Neuroscience. https://doi.org/10.3389/fnins.2016.00043
Lycan, W. G. (1996). Consciousness and experience. MIT Press.
McDermott, J. H. (2013). Audition. In K. Ochnser & S. M. Kosslyn (Eds.), The Oxford handbook of cognitive neuroscience Volume 1: Core Topics. Oxford: Oxford University Press.
Näätänen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in cognitive neuroscience. Psychological Bulletin, 125(6), 826–859. https://doi.org/10.1037/0033-2909.125.6.826
Nagel, T. (1974). What is it like to be a bat? Philosophical Review, 83, 435–450.
O’Callaghan, C. (2007). Sounds: A philosophical theory. Oxford University Press.
Panek, W., & Stevens, S. S. (1966). Saturation of red: A prothetic continuum. Perception & Psychophysics, 1(1), 59–66. https://doi.org/10.3758/bf03207823
Parr, T., & Friston, K. J. (2019). Attention or salience? Current Opinion in Psychology, 29, 1–5. https://doi.org/10.1016/j.copsyc.2018.10.006
Pautz, A. (2015). The real trouble with phenomenal externalism: New empirical evidence for a brain-based theory of consciousness. In R. Brown (Ed.), Consciousness inside and out: Phenomenology, neuroscience, and the nature of experience (pp. 237–298). Springer.
Pratt, H. (2011). Sensory ERP components. In E. S. Kappenman & S. J. Luck (Eds.), The Oxford handbook of event-related potential components. Oxford University Press.
Röhl, M., & Uppenkamp, S. (2012). Neural coding of sound intensity and loudness in the human auditory system. Journal of the Association for Research in Otolaryngology, 13(3), 369–379. https://doi.org/10.1007/s10162-012-0315-6
Scharf, B. (1978). Loudness. In E. C. Catrerette & M. P. Friedman (Eds.), Handbook of Perception: IV. Hearing. Academic Press.
Schmidt, F. H., Mauermann, M., & Kollmeier, B. (2020). Neural representation of loudness: Cortical evoked potentials in an induced loudness reduction experiment. Trends in Hearing, 24, 1–13. https://doi.org/10.1177/2331216519900595
Schreiner, C., & Malone, B. (2015). Representation of loudness in the auditory cortex. https://doi.org/10.1016/B978-0-444-62630-1.00004-4.
Siegel, E. H., & Stefanucci, J. K. (2011). A little bit louder now: Negative affect increases perceived loudness. Emotion, 11(4), 1006–1011. https://doi.org/10.1037/a0024590
Stevens, S. S. (1957). On the Psychophysical Law. Psychological Review, 64(3), 153–181. https://doi.org/10.1037/h0046162
Stevens, S. S. (1960). The psychophysics of sensory function. American Scientist, 48(2), 226–253.
Stevens, S. S. (1961). To Honor Fechner and Repeal His Law: A power function, not a log function, describes the operating characteristic of a sensory system. Science, 133, 80–86.
Suzuki, Y., & Takeshima, H. (2004). Equal-loudness-level contours for pure tones. The Journal of the Acoustical Society of America, 116(2), 918–933. https://doi.org/10.1121/1.1763601
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136. https://doi.org/10.1016/0010-0285(80)90005-5
Treisman, A., & Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16(3), 459–478. https://doi.org/10.1037/0096-1523.16.3.459
Tye, M. (2000). Consciousness, color, and content. MIT Press.
Wang, C.-A., & Munoz, D. P. (2014). Modulation of stimulus contrast on the human pupil orienting response. European Journal of Neuroscience, 40(5), 2822–2832. https://doi.org/10.1111/ejn.12641
Wang, N., Kreft, H. A., & Oxenham, A. J. (2015). Loudness context effects in normal-hearing listeners and cochlear-implant users. Journal of the Association for Research in Otolaryngology, 16(4), 535–545. https://doi.org/10.1007/s10162-015-0523-y
Wu, W. (2014). Attention. Routledge.
Zahorik, P., & Wightman, F. L. (2001). Loudness constancy with varying sound source distance. Nature Neuroscience, 4(1), 78–83. https://doi.org/10.1038/82931
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
I thank Joe Levine, Louise Antony, Hilary Kornblith, Lisa Sanders, and Miles Tucker for thoughtful discussions on the issues here raised. I also thank two anonymous referees for their helpful commentary on an earlier version of this paper. I am indebted to Joe Levine for his comments and criticisms at each stage of this paper’s development.
Rights and permissions
About this article
Cite this article
Soland, K. Does loudness represent sound intensity?. Synthese 200, 100 (2022). https://doi.org/10.1007/s11229-022-03665-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11229-022-03665-3