Skip to main content
Log in

Does loudness represent sound intensity?

  • Original Research
  • Published:
Synthese Aims and scope Submit manuscript

Abstract

In this paper I challenge the widely held assumption that loudness is the perceptual correlate of sound intensity. Drawing on psychological and neuroscientific evidence, I argue that loudness is best understood not as a representation of any feature of a sound wave, but rather as a reflection of the salience of a sound wave representation; loudness is determined by how much attention a sound receives. Loudness is what I call a quantitative character, a species of phenomenal character that is determined by the amount of attention that an underlying perceptual representation commands. I distinguish quantitative from qualitative character; even qualitative characters that represent degrees of sensible magnitudes are phenomenally and functionally distinct from quantitative characters. A bifurcated account of phenomenal character emerges; the phenomenal is not exhausted by the qualitative.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Psychophysical models of the relationship between loudness and sound intensity in pure tones date back at least to Fechner (1860). Stevens (1961) proposes another influential model on which the perceived loudness of a pure tone is related to its sound intensity by a power law. It has long been acknowledged that factors other than sound intensity play a role in the determination of loudness (a point which will be emphasized in the next section of this paper), but despite these confounds it remains standard to think of loudness as more-or-less determined by sound intensity. For example, Scharf (1978) defines loudness as “the attribute of a sound that changes most readily when sound intensity is varied,” and Epstein and Marozeau (2012) state that loudness is “the primary perceptual correlate of physical sound intensity.”.

  2. The “what it is like” locution as a description for phenomenal character originates in Nagel (1974).

  3. Notable presentations of externalist representationalism about phenomenal character include Dretske (1995), Lycan (1996), and Tye (2000).

  4. For example, O’Callaghan (2007) offers an analysis of loudness amenable to an externalist representationalist account of phenomenal character. “Loudness,” he says, “depends on the intensities of a sound’s spectral constituents” – the various frequencies that comprise the sound (pp. 86–87).

  5. Related to this is the phenomenon of loudness constancy; for at least some sounds, reports of source loudness remain constant—and accurate—even as distance from the source (and thereby proximal intensity) changes. The mechanism by which loudness constancy is achieved is not fully understood. One hypothesis is that source loudness is recovered from encoded intensity as a function of distance from the source, but this requires an accurate representation of source distance, which subjects systematically underestimate. Another possibility is that loudness constancy is achieved not as a function of distance information, but rather a function of information about reverberant sound energy recovered from the proximal stimulation (Zahorik & Wightman, 2001).

  6. See, e.g., Suzuki and Takeshima (2004). This trend reverses with very high frequency sounds for reasons that will take us too far afield to explain; for present purposes, it is enough to appreciate that encoded intensity, and thereby loudness, depends not only on proximal intensity but also, in part, on frequency.

  7. This is why playing both keys together on the piano sounds like a single rough and beat-y tone, rather than two distinct tones as with C and E.

  8. These numbers are idealized for sake of demonstration. The amount of neural activity in a critical band is determined by myriad factors. For example, as I have already noted, the amount of neural activation caused by a tone depends in part on the tone’s frequency, so the lower-frequency critical bands contribute less neural activity than the mid-range ones. For more on the relationship between loudness and critical bands, see, e.g., Schreiner and Malone (2015).

  9. E.g. Wang and Munoz (2014), Liao et al. (2016), Huang and Elhilali (2017).

  10. See, e.g. Itti and Koch (2000), Itti (2005).

  11. This simplified description eschews interesting and important details regarding, e.g., whether and how top-down attention influences this process, how bottom-up inputs from various feature maps are aggregated, and whether visual attention is subserved by one or more than one saliency map. For further discussion, see e.g. Treisman and Sato (1990), Itti and Koch (2000), and Burrows and Moore (2009), respectively.

  12. Since Kayser et al.’s study, several saliency map models of auditory attention have demonstrated improved predictive power by incorporating additional feature maps. Even in these improved models, intensity (sometimes under the guise of ‘envelope’—the “shape” of a waveform from which intensity information may be gleaned)—remains a dominant factor in salience prediction. See, e.g., Kaya (2012).

  13. Another possible explanation is that the model is wrong with respect to encoded intensity’s contribution to auditory salience. This seems unlikely. In order to be wrong in a way that delivers these results—that there is no difference in the influence of encoded intensity on salience between ‘rely’ and non-‘rely’ judgments—instead of expected results—that ‘rely’ judgments involve more influence of encoded intensity on salience than non-‘rely’ judgments—then the model would have to be dramatically wrong about how intensity-based-salience is determined. Given that their model on which intensity alone is able to predict which stimulus is judged as more salient nearly as well as the more complicated model, it would be surprising if they were deeply mistaken about determining salience from intensity.

  14. The ILR study is interesting for another reason. The component of the auditory evoked potential (AEP) that correlates with loudness is the N1-P2 deflection; reductions in loudness correspond to decreasing N1-P2 amplitude. The N1 component of the N1-P2 complex is associated with change detection, including the occurrence of deviant and oddball stimuli (Pratt, 2011). The N1 potential begins around 100 ms after stimulus onset and is believed to be generated by feature traces antecedent to integrated representations of auditory objects. It is hypothesized that feature integration of auditory objects occurs between 150–200 ms after stimulus onset, which overlaps with the end of the N1 component and the beginning of the P2 (Näätänen & Winkler, 1999). In other words, AEP activity that corresponds to loudness also corresponds to the processing immediately before and throughout the generation of a feature-integrated percept. This is precisely where we should expect to see loudness-related activity on the view that loudness reflects salience. After all, the saliency map is the guide by which bottom-up attention directs its resources, and, on FIT, it is this directing of attention that binds features at the attended location. If there is a saliency map for auditory bottom-up attention, we should expect it to arise at the interface of feature traces and feature-integrated auditory objects, and it would seem that the N1-P2 complex is where we should expect to locate that sort of processing. This does not, of course, constitute evidence that loudness is a presentation of salience; there is much more to say about the N1-P2 complex than I have discussed here. I raise it only to suggest that current understanding of the N1-P2 complex is consistent with loudness reflecting salience.

  15. Näätänen and Winkler (1999) point out that one way in which feature-integrated auditory objects differ from feature-integrated visual objects is that the “medium” of object formation is space while the “medium” for auditory object formation is time. Hence, the analog of a saliency map for auditory attention is unlikely to be a topographic spatial representation, and so the notion of a “location” is here an analogy for whatever it is that “temporal maps” encode.

  16. More precisely, but less memorably: many sounds are salient largely because they are encoded as intense; as I have stressed, encoded intensity is not the sole contributor to a sound’s salience.

  17. Hence, it is in principle discoverable that an apparently metathetic sensory dimension—one that bears the heuristic hallmarks described above—is actually prothetic, and vice versa. For example, Stevens was surprised to discover that apparent saturation does not bear a linear relationship to objective saturation, and thereby counts as a prothetic sensory dimension by his lights (Panek & Stevens, 1966).

  18. Indeed, it is an open question in neuroscience whether our brains feature a multisensory magnitude estimator; Baliki et al. (2009) suggest that the insula may be a hub for “how much” representation. It would be beyond the pale to consider such a possibility if magnitudes across modalities were not phenomenally comparable.

  19. An anonymous reviewer points out that silence is sometimes highly salient, as when we describe a tense silence as “deafening,” or when we are awoken by the tv being turned off. Salient silences such as these stand in apparent opposition to the claims that a sound loud to no degree is inaudible and that we cannot be aware of inaudible sounds, for intuitively it is something we hear that wakes us up, and acute awareness of silence that renders it “deafening.” I think there are a few ways that salient silences may be accommodated without violating my salience-based view about loudness. One is to hold that silence may be salient, but not in virtue of its (nonexistent) audible qualities; by extension, it could be held that we may be aware that it is silence without being phenomenally aware of silence. Another is to maintain that what is heard in these cases is not silence, but some other event—an event comprised of a sound and its cessation, perhaps. A broader perspective on salience and its phenomenology must surely feature an analysis of salient silence. However, it is beyond the scope of this paper to present and defend such an account.

  20. An anonymous reviewer comments that because this view relativizes loudness to (types of) perceivers according to the operation of their attention systems, it would seem to permit the possibility of “loudness inversion,” such that distal sound stimuli that are very loud to one perceiver are very soft to her invert, and vice versa. This is correct. One significant difference in the possibility of loudness inversion (and quantitative character more generally) compared to, e.g., hue or pitch inversion (and qualitative character more generally) is that where it would be difficult if not impossible to detect the latter based on the invert's behavior, it would be nearly impossible to miss in the former case. Someone loudness-inverted relative to me, for example, would strain to hear rock concerts and cover her ears in agony at the sound of pin-drops. This is a welcome distinction. My account of quantitative character is, after all, a functional one, so the fact that loudness inversion between individuals predicts “inverted” behavior is a feature of the view.

  21. I am grateful to an anonymous reviewer for raising this point.

  22. I am grateful to an anonymous reviewer for raising this point.

  23. Ascription of salience to stimuli and their features is pervasive in the attention literature. Ascription of salience to feature representations and locations arises within the literature on computational models of bottom-up attention, e.g. Koch and Ullman (1987). Ascription of salience to actions arises within literature that conceives of attention as selection for action, e.g. Kerzel and Schönhammer (2013), including views on which bottom-up attention subserves a “belief optimizing” function of perception by determining the best locations to conduct “experiments” aimed at reduction of uncertainty, i.e. by determining locations of eye saccades (Parr & Friston, 2019).

  24. This is not to say that qualitative characters like hue and pitch pose no threat to externalist representationalism; a defense of externalism about qualitative content must still contend with empirical data to the effect that qualitative continua do not correspond to distal features in the orderly way the view predicts. The difference between the cases against externalism about qualitative character and for quantitative is that, with respect to quantitative character, the case is not constituted principally out of these failures of correspondence; quantitative characters are subject to “coincidental covariation,” as Pautz (2015) calls it, and salience attribution is a competing explanation for quantitative character that has no corollary for qualitative character.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kim Soland.

Ethics declarations

Conflict of interest

The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

I thank Joe Levine, Louise Antony, Hilary Kornblith, Lisa Sanders, and Miles Tucker for thoughtful discussions on the issues here raised. I also thank two anonymous referees for their helpful commentary on an earlier version of this paper. I am indebted to Joe Levine for his comments and criticisms at each stage of this paper’s development.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soland, K. Does loudness represent sound intensity?. Synthese 200, 100 (2022). https://doi.org/10.1007/s11229-022-03665-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11229-022-03665-3

Keywords

Navigation