1 Introduction

In contrast with remembering or imagining, a defining characteristic of perceptual experience is that it affords us with an awareness of how the world is now,Footnote 1 as we experience it. Even perceptual hallucinations and illusions seem, albeit erroneously, to be experiences of something that is occurring or unfolding at the time they are experienced rather than at some point in the past or future. This is as true of experience in each individual sense-modality — vision, touch, taste, smell, etc. — as it is of perception as a whole.Footnote 2 Nevertheless, each modality detects and processes stimuli on a slightly different timescale and is subject to differing delays in transmission, transduction, communication and processing. Sounds, for example, take longer to reach the ears than light does to reach the eyes. Conversely, auditory processing is typically faster than visual processing (Vroomen and Keetels 2010: 871) and has a higher temporal resolution. The conduction of nerve impulses from tactile stimuli, on the other hand, is much slower, with the amount of time these take to travel throughout the body varying in proportion to distance (ibid.). This creates a prima facie difficulty in explaining how, despite these variations across the senses, we seem to experience a unified and coherent perceptual ‘now’. Indeed, as yet there is no consensus as to precisely how the brain solves this temporal binding problem (§2), or whether stimuli from different sensory modalities are in fact unified or ‘bound’ together as occurring simultaneously.Footnote 3

One mechanism that has been proposed in response to this problem is that perceptual processing is divided into a series of discrete ‘temporal windows’, each lasting a short, but measurable period of objective time — typically between 30 and 60 ms. These minimal processing units, or “functional moments” (Pöppel 1970), are claimed to co-ordinate and structure the processing of sensory information irrespective of the originating modality. On the simplest version of this view, stimuli that fall within the same temporal window are experienced as occurring simultaneously, while stimuli that fall within successive windows are experienced as occurring before or after one another, depending on the order of the relevant temporal windows. The view thus proposes that (a) our conscious experience of simultaneity and temporal order across multiple sense-modalities can be explained in terms of (b) the temporal structure of perceptual processing in the brain. Specifically, it takes this processing to be segmented into discrete and successive intervals of objective time.

In this paper I evaluate two of the main sources of evidence for the temporal windows hypothesis, namely periodicity in reaction times (§3) and inter-sensory binding (§4). I argue that in each case, though the existence of such effects is suggestive, it does not establish the version of the hypothesis that its original proponents (e.g. Pöppel 1970, 1997, 2009) favour. Indeed, the evidence is compatible with the existence of many such windows, each differing in its temporal properties and functional role. While this is not in itself a novel view,Footnote 4 it suggests a need for greater precision in defining and describing various types of temporal windows. To this end, I propose a partial taxonomy of temporal window types and their characteristics which facilitates a fuller description of their nature and interrelations (§5). Finally, I evaluate some of the philosophical implications of the kind of multi-window models of perceptual processing that better accommodate the available empirical evidence (§6). This in turn suggests fruitful avenues for future empirical and philosophical work that bears not only upon the temporal structure of perceptual processing, but our understanding of mental states or processes more generally (cf. Herzog et al. 2020).

2 The Temporal Binding Problem

The perception of external stimuli via different sensory modalities as simultaneous or successive must take into account delays in: (i) transmission, e.g. of light to the eyes or sound to the ears; (ii) transduction via the relevant sensory surfaces; (iii) communication via the nervous system; and (iv) processing by the brain. Each of these processes takes a measurable amount of objective time, with the precise duration depending on the sense-modality and stimulus type in question. In auditory perception, for example, sound waves take time to reach the ears (transmission), which convert their mechanical energy into electrical impulses (transduction) that are then communicated via the auditory nerves to the auditory cortex where they are processed. Similarly for vision, olfaction, and so on. The resulting temporal resolution and processing lag therefore differs such that events that are objectively simultaneous in the world are not necessarily simultaneous at the skin or sensory surfaces, nor are they processed simultaneously within the brain (cf. Harrar et al. 2016).

This raises a puzzle concerning how experiences of simultaneity, succession and duration are possible across multiple sensory modalities, and in what sense we can be said perceive the world as it is ‘now’ via multiple senses, rather than as being ‘smeared out’ across a range of different times. Call this the temporal binding problem (cf. Pöppel and Bao 2014).

2.1 Temporal Binding

The phenomenon of temporal binding concerns the “grouping together of separate events occurring at different time points into one coherent and meaningful event sequence” (Buehner 2010: 202; emphasis removed). This is typically operationalised by psychologists as occurring when two or more stimuli are subjectively experienced as closer together in time than they actually are.Footnote 5 For example, a sound occurring 100 ms before or after a flash causes the flash to be perceived 5 ms earlier or later than it really is, respectively; the sound ‘drags’ or “ventriloquizes” the apparent timing of the flash (Vroomen and de Gelder 2004; Chen and Vroomen 2013). This temporal binding effect can occur both within and across sensory modalities such that stimuli experienced via two or more modalities can seem to occur simultaneously despite differences in onset, detection, and processing times.

In practice, the subjectively experienced timing and sequence of events can differ from their objective timing and sequence due to a variety of factors and effects. These include temporal ventriloquism (ibid.), cross-modal influences such as the McGurk effect (McGurk and MacDonald 1976), and adaptation or “temporal recalibration” (Vroomen and Keetels 2010: 878). The prevalence of such temporal effects suggests the existence of a mechanism, or mechanisms, that group together events that are likely to be causally or semantically related. By analogy with feature integration theory (Treisman and Gelade 1980), events may be integrated, or “bound”, by attributing them to a common source, as in the case of inter-sensory binding (§4). Alternatively, they may be linked in virtue of, for example, falling within a certain minimum interval or processing cycle (§3).

Crucially, the temporal binding problem cannot be solved by simply offsetting or delaying the content of each sensory channel to compensate for transduction and processing delays. In many cases, the relevant delay will vary depending on a range of factors, such as the distance of the source object (in audition), bodily location (in touch), and stimulus intensity. Moreover, introducing unnecessary delays into perceptual processing in order to give time to process all the data together would slow it down and decrease temporal resolution to the point where it is no longer useful for accurately planning and tracking bodily actions. Nor is the problem simply a matter of temporal precision or resolution, though this presumably also plays a role with audition and vision, for example, resulting in more precise and so higher resolution representations of temporal information than, say, taste or smell. The problem is that without some method or principle of alignment, the information in different sensory processing streams is not directly comparable, since the time of detection and/or processing do not accurately track the objective temporal order of external events, at least at the early stages of perceptual processing.Footnote 6 What is needed, it would seem, is a common frame of reference.

2.2 The Temporal Windows Hypothesis

According to what I will call the temporal windows hypothesis, events are experienced as simultaneous or otherwise closer together in time in virtue of falling within the same processing cycle, or ‘temporal window’. The duration and timing of these windows are thought to be determined by the relevant brain rhythm — e.g. the alpha rhythm (Cecere et al. 2015; Bastiaansen et al. 2020) — which may be reset by the onset of some salient task-relevant stimulus. Thus, instead of the processing that gives rise to perceptual experience being mathematically continuous and infinitely divisible, it consists of a series of discrete non-overlapping units, or “functional moments” (Pöppel 1970), arranged successively in objective time. Events that fall within the same temporal window are experienced as, or judged to be, occurring simultaneously.Footnote 7 Events that fall within different temporal windows are experienced as, or judged to be, occurring before or after one another, depending on the order of their respective temporal windows. Moreover, temporal windows may be used to explain the perception of felt duration, with events occupying sub-fractions or multiples of the overall window duration (Merino-Rajme 2014).

The temporal windows hypothesis explains temporal binding in terms of the processing of relevant stimuli being grouped within the same temporal window, which in turn determines the perceived timing of events. This offers an attractively simple solution to the temporal binding problem: irrespective of the originating sense-modality, perceptual stimuli are grouped together, or ‘bound’, into a series of discrete temporal windows, thereby enabling the perception of synchrony and temporal order across modalities. Combined with appropriate corrections to compensate for processing lag, the hypothesis thus gives a unified and reductive account of temporal binding and synchrony perception across the senses on the basis of a single theoretical posit: the temporal window.

This hypothesis also has implications for the metaphysics of perceptual experience. That the experienced order of events depends directly upon the contents of these windows, and only indirectly upon the objective order of events, makes the temporal windows hypothesis a candidate for an intentional, aka “retentional” (Dainton 2018), view of temporal experience according to which temporal properties are represented by, or within, individual processing cycles. This contrasts with extensional views, such as Naïve Realism (ibid.), according to which the temporal properties of subjective experience are, in the normal case, directly inherited from or identical to the objective temporal properties of external events.Footnote 8 Though it is not the aim of this paper to explore the implications of this metaphysical distinction — something that has been done extensively elsewhereFootnote 9 — it is significant for the philosophy of perception, as well as temporal experience more generally, that an empirical hypothesis about the nature of sensory processing may have some bearing upon these longstanding debates (cf. Callender 2008, 2017; Montemayor 2012; Herzog et al. 2020).

This topic also bears upon philosophical questions concerning the continuity (or otherwise) of experience (Dainton 2014), its process- or state-like nature (Steward 1997; Soteriou 2013), and the nature and duration of the psychological or ‘specious’ present (James 1890; Hoerl 2013). Many of the philosophers participating in these debates, however, have tended to abstract away from the multisensory nature of perceptual experience and the neural processing that underlies it, instead treating experience as a uniform and homogenous whole.Footnote 10 But if the temporal windows hypothesis is correct, these details are of crucial importance to explaining our experience of a unified and integrated perceptual ‘now’, as well as describing the operation of the causal mechanisms responsible for structuring experience over time (cf. Cecere et al. 2015). It is therefore incumbent upon philosophers of perception and mind to better understand these psychological phenomena and abstractions in order to evaluate their implications — an issue I return to below (§6).

3 Periodicity

Perhaps the strongest evidence for the existence of discrete cycles in perceptual processing comes from the discovery of periodicities in reaction times (RTs) for perceptual tasks. A 1993 study by Dehaene provides a particularly clear illustration.Footnote 11 In a visual conjunction task, Dehaene observed periodicities of approx. 30 ms across multiple subjects. RTs were measured to an accuracy of 1 ms. To avoid statistical artefacts, the raw data were analysed using Fast Fourier Transforms to extract cyclical patterns. Dehaene found that “responses are not distributed randomly with respect to stimulus presentation, but are emitted more frequently at regularly recurring time intervals after the stimulus first appeared” (ibid. 267). That is, participants were significantly more likely to respond at multiples of 30 ms — e.g. 210, 240 or 270 ms — after the initial stimulus than they were at other times — e.g. 200, 230 or 250 ms — suggesting the existence of a periodic processing cycle.

The resulting periodicities were only apparent when the data were analysed on a per-subject basis, with individual subjects showing “substantially different oscillation periods” (ibid. 266). That the effect was not apparent when multiple subjects’ data were averaged together (ibid. 268) helps to explain why such periodicities are not more widely observed, and rules out them being an artefact of the experimental setup or analysis. The duration of the periodicity also varied between tasks, with auditory tasks yielding shorter cycles than visual ones, and complex conjunction tasks yielding longer cycles than simpler feature detection tasks in both modalities (see below). Nevertheless, Dehaene’s analysis identifies both fixed and variable portions of RTs, concluding that “a response is generally initiated after four to seven processing cycles” — a feature which “remains remarkably constant across variations in task difficulty (ibid. 267) . With a duration of 225±40 ms, the former “can be tentatively ascribed to stimulus transduction and motor response” (ibid.), with the latter varying on a periodic basis. Moreover, as Pöppel (1970) also observed, these results cannot be explained in terms of a free-running oscillator, but rather “the phase of the oscillation must be reset on each trial” (Dehaene 1993: 267) by the onset of the experimental stimulus, since if this were not the case then no periodicity would be apparent.

3.1 Simple Views

Dehaene’s study and others like it, however, are open to multiple interpretations. One possibility is that the timing of, and interaction between, sensory processes is co-ordinated by a central ‘master clock’ with a frequency of around 30 Hz, yielding a temporal window duration in the order of 30–40 ms. Call this the Simple View of perceptual processing, or sv for short. This is the view of Pöppel (1970), though he has since adopted a hierarchical model (Pöppel 1997, 2009). But while sv is consistent with Pöppel and Dehaene’s findings for visual tasks, the periodicity for auditory tasks is closer to 80 Hz, giving a temporal window period of just 12 ms (Dehaene 1993: 268). More challenging tasks or combinations of modalities yield temporal windows of other durations, calling into question the whole idea of a unified “central intermittency” (Pöppel 1970).

Moreover, it is unclear that periodicities in RTs are wholly attributable to perceptual processing. Dehaene’s experimental tasks require subjects to perform a physical action, such as pressing a button, in response to conscious awareness of sensory stimuli. The resulting periodicities might therefore be due to cyclical processing in the motor system or consciousness itself as opposed to being purely perceptual. Indeed, observation of periodicities in motor tasks suggests that these too involve some form of cyclical processing (Reimer and Hatsopoulos 2010; Buzsáki 2006). This goes against Dehaene’s hypothesis that the fixed portion of the RT is due to transduction and motor response with the variable, and so periodic, portion being due to perception. Alternatively, both systems may involve cyclical processes of similar or differing durations, making it difficult to attribute the resulting periodicity to perception alone.

Even if sv were correct, however, as characterised above the view fails to differentiate between competing hypotheses concerning perceptual processing within and across the senses. These concern whether events that are processed at different points within the same temporal window are experienced as (1) simultaneous, (2) standing in no defined temporal order, or (3) standing in a defined temporal order with respect to one another.Footnote 12 We can characterise the corresponding variants of sv as follows:

  • sv1 Events that fall within the same temporal window are experienced as simultaneous.

  • sv2 Events that fall within the same temporal window may be experienced as simultaneous or non-simultaneous, but not as before or after one another.

  • sv3 Events that fall within the same temporal window may be experienced as either (i) simultaneous with, (ii) before, or (iii) after one another.

Advocates of sv, including Pöppel (1970), typically assume sv1, but Dehaene’s data may be better accommodated by sv2 or sv3. These yield an explanation of temporal binding in which events can be perceived as being closer together in time than they actually are, while leaving open that there may be a lower bound upon the detection of simultaneity, as opposed to temporal order, at least in certain modalities.

Sv 2 in particular fits well with data from studies comparing temporal order (ToJ) versus simultaneity judgements (SJ), which suggest that the latter have a shorter, and so more precise, threshold than the former (Vroomen and Keetels 2010). Thus, subjects can correctly perceive certain pairs of stimuli as being non-simultaneous despite being unable to reliably report which stimulus came first (ibid. 872). Numerous explanations have been proposed for this discrepancy,Footnote 13 but it is possible that subjects simply use different criteria for reporting simultaneity and temporal order. On this view, subjects are more inclined to report events as being non-simultaneous even if they are unsure of their temporal order than they are to report one event as occurring before or after another, despite the former logically entailing the latter. Furthermore, the experimental data only show that subjects are at chance in reporting the experienced order of events, not that they fail to experience them as standing in any particular order. It is therefore possible that the discrepancy between ToJ and SJ thresholds is attributable to an operational failure, e.g. of short-term memory or other epistemic factors that affect subjects’ abilities to accurately access or report their experiences, rather than a difference in the nature or content of experience itself.

A more pressing problem for all three variants of sv, however, is that the threshold for perceived simultaneity differs between sensory modalities (ibid. 874). Auditory stimuli as little as 10 to 15 ms apart, for example, are typically experienced as distinct sounds rather than as a single event, suggesting a much shorter processing window for audition than for vision. Indeed, this is precisely what Dehaene’s study suggests, with auditory feature detection tasks exhibiting a periodicity of between half and one third the duration the periodicity in comparable visual tasks, though neither is the visual periodicity a straightforward multiple of the auditory periodicity.

While these results can to some extent be accommodated by sv2 or sv3, since these views allow for a degree of internal temporal structure, this cannot explain the variations in periodicity between tasks of differing complexity, or which involve different sense-modality pairings. Rather, it suggests that far from being a central feature of sensory processing, as adherents of sv claim, temporal window duration may be modality- or even task-specific.

3.2 Modality-Specific Views

One response to the above objection to sv would be to posit that the relevant cycles lie in the early stages of perceptual processing and so are modality-specific. We can thus imagine versions of sv1 through sv3 in which events in each individual sense-modality that fall within the same temporal windows are experienced as either (1) simultaneous, (2) simultaneous or non-simultaneous, but not in any defined order, or (3) simultaneous, before or after one another. Call these variants mv1 through mv3, respectively. Despite better accommodating the evidence, however, the resulting views have less explanatory power than sv since they fail to explain the perception of synchrony across multiple senses. Thus, while this would go some way towards explaining temporal binding within sensory modalities, e.g. vision, it does not seem to offer any particular advantage over non-temporal window based views in inter-modal cases.

Furthermore, if each sense-modality has a different temporal window length, then mv1 and mv2 arguably make solving the temporal binding problem even harder, since events that are experienced as simultaneous or as standing in no defined temporal order in one sense-modality, e.g. audition, may well be experienced as standing in different temporal relations to events experienced in another sense-modality; e.g. vision.Footnote 14 Given that explaining how events in different sensory modalities can be experienced as simultaneous or successive is one of the motivations for appealing to the notion of a temporal window in the first place, it is difficult to see how this represents an improvement upon alternative views, or how it explains the periodicities in Dehaene and others data for cross-modal tasks. As a solution to the temporal binding problem, then, modality-specific views look to be a non-starter. They can, however, form a component of a more comprehensive multi-window account, as discussed in §5.

4 Inter-Sensory Binding

The second main source of evidence for the existence of temporal windows comes from inter-sensory binding. It is well-known that the brain uses information from multiple sense-modalities to improve the accuracy of spatial perception (e.g. Welch and Warren 1980). Subjects are typically more accurate in spatial tasks that involve, for example, visual and tactual cues than they are in comparable visual- or tactual-only tasks, particularly where one or both signals are noisy. This is known as the multisensory enhancement effect. If stimuli from multiple modalities are integrated or ‘bound’ together in time, one might expect a comparable enhancement in temporal accuracy.

In fact, RTs in multisensory tasks are known to improve upon performance in any one of the contributing modalities alone (Miller 1986). This rules out a simple ‘race model’ in which subjects react as soon as the first modality is detected in favour of a model where the additional modalities contribute to and improve upon task performance. The precise mechanism for this is as yet unknown,Footnote 15 but a small yet significant temporal enhancement effect has been demonstrated by Harrar et al. (2016). Several features of this study are significant for the current debate.

4.1 Window of Simultaneity

In a time-sensitive audiovisual task, Harrar et al. demonstrated an improvement in performance for stimuli experienced via multiple modalities as compared any one modality alone (ibid.). Crucially, this temporal enhancement effect was most pronounced for stimuli that were objectively simultaneous, i.e. simultaneous in the world, as opposed to at the sensory surfaces or in the brain. This shows that the brain is capable of detecting and compensating for transmission, transduction and processing delays in order to accurately determine whether events are truly simultaneous across multiple sensory modalities, as opposed to merely being detected or processed simultaneously. While the precise mechanism for this unknown, it suggests the existence of a sophisticated and finely-tuned system for the detection of objective simultaneity at a relatively early stage of perceptual processing. Moreover, the temporal window for this effect is no longer than 10 ms, and so has a much higher resolution than the kinds of temporal window discussed above. Indeed, the actual window could be even shorter since this was at the limit of the resolution of Harrar et al.’s data, suggesting the existence of a lower bound upon the detection of simultaneity across, and possibly within, sensory modalities independently of the periodicity considerations discussed in §3.

There is as yet, however, no evidence to suggest that these ‘windows of simultaneity’ are successive in time (i.e. discrete), as opposed to a continuously rolling threshold or “sliding window” (Doerig et al. 2019) for the detection of simultaneity. The existence of such a threshold would be neither controversial nor, I would argue, of great significance for the ontology of perceptual experience as one would naturally expect the neural mechanisms responsible for processing and comparing signals across sensory channels to have some finite resolution. Harrar et al., however, go on to argue that the brain actively works to bring the temporal order of events at later stages in the processing stream into closer alignment with the order of external events. Since such processing already operates at a near-optimal rate, this appears to be achieved by slowing down some of the faster elements of, for example, auditory processing, which is typically faster than visual processing, while simultaneously speeding up the slower aspects of visual processing.Footnote 16 This brings the time of processing events in the brain into closer alignment with the objective timing of events in the world, thereby helping to offset transmission and processing delays.Footnote 17

4.2 Window of Multisensory Integration

While the temporal enhancement effect supports the existence of a sub-10 ms window of simultaneity at a relatively early stage of perceptual processing, other cross-modal phenomena suggest the existence of a longer duration window. We can tolerate delays between vision and speech, for example, in the region of 75–125 ms before the relevant stimuli no longer seem to originate from the same source object (Vroomen and Keetels 2010: 874). Consider, for example, watching a video with a slightly desynchronised audio track. Up to a point, the voice seems to come from the mouth of the person who appears to be speaking on screen. After that point, however, one no longer seems to experience a single object or event, but rather two distinct sources: one visual, the other auditory. Similarly, the McGurk effect, in which seen lip movements affect what subjects seem to hear, only occurs when audio and visual stimuli fall within a limited temporal window (McGurk and MacDonald 1976). Though speech perception may be something of a special case, similar effects occur when a single stimulus is presented to other senses non-synchronously — in taste and smell, for example (Lim and Johnson 2011: 288).

Unlike periodicity effects, which vary between the senses, and indeed between tasks, the window for this kind of sensory binding appears to be relatively stable across modalities. This suggests the existence of a common mechanism that enables stimuli detected via multiple modalities to be attributed or ‘bound’ to the same source object or event (cf. O’Callaghan 2017). With a duration of approximately 100 ms, the resulting ‘window of multisensory integration’ is much longer than any of those considered so far, and so unlikely to be a result of the same mechanism responsible for periodicity effects. Nor are the events within a single window perceived as simultaneous, as per sv1. It does, however, share some other features of sv, and in particular sv3, which allows for an internal temporal ordering of events within each temporal window.

Crucially, the window of multisensory integration appears to be periodic rather than a sliding threshold. The strongest evidence for this comes from the sound-induced flash illusion (sifa) in which subjects are shown simultaneous audiovisual stimuli — a flash and a beep — followed by an auditory-only stimulus, i.e. a second beep. Many subjects report the experience of an illusory second flash at the same time as the second auditory stimulus (Shams et al. 2000). This is thought to be caused by the erroneous categorisation of the first and second stimuli as events of the same type.Footnote 18

Importantly, sifa only occurs when the first and second stimuli are presented in rapid succession. Cecere et al. (2015) hypothesised that the illusory flash is only experienced when both the initial audiovisual and subsequent auditory-only stimuli fall into the same cycle of the brain’s alpha rhythm, which has a frequency of 10–12 Hz. Conversely, when the second stimulus falls into a subsequent alpha cycle, no illusory flash is reported. By experimentally manipulating the timing of the alpha cycle via a salient stimulus, Cecere et al. were able to predict the occurrence of the illusory second flash with a reasonable degree of accuracy — a finding since replicated by Keil and Senkowski (2017). This suggests that (i) inter-sensory binding occurs on a periodic or discrete basis, rather than as a continuously sliding window, and (ii) the effects of inter-sensory binding apply only within a single alpha cycle, and do not persist into subsequent cycles. The alpha rhythm also slows down with age, lengthening the duration of the resulting temporal window (Surwillo 1961). A similar age-related effect has been shown in studies of sifa (McGovern et al. 2014), suggesting that both effects may well be tied to the same periodic rhythm.

5 A Partial Taxonomy

Having reviewed some of the evidence for the existence of temporal windows, it is clear that this does not support the simple view of perceptual processing (sv) according to which there is just one multimodal window that co-ordinates perceptual processing. Specifically, sv1 through sv3 are inconsistent with the evidence that periodicity in RTs vary between modalities and tasks. Moreover, modality-specific views such as mv1 through mv3, do not explain the experience of simultaneity and succession across multiple modalities, nor the multisensory enhancement or inter-sensory binding effects discussed in §4. However, this does not preclude them from forming a component of a more complex multi-window model (cf. Montemayor 2013). Indeed, consideration of cross-modal phenomena, such as sifa, strongly suggests the existence of multiple such windows with differing durations, functional roles, and other characteristics. To date, however, there has been little attempt to systematically categorise these windows, resulting in a lack of precision in the scientific literature concerning the existence and nature of temporal windows and their relation to perceptual experience. Indeed, the terms ‘temporal window’ and ‘window of integration’ are themselves ambiguous, and used more or less interchangeably to identify a variety of different phenomena from a simple rolling threshold or degree of tolerance to the existence of discrete processing cycles within which certain functions, such as inter-sensory binding, are performed.

In this section I propose a basic taxonomy or framework of possible temporal window types and their characteristics that aims to accommodate the multiplicity of timescales and ways in which perceptual processing occurs (§5.1), along with how the resulting windows are interrelated (§5.2). The taxonomy is partial, and I do not propose or defend any particular model of temporal processing or the neural mechanisms that underpin it. Rather, by setting out a range of properties and distinctions, the taxonomy aims to facilitate more precise characterisation and investigation of such models, as well as highlighting important areas for future empirical and philosophical research.

5.1 Properties of Temporal Windows

We can define a temporal window as the distinct timescale or interval of objective time over which a given neural process, or set of processes, operates.Footnote 19 In addition to their varying roles in perceptual processing (see below), for each window type we can identify a range of additional properties, or dimensions along which it may vary. These include:

  1. (1)

    Period: the objective duration, or range of durations, of the relevant window type.Footnote 20

  2. (2)

    Uniformity: whether the window is of fixed or variable length; e.g. depending upon task or modality.

  3. (3)

    Periodicity: whether the window (i) consists of discrete cycles that occur successively in objective time, (ii) is a rolling tolerance or threshold (i.e. a ‘sliding window’), or (iii) is non-recurrent.

  4. (4)

    Resettability: periodic windows may be resettable by some other factor, such as a salient stimuli or the deployment of attention, or monotonic, i.e. non-resettable.Footnote 21

  5. (5)

    Modality: whether the window is (i) unimodal, i.e. relates to a single sensory modality or channel, (ii) primarily associated with one sense-modality or channel, but with cross-modal influences from other modalities, (iii) multimodal, i.e. relates to multiple sense-modalities or channel, or (iv) amodal, i.e. not associated with any particular sense-modality or channel.

  6. (6)

    Temporal structure: whether events falling within the same temporal window are experienced as (i) synchronous, (ii) either synchronous or non-synchronous, but without standing in any defined temporal order, or (iii) standing in some defined temporal order.

  7. (7)

    Tense: for temporally structured windows, events may be experienced as either (a) tensed, i.e. having A-theoretical properties of past, present and/or future, or (b) tenseless, i.e. having B-theoretical before/after properties.Footnote 22

The above list is not exhaustive, but enables various fine-grained distinctions to be drawn between possible temporal window types, avoiding the potential for conflation and terminological confusion. This in turn enables a more precise characterisation and classification of the temporal windows posited by various models of temporal perception. The various windows discussed in §34 along with a representative sample of others from the philosophical literature, for example, may be more accurately characterised as per Table 1, arranged in order of approximate duration (the numbered columns corresponds to the features listed above).

Table 1 Examples of temporal window types

Of course, not all of these windows exist. The multimodal window of simultaneity proposed by Harrar et al. and the unimodal simultaneity windows of Montemayor (2013), for example, are clearly intended as alternatives. Similarly, the window of multisensory integration and window of conscious perception (Herzog et al. 2020), both of whose contents are consciously accessible, and the ‘sensorial present’ (Montemayor 2013), which is unconscious, offer competing explanations of inter-sensory binding. Pöppel and Bao (2014), on the other hand, propose a two- to three-second window to explain various higher-level cognitive effects such as the experience of perceptual presence and conscious thought, closely corresponding to Montemayor’s ‘phenomenal present’, which operates on a similar timescale, and Wittmann’s (2011) ‘experienced moment’. These in turn are compatible with Merino-Rajme’s (2014) quantum theory of duration perception, for which no specific duration of the perceptual quanta — a form of temporal window — is given. This highlights an emerging consensus concerning the role of temporal windows in conscious experience. The details of this, however, are difficult to assess since the precise characteristics of the relevant temporal windows are either underspecified or unknown, as highlighted by the question marks in Table 1.

This highlights the need for additional empirical research to ascertain, for example, whether windows of simultaneity are discrete or rolling, and uni- or multimodal. Indeed, the distinction between discrete and rolling windows has particular significance for the metaphysics of experience (§6), but is often left implicit by those writing about temporal windows — a term that should arguably reserved for the former. This in turn highlights the need for greater clarity in both philosophical and empirical research to spell out the precise details of proposed models for the temporal structure of experience.

5.2 Inter-Window Relations

The identification of multiple temporal windows does not yet settle the question of how, if at all, these windows are related. Here we can identify a further range of possible inter-window relations, including:

  1. (A)

    Synchronisation: whether temporal windows of different types or functions are either free-floating or synchronised, e.g. phase-locked, with respect to one another.

  2. (B)

    Dependency: whether such windows are causally or constitutively dependent upon one another.

  3. (C)

    Encapsulation: whether the processing in distinct windows is informationally and/or computationally encapsulated with respect to one other. That is, does processing within one window have access to, or compute over, information processed within the other window?

For example, lower-level unisensory windows may be phase-locked with higher-level multisensory mechanisms that detect the timing of events across multiple sense-modalities such that the former always (or normally) form a subinterval of the latter. Conversely, the timing of distinct temporal windows may be largely or entirely independent such that there is no causal or constitutive relationship between them. In this case, the timing of distinct temporal windows will float freely with respect to each another, or is only brought into alignment by certain triggers, such as the detection of some salient, high-intensity or consciously attended stimulus, as in the case of the window of multisensory integration (§4.2). A considerable amount of empirical work remains to be done in order to ascertain the precise mechanisms by which temporal windows are co-ordinated or synchronised, if indeed they are.

6 The Structured Present

Ruling out simple temporal window-based views leaves open the possibility that perceptual processing is divisible into multiple such windows. This better accommodates the empirical evidence by positing a collection of distinct temporal windows that together ground, or constitute, the experience of events as occurring successively or simultaneously across multiple sensory modalities. We can think of such windows as forming a loosely hierarchical structure, with shorter duration windows, such as the window of simultaneity, being subsumed by longer duration windows; e.g. the window of multisensory integration. In this section I evaluate some implications of multi-window views for the metaphysics of experience.

The question of whether temporal windows form a strict hierarchy in which every window except one is subordinate to some other window is equivalent to the question of whether (a) shorter duration windows — the ‘lower’ levels in the hierarchy — are always nested within longer duration windows — the ‘higher’ levels in the hierarchy — or (b) the former sometimes straddle boundaries between the latter, in which case they form a looser kind of structure. In either case, however, we can usefully talk of the structured present (sp) in which the experience of an apparently unified perceptual ‘now’ is explained in terms of the activity of multiple ‘levels’ or layers of temporal processing, not all of which need be consciously accessible.Footnote 23 In such a model, which includes a broad range of multi-window views, each level operates on a different timescale and is more or less coordinated with other levels to yield the experience of simultaneity and succession over time.

According to sp, then, the experience of how things perceptually are at the time they are experienced consists or is grounded in a series of discrete or continuous processes taking place over different, but overlapping intervals of objective time. As such, rather than one perceptual moment ‘replacing’ or being succeeded by another, as per sv and mv, experience is constantly being renewed with the ‘upper’ layers of the temporal hierarchy providing continuing context for ‘lower’ layers of discrete sensory processing. Taking the relevant timescales into account, what emerges is a picture of the perceptual or psychological present that is not strictly linear, but ‘smeared out’ over objective time. This contrasts with William James’s (1890) notion of the “specious present”, according to which subjective or experienced duration need not correspond to the objective duration of corresponding experience, though is not strictly incompatible with it. Rather, it posits that experiences possess a temporal grain or microstructure, the coarseness of which varies across the various levels of the temporal hierarchy.

There is, however, a difficulty in moving from talking about perceptual processing to talk of perceptual experience. Indeed, this is a temporal corollary of the traditional mind–body problem (Chalmers 1995). Though fully addressing this issue lies beyond the scope of this paper, given the naturalistic — and in the author’s view plausible — assumption that such processing grounds, realises or is otherwise constitutive of experience,Footnote 24 one would expect the temporal structure of perceptual processing to place corresponding constraints upon the phenomenal character of experience. These constraints may not be directly accessible to introspection, but will have consequences for, or place limits upon, the phenomenal character of experiences that are introspectible in a way that is of general interest to philosophers of mind and perception. Given this simplifying assumption, what consequences does sp have for our understanding of perceptual and temporal experience? In the remainder of this section, I focus on two issues: the granularity of perceptual experience (§6.1) and the duration of the experiential ‘now’ (§6.2).

6.1 Temporal Grain

The first consequence concerns the ‘chunky’ or granular nature of perceptual processing. While philosophers have tended to assume that perceptual experience is mathematically continuous (cf. Dainton 2014) or homeomerous, i.e. consisting of similar parts (Phillips 2009: 96–109), sp suggests that this view is false. If experiences are grounded in cyclical or periodic processes that are not themselves divisible into the same kind of units, then the assumption is incorrect. This does not mean that the neural processes that ground the relevant periodic processing are not continuous — something that will depend ultimately upon fundamental physics. Nor does it mean that experiential contents or objects are experienced as non-continuous or ‘gappy’ (cf. Rashbrook 2011). It does, however, place limits upon the minimum duration of such processing that can be regarded as giving rise to experience, since at least one cycle of at least some, or possibly all, of the relevant temporal windows is presumably necessary to generate a conscious percept.

If sp is correct then there is, strictly speaking, no such a thing as perceptual experience at a time, where this is taken to mean a metaphysical instant.Footnote 25 Rather, perceivers like us are only capable of having experiences over a certain interval, this corresponding to the longest temporal window that is necessary for the generation conscious experience. We should therefore be wary of attributing representational content to experience at a given time. At best, such content will be attributable over some defined interval, but since sp posits multiple processing cycles of varying durations, this makes it difficult to identify a single unified content of experience. Rather, the resulting content, or contents, will depend upon the precise interval in question. Alternatively, intentional content may be attributable only to a given temporal window. In this case the content of experience would consist of the content at some defined level in the processing hierarchy — say, the window of multisensory integration — or else the conjunction of the contents of all currently active temporal windows. Naïve Realist or extensionalist views of temporal experience, on the other hand, take experienced temporal structure to mirror or “inherit” the objective temporal structure of external events (Phillips 2014: 142). But this neglects to account for the existence of temporal grain, which is a feature of experience that is not inherited in this way. sp thus poses a prima facie challenge to both representational and non-representational views of temporal experience.

The granularity of perceptual processing also has consequences for the individuation of experiences. sp posits multiple overlapping windows at different levels in the processing hierarchy. In the absence of strict inter-level synchronisation (§5.2), this means there are no natural temporal boundaries by which the perceptual processing that grounds perceptual experience can be individuated. Were such processing homeomerous and so, at least in theory, infinitely divisible, any point in time might equally be considered the ‘start’ or ‘end’ of a given experience. sp, however, identifies multiple possible ‘joints’ at which perceptual episodes may be carved, corresponding to the boundaries between temporal windows at various levels in the processing hierarchy. Some of those joints, however, may cut across processes at others levels in the hierarchy that do not have a joint at that time. This makes it difficult to individuate discrete experiential episodes on the basis of underlying perceptual processes, since different levels in the hierarchy will overlap one another across different intervals of objective time. Consequently, unless there are points at which all temporal windows are brought into synchronisation, or some privileged level in the hierarchy, perceptual episodes have no natural start or end points other than the beginning or end of a conscious episode (cf. Tye 2003: 97).

Despite its granular structure, then, sp alone does not mandate any particular boundaries by which experiences can be individuated without cross-cutting at least some temporal windows. As such, perceptual processing is neither wholly continuous, since it consists of multiple layers of cyclical processes, nor wholly discrete, since these processes overlap in objective time in various ways. If this is correct, then the temporal microstructure of perceptual processing is, as Phillips (2009: 97) puts it, “lumpy”, rather than entirely discrete or smooth. Indeed, it is likely a complex hybrid of both continuous and discrete processing. Absent further justification, then, we should be wary of abstracting perceptual episodes from the longer stretches of experience of which they form a part.Footnote 26

6.2 The Perceptual Present

The second consequence of sp concerns the duration of the perceptual present or ‘now’. Following James (1890), many philosophers have endorsed the idea that the experiential present consists of an extended interval of time: the so-called specious present. On a Naïve Realist or “extensional” view of experience (Dainton 2018), this corresponds to an interval of objective time. On intentional, or “retentional” (ibid.), views the duration of the specious present will depend upon time-indexed contents of experience, and so is ‘specious’ in the sense of not matching the objective duration of the interval experienced. Nevertheless, the processing that grounds our experience of the present is standardly assumed to occupy a specific interval of clock time, variously argued to be somewhere between 30 ms and three seconds. Similarly, one might hypothesise that the minimal unit of perceptual processing, and so experience, lies somewhere in this range.

According to sp, however, the precise duration of these units will depend upon the level in the temporary hierarchy that one focuses on. Given what we know about inter-sensory binding (§4), one might identify the longest temporal window in the processing hierarchy as corresponding to the perceptual ‘now’. If so, this would include the processing of events that are subjectively experienced as non-simultaneous or successive, and so more than just the perceptual present. Rather, the perceptual ‘now’ would correspond to the experience of a short interval of subjective time. Conversely, were one to choose the shortest temporal window, e.g. the window of simultaneity, the resulting interval may be of insufficient duration to correspond to any conscious experience at all, and so fails to target the intended phenomenon. The answer, it would seem, must lie somewhere in between. But, absent further empirical justification, the selection of any specific level in the processing hierarchy will be to some extent arbitrary since it is, on plausible naturalist assumptions, the activity of the hierarchy as a whole that grounds the experience of temporal order, simultaneity and duration.

If sp is correct, then, questions about the precise duration of the perceptual present, or minimal unit of experience, are poorly founded. Just as we cannot individuate perceptual episodes by the start or end of the underlying perceptual processes, there need be no single unit which corresponds to the experience of the perceptual ‘now’. Rather, perceptual processing consists of a series of temporally overlapping processes occupying varying intervals of objective time, and which are roughly contemporaneous with the events or objects that they enable the perception of. The idea of the structured present may be developed within an extensionalist or a retentionalist framework. However, in either case one should be wary of abstracting away from the temporal grain of experience, or assuming the existence of some arbitrary minimal unit or “functional moment” (Pöppel 1970) of experiential processing from which longer experiences are necessarily composed. Instead, the temporal properties of experience may be explained by the properties of the structured present as a whole, rather than any one of its component parts.

7 Conclusion

The temporal windows hypothesis aims to provide a simple, unified explanation of the perceptual experience of simultaneity and succession across the senses. However, the ‘temporal window’ label belies a wealth of different mechanisms and phenomena ranging from a simple rolling threshold or tolerance to sophisticated forms of multimodal integration. Far from providing a straightforward solution to the temporal binding problem, the empirical evidence points to a more complex picture in which multiple ‘levels’ or layers of perceptual processing combine information from different sense-modalities and functional mechanisms to create the experience of a rich multisensory ‘now’. If this kind of multi-window view is correct then, absent further functional considerations, perceptual processing is neither mathematically continuous nor wholly discrete, but consists of multiple windows or ‘chunks’ of varying temporal resolution.

As the above-cited and other studies demonstrate, fine-grained differences in the temporal structure of perceptual processing are amenable to empirical investigation. It is therefore possible to design experiments to advance our understanding of the relevant temporal windows, their precise properties and roles. Cataloguing the characteristics and functions of these windows will in turn enable the formulation of new hypotheses and research questions concerning the nature of perceptual processing and experience that have yet to be explicitly addressed. Elucidating the temporal microstructure of experience remains a central task for perceptual psychology and neuroscience, with implications for the philosophy and metaphysics of perception, temporal experience and consciousness alike.