cognitively sophisticated moves as inferring the (hidden) causes of our current observations, and using that hypothesis to predict future observations, both as we passively monitor and actively intervene in the world. It is theory laden and model-rich. We have no trouble believing that a fundamental part of our exquisite attunement to environmental contingencies involves sensitivity to (and the ability to make use of) interand crossmodal correlations in sensory signals. Sensitivity to temporal and spatial (e.g., across the retina) correlations could underwrite many functional advantages, including the ones Clark highlights, such as reducing sensory bandwidth and drawing attention to salient departures from expectations. In this sense we share Clark's belief that predictive1 coding is likely to be a ubiquitous and fundamental principle of brain operation; neural nets are especially good at computing correlations. However, we don't think that evidence for predictive1 coding warrants a belief in predictive2 coding. And it is only from predictive2 coding that many of Clark's larger implications follow. Clark makes the move from predictive1 coding to predictive2 coding largely by relying on an innovative account of binocular rivalry offered by Hohwy et al. (2008). In Clark's somewhat simplified version of their proposal, the experienced alternation between seeing the face stimulus presented to one eye and the house stimulus presented to the other is explained by a knowledge-driven alternation between rival hypotheses (face at location x, house at location x) neither of which can account for all of the observations. According to Clark, the reason the images don't fuse and lead to a visual steady-state is because we know that faces and houses can't coexist that way. If this knowledge-driven account is the correct way to understand something as perceptually basic as binocular rivalry, then predictive2 coding can begin to look like a plausible, multilevel and unifying explanation of perception, action and cognition: perception is cognitive and inferential; inference perceptual; and all of it is active. But while the predictive2 coding model of binocular rivalry may be consistent with much of the data, it is far from the only possible explanation of the phenomenon. Here is an outline of a reasonable predictive1 coding account: Given the generally high-level of cross-correlation in the inputs of our two eyes, the left eye signal would predict1 greater correlation with the right eye than is currently in evidence; this would weaken the inputs associated with the left eye, unmasking the inputs associated with the right eye, which would predict1 cross-correlated left eye signals . . . and so on. However far this particular proposal could be taken, the point is one can account for the phenomenon with lowlevel, knowledge-free, redundancy-reducing inhibitory interactions between the eyes (see, e.g., Tong et al. 2006). After all, binocular rivalry also occurs with orthogonal diffraction gratings, indicating that high-level knowledge of what is visually possible needn't be the driver of the visual oscillation; humans don't have high-level knowledge about the inconsistency of orthogonal gratings. In general, although not every pair of stimuli induce bistable perceptions, the distinction between those that do and those that don't appears to have little to do with knowledge (see Blake [2001] for a review). Adopting a predictive2 coding account is a theoretical choice not necessitated by the evidence. It is hardly an inconsequential choice. Using predictive2 coding as a GUT of brain function, as Clark proposes, is problematic for several reasons. The first problem is with the very idea of a grand unified theory of brain function. There is every reason to think that there can be no grand unified theory of brain function because there is every reason to think that an organ as complex as the brain functions according to diverse principles. It is easy to imagine knowledge-rich predictive2 coding processes employed in generating expectations that we will confront a jar of mustard upon opening the refrigerator door, while knowledge-free predictive1 coding processes will be used to alleviate the redundancy of sensory information. We should be skeptical of any GUT of brain function. There is also a problem more specific to predictive2 coding as a brain GUT. Taking all of our experience and cognition to be the result of high-level, knowledge-rich predictive2 coding makes it seem as if the world that we experience and think about is a projection of our minds. Western philosophy has been down this lonely and unproductive road many times. It would be a shame if the spotlight that Clark helpfully shines on this innovative work in neuroscience were to lead us back there. Attention and perceptual adaptation doi:10.1017/S0140525X12002245 Ned Blocka and Susanna Siegel Q1b aDepartment of Philosophy, New York University, New York, NY 10003; bDepartment of Philosophy, Harvard University, Cambridge, MA 02138. ned.block@nyu.edu ssiegel@fas.harvard.edu http://www.nyu.edu/gsas/dept/philo/faculty/block/ http://www.people.fas.harvard.edu/!ssiegel/ Abstract: Clark advertises the predictive coding (PC) framework as applying to a wide range of phenomena, including attention. We argue that for many attentional phenomena, the predictive coding picture either makes false predictions, or else it offers no distinctive explanation of those phenomena, thereby reducing its explanatory power. According to the predictive coding view, at every level of the visual/cortical hierarchy, there are two kinds of units: error units and representation units. Representations propagate downward in the visual hierarchy whereas error signals propagate upward. Error in this sense might be better called "discrepancy," since it is the discrepancy between what the visual system predicts (at a given level) and what is represented at that level. Clark advertises the predictive coding (PC) framework as applying to a wide range of phenomena, including attention, which Clark says "is achieved by altering the gain (the 'volume,' to use a common analogy) on the error-units" (sect. 2.3, para. 6). We argue that for many attentional phenomena, the predictive coding picture either makes false predictions, or else it offers no distinctive explanation of those phenomena, thereby reducing its explanatory power. Consider a basic result in this area (Carrasco et al. 2004), which is that attention increases perceived contrast by enhancing "the representation of a stimulus in a manner akin to boosting its physical contrast" (Ling & Carrasco 2006, p. 1243). A cross-modal study using auditory attention-attractors (Störmer et al. 2009) showed that the contrast-boosting effect correlated with increased activity in early stages of visual processing that are sensitive to differences in contrast among stimuli. The larger the cortical effect, the larger the effect on perceivers' judgments. Increasing the contrast of a stimulus has an effect on the magnitude of perceptual adaptation to that stimulus, causing greater threshold activation in the tilt after-effect and longer recovery time. Ling and Carrasco (2006) showed that attending to a stimulus while adapting to that stimulus has the same effect as increasing the contrast of the adapting stimulus. After attending to the adaptor (70% contrast), the contrast sensitivity of all observers was equivalent to the effect of adapting to a 81–84% contrast adaptor. How do these results look from a PC perspective? Suppose that at time t1, the perceiver is not attending to the left side of space but nonetheless sees a striped grid on the left with apparent contrast of 70%. Because there is no movement or other change, at time t2, the visual system predicts that the patch will continue at 70%. But at t2 the perceiver attends to the patch, raising the apparent contrast to, say, 82%. Now at t2 there is an error, a discrepancy between what is predicted and what is "observed." Since the PC view says attention is turning up the volume on the error representations, it predicts that at t3 the signal (the represented contrast) should rise even higher than 82%. But that does not happen. Commentary/Andy Clark: Predictive brains, situated agents, and the future of cognitive science BEHAVIORAL AND BRAIN SCIENCES (2013) 36:4 25 There are two important lessons. First, the initial changes due F ig .1 B /W on lin e, B /W in pr in t to attending come before there is an error (at t2 in the example), so the PC viewpoint cannot explain them. Second, the PC view makes the false prediction that the changes due to attending will be magnified. Sometimes PC theorists assume the error signal is equal to the input. Perhaps this identification makes some sense if the perceiver's visual system has no "expectations," say because the eyes have just opened. But once the eyes have opened and things in the environment are seen, it makes no sense to take the error signal to be the sensory input. The PC picture also seems to lack a distinctive explanation of why attention increases spatial acuity. Yeshurun and Carrasco (1998) showed that increased attention can be detrimental to performance when resolution was already on the border of too high for the scale of the texture, increasing acuity to the point where the subject does not see the forest for the trees. Too little attention can also be detrimental, making it harder to see the trees. Yeshurun and Carrasco varied resolution of perception by presenting textured squares (such as the one in Fig. 1) at different eccentricities (the more foveal, the better the resolution). But they also varied resolution by manipulating the focus of spatial attention: With the eyes focused at the center, they attracted attention to the left or to the right. Combining contributions to resolution from eccentricity and attention, they found that there was an optimal level of resolution for detecting the square, with detection falling off on both ends. Single cell recordings in monkey visual cortex reveal shrinking receptive fields (the area of space that a neuron responds to) in mid-to-high level vision, specifically in V4, MT, and LIP, and this shrinkage in receptive fields is a contributor to explaining the increase in acuity (Carrasco 2011). Does the PC framework have a distinctive explanation of attentional effects on spatial acuity, in terms of "gain in error-units"? If, due to the level of acuity, one does not see the square, then the prediction of no square will be confirmed, and there will be no discrepancy ("error") to be magnified. Since the gain in error units is the only distinctive resource of the PC view for explaining attentional phenomena, the view seems to have no distinctive explanation of this result either. Can the predictive coding point of view simply borrow Carrasco's explanation? That explanation is a matter of shrinkage in receptive fields of neurons in the representation nodes, not anything to do with prediction error, so the predictive coding point of view would have to concede that attention can act directly on representation nodes without a detour through error nodes. Finally, attention to certain items – for example, random dot patterns –makes them appear larger. Anton-Erxleben et al. (2007) showed that the size of the effect is inversely related to the size of the stimulus, explaining the result in terms of receptive field shift (such shifts are also observed from single cell recordings in monkey visual areas; Womelsdorf et al. 2006). This explanation depends on the retinotopic and therefore roughly spatiotopic organization common to many visual areas – not on error units. Neurons whose receptive fields lie on the periphery of the pattern shift their receptive fields so as to include the pattern, moving the portion of the spatiotopically represented space to include the pattern, resulting in the representation of the pattern as occupying a larger area. Here too, predictive coding offers no distinctive explanation. The facts of attention and adaptation do not fit well with the predictive coding view or any picture based on how "sensory neurons should behave" (Lochmann et al. 2012) rather than the facts of how they do behave. Without a distinctive explanation of these facts, the explanatory promises of predictive coding are overdrawn. Attention is more than prediction precision doi:10.1017/S0140525X12002324 Howard Bowman,a Marco Filetti,a Brad Wyble,b and Christian Olivers Q1c aCentre for Cognitive Neuroscience and Cognitive Systems, and the School of Computing, University of Kent at Canterbury, Kent CT2 7NF, United Kingdom; bDepartment of Psychology, Syracuse University, Syracuse, NY 13244; cDepartment of Cognitive Psychology, Faculty of Psychology and Education, VU University Amsterdam, 1081 BT Amsterdam, The Netherlands. H.Bowman@kent.ac.uk M.Filetti@kent.ac.uk bwyble@gmail.com c.n.l.olivers@vu.nl http://www.cs.kent.ac.uk/people/staff/hb5/ http://www.cs.kent.ac.uk/people/rpg/mf266/ www.bradwyble.com http://olivers.cogpsy.nl Abstract: A cornerstone of the target article is that, in a predictive coding framework, attention can be modelled by weighting prediction error with a measure of precision. We argue that this is not a complete explanation, especially in the light of ERP (event-related potentials) data showing large evoked responses for frequently presented target stimuli, which thus are predicted. The target article by Andy Clark champions predictive coding as a theory of brain function. Perception is the domain in which many of the strongest claims for predictive coding have been made, and we focus on that faculty. It is important to note that there are other unifying explanations of perception, one being that the brain is a salience detector, with salience referring broadly to relevance to an organism's goals. These goals reflect a short-term task set (e.g., searching a crowd for a friend's face), or more ingrained, perhaps innate motivations (e.g., avoiding physical threat). A prominent perspective is, exactly, that one role of attention is to locate and direct perception towards, salient stimuli. The target article emphasises the importance of evoked responses, particularly EEG event-related potentials (ERPs), in adjudicating between theories of perception. The core idea is that the larger the difference between an incoming stimulus and the prediction, the larger the prediction error and thus the larger the evoked response. There are indeed ERPs that are clearly modulated by prediction error, for example, the Mismatch Figure 1 (Block & Siegel). A display of one of the textured figures (the square on the right) used by Yeshurun and Carrasco (1998). The square appeared at varying degrees of eccentricity. With low resolution in peripheral locations, attention improved detection of the square; but with high resolution in central locations, attention impaired detection. Commentary/Andy Clark: Predictive brains, situated agents, and the future of cognitive science 26 BEHAVIORAL AND BRAIN SCIENCES (2013) 36: