Abstract
Free full text
Perceptual learning rules based on reinforcers and attention
Abstract
How does the brain learn those visual features that are relevant for behavior? In this article, we focus on two factors that guide plasticity of visual representations. First, reinforcers cause the global release of diffusive neuromodulatory signals that gate plasticity. Second, attentional feedback signals highlight the chain of neurons between sensory and motor cortex responsible for the selected action. We here propose that the attentional feedback signals guide learning by suppressing plasticity of irrelevant features while permitting the learning of relevant ones. By hypothesizing that sensory signals that are too weak to be perceived can escape from this inhibitory feedback, we bring attentional learning theories and theories that emphasized the importance of neuromodulatory signals into a single, unified framework.
Perception improves with training
Visual perception improves with practice. A birdwatcher sees differences between birds that are invisible to the untrained eye. To gain an understanding of perceptual learning one can compare the perception of bird experts to the perception of subjects with other interests [as explained in detail in 1]. Alternatively, one can study perceptual learning in the laboratory, which has produced many important insights. Training improves perception, even in adult observers, provided they are willing to invest some effort in the task. Subjects typically have to train for a few hundred trials per day over a few days before perceptual improvements are noticeable. Under these conditions, subjects can even become better in discriminating between basic features, for example, between subtle variations in the orientation or motion direction of a visual stimulus [2]. Once perceptual learning has occurred, it is persistent and can last for many months [3] or years [4]. The learning effects are often specific so that perceptual improvements in a particular version of a task do not generalize to other versions. Performance improvements are not observed, for example, if the test stimulus has a different orientation [4-6], motion direction [7,8] or contrast [9,10] than the trained stimulus. Moreover, the training effects are often retinotopically specific. After training in one region of the visual field, the improvement in performance does not transfer to other visual field locations [4,6-9,11,12], although special learning procedures can cause better generalization [13,14].
What are the mechanisms that determine perceptual learning? The improvements in perception could be the result of changes in sensory representations, but they could also be the result of the way that sensory representations are read out by decision-making areas [15-17]. Furthermore, the role of selective attention in learning is unclear. It appears to be important for some forms of learning [10,18-22], but not for others [8]. Similarly, in some cases learning takes place without giving explicit feedback to the subject about the accuracy of the responses [23], while in other cases such feedback facilitates learning [24]. The goal of the present review is to put together recent neurophysiological and psychological findings into a coherent theoretical framework for perceptual learning. In the context of this framework, we will discuss attention-gated reinforcement learning (AGREL), a model that posits that selective attention and neuromodulatory systems jointly determine the plasticity of sensory representations [25]. We will proceed by proposing a generalization of this model that provides a new, neurophysiologically plausible demarcation between the conditions where attention is required for learning and the conditions where it is not.
Neuronal correlates of perceptual learning
How does the visual cortex change as a result of perceptual learning? The available evidence implicates early sensory representations but also higher association areas in perceptual learning, although the relative contribution of low-level and high-level mechanisms is under debate [15,16]. With regard to early visual representations, functional imaging studies have revealed increases in neuronal activity at the representation of the location in the visual field that is trained [26], often with particularly strong effects in the primary visual cortex [27-31]. Moreover, neurophysiological studies have demonstrated that neurons in early visual areas change their response properties during perceptual learning [5,32-34]. For example, when monkeys are trained to judge the orientation of a stimulus, neurons in the primary visual cortex (area V1) become sensitive to small variations around the trained orientation. In addition to area V1, increased sensitivity of neurons to category boundaries is also observed in higher visual areas, including area V4 [32,35], the inferotemporal cortex [36], area LIP [37] and the prefrontal cortex [38]. From a functional point of view, the amplified representation of feature values close to a category boundary is useful, because small changes in input should lead to categorically different behavioral responses [39,40]. If the stimuli that need to be classified differ in multiple feature dimensions, the increases in sensitivity caused by training are strongest for the features that distinguish between categories and weaker for features that do not [41,42]. Thus, neurons in many areas of the visual cortex change their tuning in accordance with the arbitrary categories imposed by a task.
A few studies have directly compared plasticity in lower and higher visual areas and found stronger effects in the higher areas [15,36,43], although there are also tasks where the changes in lower areas predominate [27,28]. Some tasks require a categorization on the basis of features that are easy to discriminate, and subjects can learn these tasks as new stimulus-response mappings that may not depend on plasticity in the visual cortex. Genuine perceptual learning paradigms, on the other hand, train subjects to perceive small variations in features invisible to the untrained eye, and may therefore engage plasticity in the visual cortex. Ahissar and Hochstein [19,44] demonstrated that learning in easy (or low precision [45]) tasks generalizes across locations and feature values, suggestive of plasticity at high representational levels, while training in higher precision tasks is more specific to the trained stimulus implicating lower representational levels.
Theories of perceptual learning therefore have to explain where, when and why plastic changes occur. A particularly challenging question for these theories is how learning effects occur in early sensory areas, remote from the areas where perceptual decisions are made and task performance is monitored. What signal informs the sensory neurons to become tuned to the feature variations that matter? In what follows we consider two important routes for these effects to reach sensory areas: diffuse neuromodulatory systems and feedback connections propagating attentional signals from higher to lower areas.
Global neuromodulatory systems that gate plasticity
There are a number of neuromodulatory systems that project broadly to most areas of the cerebral cortex and deliver information about the relevance of stimuli and the association between stimuli and rewards. The two neuromodulatory systems that are most often implicated in neuronal plasticity are dopamine and acetylcholine. The substantia nigra and the ventral tegmental area are dopaminergic structures in the midbrain that project to the basal ganglia and the cerebral cortex. In an elegant series of studies (reviewed in [46]), Schultz and co-workers demonstrated that dopamine neurons code deviations from reward expectancy. They respond if a reward is given when none was expected and also to stimuli that predict rewards, causing a surge of dopamine in the basal ganglia and cerebral cortex [47]. Because the increase in the dopamine concentration signals that the outcome of a behavioral choice is better than expected, it is beneficial to potentiate active synapses and thereby increase the probability that the same choice will be made again in the future. In slice preparations of the basal ganglia, dopamine has indeed been shown to control synaptic plasticity [48]. Moreover, there is in vivo evidence for the control of plasticity by dopamine. If transient dopamine signals are paired with an auditory tone, the representation of this tone is expanded in the auditory cortex [49].
Acetylcholine is another neuromodulator that has been linked to synaptic plasticity. Neurons in the basal forebrain project to the cortex to supply acetylcholine. These neurons also respond to rewards [50], although the relation between their activity and reward prediction is not as well understood as for the dopamine neurons. If artificial stimulation of the basal forebrain is paired with an auditory tone, then the representation of this tone in the auditory cortex increases [51,52]. Thus acetylcholine promotes neuronal plasticity in vivo and it also influences synaptic plasticity in cortical slice preparations [53]. Other studies have shown that acetylcholine is necessary for plasticity, because a reduction of the cholinergic input reduces cortical plasticity [54] and impairs learning [55-57]. These results, taken together, provide strong support for the idea that learning and plasticity of cortical representations are controlled by neuromodulatory systems that change their activity in relation to rewarding stimuli or stimuli that predict reward.
Role of selective attention in learning
Visual attention provides a second route for signals about behavioral relevance to reach the visual cortex. There is substantial evidence for a role of selective attention (here we will not consider the effects of ‘general attention’ or arousal [58] and do not use the word ‘attention’ for the effects of neuromodulators) in determining what is learned and what not. One powerful approach for studying the role of attention in learning is given by the ‘redundant relevant cues’ method [18,59]. The subjects have to learn to associate stimuli with responses and can use multiple features of a stimulus, e.g. color and shape, to determine the correct response. The critical manipulation is that the subjects are cued to direct their attention selectively to one of the features and not to the other. In these situations, they usually learn to use the attended feature and even exhibit an increase in perceptual sensitivity for this feature [18] while they do not learn to use the other, redundant feature even though it is presented and rewarded equally often. As a result, the subjects cannot perform the task after the training phase if the attended feature is taken away so that they are forced to use the redundant feature. Thus, in these cases attention to a feature determines which representations undergo plasticity and which do not. Similar effects occur for spatial attention, because perceptual learning is particularly pronounced for stimuli at attended locations [e.g. 22].
How do these feature-based and spatial attentional effects reach the early levels of the visual system? The most likely route is through feedback connections that run from the higher areas back to lower areas of the visual cortex [60,61]. Cortical areas involved in response selection feed back to sensory areas so that objects relevant for behavior are represented more strongly than irrelevant ones. Such a direct relationship between behavioral relevance and visual selection was made very explicit in the ‘premotor theory’ of attention [62,63]. When a stimulus is selected for a behavioral response, the relevant features automatically receive attention. This theory is supported by experiments on eye movements, as attention is invariably directed to items that are selected as target for an eye movement [64-66].
Neurophysiological findings provide support for the coupling between movement selection and spatial as well as feature-based attention. During visual search, for example, the representation of the features of the item that is searched as well its spatial location are enhanced in the visual cortex [67] and also in the frontal eye fields (area FEF) [68], and the frontal eye fields may feed back to cause attentional selection in the visual cortex [69]. The same is true in other tasks where visual stimuli compete for selection. Figure 1 shows data from a study where monkeys were trained to select the circle at the end of a curve that was connected to a fixation point as the target for an eye movement (Figure 1b), while ignoring a distractor curve [70]. Neurons in area FEF responded to the appearance of the stimulus in their receptive field, although their initial response did not discriminate between the target curve and the distractor (Figure 1c). After a short delay, however, responses evoked by the target curve became much stronger than those evoked by the distractor (striped bar), and this enhanced activity is a neuronal correlate of target selection [71]. A similar selection signal is observed in area V1 (Figure 1c), where responses evoked by the relevant, attended curve are enhanced over the responses evoked by a distractor curve, in a later phase of the response.
These results support the idea that the appearance of a visual stimulus, be it a target or distractor, initially triggers the rapid propagation of activity from lower to higher areas of the visual cortex through feedforward connections (Figure 1a) [72,73]. This phase is followed by an epoch where neurons in the frontal cortex that code different actions engage in a stochastic competition. The cells that code the action that wins the competition have stronger responses than the neurons that lose, and feed back to the representation of the selected object in the visual cortex [74,75], causing a response enhancement that is a correlate of selective attention [76]. Such a counterstreams model [77] requires reciprocal connections between the visual and frontal cortex, so that actions that are selected in the frontal cortex provide feedback to neurons that gave input for this particular choice (orange neurons in Figure 1a), thus highlighting the circuits in the visual cortex that determine the course of action. Reinforcement learning theories (like AGREL, see below) hold that actions are selected stochastically, so that the same visual stimulus can give rise to different actions and therefore also different patterns of feedback [25] (see Box 1).
Such a coupling between motor selection in the frontal cortex and attentional effects in the visual cortex during action selection is useful for guiding plasticity because plasticity occurs for connections between neurons that are important for the selected response. This computational idea is supported by psychological studies showing that attention gates learning [18,59,78-80]. Moreover, a recent pharmacological study that investigated the role of different receptors in feedforward and feedback processing demonstrated that feedback connections have a larger proportion of NMDA-receptors than feedforward connections [81]. This result suggests that the feedback connections might gate perceptual learning [82] by activating NMDA receptors [83]. Another hypothetical route for feedback connections to gate plasticity involves acetylcholine receptors that are involved in selective attention [84] and also play a role in the gating of synaptic plasticity, as was discussed above.
Interactions between attention and diffuse reinforcement learning signals
So far we have reviewed evidence for the gating of plasticity by neuromodulatory systems as well as selective attention. Roelfsema and van Ooyen proposed a framework called ‘attention-gated reinforcement learning’ (AGREL) [25] that holds that these two signals are complementary and jointly determine plasticity (Box 1). The network receives a reward for a correct choice, while it receives nothing if it makes an error. After the action, neuromodulators are released into the network to indicate whether the rewarded outcome is better or worse than expected (Figure 1) [46]. If the network receives more reward than expected, the neuromodulators cause an increase in strength of the connections between active cells, so that this action becomes more probable in the future; the opposite happens for actions with a disappointing outcome. The second signal is the attentional feedback during action selection that ensures the specificity of synaptic changes. Although the neuromodulators are released globally, the synaptic changes occur only for units that received the attentional feedback signal from the response selection stage during action selection. AGREL causes feedforward and feedback connections to become reciprocal, in accordance with the anatomy of the cortico-cortical connections. Consequently, the neurons that give most input to the winning action also receive most feedback. As a result, only sensory neurons involved in the perceptual decision change their tuning, while the tuning of other neurons remains the same. The attentional feedback signal thereby acts as a credit assignment signal, highlighting those neurons and synapses that are responsible for the outcome of a trial, thus increasing the efficiency of the learning process substantially [85]. A remarkable result is that under some conditions, the global neuromodulatory signal combined with the attentional feedback signal gives rise to learning rules that are as powerful as supervised learning schemes, like error-backpropagation, although the learning scheme operates by trial and error and is plausible from a neurophysiological point of view. Learning rules that combine the two factors, like AGREL, can reproduce the effects of categorization learning if there is a direct mapping of stimuli onto responses. They steepen the tuning curve of sensory neurons at the boundary between categories and cause a selective representation of ‘diagnostic’ features that matter for the task. It is still an open question whether these reinforcement learning models can be adapted to explain perceptual learning in tasks that require a comparison between stimuli presented at different times, for example in delayed match-to-sample tasks where subjects have to judge whether two sequentially presented stimuli are the same. These tasks require the comparison between a memory trace of the first stimulus and the perceptual representation of the second stimulus, while the existing reinforcement learning models do not have such a working memory.
Perceptual learning without attention
Although the studies reviewed above demonstrate that attention gates learning, there are also forms of perceptual learning that occur without attention. Watanabe and his colleagues [8] demonstrated that perceptual learning can occur for stimuli too weak to be perceived, if they are paired with the detection of another stimulus. In one of their experiments [86], subjects monitored an RSVP (rapid serial visual presentation) stream for target digits that were presented on a background of moving dots (Figure 2a). Unbeknownst to the subjects, the target digits were consistently paired with a very weak motion stimulus in one direction and, remarkably, the subjects became better in detecting motion in the paired direction. It is unlikely that they directed their attention to this subthreshold motion stimulus, and yet they learned.
Seitz and Watanabe [87] proposed that neuromodulatory signals can explain these findings if the successful detection of a target letter in the RSVP stream generates an internal reward. In accordance with this view, the pairing of subliminal stimuli only results in learning if they are paired with successfully detected target digits and not if they are paired with targets that are missed [88]. Moreover, it is possible to replace the internal reward by an external one. A recent experiment tested subjects who were deprived of water and food for several hours and then exposed to an orientation that was paired with water as reward, and also to another orientation not paired with water [89]. The subjects became better in discriminating between orientations around the paired orientation. These findings, taken together, indicate that task-irrelevant learning occurs for subliminal, task-irrelevant features if they are paired with external or internal rewards, which presumably cause release of neuromodulatory factors such as dopamine and acetylcholine [87].
Reconciliation of new results and theories
It is evident that the studies reviewed so far agree about the role of neuromodulatory signals but also that they appear to contradict each other regarding the role of selective attention. Some studies demonstrated an important role for attention in learning while others demonstrated learning for unattended, irrelevant and even imperceptible stimuli. Theories about the mechanisms for learning may appear to be equally contradictory. While AGREL [25] stresses the importance of attention, the model by Seitz and Watanabe [87] indicates that the coincidence of a visual feature and an internal or external reward is sufficient for learning.
To resolve these apparent contradictions, we propose that the attentional feedback signal that enhances the plasticity of task-relevant features in the visual cortex also causes the inhibition of task-irrelevant features so that their plasticity is switched off. We further propose that stimuli that are too weak to be perceived escape from the inhibitory feedback signal so that they are learned if consistently paired with the neuromodulatory signal. This proposal can explain why studies using stimuli close to or below the threshold for perception observed task-irrelevant perceptual learning while studies using supra-threshold stimuli invariably implicate selective attention in learning. A recent study [90] directly compared task-irrelevant learning for a range of stimulus strengths and indeed observed that learning only occurred for motion strengths at or just below the threshold for perception but not for very weak or strong stimuli (Figure 2b). It is easy to understand why very weak motion signals are not learned because they hardly activate the sensory neurons (Figure 2c, left). According to our proposal, the strong motion signals could interfere with the primary letter detection task and are therefore suppressed by the attentional feedback that also blocks plasticity (Figure 2c, right). Threshold stimuli, however, might stay ‘under the radar’ of this attentional inhibition mechanism so that they are not suppressed and can be learned if consistently paired with the neuromodulatory signal (Figure 2c, middle).
Recent results of Tsushima, Sasaki and Watanabe [91] provide further support for this view. They measured the interference caused by irrelevant motion stimuli in the letter detection task of Figure 2 and found an unexpected dependence on signal strength. Weak motion stimuli interfered more than suprathreshold motion stimuli, and an fMRI experiment revealed that they caused stronger activation of motion sensitive area MT+. The reason for the enhanced activation of MT+ by threshold stimuli was observed in the dorsolateral prefrontal cortex (DLPFC), a region that generates attentional inhibition signals. The suprathreshold stimulus activated DLPFC, which then suppressed MT+, while the threshold stimulus did not (Figure 2c). These psychophysical and fMRI results, taken together, indicate that the weak motion signals can indeed escape from the attentional control system so that they can be learned [91].
Concluding remarks
We conclude that there is substantial evidence for an important role for neuromodulatory reward signals and selective attention in the control of perceptual learning. These two factors can act in concert to implement powerful and neurobiologically plausible learning rules in the cortex. The neuromodulatory signals reveal whether the outcome of a trial is better or worse than expected, while the attentional feedback signal highlights the chain of neurons between sensory and motor cortex responsible for the selected action.
We here proposed that the attentional feedback signals guide learning by suppressing plasticity of irrelevant features while permitting the learning of relevant ones. By hypothesizing that sensory signals that are too weak to be perceived can escape from this inhibitory feedback, we have brought attentional learning theories, like AGREL [25], and theories that emphasized the importance of neuromodulatory signals, like the model of Seitz and Watanabe [87], into a single unified framework.
In most studies on task-irrelevant perceptual learning, attention was focused on the primary RSVP task which was in close proximity to the threshold stimulus to be learned. If task-irrelevant learning and attention-dependent learning are manifestations of the single unifying learning rule, it should even be possible to influence task-irrelevant learning by shifts of selective attention. A recent study [92] manipulated spatial attention by presenting two RSVP streams while instructing subjects to attend only one of them. Task-irrelevant learning occurred for subthreshold stimuli close to the relevant RSVP stream but not for stimuli close to the irrelevant one. In this case the selective attentional signal that gates plasticity has a different origin than the attentional signal in the AGREL model: it could now either come from the instruction to attend one of the RSVP streams or from the response selection stage of the RSVP task. Thus, even the learning of subliminal task-irrelevant stimuli can be brought under attentional control by changing the relevance of nearby suprathreshold stimuli.
In the introduction we asked how sensory neurons can be informed about the relevance of stimuli so that they can sharpen their tuning for features that are important for behavior. The present framework requires two such signals: a global, neuromodulatory signal that signals the rewarded outcome of a trial and an attentional credit assignment signal that restricts plasticity to those sensory neurons that matter in the decision. If acting in concert, these factors can give rise to biologically realistic learning rules that are as powerful as error-backpropagation. Future studies could test the predictions of this new perceptual learning theory, and unravel the mechanisms underlying the interactions between learning, selective attention and reward signaling at the systems level as well as at the cellular and molecular level (Box 2).
Acknowledgements
The work on AGREL was supported by an NWO-Exact grant. PRR was supported by an NWO-VICI and an HFSP grant and TW by NIH R21 EY017737, NIH R21 EY018925, NIH R01, EY015980-04A2, NIH R01 EY019466, NSF BCS-PR04-137, NSF BCS-0549036, and HFSP-RGP0018.
Glossary
AGREL | attention-gated reinforcement learning |
Feedforward connection | propagates information from lower to higher levels |
Feedback connection | propagates information from higher back to lower levels |
Perceptual learning | improvement of perception through learning |
Selective attention | behavioral selection of one representation over another one |
Neuromodulatory systems | systems that release neuromodulators to code the rewarded outcome of a trial |
RF | receptive field |
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Reference List
Full text links
Read article at publisher's site: https://doi.org/10.1016/j.tics.2009.11.005
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc2835467?pdf=render
Citations & impact
Impact metrics
Citations of article over time
Article citations
Action video games normalise the phonemic awareness in pre-readers at risk for developmental dyslexia.
NPJ Sci Learn, 9(1):25, 21 Mar 2024
Cited by: 0 articles | PMID: 38514689 | PMCID: PMC10957868
Visual perceptual learning of feature conjunctions leverages non-linear mixed selectivity.
NPJ Sci Learn, 9(1):13, 01 Mar 2024
Cited by: 1 article | PMID: 38429339 | PMCID: PMC10907723
Profiles of visual perceptual learning in feature space.
iScience, 27(3):109128, 06 Feb 2024
Cited by: 0 articles | PMID: 38384835 | PMCID: PMC10879700
Distinct Patterns of Connectivity between Brain Regions Underlie the Intra-Modal and Cross-Modal Value-Driven Modulations of the Visual Cortex.
J Neurosci, 43(44):7361-7375, 08 Sep 2023
Cited by: 0 articles | PMID: 37684031
Human orbitofrontal cortex signals decision outcomes to sensory cortex during behavioral adaptations.
Nat Commun, 14(1):3552, 15 Jun 2023
Cited by: 3 articles | PMID: 37322004 | PMCID: PMC10272188
Go to all (151) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
The phenomenon of task-irrelevant perceptual learning.
Vision Res, 49(21):2604-2610, 07 Aug 2009
Cited by: 79 articles | PMID: 19665471 | PMCID: PMC2764800
Review Free full text in Europe PMC
Perceptual learning of task-irrelevant features depends on the sensory context.
Sci Rep, 9(1):1666, 07 Feb 2019
Cited by: 3 articles | PMID: 30733577 | PMCID: PMC6367344
A unified model for perceptual learning.
Trends Cogn Sci, 9(7):329-334, 01 Jul 2005
Cited by: 197 articles | PMID: 15955722
Review
Boosting perceptual learning by fake feedback.
Vision Res, 49(21):2574-2585, 14 Jun 2009
Cited by: 35 articles | PMID: 19531366
Learning to suppress task-irrelevant visual stimuli with attention.
Vision Res, 45(6):677-685, 01 Mar 2005
Cited by: 20 articles | PMID: 15639494
Funding
Funders who supported this work.
NEI NIH HHS (12)
Grant ID: R01EY019466
Grant ID: R21 EY017737-02
Grant ID: R21 EY018925-01
Grant ID: R01 EY015980-04A2
Grant ID: R01 EY019466-01
Grant ID: R21 EY017737-01
Grant ID: R21 EY018925
Grant ID: R01 EY019466
Grant ID: R21 EY018925-02
Grant ID: R21 EY017737
Grant ID: R01 EY015980
Grant ID: R21EY017737