Abstract
Free full text
Contributions of the striatum to learning, motivation, and performance: an associative account
Abstract
It has long been recognized that the striatum is composed of distinct functional sub-units that are part of multiple cortico-striatal-thalamic circuits. Contemporary research has focused on the contribution of striatal sub-regions to three main phenomena: learning of associations between stimuli, actions and rewards; selection between competing response alternatives; and motivational modulation of motor behavior. Recent proposals have argued for a functional division of the striatum along these lines, attributing, for example, learning to one region and performance to another. Here, we consider empirical data from human and animal studies, as well as theoretical notions from both the psychological and computational literatures, and conclude that striatal sub-regions instead differ most clearly in terms of the associations being encoded in each region.
Anatomical and functional delineations of the striatum
Early anatomical studies delineated striatal sub-regions in terms of their afferent and efferent cortical projections (Figure 1), demonstrating that the dorsolateral region of the striatum (i.e., putamen) is primarily connected to sensory and motor cortices. In contrast, a dorsomedial region (i.e., caudate) is connected with frontal and parietal association cortices, whereas the ventral striatum is connected with limbic structures, including the amygdala, hippocampus, and medial orbitofrontal and anterior cingulate cortices [1,2]. Over the past few decades, these striatal divisions have played central roles in theoretical and empirical work across psychological domains.
First, theories of associative learning, which address how relationships between stimuli, actions, and rewards become encoded in the brain, have attributed different types of associative learning to distinct dorsal and ventral regions of the striatum [3,4]. Dissociable dorsal regions have also been identified by research that contrasts automatic performance of well-learned motor-programs with tasks that require high-level ‘executive’ attention or cognitive control [5,6]. In particular, in the motor-skill literature, medial and lateral regions of the dorsal striatum are often reported to be involved in early learning and well-trained performance, respectively [7–10]. More recently, learning versus performance of motor behavior has instead been attributed to ventral versus dorsal striatal regions; specifically, it has been proposed that, whereas the ventral striatum supports both learning and performance, the dorsal striatum is only critical for performance [11]. Others have postulated a dorsal-ventral distinction with respect to how incentives modulate performance, arguing that the ventral striatum encodes motivational variables and communicates their significance to dorsal regions responsible for response implementation [12,13]. In the present review, we discuss key findings from this broad and divergent literature and contrast accounts that delineate striatal sub-regions in terms of learning, performance, or motivation with theories that emphasize the content and nature of associative encoding.
Learning and the striatum
An extensive body of work has focused on the role of the striatum in facilitating two different types of associative learning: Pavlovian learning, in which, through repeated pairings, initially neutral conditioned stimuli (CSs) come to elicit reflexive behaviors in anticipation of the subsequent occurrence of appetitive or aversive events, and instrumental learning in which an organism learns to perform actions that increase the probability of obtaining reward or avoiding punishers [14]. Instrumental learning is further divided into goal-directed learning, which is driven by representations of the outcomes of actions – their value and causal antecedents, and habit learning, through which actions come to be automatically elicited by the stimulus environment, without any explicit reference to their consequences [15].
Considerable evidence has amassed to implicate the ventral striatum (VS) in Pavlovian learning: transient dopamine (DA) release in the VS in response to primary food rewards shifts, across training, to the onset of reward-predictive cues, and CSs that signal food reward produce changes in neuronal firing patterns in the VS [16,17]. In contrast, different sub-regions of the dorsal striatum appear to be involved in habitual and goal-directed instrumental conditioning, respectively. In rodents, lesions of the lateral dorsal striatum (DLS) disrupt acquisition of habits, whereas lesions to the medial part of the dorsal striatum (DMS) impair goal-directed learning [18–20]. Likewise, in humans, activity in the DMS has been found to be correlated with computations of action-outcome contingency, a hallmark of goal-directed learning, whereas activity in a region of right posterior DLS was found to track the behavioral development of habits (Figure 2A) [21–23].
Computational approaches to understanding the functions of the striatum are dominated by reinforcement-learning (RL) theory [24]. In one class of RL algorithms called ‘model-free’ (referring to the absence of an internal model of the world), a reward prediction error (RPE) signal is used to incrementally update reward expectations assigned to particular states of the world or to actions available in those states [25]. One RL model initially proposed as an account of striatal function is the actor/ critic model [26], in which a critic module learns to anticipate rewards associated with various states of the world, analogous to Pavlovian conditioned expectations, whereas an actor module learns a policy corresponding to the probability of performing a particular action given some state, analogous to learning instrumental actions. Importantly, in this model, the RPE signals generated by the critic are used to update both the state-based reward expectations in the critic and the action probabilities in the actor. In support of this view, human fMRI studies have found that VS activity correlates with RPEs during tasks that feature exclusively Pavlovian reward associations [27,28], consistent with a role for this region in implementing the critic, whereas tasks involving instrumental actions have been shown to recruit both ventral and dorsal striatum [27,29,30].
A major limitation of the actor/critic model is that it cannot account for the known differences between goal-directed and habitual instrumental actions, and the differential functions of the DMS and DLS in supporting these mechanisms. Specifically, the actor-critic model, using a general appetitive RPE signal, is entirely model-free, failing to provide an account of goal-directed performance and its implementation by the DMS. This shortcoming has been addressed by the proposal that goal-directed instrumental behavior can be accounted for by means of a ‘model-based’ type of RL, in which the agent encodes a rich model of the transition structure between states of the world, and uses this model, alongside knowledge of the current value of available outcomes, to perform on-line computations of the expected future value of taking particular actions [25]. In spite of the conceptual appeal of mapping quantitative model-based and model-free RL signals to the DMS and DLS respectively, very few human studies have empirically assessed this hypothesis thus far. One such study found evidence in support of the postulated computational dissociation [31], whereas another study, using a similar design, instead found evidence for a linear mix of model-based and model-free signals within the same overlapping areas [32]. Further work is needed to ascertain the extent to which model-based and model-free RL computations adequately capture the differential contributions of DMS and DLS to goal-directed and habitual learning, respectively.
Motor performance
There is considerable evidence to implicate the ventral striatum in generating skeletomotor reflexes elicited by Pavlovian cues [33,34]. Lesions as well as transient inactivation of the VS significantly impair previously acquired conditioned responses (CRs) to food-paired CSs: In particular, a medial part of the nucleus accumbens (Nacc) called the core, distinct from a more lateral part called the shell (Figure 1), has been shown to mediate the retrieval and expression of CS-US associations [33,34].
A large body of research has also implicated the dorsal striatum in the implementation of already learned instrumental motor behaviors, often with dissociations emerging between the DLS and DMS [7–10]. For example, using a serial reaction time (SRT) task, in which participants respond to a sequence of consecutively presented stimuli, several neuroimaging studies have reported that, whereas the DMS appears to be active during learning of novel sequences, the DLS is active during performance of well-learned sequences [7,8] (but see [35] for evidence of learning-related decreases in DLS activity). Notably, neuro-physiological studies in non-human primates [9], as well as in rodents [10], have also found dissociable contributions of the DMS and DLS to early versus late stages of training.
The DLS and DMS also appear to differ in their contribution to the inhibition of competing, but incorrect responses, a process that is generally thought to involve a voluntary, cognitive, suppression of automatic responding. Response inhibition is commonly studied using the Go/No Go task, in which an infrequent (No Go) stimulus signals that performance of an action that is usually rewarded will result in the omission of reward or in punishment. Neuroimaging research has implicated the DMS, more strongly than the DLS, in inhibiting responding on No Go trials [36,37]. Indeed, numerous studies have found selective involvement of the DMS in various tasks that require cognitive control and working memory [6,35,38], consistent with the strong anatomical connections of this area to pre-frontal and parietal association cortices. In Box 1, we relate the literature on skill-learning and cognitive control to that discussed in the above section on associative learning. Additional evidence for the specialized contributions of the DLS and DMS to automatic and cognitively controlled performance, respectively, comes from investigations of neuropathology, in particular from studies on Parkinson's disease (Box 2).
One interpretation of the motor-skill literature is that the DMS and DLS can be distinguished in terms of their respective contributions to the acquisition versus performance of motor behavior [10]. However, this hypothesis is challenged by the finding that both lesions and transient inactivation of the DMS abolish the sensitivity of previously acquired actions to outcome devaluation and contingency degradation – behavioral assays of goal-directed performance [20]. Thus, DMS disruptions impair the expression of goal-directed behavior, suggesting that this structure plays a critical role during performance. Likewise, the proposal that the dorsal striatum is critical only for performance, whereas the ventral striatum supports both learning and performance, of instrumental actions [11] is challenged by the finding that blockage of NMDA receptors in the DMS during action-outcome learning abolishes sensitivity to outcome devaluation in subsequent tests [19].
Motivation
Another function attributed to the striatum, and to the ventral striatum in particular, is that of motivation. Cues that indicate that a certain amount of reward is available given successful performance of an instrumental action, or even of a complex cognitive task, elicit increases in VS activity proportional to the amount of signaled reward and these signals correlate with the degree of performance enhancement found for larger compared to smaller rewards [12,13]. Paradoxically, whereas increasing rewards tend generally to improve performance, the opportunity to earn very large rewards has also been shown to have a deleterious influence, a phenomenon known in the psychological literature as choking. Recent neuroimaging studies have implicated the VS in these detrimental, as well as in the facilitating, effects of incentives on performance [39,40].
Cues that signal reward delivery independently of whether or not an instrumental action is performed can nevertheless invigorate instrumental performance, a phenomenon termed Pavlovian-instrumental transfer (PIT) [41,42]. These effects also appear to be largely dependent on the VS [43–45]. For example, amphetamine injection into the Nacc enhances PIT, without affecting base rates of instrumental responding [45]. Importantly, PIT effects emerge even when the instrumental action earns a different reward than that signaled by the cue and are attenuated by general motivational shifts from hunger to satiety [42], suggesting that the cue induces a general motivational state (i.e., general PIT). However, under certain training conditions, PIT effects exhibit a clear selectivity, such that instrumental responding is enhanced specifically for an action that earns the same reward as that signaled by the Pavlovian cue, suggesting the involvement of outcome-specific representations (i.e., specific PIT). Findings from rodent lesion and inactivation studies suggest that the Nacc shell and core may mediate specific and general PIT, respectively [41]. More recently, the involvement of the medial VS in a form of PIT that may depend on general motivational processes [46], and of the ventrolateral striatum in specific PIT [47], has been demonstrated in human neuroimaging studies (Figure 2b). A more detailed comparison of the functional anatomy of humans and rodents is provided in Box 3.
Another important function recently attributed to the ventral striatum is the hedonic evaluation of stimuli, termed ‘liking’, which is commonly assessed using measures of affective facial reactions [48]. Unlike PIT and a range of other reward-oriented behaviors, including approach and consumption, behavioral expressions of liking are unaffected by amphetamine injection into the Nacc [43,44]. Instead, such responses are altered by blockage or stimulation of Nacc opioid receptors [44,49], suggesting that dissociable neurobiological substrates in the VS mediate motivational and hedonic processes. Notably, although both dopaminergic and opioidergic manipulations of the Nacc modulate the firing of VP neurons in response to (reward proximal) Pavlovian cues, only opioid manipulations alter VP firing in response to unconditioned stimuli, suggesting that the separation of motivational and hedonic processes is preserved throughout the Nacc-VP circuit [44].
An associative account of striatal function
The evidence reviewed here has implicated the ventral and dorsal striatum (both the DLS and DMS) in the learning as well as the performance of reward-related behaviors. It is unlikely therefore, that these regions differ functionally in terms of their respective contributions to learning vs performance [10,11]. Rather, a more parsimonious interpretation is that striatal regions support dissociable associative learning strategies that may respectively dominate at various stages of training, depending on the task [3,50,51]. Specifically, the ventral striatum is involved in the encoding of Pavlovian associations, supporting generation of conditioned skeletomotor responses, whereas the DMS is involved in the encoding of goal-directed instrumental actions and the DLS in the encoding of habitual stimulus-response associations. From this perspective, selective activation of the VS or DMS during early stages of training reflects the respective dominance of Pavlovian and goal-directed instrumental processes, rather than learning per se.
Findings implicating the ventral striatum in incentive-based performance [12,13,39,40] can arguably also be accounted for in terms of the role of this structure in the expression of Pavlovian conditioned responses. For example, performance of an instrumental action that involves approach towards a food location may be facilitated by the presence of Pavlovian cues that elicit compatible conditioned reflexes (i.e., directed at the same location). Conversely, performance of highly skilled motor behavior or of instrumental responses that necessitate approach towards aversive stimuli might be impaired by incompatible reflexes elicited by Pavlovian cues [39]. Another potential means by which Pavlovian associations might produce both facilitatory and detrimental incentive effects on performance is through the elicitation of habits. Specifically, Pavlovian retrieval of sensory-specific features of unconditioned stimuli might evoke stimulus representations that have been previously linked to particular instrumental responses through stimulus-response learning and that, consequently, elicit habitual performance of those responses at the point of Pavlovian retrieval [52]. Depending on whether such responses are compatible or incompatible with the instrumental actions needed to obtain the reward, a behavioral effect of either facilitation or impairment might occur.
Finally, Pavlovian retrieval of affective aspects of unconditioned stimuli contributes to the elicitation of hedonic, emotional, conditioned responses indicative of ‘liking’ [48]. Indeed, in this capacity, Pavlovian processes may also play a role in the estimation of outcome utility, central to accounts of goal-directed instrumental performance. This notion is particularly compelling given that CRs themselves exhibit sensitivity to outcome devaluation procedures, as we discuss further in the section below. It is also consistent with the strong projections between the VS and the medial orbitofrontal cortex (mOFC), an area well known for its involvement in utility estimation [53,54].
Challenges and further directions
RL theories of behavioral control attempt to characterize the instantiation of, and arbitration between, various associative processes and, further, to map such processes – in the form of distinct algorithms – to different striatal sub-regions. Although there is mounting evidence in favor of this approach, a number of key challenges still remain.
First among these is the question whether Pavlovian signals in the ventral striatum are model-free, model-based, or both. Current computational accounts of Pavlovian learning in the ventral striatum propose that such learning is model-free: that is, based on general appetitive RPE signals that are void of specific outcome representations and, thus, insensitive to changes in outcome value. This notion is greatly challenged by the fact that Pavlovian CRs, as well as BOLD signals in the VS, show clear sensitivity to outcome-specific devaluation [55–57]. Attempts to resolve this apparent inconsistency have included the proposal that preparatory (e.g., approach) and consummatory (e.g., chewing) CRs may be model-free and model-based, respectively, and that these different algorithms may be implemented by the core and shell of the Nacc, respectively [58]. Although promising, this revised RL account faces some problems; most notably, the Nacc core and shell have both been shown to be necessary for the effects of outcome-specific devaluation on preparatory CRs [56,57]. Nevertheless, it is clear that humans, as well as other animals, are capable of learning about the specific features of Pavlovian outcomes and that the VS appears to play a role in such effects.
A second question concerns the role of the striatum in aversive learning and in processing novel stimuli. Developing an understanding of the role of the striatum in aversive learning represents a major challenge. RL theory has focused almost exclusively on the role of reward in Pavlovian and instrumental processes. Indeed, because of our focus on such computational accounts, our own discussion has been geared towards appetitive learning – a bias that is also explained, in part, by a general emphasis in the literature on reward processing in the striatum, with processing of aversive events being primarily attributed to other regions, such as the amygdala, anterior insula and lateral OFC [59–61]. However, the neuroimaging literature is profoundly inconsistent on this point, with some studies reporting increased VS activity in aversive contexts (Figure 2c) [62–64] and others reporting decreasing activity in this area during the prediction, learning, and receipt of aversive outcomes [65,66]. Likewise, whereas some studies have reported that aversive stimuli inhibit the DA activity of midbrain neurons (e.g., [67]), others have found that they elicit phasic DA release in the VS (e.g., [68]).
One possible reason for these variable findings might be that ventral striatal responses are strongly contextually dependent. A clear example of context dependent value encoding comes from a study in which the firing of ventral pallidal (VP) neurons in response to an intense salt solution was measured in rodents while in a normal homeo-static state versus a salt-deprived state. Behavioral measures of hedonic processing revealed that the solution was strongly aversive when rats were in a normal state, but became pleasant in the salt-deprived state. Intriguing-ly, the response patterns of VP neurons closely tracked such behavioral changes, showing a dramatic increase in response to the salt-solution in the deprived relative to the normal state [69]. Thus, the same stimulus was perceived, and neurally encoded, as both pleasant and aversive depending on the subject's internal context. Precisely how such context-dependent encoding effects become manifest within the striatum is going to be an important area of future research.
In addition to aversive and appetitive encoding, DA neurons across the mesolimbic, mesocortical and nigrostriatal pathways have been shown to respond phasically to novel environmental stimuli [70], regardless of their particular valence (i.e., appetitive, aversive, or neutral). In the VS specifically, responses to novel stimuli have been shown with fast-scan voltammetry and other techniques measuring extra-cellular DA concentrations, as well as with single unit recordings and fMRI [71–73]. An important aspect of encoding novel events is that they may serve as a basis for exploration. In this sense, it behooves the organism to effectively treat novelty as a rewarding event, thus promoting approach towards and search of unfamiliar, but potentially richly rewarding, environments. Indeed, some behavioral evidence from rodents suggests that novelty may serve as an instrumental reinforcer, such that rats will press a lever that produces an apparently neutral light stimulus more than a lever that does not yield any outcome [74]. Several modified RL algorithms have been proposed that incorporate novel event signaling, either as a surrogate of reward or as a component of the estimated state value [75].
Another area where outstanding questions remain concerns corticostriatal interactions. Although there is overwhelming evidence for a role of the DLS in performance of well-learned motor programs [5,7–10,18], consistent with the characterization of this area by RL theory as the site where habits are ultimately stored and expressed, some data indicate that over-trained responses can be independent of the DLS specifically [76] and of DA more generally [77]. On these grounds, it has been suggested that reinforcement-learning in the striatum provides a basis for successful Hebbian learning in sensory and premotor cortices and that, with extended training, control is transferred to these less plastic, but considerably faster, cortical-cortical projections [78]. Additional support for this view comes from neuroimaging studies showing that, with extremely extended training (i.e., several weeks), slowly evolving BOLD signals in the primary motor cortex (M1) begin to discriminate between practiced and novel sequences [79].
Conversely, tasks such as deductive reasoning and problem solving, which are known to depend largely on high-level association cortices and which have no obvious connection to reward learning, seem nonetheless to recruit strongly the DMS [80,81], suggesting that this structure implements far more complex functions than those outlined by RL theory. Generally, these issues highlight the importance of considering the interplay between the striatum and cortex in accounting for the specialization of striatal sub-regions.
Another important consideration is whether striatal sub-regions differ in terms of the mechanism underlying selection between alternative responses. In RL theory, one simple way to implement action selection in either a model-based or model-free learner is to use a soft-max distribution [24,25], in which a free parameter controls the degree to which choices are biased towards the highest valued action. However, in many cases, the basis for exploration of non-optimal response alternatives, permitting discovery of actions that are more rewarding than those sampled thus far, is likely more principled than that afforded by the soft-max rule. For example, exploratory sampling might be guided by uncertainty about the relationships between actions and rewards [82]. One possibility is that model-based processes implement selection based on such relative uncertainty estimation, whereas the habit system uses the blunter soft-max rule. Alternatively, the selection mechanism for habitual, as well as Pavlovian, systems might be better characterized by simple drift diffusion models (DDM) [83], in which, at every instance, noisy ‘evidence’ is accumulated for each response alternative until a threshold, serving as the decision criterion, is reached. DDMs have been shown to successfully capture perceptual [84] and value-based [85] decision-making, as well as the firing rates of neurons in the lateral intraparietal area of the monkey brain [84]. A major avenue for future work will be to determine how striatal regions differ, or are similar, in their implementation of response selection, as well as to develop a better understanding of the role of corticostriatal interactions in such response selection functions.
Concluding remarks
In this article, we have reviewed evidence implicating the striatum as a whole in a number of distinct processes underlying reinforcement-related motor behavior: in learning of both instrumental actions and Pavlovian conditioned responses, in the expression of such learned behaviors, and in controlling the motivation to respond. We have noted that, rather than being divided along lines of learning versus performance, striatal subregions appear to implement distinct forms of associative encoding. Specifically, the ventral striatum is more involved in Pavlovian conditioned responses, whereas the dorsal striatum is involved in instrumental action. Moreover, there is a dissociation within the dorsal striatum – between medial and lateral structures – in the implementation of goal-directed and habitual instrumental strategies. Finally, through its role in the learning and expression of Pavlovian conditioned responses, rather than, perhaps, through its role in motivation per se, the ventral striatum supports a range of modulatory influences on instrumental performance, including general invigoration (e.g., general PIT), response selection (specific PIT), and potentially even goal-directed outcome evaluation.
The question of how dissociable striatal modules, supporting distinct associative processes, compete and cooperate is at the center of the associative account of striatal function [25,86]. Although much is now known about how striatal regions differ, much less is understood about the mechanisms by which they interact with each other and with the cortex. Future work will need to move beyond the functional segregation perspective and focus instead on characterizing how distinct circuits integrate to produce coordinated cognitive and motor behavior.
References
Full text links
Read article at publisher's site: https://doi.org/10.1016/j.tics.2012.07.007
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc3449003?pdf=render
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1016/j.tics.2012.07.007
Article citations
Brain-wide circuit-specific targeting of astrocytes.
Cell Rep Methods, 3(12):100653, 04 Dec 2023
Cited by: 0 articles | PMID: 38052209 | PMCID: PMC10753298
Neuroinflammation After COVID-19 With Persistent Depressive and Cognitive Symptoms.
JAMA Psychiatry, 80(8):787-795, 01 Aug 2023
Cited by: 12 articles | PMID: 37256580 | PMCID: PMC10233457
Anatomical and Functional Comparison of the Caudate Tail in Primates and the Tail of the Striatum in Rodents: Implications for Sensory Information Processing and Habitual Behavior.
Mol Cells, 46(8):461-469, 17 Jul 2023
Cited by: 0 articles | PMID: 37455248 | PMCID: PMC10440267
Review Free full text in Europe PMC
Differentiation between fetal and postnatal iron deficiency in altering brain substrates of cognitive control in pre-adolescence.
BMC Med, 21(1):167, 04 May 2023
Cited by: 3 articles | PMID: 37143078 | PMCID: PMC10161450
Prefrontal modulation of anxiety through a lens of noradrenergic signaling.
Front Syst Neurosci, 17:1173326, 17 Apr 2023
Cited by: 2 articles | PMID: 37139472 | PMCID: PMC10149815
Review Free full text in Europe PMC
Go to all (154) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
The rewarding value of good motor performance in the context of monetary incentives.
Neuropsychologia, 50(8):1739-1747, 06 May 2012
Cited by: 24 articles | PMID: 22569215
Modulation of associative learning in the hippocampal-striatal circuit based on item-set similarity.
Cortex, 109:60-73, 18 Sep 2018
Cited by: 5 articles | PMID: 30300757 | PMCID: PMC6263739
Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.
Prog Brain Res, 126:193-215, 01 Jan 2000
Cited by: 122 articles | PMID: 11105648
Review
Elderly adults show higher ventral striatal activation in response to motor performance related rewards than young adults.
Neurosci Lett, 661:18-22, 20 Sep 2017
Cited by: 4 articles | PMID: 28939388
Striatal contributions to reward and decision making: making sense of regional variations in a reiterated processing matrix.
Ann N Y Acad Sci, 1104:192-212, 07 Apr 2007
Cited by: 97 articles | PMID: 17416920
Review