Original ArticlesMinimal coherence among varied theory of mind measures in childhood and adulthood
Introduction
Although philosophers and psychologists have long been interested in how we think about other people’s thoughts (see Obiols and Berrios, 2009, Wellman, 2017 for historical review), Premack & Woodruff first introduced the term ‘theory of mind’ (ToM) in a 1978 paper that examined whether chimpanzees could infer human goals. The authors considered such mental state inferences to be evidence for a ToM—the capacity to represent the mental states of others. The term was quickly applied to human cognition research, and the subsequent forty years have seen a rapid increase in articles investigating ToM across age groups and methodologies (for recent reviews, see Henry et al., 2013, Mahy et al., 2014, Slaughter et al., 2015, Schaafsma et al., 2015, Schurz et al., 2014).
This wealth of ToM research has involved the creation of dozens of ToM measures, including tasks assessing false belief understanding (Wimmer & Perner, 1983), pragmatic language comprehension (Baron-Cohen et al., 1999, Happé, 1994, White et al., 2009), the ability to infer mental states from photographs of the eye region (Baron-Cohen et al., 2001, Baron-Cohen et al., 2001), and reaction time when responding to actors’ beliefs (Apperly, Warren, Andrews, Grant, & Todd, 2011). Despite the surface differences between such measures, the field often considers all these social-cognitive paradigms to capture ToM. As a result, any individual paper may select just one or two tasks in order to examine how ToM relates to another ability or differs between groups. However, theoretical proposals and recent reviews of neuroimaging and behavioral research suggest that ToM may not be a single construct (Apperly, 2012, Frith and Frith, 2008, Gerrans and Stone, 2008, Schaafsma et al., 2015, Schurz et al., 2014). In spite of these proposals, the extent to which varied ToM assessments relate to one another, and whether such measures do in fact capture a unitary construct, remains underexplored empirically.
The social cognitive literature contains long-standing theoretical discussions about the nature of ToM and its measurement. Much early work in this area was focused on false belief tasks (e.g., Frith and Happé, 1994, Bloom and German, 2000), but more recent theoretical accounts have tackled the broader coherence of ToM. For example, Gerrans and Stone (2008) contrasted accounts of ToM as a domain-specific module versus accounts of ToM as multiple low-level domain-specific social processes intersecting with domain-general abilities including metarepresentation and executive function. Apperly (2012) similarly compared conceptual theories of ToM—which would argue for coherence among tasks—with cognitive theories, in which ToM is modelled not as a state of conceptual knowledge but as an interactive process spanning multiple cognitive abilities. Consistent with the latter perspective, Schaafsma et al. (2015) surveyed the vast array of different tasks measuring ToM and argued for the deconstruction of ToM into varied component processes (e.g., gaze processing, tracking intentions) rather than for ToM to be considered a single construct. In this framework, relations between ToM tasks could be due to common non-ToM demands (e.g., language, executive function) or due to common conceptual demands of specific types of ToM (e.g., false belief reasoning), rather than a broader conceptual coherence among all types of mental state reasoning.
In spite of this extensive theoretical discussion, empirical tests of ToM’s unidimensionality have been limited. Papers introducing new ToM tasks often examine their relation with one or two existing tasks (e.g., Peterson and Slaughter, 2009, Beaumont and Sofronoff, 2008, Devine and Hughes, 2013), but this literature may be biased to include positive relations, as new tasks that fail to show such relations may remain unpublished. Similarly, research comparing clinical and neurotypical groups on ToM batteries (e.g., Brent et al., 2004, Rosenblau et al., 2015) does not directly comment on the underlying structure of ToM because group differences across tasks do not necessitate that performance on these tasks is correlated within subgroups. In the realm of neuroimaging research, meta-analytic evidence of overlapping activation across ToM tasks (e.g., Schurz et al., 2014, Molenberghs et al., 2016) does not necessarily indicate that such tasks tap into the same underlying mental process in particular individuals.
More targeted work has directly examined the relation between ToM measures in single samples. Some of the earliest work on this question examined relations between false belief measures in early childhood, finding, for example, that children who understood that others could have false beliefs about an object’s location also understood that others could have false beliefs about an object’s appearance (e.g., Carlson and Moses, 2001, Hughes et al., 2000). This coherence among false belief measures in preschoolers is consistent with meta-analytic evidence that developmental trajectories of false belief acquisition are unaffected by task type (Liu et al., 2008, Wellman et al., 2001). More recently, researchers have also examined the relation between advanced theory of mind tasks at older ages. In middle childhood, there are significant correlations between children’s ability to answer explicit questions about mental states based on stories and their ability to answer similar questions based on video clips (Devine and Hughes, 2013, Devine and Hughes, 2016). Likewise, adults who are skilled at inferring complex emotional and mental states from pictures of the eyes show similar inferential skills when presented with pictures of the whole face and with spoken language (Meinhardt-Injac, Daum, Meinhardt, & Persike, 2018).
These existing studies of the coherence among ToM measures, however, are confounded by two important factors. First, such studies often use measures which assess conceptually-similar aspects of ToM (e.g., all false belief tasks or all tasks that involve explicitly inferring complex emotional states). Thus, coherence among tasks may be driven not by a common component underlying all mental state reasoning, but rather a conceptual commonality to one particular aspect of ToM. Second, the tasks used in existing studies often have very similar non-ToM cognitive demands (e.g., processing facial information). This confound means that such studies cannot address whether ToM represents a single construct as similar performance on these tasks may be due to the associated demands of other shared non-ToM component processes (Apperly, 2012, Gerrans and Stone, 2008). Thus, testing a wide array of diverse ToM measures would help establish whether ToM is a unitary construct.
A limited body of research has examined more diverse sets of ToM tasks within single samples and has produced inconclusive findings. For example, although some research has found that ToM tasks spanning modalities load onto a single factor in middle childhood (Devine et al., 2016, Osterhaus et al., 2016), other research has found evidence for much weaker patterns of relations on similar tasks in the same age range (Hayward and Homer, 2017, Rice et al., 2016). Further, even papers finding that one set of ToM tasks load onto a single factor have found that other ToM measures do not (Devine et al., 2016, Osterhaus et al., 2016), preventing conclusions about the coherence of ToM. Perhaps due to this lack of direct empirical research into the unidimensionality of ToM, a large number of studies continue to consider ToM a unitary construct, employing only one or two measures in order to capture ToM. Only by testing relations across tasks that assess different facets of ToM (e.g., false belief versus hidden emotions) and vary in their other non-ToM cognitive demands can we directly assess underlying ToM coherence (as opposed to coherence among other domains). This empirical exploration into the structure of ToM has both theoretical and practical relevance to the study of social cognition.
To behaviorally address the question of whether varied ToM measures form a unitary construct, we selected a range of widely-used ToM measures designed to capture individual differences in adult and child performance across a variety of specific tasks and modalities which have been argued to be important components of ToM (e.g., verbal versus non-verbal, affective versus cognitive, deliberate versus automatic). The goal of this project was not to replicate literature examining the order of ToM concept acquisition (e.g., Wellman & Liu, 2004) or to determine if a narrow range of ToM tasks (e.g., affective verbal tasks or visual implicit tasks) were related to one another. We instead started with a broad slate of tasks, consistent with theoretical arguments that a diverse set of tasks might be the best route to understanding varied manifestations of ToM (Apperly, 2012). If these varied tasks did not cohere with each other, it would set the stage for future, more targeted work examining components of ToM. If, on the other hand, coherence emerged even on diverse tasks, such a finding would be strong evidence for unity in ToM.
We examined structure across these diverse ToM tasks in both children and adults, as the underlying structure of understanding others’ thoughts may vary across development. Specifically, we administered multiple measures of ToM in early childhood (four-year-olds and six-year-olds), middle childhood (children aged 7–12), and adulthood. We selected a varied set of tasks for each age group, as older individuals are often at ceiling on measures (e.g., false belief tasks) appropriate for younger ages (Hughes, 2016, Lagattuta et al., 2015).
In our analysis of whether ToM measures were interrelated, several developmental patterns of results were possible. First, across all ages, different ToM measures could converge on a single factor. Second, children, but not adults, could show a single ToM factor. This would suggest there is a unitary mental inference ability early in development that becomes more task-specific with age. Third, adults, but not children, could show convergence of ToM measures, potentially indicating that years of social experience crystallize ToM differences. In these scenarios, the middle childhood group could serve as an intermediary point between the preschoolers and adults. Finally, ToM might not form a unitary construct within any age group. Although conclusions from this study are necessarily limited to the specific set of tasks used, each of these potential findings has theoretical and practical implications for our understanding of ToM development, serving as a springboard for future research.
Section snippets
Participants
We initially collected data from 40 four-year-olds (14 males; average age 54 months), 38 six-year-olds (17 males; average age 79 months), and 40 children aged 7–12 (20 males, average age 10.09 years). Our analyses suggested that ToM measures were not related to each other. To ensure that these results were not due to limited power, we then increased our sample size. Specifically, we targeted a sample size for each age group that would have 80% power to detect moderate correlations
Descriptive Statistics
All ToM tasks produced a wide range in performance (Table 3; see Supplemental Materials for histograms for all tasks across all age groups). Given that several tasks could only produce a limited range of values (e.g., integer scores from 0 to 4), non-parametric test were conducted on the data. Consistent with previous research, six-year-olds scored higher than four-year-olds on all tasks the groups had in common: the false belief index (Mann-Whitney U test = 3426.0, p < .001), the
Discussion
In the current study, we examined the relations between varied ToM measures at three time points across development. For the tasks used, no clear structure underlying ToM emerged for any developmental period. Specifically, after controlling for potential confounding variables (e.g., age, verbal ability), ToM tasks were minimally correlated in early childhood, in middle childhood, and in adulthood, a finding which was supported by Bayesian analysis that endorsed the null hypothesis. This finding
Funding
Internal support funds through the University of Maryland provided to ER and through Texas State University provided to KRW.
Declaration of Competing Interest
None.
Acknowledgements
The authors wish to thank Seleste Braddock, Viviana Bauman, Robert Cai, Shannon Coveney, Callie de la Cerda, Laura Anderson Kirby, Sydney Maniscalco, Dustin Moraczewski, Jacqueline Thomas, Daniel O’Young, Kayla Velnoskey, Brieana Viscomi, & Marieke Visser for their assistance.
References (86)
- et al.
Two reasons to abandon the false belief task as a test of theory of mind
Cognition
(2000) - et al.
Connecting the dots from infancy to childhood: A longitudinal study connecting gaze following, language, and explicit theory of mind
Journal of Experimental Child Psychology
(2015) - et al.
The differentiation of executive functions in middle and late childhood: A longitudinal latent-variable analysis
Intelligence
(2014) - et al.
Measuring theory of mind across middle childhood: Reliability and validity of the silent films and strange stories tasks
Journal of Experimental Child Psychology
(2016) - et al.
Implicit and explicit processes in social cognition
Neuron
(2008) - et al.
Autism: Beyond “theory of mind”
Cognition
(1994) - et al.
33-month-old children succeed in a false belief task with reduced processing demands: A replication of Setoh et al. (2016)
Infant Behavior and Development
(2019) Theory of mind grows up: Reflections on new research on theory of mind in middle childhood and adolescence
Journal of Experimental Child Psychology
(2016)- et al.
Limits on theory of mind use in adults
Cognition
(2003) - et al.
Mentalizing regions represent distributed, continuous, and abstract dimensions of others' beliefs
NeuroImage
(2017)