1 Introduction

The last two decades produced a tremendous amount of research on the processes underlying moral judgments. The standard way to test alternative descriptive theories of moral judgment is by asking subjects to evaluate (amongst others) trolley dilemmas, which pit one moral theory against another. In the standard description of the ‘Switch’ trolley dilemma, a switch can be pulled to redirect a train that is out of control to a different track. If nothing is done, the train will kill five people who are standing on the train’s track. If the switch is pulled, the five people will be saved, but a person who is standing on the track onto which the train is redirected will be killed (Foot 1967). In the ‘Push’ version of this scenario, the only option available to save the five people is to push a heavy person who happens to be standing on a footbridge above the tracks into the trolley’s path (Thompson 1985). Trolley dilemmas, and comparable cases of harm-based moral dilemmas are ‘sacrificial’ dilemmas because they involve sacrificing (at least) one person to save a greater number (Kahane 2015).

Given the structure of sacrificial dilemmas, choosing to act (e.g. switching tracks, or pushing the heavy person off the bridge) can be classified as making a utilitarian moral judgment. Choosing to remain passive can be classified as making a deontological moral judgment (the judgment can be classified as such, even though the reason for a subject’s inaction might be action-aversion, rather than explicit endorsement of deontological principles; these factors must not be confounded, see Gawronski et al. (2016)). More generally, any judgment that serves the greater overall good (usually by saving the greater number of lives) can be classified as ‘utilitarian’ (Conway et al. 2018, p. 242). Conversely, any judgment that does not serve the greater good is classified as ‘deontological’. Research in moral psychology has shown that the majority of people approve of intervening in Switch but disapprove of intervening in Push (Cushman et al. 2006; Greene et al. 2009; Hauser et al. 2007; Waldmann et al. 2012).

Why do people make utilitarian moral judgments in Switch and deontological moral judgments in Push? Existing moral psychological research has addressed a number of personal (Baez et al. 2017; Bartels and Pizarro 2011; Gao and Tang 2013; Koenigs et al. 2011; Mendez et al. 2005) as well as situational factors (Greene et al. 2009; Starcke et al. 2012; Valdesolo and DeSteno 2006) that correlate with utilitarian and deontological moral judgments. Also, moral philosophical work has extensively discussed the normative implications and questions that arise based on the moral psychological findings (see Bruers and Braeckman 2014 for an overview). This literature review assesses which situational factors influence utilitarian moral judgments in sacrificial moral dilemmas..

What are situational factors? Switch and Push are different – the action type in the former case, to name just one example, is switching a lever, while it is pushing a person in the latter. Should we count features of the case such as ‘action type’ as situational factors? Alternatively, must situational factors be independent of the case description? This review uses the following definition of a situational factor:

Situational factor: Factor f is a situational factor in case c with answer options a1 (classified as utilitarian option) and a2 (classified as deontological option) if and only if f does not affect the classification of a1 or a2 as utilitarian or deontological and f is not a dispositional factor of the agent.

Hence, ‘action types’ are situational factors, as are stress-levels, or wordings of the case, but anti-social disorders or intelligence are not.

Studying the influence of situational factors on moral judgments relates to the broader psychological project of describing, explaining, and understanding the nature of moral judgments. As will be shown, the effects of some known situational factors are well explained by dual process theory, which predicts that utilitarian judgments are typically the products of controlled cognitive processes that overturn automatic emotional responses (Conway et al. 2018; Greene et al. 2004; Greene et al. 2001). For example, it has been shown that inducing positive mood increases utilitarian judgments in some cases (Strohminger et al. 2011). Relatedly, presenting cases in a foreign language increased the frequencies of utilitarian responses, probably by stimulating cognitive control (Corey et al. 2017; Costa et al. 2014; Geipel et al. 2015a, b; Muda et al. 2018). So, according to dual process theory, utilitarian responses to sacrificial dilemmas should increase in frequency when a) subjects experience less negative affect (as when they are in good spirits) and/or b) subjects can exert more cognitive control (which is helped by, for example, using a foreign language).

Researchers have also consistently found order effects on moral judgments (Liao et al. 2012; Petrinovich and O’Neill 1996; Schwitzgebel and Cushman 2012). The typical response pattern (to wit, acting in Switch is judged permissible, acting in Push is judged impermissible) does occur to a lesser extent when subjects are first presented with Push and then with Switch (in which case acting in Switch is often also judged impermissible). Effects of the order of presentation do not obviously fit with dual process theory. Though the order of presentation may increase affect (for example, if emotionally arousing dilemmas are presented first), or decreases cognitive control (for example, through an effort to maintain consistency between cases), previous studies have found no difference in emotional involvement between the relevant dilemmas (Nakamura 2013; Horne and Powell 2013).Footnote 1 Instead, to explain the impact of the situational factor ‘order of presentation’, some researchers have proposed that moral judgments are sensitive to the locus of intervention (to wit, whether the subject has to interact with an object that causes harm [in which case utilitarian moral judgments increase in frequency] or with a subject that is itself harmed [in which case deontological moral judgments increase in frequency]; Wiegmann and Waldmann 2014).

The aim of this literature review is to assess whether the prediction of dual process theory is borne out in the literature by assessing whether situational factors that inhibit cognitive control and/or increase affect lower utilitarian judgments. Moreover, this review will explore which further situational factors, such as order effects, have been shown to influence moral judgments in sacrificial dilemmas.

Studying the influence of situational factors on moral judgement is of great practical relevance. Ideally, a good understanding of that influence could help individuals and societies to improve moral decision making. To that effect, it would be helpful to identify and understand commonalities between situational factors that have an influence on utilitarian moral judgment. Therefore, to take a first step in that direction, this literature review will also propose and critically examine an organizing classification of situational factors into classes and examine the pooled effects per class on utilitarian moral judgements via-a-vis the predictions of dual process theory. Moreover, a more fine-grained classification of different situational factors could potentially illuminate correspondences to more fine-grained types of utilitarian and deontological moral judgement.

2 Theory

2.1 Previous Research and Key Concepts

2.1.1 The Dual Process Theory of Moral Judgment

The dual process theory of Greene and colleagues says that both cognitive and affective processes play a causal role in generating moral judgments (Greene et al. 2001, 2004; Greene 2008). There is no rationalistic nor affective moral faculty that produces moral judgments, but “instead, they are influenced by a combination of automatic emotional responses and controlled cognitive processes with distinctive cognitive profiles” (Amit et al. 2014, p. 340).

Hence, the dual process theory incorporates insights from the formerly prominent Kohlbergian and the social intuitionist model of moral judgement, but it does not unify them: it is, strictly speaking, an alternative to either theory. In contrast to the Kohlbergian rationalist model, some moral judgments are driven entirely by fast, intuitive responses without the influence of reasoning. In contrast to the social intuitionist model, some moral judgments are driven entirely by slow, controlled responses that can override the influx of intuition. Thus, the dual process theory postulates that there are two distinct types of moral judgments, driven by two distinct processes, which have come to be known as system 1 and system 2 processing (Kahneman 2012). Figure 1 illustrates the dual process theory.

Fig. 1
figure 1

A dual process model of moral judgment. System 1 and System 2 processing are illustrated related to person A’s and B’s judgment process, respectively. Pointed arrows refer to causal processes that are only sometimes active, solid arrows display causal relations (own illustration)

System 1 processing refers to fast processing that is often associated with processes and mechanisms that originate early in human genealogy. Hence, Greene and colleagues write that “the social-emotional responses that we’ve inherited from our primate ancestors (due, presumably, to some adaptive advantage they conferred), shaped and refined by culture-bound experience, undergird the absolute prohibitions that are central to deontology” (Greene et al. 2004, p. 389). System 1 processing has been associated with the aforementioend deontological moral judgments.

System 2 is associated with controlled, effortful processing. Greene and colleagues note that the ‘moral calculus’ that defines utilitarianism “is made possible by more recently evolved structures in the frontal lobes that support abstract thinking and high-level cognitive control” (Greene et al. 2004, p. 389). System 2 processing has been associated with the aforementioned utilitarian moral judgments.

Support for a dual-process theory of moral judgment comes from neuroimaging studies (e.g. Greene et al. 2001). Utilitarian judgments are often accompanied by greater activities in brain regions associated with controlled processing. Conversely, activation in brain areas associated with social-emotional processing has been shown to be correlated with deontological judgments.

Importantly for this review, the activation in cognition related areas of the brain that correlates with utilitarian moral judgment cannot readily be explained by a social intuitionist model of moral judgment. Though the social intuitionist model makes room for reasoning in the genesis of moral judgment (see Haidt 2001), it postulates that such controlled processes are rare and the exception in moral judgment. Greene and colleagues, however, have found that these processes occur very often, which lends support to the dual-process theory as opposed to the social intuitionist model.

In contrast to the Kohlbergian rationalist model of moral judgment, the dual process theory can also accommodate the central positive claim of the social intuitionist model of moral judgment: moral judgments are sometimes driven by affect.

Therefore, the dual process model of moral judgment seems like a valuable starting point in evaluating the nature of moral judgments. In assuming this starting point, it is helpful to keep in mind that the moral philosophical distinction between utilitarian and deontological moral judgements may not perfectly correlate with the moral psychological system 1 / system 2 distinction, which refers to the causal processes that influence moral judgements. For example, moral judgements may often on reflection fall in line with deontological principles, which seems to involve system 2 processing at some point (e.g. Sauer 2017). For this review, focus will be on the moral psychological distinction and thus the pertinent point is whether utilitarian moral judgements in a given decision situation are affected by system 1 or system 2 processes.

An important question, however, concerns the evidence for the claim that both types of moral judgments (driven by system 1 and system 2 processes, respectively) are completely distinct rather than two sides of the same coin. That is, is a utilitarian moral judgment simply a non-deontological moral judgment, but based on a completely distinct underlying process?

2.1.2 Sacrificial Moral Dilemmas

The question about the relation of deontological and utilitarian moral judgment has been addressed in recent research using so-called sacrificial moral dilemmas and, at the same time, research using such dilemmas provides the second pillar of support for a dual process theory.

The best known sacrificial moral dilemma is the trolley problem, which has baffled ethicists for decades (Foot 1967; Thompson 1985). In philosophical ethics, the trolley problem is used in an attempt to weigh up different first-order normative theories, such as deontology and utilitarianism, against one another. The idea behind thought experiments like the trolley problem is to ‘test’ a theory against the moral intuitions of professional philosophers by applying the theory to a suitable test case and to see whether the theory under scrutiny would give the right results (cf. Di Nucci 2013). For example, consider the standard ‘Switch’ trolley scenario that Greene et al. (2001) adopted from Foot (1967) and Thompson (1985)Footnote 2:

Switch: You are at the wheel of a runaway trolley quickly approaching a fork in the tracks. On the tracks extending to the left is a group of five railway workmen. On the tracks extending to the right is a single railway workman. If you do nothing the trolley will proceed to the left, causing the deaths of the five workmen. The only way to avoid the deaths of these workmen is to hit a switch on your dashboard that will cause the trolley to proceed to the right, causing the death of the single workman. Is it appropriate for you to hit the switch in order to avoid the deaths of the five workmen?

In the switch dilemma, the utilitarian response is to pull the switch since, roughly speaking, doing this would save the greater number of people. Moral philosophers like Thompson (1985) use scenarios like the trolley experiment to understand and dissect their intuitions about moral theories. Since it is commonly accepted, for instance, that it is proper to pull the switch in the switch case, the fact that utilitarianism gives the ‘right’ recommendation, in this case, is taken, by philosophers, as providing support for utilitarianism as a normative ethical theory.

In contrast, the following standard ‘Push’ dilemma, adopted by Greene et al. (2001) from Foot (1967) and Thompson (1985), was specifically designed to elicit an utilitarian response that would conflict with the intuitions of many people by recommending an action that seems to be wrongFootnote 3:

Push: A runaway trolley is heading down the tracks toward five workmen who will be killed if the trolley proceeds on its present course. You are on a footbridge over the tracks, in between the approaching trolley and the five workmen. Next to you on this footbridge is a stranger who happens to be very large. The only way to save the lives of the five workmen is to push this stranger off the bridge and onto the tracks below where his large body will stop the trolley. The stranger will die if you do this, but the five workmen will be saved. Is it appropriate for you to push the stranger on to the tracks in order to save the five workmen?

Thus, in the Switch case, utilitarianism seems to get it right, but in the seemingly analogous push case, utilitarianism gets it wrong. The push dilemma can now be employed in an argument against utilitarianism based on the following considerations: if it seems wrong to perform the ‘utilitarian’ response in the push dilemma, then this is a reason against utilitarianism, because utilitarianism gives the intuitively wrong results – it does not cohere with normative intuitions (Kamm et al. 2016). Since there are no morally relevant differences between the switch and the push case according to utilitarianism (since what is morally relevant is relative to moral theory) any misgivings that one might have with the utilitarian response must be due to utilitarianism itself. Hence, utilitarianism ought to be rejected, or so Thompson argued.

Being clear about the philosophical background of sacrificial dilemmas is important because it helps to see that they are designed to yield conflicting judgments depending on whether one accepts deontology or utilitarianism. To make sure that eventual conflicts in intuition are due to a conflict in normative theory, philosophers took great care to design the dilemmas so that the judgment or action recommended by utilitarianism would be opposed to the judgment or action recommended by deontology, irrespective of the details of the case.

Sacrificial moral dilemmas only recently became a topic for empirically minded moral psychologists, with the advent of dual process theory of moral judgment. The fact that they are designed to bring out differences in ethical theory makes them a suitable starting point for experimental research on moral judgments. Beginning with the work of Greene et al. (2001), Greene et al. (2004), and Koenigs et al. (2011) there is now a large battery of sacrificial dilemmas being used in experimental research on moral judgments.Footnote 4

At the same time, the use of sacrificial dilemmas has been amply criticized, mainly because they seem to pose unrealistic and invalid test-cases for real moral judgments (Bloom 2011; Pizarro et al. 2003). Anderson (2018) as well as Kamm et al. (2016), for example, have recently argued on theoretical grounds that sacrificial dilemmas incorporate too many uncontrolled variables to draw clear conclusions about a dual process theory; indeed, they argue, some of the found factors might be morally relevant and thus reason to doubt the central claims of dual process theory.

2.1.3 Influences on Moral Judgments in Sacrificial Dilemmas

The use of sacrificial dilemmas has indeed brought to the fore a host of influencing factors on moral judgment, which allowed researchers to establish correlations between various situational factors and both deontological and utilitarian moral judgment. A major strand of the literature has been concerned with establishing correlations (situational factor to moral judgment) and inferring something about the underlying processes that account for the correlation. The established influences can be distinguished into two broad classes: situational and personal influences.

For example, to name but a few, amongst the personal influences, it has been found that psychopathy (Gao and Tang 2013; Koenigs et al. 2011; Patil 2015), cognitive and emotional impairments, alcohol dependence (Khemiri et al. 2012), and gender (Fumagalli et al. 2010; La Olivera Rosa et al. 2016) have influenced moral judgment. Amongst the situational factors, it has been found that, for example, cognitive load (Greene et al. 2008) and time pressure in answering the dilemma decreases (Suter and Hertwig 2011), whereas cognitive control (Conway and Gawronski 2013) and incidental positive affect (Strohminger et al. 2011) increase the frequency of utilitarian moral judgment.

A pernicious finding in research on personal influences on moral judgment has been that seemingly maladaptive or at least unwelcome personality traits, such as psychopathy, have been linked with increased frequency of utilitarian moral judgment (Bartels and Pizarro 2011; Koenigs et al. 2011). It would seem, in light of these findings, that utilitarianism as a normative theory is put into question. To sidestep these controversial issues, one could focus on situational factors, whose influence on moral judgment does not straightforwardly invite inferences about the validity of either utilitarianism or deontology.

On the face of it, and in contrast to the criticism mentioned at the end of the previous section, the effects of both personal and situational influences on moral judgment are by a large in line with a dual process theory of moral judgment. On the one hand, those factors that (plausibly) either deactivate system 1 processing or activate system 2 processing increase the frequency of utilitarian moral judgments. On the other hand, those factors that (plausibly) either activate system 1 processing or deactivate or inhibit system 2 processing decrease the frequency of utilitarian moral judgment.

At the same time, the finding that seemingly morally irrelevant personal and situational factors influence moral judgments gives itself rise to a potential point of criticism. Even if dual process theory is correct, then what is the relation of the different influencing factors on moral judgment? According to dual process theory, influencing factors must affect moral judgment either by reducing the onset of system 1 activation or by increasing the onset of system 2 activation, as illustrated in Table 1.

Table 1 Two Pathways to Influencing Utilitarian Moral Judgments

As the table shows, the effects of situational factors are logically constrained to two possibilities: Inhibition System 1/Activation System 2 and Activation System 1/Inhibition System 2. Still, for any situational factor that influences moral judgment, it is not revealed whether the effect is due to one of side of either effect-pair, or a combination of them. In other words, the pathways illustrated in Table 1, illustrate routes to influencing moral judgment, but it is an open question how those pathways are built.

Given the tremendous amount of research on this topic, it is timely to conduct a literature review to systematically investigate what situational influences have been found, and how they affect moral judgment. More specifically, to further explore the influence of situational factors on utilitarian moral judgment, and to track its relations to dual process theory, the following hypotheses are proposed.

2.2 Hypotheses

Dual process theory posits that the frequency of utilitarian moral judgments is positively influenced by controlled cognitive processes and negatively influenced by automatic affective processes. H1 investigates the predictions of dual process theory, where H1a concerns the first, system 2 related, and H1b the second, system 1 related, prediction. H2 investigates the influence of a situational factor that cannot be explained by dual process theory.

2.2.1 Hypothesis 1a

If cognitive control is high (System 2 activated), there will be increased frequency for utilitarian moral judgments.

2.2.2 Hypothesis 1b

If affect is high (System 1 activated), there will be decreased frequency for utilitarian moral judgments.

2.2.3 Hypothesis 2

If the locus of intervention refers to a subject that is harmed by the intervention (vs. one that is not harmed by the intervention) then there will be decreased frequency for utilitarian moral judgments.

3 Method

The pre- and post-analysis report of the study are registered in OSF at https://osf.io/c38r7/?view_only=56174b305b3840c8b99ded7a8797b7ad.

3.1 Search Strategy and Selection Criteria

After an initial screening of the literature, the following databases and search terms were identified to conduct the systematic literature review following PRISMA guidelines (Moher et al. 2009).

The databases used were PsychArticles, Psychological and Behavioral Sciences Collection, PsycINFO, PSYINDEX, and Philosopher’s Index.

The literature search used the following (combinations of) search terms (N = number of results per combination of search terms)Footnote 5:

The search terms in the second conjunct of each search listed in Fig. 2 were selected based on the initial screening of the literature. For each search term, there was evidence from initially screened literature that the search term might be related to the activation/inhibition of System 1 or System 2 processing or to the causal representation of a sacrificial dilemma and thus relevant for the present study. For example, consider the search-term “time” in top bracket of Fig. 2. Manipulating response times may affect System 1 / System 2 processing and thus it is a potentially relevant factor in assessing people’s utilitarian responses in sacrificial moral dilemmas.

Fig. 2
figure 2

Overview of search terms

After removing duplicates from the pre-selection of literature (based on database search and additional search), each article has been assessed for relevance in two steps. In a first step, abstracts and sometimes full texts were screened for relevance, applying the selection criteria. In the second step, the remaining full texts were analyzed for relevance, applying the selection criteria.

The selection criteria were as follows. Studies had to be in English and published in peer-reviewed journals.Footnote 6 The studies had to measure moral judgements based on sacrificial dilemmas, where there are two action types that could be classified as utilitarian and deontological according to the above operationalization.

Studies that report a failed manipulation check were excluded from the analysis because they did not allow an inference as to whether there was no effect because the manipulation failed or because there is no relevant connection between the dependent and independent variable(s).

3.2 Literature Search

The literature search with the search terms and databases specified in Fig. 2 above was performed by the authors in September 2018 and 915 items were found. After removing duplicates, the list contained 372 items. The search was duplicated by another rater [anonymized 2], with extensive previous experience in conducting PRISM reviews. After the authors screened for eligibility, 295 articles were removed and 77 articles were left for full-text analysis. A further 24 articles were excluded upon close reading (based on the exclusion criteria specified above). 16 additional articles were added based on identification through sources by the authors. That is, additional articles were found by examining the reference lists of our search results. Thus, the final list contained 53 articles for analysis. Figure 3 provides a graphical overview of the selection process.

Fig. 3
figure 3

PRISMA 2009 flow diagram

All analyzed articles, as well as the relevant studies (k = 82; there were sometimes multiple relevant studies per article), are reported in Table 5 (see appendix) to enable a compact overview over the essential findings of the analyzed studies.Footnote 7

Figure 3 illustrates the literature search using a PRISMA diagram.

Effect sizes were either taken directly from the paper, or computed from reported inferential or descriptive statistics (Lenhard and Lenhard 2017). Three of the analyzed studies did not contain sufficient information to calculate effect sizes and the reported effect sized were provided by authors on request (see Appendix I for an overview of the studies). An exploratory analysis revealed a total of 32 different situational factors that influence moral judgments in sacrificial dilemmas.

3.3 Situational Factor Subgroup Classification

The author classified the situational factors tested in the reviewed studies along two dimensions. The first dimension was the system (System 1 or System 2) presumably affected by the situational factor. We adopted the classification of situational factors according to their presumed effect on System 1 and System 2 processing from the study that reported the situational factor. For example, when study x sought to evaluate the effect of the situational factor ‘cognitive control’ on utilitarian moral judgements by activation of System 2, we adopted the classification of cognitive control as activating System 2. Some studies considered situational factors that, according to the respective study’s author(s), could not be associated with an effect on the inhibition or activation of System 1 or System 2 processing. We classified these situational factors as ‘Other.’ As a ‘sanity check,’ we also considered whether the properties associated with System 1 and System 2 were in line with the classification proposed by the authors (cf. Kahneman 2012; Greene 2015). The initial classification was done by the author, and spot checked by Dorothea Mischkowski. We did not have a second rater for all reviewed studies nor did we formally calculate inter-rater reliability because we adopted the ratings from our reviewed studies. This procedure tracks the reliability of classifications of situational factor classifications in the reviewed literature. Therefore, our classification will be as problematic as the common association of properties (e.g. controlled, effort-full, or slow) and their associated stimuli (e.g. priming cognitive control) with a particular system (e.g. System 2). This is a point to which we will return in the discussion section.

The second dimension of classification aimed to capture properties of the reviewed situational factors beyond their presumed effect on System 1 or System 2 processing. The author categorized situational factors into two distinct sets of factors, Judge and Presentation, where the Presentation set has two further subsets, Sacrifice and Victim(s), as illustrated by Fig. 4:

Fig. 4
figure 4

Classes of Situational Factors

The categorization was informed by the independent variable that was modified to alter moral judgments. Conceptually, we can distinguish factors that belong to the presentation of a situation from factors that affect the judge or observer of a situation.

A situational factor pertains to the Presentation class if and only if the independent variable modified in the experiment alters a feature of (the description of) the dilemma, such as duration of presentation, language, wording, or other (morally irrelevant) content of the dilemma (k = 35, N = 4284). Examples are the medium of presentation, the extent to which the actor in the dilemma has to exert personal force (e.g. by switching tracks), the language in which the dilemma was written, or whether or not responses to the dilemma had to be given under time pressure.

Factors belonging to the Presentation set are conceptually distinct from factors belonging to the Judge set. A situational factor pertains to the Judge set if and only if it does not pertain to the Presentation class (k = 35, N = 7291). Less formally, situational factors of the Judge class affected the experience or state of the subject but not by altering any details of the dilemma at hand. Examples are the incidental serotonin level or cognitive load under which subjects were placed. Which do not alter the (description of) the dilemma, but rather alter the decisions situation by affecting the judge directly. Naturally, how a situation is presented will eventually affect the judge (if it has any effect as a situational factor), but it stands to reason that there might be a difference to the extent to which factors belonging to the Presentation set influence utilitarian moral judgement vs factors from the Judge set.

Within the Presentation set, two relevant sub-sets were distinguished: the Sacrifice (k = 3, N = 1239) and Victim(s) set (k = 9, N = 1569). Again, the classification was done by [anonymized 1] and spot checked by [anonymized 2] and based on a conceptual distinction: some factors of the Presentation set may alter features of the individuals that might be harmed by performing the relevant action in a sacrificial dilemma (such as pulling the switch), which the subject has then to perform or to rate for its acceptability. Examples of this category are whether the judging subject itself is the potential sacrifice. This is conceptually different from factors that alter features of those that would be harmed if the subject decides not to act (e.g. not to pull the switch in the Switch dilemma). So, the Victim(s) set pertains to independent variables that alter features of the individuals that might be harmed by abstaining from performing the relevant action in the dilemma (such as pulling the switch), which the subject then has to perform or to rate for its acceptability. Examples of this category are the severity of the harm that potentially befalls the victims of the dilemma (e.g. death vs non-lethal harm) or the relationship the potential victims have to the judge.

The proposed conceptual distinctions may – apart from the practical implications mentioned at the outset – serve experimental purposes. It could be hypothesized that there are finer grained modules that drive moral judgement based on, say, different types of system 1 intuitions.Footnote 8 Clearly, some deontological norms apply only to a subset of sacrificial moral dilemmas (e.g. the doctrine of double effect applies only to ‘Push’ type dilemmas). If we assume that moral intuitions reflect these norms we should expect that relevantly different dilemmas should result in different response patterns. But to test that experimentally, we need to understand how situational factors differ. The proposed ‘Judge’ vs ‘Presentation’ distinction, for example, could be a first starting point to test that suggestion. ‘Judge’ factors do not affect the situation, and so if there are moral intuitions that track the doctrine of double effect specifically, they should not be influenced by factors of the Judge class. Eventually, it would be desirable to (experimentally) identify effects of the situational factors apart from their effect on utilitarian moral judgment and to use these effects as a basis for proposing an alternative classification. Because this approach first requires and overview of situational factors provided by this review, however, the conceptual distinctions are more expedient for the purposes of this review.

3.4 Statistical Methods

Effect sizes were pooled and analyzed in MS Excel and R (4.3.4), using the meta, metaphor, and dmetar packages (cf. Harrer et al. 2019).Footnote 9 . The analyzed studies vary in several important characteristics (e.g. intervention type). Therefore, we rejected the assumption that all studies along with their effect sizes stem from a single homogeneous population and chose a random-effects as opposed to a fixed effects-model(Borenstein et al. 2009).Footnote 10According to a random-effects model, we assumed that the true effect size θk of each study k is a part of a distribution of true effect sizes with mean μ, sampling error ϵk, and an additional error ζk from the variance in the distribution of true effect sizes. The formula for the random-effects model is as follows:

$$ {\hat{\uptheta}}_{\mathrm{k}}=\mu +{\upepsilon}_{\mathrm{k}}+{\upzeta}_{\mathrm{k}} $$

We used the standard DerSimonian-Laird method to estimate the variance of the distribution of true effect sizes.Footnote 11

Standard errors for each k were taken from the study directly or calculated from Cohen’s d and the study’s p value, following Altman and Bland (2011). We assessed between study heterogeneity using a I2 and Cochran’s Q test of heterogeneity (Higgins and Thompson 2002).

In case of heterogeneity, we determine and exclude as outliers those studies for which the upper bound of the 95% confidence interval was lower than the lower bound of the pooled effect confidence interval (i.e., extremely small effects), and studies for which the lower bound of the 95% confidence interval is higher than the upper bound of the pooled effect confidence interval (cf. Viechtbauer and Cheung 2010). With outliers removed, the overall effect can then be assessed again.

All subsequent statistical evaluations, such as hypothesis tests, were done on the dataset with outliers removed. The studies removed for analysis of the general effect are listed in Table 5 in Supplementary Materials I.

With the outliers removed, we test for publication bias using a funnel plot and assess it using Egger’s test to avoid known problems of subjectivity in interpreting the plot.

4 Results

36 situational factors have been found to have an effect on utilitarian moral judgements in sacrificial moral dilemmas. Across all situational factors, the fixed-effects model showed a small (Cohen 1988) to desirable (Hattie 2009) effect (g = .40, 95% CI [0.35, 0.46], p < 0.0001). With a I2 value of 61.8% (95% CI [51.5%, 69.9%]) and Q = 212.18 (p < 0.0001), there was evidence for moderate to substantial heterogeneity in the data. After outliers (k = 9) were removed from the dataset, the fixed effect model showed a small to moderate effect (g = .36, 95% CI [0.33, 0.40], p < 0.000), with little evidence of between study heterogeneity (I2 = 0.0%, 95% CI [0.0%, 27.4%]; Q = 71.39, p = 0. 4981). The funnel plot (Fig. 5) showed visual evidence for publication bias. Egger’s test was significant (p < 0.0001), which suggests significant asymmetry in the funnel, which could be caused by publication bias.

Fig. 5
figure 5

Funnel plot (with outliers removed)

Table 2 gives an overview of the situational factors and their assigned classes, their type, direction of influence, and their supposed underlying system (through which they exert their influence on moral judgments). Since single situational factors were often addressed by multiple studies, only aggregate effects are reported in Table 3 if there are multiple studies. A complete report of all individual effect sizes, including notifications about where effect sizes have been recalculated, and significance levels per study (k = 82) are summarized in Table 5 in the Supplementary Materials I.

Table 2 Overview of Study Designs and Main/Interaction Effects Overview of Study
Table 3 Summary of Exploratory Analysis of Situational Factors of Studies Included in Literature Review (Outliers Removed)

Though all results are reported in this review, it is important to note, as Table 2 illustrates, that just 16 articles out of the 53 analyzed articles reported main effects of situational factors on moral judgment (30%). The majority of articles (N = 37, 70%) did not report main effects of situational factors on utilitarian moral judgments. Instead, most effects were observed only for a subset of sacrificial dilemmas, such as personal dilemmas (in contrast to impersonal dilemmas) or personal high conflict dilemmas (in contrast to personal low conflict dilemmas).Footnote 12

Thus, many findings are restricted to a subset of sacrificial dilemmas. This is an important observation. Previous studies found that the dilemma type already has an effect on moral judgement (e.g. Greene et al. 2009; see also section 4.1 below). ‘Personal’ sacrificial dilemmas like ‘Push’ tend to decrease the rate of utilitarian moral judgment, compared to ‘impersonal’ sacrificial dilemmas like ‘Switch.’ It is thus not always immediately clear whether an increase or decrease in utilitarian moral judgement due to a given situational factor goes for personal dilemmas, impersonal dilemmas, or both. Table 5 specifies the experimental design used in each study. However, a restriction to personal sacrificial dilemmas was not an exclusion criteria in the setup of this review and all studies that found interaction effects (N = 21) or tested dilemma type as a 1-level factor (N = 16) reported significant effects for personal sacrificial dilemmas only. Therefore, the situational factors reported in this study must be read as situational factors that affect moral judgments in personal sacrificial dilemmas. Only a few situational factors are broader in that they affect moral judgments in sacrificial dilemmas, independently of whether they are personal or impersonal sacrificial dilemmas.Footnote 13

Relatedly, there is an important caveat: there were many interaction effects between different types of situational factors. For example, the situational factor action choice (whether subjects are asked whether they would do action x or whether they would find x acceptable) had a significant impact on moral judgments. Not all studies allowed an inference about the relevant and/or controlled interactions and thus reported effects may sometimes be mere interactions with more ‘basic’ situational factors.

4.1 The Trolley Effect

There was a strong effect of dilemma type, which can be called the ‘trolley effect’. When subjects evaluated personal moral dilemmas, such as push, as opposed to impersonal dilemmas, such as switch, the frequency of utilitarian moral judgments significantly decreased. Eight articles (after outliers were removed) reported significant main effects of dilemma type where the original push and switch dilemmas were used, and the mean effect size pointed to a considerable effect, g = .40 (N = 1271; Duke and Begue 2015; Kvaran et al. 2013; La Olivera Rosa et al. 2016; Manfrinati et al. 2013; Moore et al. 2008; Sachdeva et al. 2015; Shallow et al. 2011; Sylvia et al. 2013).

Nonetheless, as discussed above, dilemma type will not be discussed as a separate situational factor in this review, because it is insufficiently clear what the distinguishing features are that account for the difference between push and switch. One possibility is that the trolley effect is an idiosyncratic result of the very wording of the standard push and switch dilemma. What seems more plausible, however, is that there is a generalizable feature of the push dilemma that accounts for its effect on the frequency of utilitarian moral judgment. This feature, or features, might be, for example, the situational factor personal force, intention, or empathic concern, which are discussed individually below.

4.2 Subgroup Analysis

The effects of situational factors can be pooled by their conceptually defined sub-groups. A more detailed explanation of the situational factors included in each class and their operationalization can be found in Supplementary Materials II.

Across all assessed studies, effect sizes pooled by class ranged between medium strong effects. For the Judge class, we found g = .34 (SE = 0.029, 95% CI [0.28, 0.40], p < 0.001), for the Presentation class g = .36 (SE = 0.024, 95% CI [0.32, 0.41], p < 0.001), for the Victim class g = .44 (SE = 0.086, 95% CI [0.28, 0.62], p < 0.001) and for the Sacrifice class g = .78 (SE = 0.152, 95% CI [−0.48, 1.07], p < 0.001).

Between study heterogeneity was very low for the Judge, Presentation, and Sacrifice class (I2 = 0%, 4%, 0% respectively), but considerable for the other Victim class (I2 = 48%). This suggests that the categorization for the Victim class was too coarse grained, but that the Judge, Presentation, and Sacrifice category situational factors had homogeneous effects, respectively. For the Sacrifice category, however, there were only two studies. Collapsing the Victim and Sacrifice class into the Presentation class yielded g = .39 (SE = 0.025, 95% CI [0.31, 0.41], p < 0.001, I2 = 20%) for the Presentation class.

There are four key takeaways. First, situational factors of both the Judge and Presentation class had significant pooled medium to desirable effects on utilitarian moral judgement (Hattie 2009). Second, there was no considerable difference in the effect on utilitarian moral judgement between situational factors of both classes. Third, between both classes, there was no evidence for heterogeneity (Q = 1.9423, p = 0.1634). Finally, within their class, the effects of situational factors were as homogeneous than the effects of all situational factors taken together(I2 = 0.0%) for the Judge class (I2 = 0.0%), but not for the Presentation class (I2 = 22%, with Victim and Sacrifice class included). This suggests that the conceptual classification of situational factors presented here might, at least in the case of the Judge class,be expedient for subsequent experimental investigation.

4.3 Hypothesis 1a Results

Hypothesis 1a read ‘If cognitive control is high (System 2 activated), there will be increased frequency for utilitarian moral judgments’. As predicted, increased activation of system 2 by situational factors such as abstract mode of thought, cognitive control, time delay, or psychological distance lead to an increased frequency for utilitarian moral judgment, with effect sizes ranging from g = 0.24 (small effect) to g = 0.63 (small to medium effect). Likewise, situational factors that plausibly disrupt operation of system 2, like time pressure, incidental stress, and cognitive load decreased utilitarian moral judgment, with effect sizes ranging from g = 0.28 (small effect) to g = 0.41 (small to medium effect). There were many interaction effects, however. Several situational factors were significant only in interaction with personal sacrificial dilemmas.

The pooled effect size for factors that either inhibited or activated system 2 processing was g = .35 (SE = 0.024, 95% CI [0.30, 0.39], p < 0.001, Q = 29.05, I2 = 0%, 95% CI [0%, 22%]). There was thus a small to medium and significant effect (Cohen 1988) of situational factors associated with system 2 activation or inhibition on utilitarian moral judgement. The findings of the meta-analysis can thus be taken to vindicate hypothesis 1a and thus to validate the predictions of dual process theory.

4.4 Hypothesis 1b Results

Hypothesis 1b read ‘If affect is high (System 1 activated), there will be decreased frequency for utilitarian moral judgments.’ As predicted, if affect was high, and system 1 plausibly activated, through situational factors such as mortality salience or harm severity, there was decreased frequency for utilitarian moral judgments, with effect sizes ranging from g = 0.25 to g = 0.63 (small to medium effect). It is not entirely clear, however, whether, and if so how, the valence of affect plays a role. On the one hand, in line with the hypothesis, situational factors that seem to inhibit positive affect, such as incidental coldness or harm severity, decrease frequency for utilitarian moral judgments. On the other hand, positive affect as well as (feeling) social connectedness increase frequency for utilitarian moral judgments, contrary to hypothesis 2.1. Apart from the valence of affect, another possibility is that it is not affect per se that negatively affects the frequency of utilitarian moral judgment through activation of system 1, but empathic concern for the ‘victims’ in the dilemma, as suggested by situational factors such as self, relation to judge, and floweriness. As it stands, the data is not conclusive on the precise mechanism by which system 1 activation relates to the frequency of utilitarian moral judgment. Moreover, there were many interaction effects. Several situational factors were significant only in interaction with personal sacrificial dilemmas.

The pooled effect size for factors that either inhibited or activated system 1 processing was g = .33 (SE = 0.039, 95% CI [0.250, 0.41], p < 0.001, Q = 14.22, I2 = 0%, 95% CI [0%, 40%]).

Again, there was thus a small to medium and significant effect (Cohen 1988) of situational factors associated with system 1 activation or inhibition on utilitarian moral judgement. The findings of the meta-analysis can thus be taken to vindicate hypothesis 1b and thus to validate the predictions of dual process theory.

4.5 Hypothesis 2 Results

Hypothesis 2 read ‘If the locus of intervention refers to a subject that is harmed by the intervention (vs. one that is not harmed by the intervention) then there will be decreased frequency for utilitarian moral judgments.’ Some situational factors could not be related unambiguously to system 1 or system 2. These were grouped in the ‘other’ category. The studies relevant for hypothesis 2 provide evidence that the causal role of the ‘sacrifice’ in sacrificial moral dilemmas has a strong effect on moral judgment, in the direction suggested by the hypothesis. Importantly, the studies analyzed in support of this claim control for interacting factors, such as personal force, and thus support the view that moral judgments are sometimes driven by causal judgments, which is not readily explained by a dual process theory of moral judgment.

The pooled effect size for factors that could not be related unambiguously to system 1 or system 2 was g = .45 (SE = 0.043, 95% CI [0.36, 0.53], p < 0.001, Q = 24.64, I2 = 31%, 95% CI [0%, 61%]).

This can be considered a small to medium effect (Cohen 1988) and within the zone of desirable effects (Hattie 2009). Though the findings in relation to hypotheses 1a and 1b support a dual process theory, this significant finding is not readily explained by dual process theory.

5 Discussion

Overall, the results of this literature review provide moderate evidence in support of hypothesis 1a, according to which activation of system 2 processes increased the frequency of utilitarian moral judgment. Hypothesis 1b, according to which activation of system 1 processes leads to decreased frequency of utilitarian moral judgment, also looks supported by the evidence, but the picture is more ambiguous than in the case of H1a. The main point is that higher affect is not unambiguously related to decreased frequency of utilitarian moral judgments, but the relation seems to be affected by valence such that negative affect behaves in line with H1b, but positive affect does not. In other words, depending on its valence, affect decreases or increases the frequency of utilitarian moral judgments. Given the large number of situational factors, and possible interactions, further investigation of H1b must systematically investigate whether and how increased affect decreases utilitarian judgment. It is tempting to suggest that positive mood would work by increasing the activation of system 2 and thereby account for the effect of some situational factors related to positive affect. It would be interesting to see, in future work, why this should be the case. It might also provide an avenue to investigate some of the findings about the influence of personality factors, such a psychopathy, on moral judgment.

About H2, there were indeed some situational factors, of which causal role is but one, that cannot obviously be traced to system 1 or 2 activation, and that seem best explained by the hypothesis that the locus of intervention affects peoples moral judgments, thus providing support for H2. Moreover, these findings put pressure on the view that the dual process theory of moral judgment can explain the effects of all found situational factors. This, in turn, puts pressure on the claim that a dual process theory of moral judgment can give a complete account of moral judgment. Instead, there are reason to think that there are domain-general processes, such as causal reasoning, that carry over to moral judgment.

More generally, this review confirms strongly the ‘trolley effect’, the affirmation of action (which corresponds to a utilitarian judgment) in impersonal dilemmas like switch and the disproportionate rejection of action (which corresponds to a deontological response) in personal dilemmas like push. The effect persisted, with considerable effect size, across multiple studies with a large number of participants, as pointed out in the results section.

However, there are important limitations. There is considerable evidence for publication bias in the study of the influence of situational factors on utilitarian moral judgments. The restriction to include only peer-reviewed articles in this review may of course have contributed to that result, although only 17 studies were excluded from analysis based on that criterion. What seems more relevant as a limitation in the context of this review is selective reporting bias. An overwhelming majority of the reported effects of situational factors were significant, which may suggest that different or the same situational factors were also tested but not reported. Though the effect of that possible bias is uncertain, it invites a more tentative interpretation and assessment of the overall small to moderate effect of situational factors on utilitarian moral judgements.

Moreover, despite the perseverance of the effect, it is much less clear why the trolley effect or other effects of situational factors on utilitarian moral judgements occur and that no such inference can confidently be made at this point is another relevant finding of this review. Looking into the underlying mechanism serves as an illustration of what can and cannot be learned from studying the effects of situational factors on moral judgment.

To clarify the point, it should be noted that there are really three crucial causal connections that need to be illuminated, as Fig. 6 illustrates: which situational factors there are, which underlying process each factor affects, and which moral inclinations each underlying process affects. Of course, there is also a link between moral inclination and ultimate moral judgment, but this link is beyond the scope of this review.

Fig. 6
figure 6

Causal Chain of Moral Judgments

The widely-shared claim that the trolley effect has to do with the use of personal force does not explain the persistence of the finding in cases where personal force is excluded from the picture (e.g. Nagel and Waldmann 2016). Moreover, of course, spatial proximity and physical contact, cannot account for the effect either (Greene et al. 2009). What seems more fundamental even than personal force, and capable of explaining the findings where personal force was excluded from the picture, is intention (Manfrinati et al. 2013). Intending to harm someone, to use someone as a means, plausibly leads to decreased frequency of utilitarian moral judgment in line with an evolutionary argument about what (Greene 2015, p. 232) called an “anti-violence gizmo”, an evolved inhibition against intentionally harming others. Providing further support for this view, a systematic review found that personal dilemmas create higher arousal compared to impersonal dilemmas (Christensen et al. 2014).

There are four problems with reading off these experiments an insight into the causal chain between situational factor, impact on underlying system, and impact on moral judgment too quickly, however. First, when intention was controlled for, as in Waldmann and Dieterich (2007), the trolley effect persisted. This points to the possibility that a dual process theory does not exhaust explanations of the determinants of moral judgments, as discussed further in section 5.1.1.

Second, and more generally, even if it were clear that a given situational factor, such as those apparently present in personal moral dilemmas, affect moral judgment, it is not clear whether they do so by inhibiting utilitarian tendencies or deontological tendencies. Both should be expected to differ, since this is precisely what dual process theory predicts: both tendencies are built on two distinct processes. Hence, as section 5.1.2 discusses, they should be distinguished.

Finally, there is an ambiguity between the causes for a decrease or increase in utilitarian moral judgment, as Table 4 illustrates. The table is populated based on the reviewed findings and the available information about controlled variables in the reviewed studies (e.g. that blood alcohol level raises positive affect, which indicates that it belongs into the third column). The fact that the underlying effects of situational factors could not always be traced to either cause (activation vs inhibition of one of either systems) suggests that an important theoretical, methodological implication of this review is to invest more in standardizing experimental material and controlling for confounds in future studies, as discussed in section 5.2.

Table 4 Presumed Underlying Effects of Situational Factors

5.1 Theoretical Implications

5.1.1 Possible Limits for Dual Process Theory

The situational factors that could not straightforwardly be sorted into Table 4 with good confidence were [Judge] Incidental disgust, [Judge] Incidental coldness, [Presentation] Intuitiveness, [Presentation] Order, and [Sacrifice] Causal role. Focusing on the latter two factors, for their relation to H2, it can be argued that their effect on utilitarian judgment points to limits for a dual process theory of moral judgment. In other words, given that some experiments controlled for factors that could plausibly affect system 1 or system 2 processing and nonetheless found significant influences on moral judgment, it stands to reason that there are some aspects of moral judgment that are not readily explained by dual process theory. And even though this might be a mere artifact of not uncovering influences of the causal role, or the locus of an intervention, on affect or cognitive control, it is theoretically unlikely that there be such a relation. This seems to complicate the picture discussed in section 2: there is more to a theory of moral judgment than a dual process theory.

In particular, there might be domain general processes that influence moral judgment that cannot clearly be fitted into the system 1 inhibition, system 2 activation model of dual process theory (cf. Rai and Fiske 2011).

At the same time, however, it should be clear that finding seemingly inexplainable effects of situational factors does not falsify dual process theory. Dual process theory might accommodate these findings by turning these findings on their head, suggesting that they not be findings about moral judgments, but rather casual judgments in a moral context. Less radically, one might ask why causal models or loci of intervention play a role in moral judgment in the first place. Insofar as the answer will be grounded in an evolutionary explanation, it is plausible that a fundamental concern of dual process theory will remain: deontological moral judgments originate in an attempt to navigate the early environment of evolutionary adaptiveness.

Finally, a closer look at Table 4, coupled with a recent suggestion by Gawronski and Beer (2017) might provide further problems for a dual process theory of moral judgment. Gawronski and Beer (2017) pointed out that situational factors may be seen to affect outcomes or norms, e.g. when intentions matters. They write (Gawronski and Beer 2017, p. 630):

[U]tilitarian responses are reflected in a main effect of experimentally manipulated outcomes (i.e., stronger preference for action when it increases overall well-being than when it decreases overall well-being), whereas deontological responses are reflected in a main effect of experimentally manipulated norms (i.e., stronger preference for action when the dilemma involves a prescriptive norm than when the dilemma involves a proscriptive norm).

Deontology deals with norms while utilitarianism deals with values. When the impact of situational factors that increase value corresponds to increased utilitarian moral judgment and those that would imply the transgression of norms affect deontological moral judgments, then the Kohlbergian hypothesis might regain ground: people are affected by moral theory, not only by morally irrelevant situational factors. Table 4 provides some support for this view: amongst the factors that decrease utilitarian moral judgment (ergo increase deontological moral judgment) are many that imply the transgression of norms (all but incidental serotonin, floweriness, stress, time-delay, and time-pressure). The same is the case for factors that increase utilitarian moral judgment (all but testosterone, blood alcohol, foreign language, and medium). Hence, at least some of the findings of this review can read as a defusing explanation of a dual process theory, suggesting that sensitivity to moral theory is what drives moral judgment.

Of course, this claim would have to be investigated further. An interesting avenue for further research with the Kohlbergian rationalistic paradigm might be to employ the dilemmas used in Kohlberg’s approach, which ask subject about an action they should perform (e.g. ‘Should Heinz steal the drug?’), and subjects are often evenly divided about the answer. That is, about half of them are in favor of stealing, the other half is against it.

The Kohlbergian dilemmas pit different kinds of values, rather than normative theories or principles, against one another. However, normative theories are also associated with values, and different types of moral theories will give rise to different endorsements of values and norms. Hence, the dilemmas of the Kohlbergian paradigm could be used in a similar experimental setup as sacrificial dilemmas, varying seemingly irrelevant situational factors, to check whether people are sensitive to norms and values or not. For example, one could change pieces of the story (e.g. Heinz is described as a member of the in- vs the out-group) and see whether this affects the action decision of subjects. In this way, the relevance of reasoning might be contrasted with the influences of affective components on Kohlberg’s very own paradigms.

These possible explanatory gaps or limits for dual process theory may compound already existing criticism, for example that the dual process theory neglects the motivational aspects of decision making (cf. Moll et al. 2008). In future research, it would therefore be interesting to see how motivational aspects aroused by situational factors affect utilitarian moral judgement, and whether the observed effects are in keeping with dual process theory.

5.1.2 Process Dissociation Required

Many studies suffer from a methodological shortcoming in that they equate more utilitarian judgments with less deontological judgments. However, recent articles that use a process dissociation technique were able to show that the picture is more complicated.

In particular, it is not always clear whether a situational factor that can be shown to increase the frequency of utilitarian moral judgments does so by inhibiting a preference for deontological moral judgment or by increasing the subject’s preference for utilitarian moral judgment. However, being clear about this distinction is crucial to come to a full illumination of the link between underlying process and moral inclination (as well as the link between moral inclination and moral judgment).

Conway and Gawronski (2013, p. 217) have raised this point before, putting special emphasis on the fact that the current experimental paradigm does not allow one to examine whether utilitarian and deontological moral inclinations are independent or positively related, since one is simply treated as an opposite of the other. The most important problem is that experimental manipulations are ambiguous between an effect increasing a subject’s inclination on one normative theory versus decreasing the subject’s inclination toward the other normative theory.

At the same time, studies that already used process dissociation or related techniques (e.g. Conway and Gawronski 2013; Li et al. 2018), did not uncover findings that differ widely from those reached with conventional techniques. It will, therefore, be interesting to see how the theoretical advantages of process dissociation pan out in experimental practice.

5.1.3 Making Moral Progress

A further theoretical implication of this review is moral philosophical. The sheer quantity of situational factors that have been found to influence moral judgement may deepen existing concerns about the reliability of moral judgement. This is relevant from two perspectives. First, it may put pressure on success theorists in moral epistemology who claim that, by and large, moral judgements track moral truth. Their traditional reply to situational factors has been to argue that at least some of the alleged situational factors are indeed normatively relevant (e.g. Kamm 2019). With an lengthening list of situational factors, however, that defense of the success theorists will become more daunting. Second, these findings put pressure on the view that moral progress as driven by individual moral judgements is real. If it could be shown that moral progress does, in fact, rely on moral judgements unperturbed by situational factors, then the findings of this review may lead to an argument that makes trouble for moral progress.

5.2 Methodological Implications

Given the large number of interaction effects, there is something to be said for controlling for the influence of situational factors by standardizing the dilemmas used in experiments and by making them easily accessible for other experimenters. For example, it was not always possible to ascertain which dilemmas were used in the experiments reviewed in this study. And though many studies contained a reference to the set of dilemmas employed, they did not specify whether the dilemmas were used verbatim or slightly altered. This is particularly pressing for situational factors studies in only a few experiments, such as incidental serotonin, blood alcohol level, or intuitiveness. In these cases, it might be the case that the reported effects were due to framing or wording differences in the used dilemmas. Without access to the original material, or transparent reporting, it remains an open possibility that some reported effects are really effects of other situational factors, such as action framing or action choice.

A point related to the need for transparency in describing experimental materials is the view that experimental materials should be standardized. That is, in line with efforts already undertaken by, for example, Lotto et al. (2014) or Chan et al. (2016), researchers might want to concentrate their efforts on a shared set of standardized dilemmas, and related conditions of the presentation (such as medium of presentation, room temperature, or stress levels). The benefits of this proposal would be obvious: if done properly, a more standardized experimental paradigm will help to isolate the effects of situational factors. The relevancy of this point is explicated by the failure to replicate the foreign language effect, as reported by Chan et al. (2016). They found the effect in only some specific moral dilemmas, and there was no interaction with the common personal/impersonal dilemma type. Hence, despite the relative strength of the foreign language effect, it is still an open possibility that it is an artifact of the specific dilemmas that were used to assess it.

At the same time, however, the downside of this policy is that it becomes questionable whether results gained in these studies generalize to broader category of moral judgments outside of sacrificial dilemmas. Indeed, it has been a perennial criticism of the sacrificial dilemma paradigm that the experimental results allow inferences about the nature and determinants of moral judgments in sacrificial dilemmas but not inferences about moral judgements more generally. In line with this objection, Bostyn et al. (2018) have recently posed dilemmatic situations with real-life consequences and failed to produce typical trolley effects. Hence, there seems to be a real problem with transferring results of moral psychological research gained in the lab straightforwardly to the phenomenon as it occurs ‘in the field’. In other words, results gained from experiments on sacrificial dilemmas are not valid tests of moral judgment (though they are certainly valid and reliable results about a subset of moral judgments).

The results provided in this review should tip the balance in favor of standardizing the experimental material as far as possible (or, alternatively, controlling for their effects). Given the influence of situational factors, it seems fairly certain that experimental results are otherwise invalid. As long as the current experimental paradigm is the best available to study the nature of moral judgment, it will be more beneficial to improve it than to abandon in an attempt to study moral judgments in more realistic scenarios.

5.3 Practical Implications

Finally, the results reviewed in this study can be taken to have practical implications for policy makers. Policy makers often feel compelled to take into account public debate about issues of public concern, such as taxation, infrastructure, immigration, or family law. Very often, these debates are morally laden and indeed concern moral issues, such as whether well-off people must contribute to public welfare (taxation) or whether non-nationals have a right to be protected from inhumane conditions in other countries (immigration). When policy makers turn to the public’s opinion about such matters, then the reviewed research suggests that care must be taken to present the issues at hand with as little variation in the morally irrelevant factors as possible.

To give a concrete and timely example, suppose that some country’s policymakers ask for a public referendum on whether or not there should be a limit to immigration. The options would be to instantiate a limit or to refrain from doing so. At least in the case of immigrants that face a threat to their human dignity in their country of origin, there will be a clear deontological case not to instantiate a limit. It can also be assumed, for the sake of argument, that limiting immigration would not result in the greatest happiness for the greatest number. Hence, this would be a case where both major competing normative theories would advocate the same response: not to instantiate a limit on immigration. Nonetheless, it is not clear that the public would support such a decision because it might easily be swayed by situational factors. For example, the way the choice is framed (as erecting a limit vs opposing a limit from being erected) or whether it is about stopping people from being killed vs saving them might make a relevant difference in people’s judgment of the matter.

Conversely, knowledge about the effects of situational factors on moral judgement could also be used to sway the public’s opinion on these matters.Footnote 14 For example, one may attempt to make the public more supportive of immigration by relying on our insight into the effects of situational factors. Practically, such an approach would be limited by normatively underdetermined choices (that is, even if we could increase utilitarian moral judgements (say) we do not know whether this unequivocally leads to a more pro or con position on immigration). Ethically, such an approach deserves scrutiny because of its close relation with nudging practices and related concerns about manipulation (cf. Wilkinson 2013; Klenk and Hancock 2019; Klenk 2021).

In any case, greater care must be taken to control for situational factors in situations where moral choices have to be taken, so that moral principles, norms, or values are the sole distinguishing factor of different choices. In practice, this might remain an ideal, but given the large number of situational factors at play in sacrificial dilemmas, even a small standardization promises to improve the situation.