VARIETIES OF MORAL JUDGMENT Beyond good and bad: Varieties of moral judgment William Jiménez-Leal1, Samuel Murray2, Santiago Amaya3, and Sergio Barbosa1,4 1Deparment of Psychology, Universidad de los Andes 2Mind At Large Lab, Imagination and Modal Cognition Lab, Duke University 3Department of Philosophy, Universidad de los Andes 4Department of Psychology, Universidad del Rosario Word Count: 15910. Author Note The first three authors contributed equally. Correspondence concerning this article should be addressed to William Jiménez-Leal, Universidad de los Andes. E-mail: w.jimenezleal@uniandes.edu.co. Supplementary materials (including materials, preregistrations, raw data and code) available at: https://osf.io/kja2u/?view_only=573c1541bbd240b397b79229704dfae7. VARIETIES OF MORAL JUDGMENT 1 Abstract1 We argue that people regularly encounter situations involving moral conflicts among 2 permissible options. These scenarios, which some have called morally charged situations, reflect 3 perceived tensions between moral expectations and moral rights. Studying responses to such 4 situations marks a departure from the common emphasis on sacrificial dilemmas and widespread 5 use of single-dimension measures. In 6 experiments (n=1607), we show that people use a wide 6 conceptual arsenal when assessing actions that can be described as suberogatory (bad but 7 permissible) or supererogatory (good but not required). In Experiment 1 we find that people 8 identify actions as suberogatory or supererogatory when using open descriptions to describe 9 them. Experiment 2 shows that they differentially assess these actions in terms of how 10 permissible, optional, and good they considered them. Experiment 3 tests the use of these11 evaluative dimensions with sacrificial dilemmas. We fail to find differences between these 12 categories when people respond to dilemmas, even when controlling for trait utilitarian 13 tendencies. By including judgments of blameworthiness and sanction, Experiments 4 and 5 14 provide additional evidence of the granularity and the moral significance of these evaluations. In 15 Experiment 6 people offered their own explanations of their responses. Qualitative analyses 16 revealed that they frequently appeal to character traits, the presence of rights, and the absence of 17 explicit duties. Taken together these results suggest a richer spectrum of both situations and 18 concepts relevant to characterize moral judgment than moral psychologists up to this point have 19 generally recognized.20 Keywords: moral rules, moral dilemmas, suberogatory, supererogatory, duty21 VARIETIES OF MORAL JUDGMENT 2 Beyond good and bad: Varieties of moral judgment22 The role of dilemmas in the science of morality23 Dilemmas have traditionally played a prominent role in the study of moral cognition. 24 While many have recently registered dissatisfaction with this state of affairs (Andrade, 2019; 25 Bauman, Mcgraw, Bartels & Warren 2014; Dahl & Oftedal, 2019; Everett & Kahane, 2020; 26 Kahane, 2020), there is considerably less said about how historically we got here. Dilemmas take 27 on this crucial role because of the assumption that morality is a system of rules. But this 28 assumption is problematic for reasons pertaining to both normative theories of morality and the 29 psychology of moral judgment.30 From its early days, psychologists interested in moral judgment understood morality as a 31 structured set of rules prohibiting and prescribing certain behaviors. Piaget (1932), for instance, 32 opened his landmark The Moral Judgment of the Child with the statement: "All morality consists 33 in a system of rules, and the essence of all morality is to be sought for in the respect which the 34 individual acquires for these rules" (1932, p. 1). Fauconnet echoes this sentiment in his 35 Responsibility, stating that moral responsibility is a "quality belonging to those who must...in 36 virtue of a rule be chosen as the passive subjects of a punishment" (1920, p. 11).37 Thinking of morality as a system of rules has several theoretical pay-offs. For one, 38 various deontic concepts can be inter-defined through rules. For example, acts are permissible if 39 and only if those acts accord with the rules, and something counts as good only if it accords with 40 the rules. Likewise, the impermissible is whatever constitutes violating a rule, and something is 41 bad insofar as doing it violates a rule (see, for example, Kanger (1971) and Anderson (1958) for 42 VARIETIES OF MORAL JUDGMENT 3 two formal attempts to explain inter-definability). This, as we shall see, results in a 43 methodological advantage. If true, asking subjects whether something is impermissible, bad, or 44 in violation of a duty comes close to asking one and the same question.45 Focusing on rules also provides a concrete way to measure moral development. Under the 46 assumption that maturing moral judgment consists in possessing a greater moral understanding, 47 it is possible to define this understanding in terms of increasing aptitude in applying more 48 sophisticated evaluative rules (Kohlberg & Hersh, 1977). This insight famously informed the 49 stage theory of moral development (Kohlberg, Levine, & Hewer, 1983), which identified moral 50 stages according to the kinds of rules that informed judgments: from pre-conventional rules 51 ("Doing this is bad because I can get punished") to universal, exceptionless principles of 52 impartial justice ("Killing is wrong").53 Moral dilemmas are one specific kind of moral encounter (Monin, Pizarro & Beer, 2007). 54 They characteristically present situations requiring decisions among impermissible options, in 55 the sense that each violates some plausible moral imperative (Sinnott-Armstrong, 1988: 29-30)56 In this respect, they isolate different sets of rules (egocentric v. altruistic, in-group v. out-group, 57 etc.) that are normally taken to inform moral cognition. Thus, by looking at the choices people 58 make in these situations or by studying how they assess the decisions made by actors depicted in 59 them, it seems possible to better understand which rules inform their judgments of goodness and 60 badness. For similar reasons, moral dilemmas also seem to provide a good instrument to measure 61 moral development and test hypotheses about individual developmental trajectories. 62 VARIETIES OF MORAL JUDGMENT 4 Obviously, theories of moral cognition that fall under this rule-based tradition differ from 63 one another in important respects (see Darley & Schultz, 1990 for discussion). A sign of this are 64 the well-known controversies that exist among them. Kohlberg's research program, for example, 65 was criticized for the use of culturally biased materials (Simpson, 1974; Snarey, 1985), biased 66 samples (Walker, 1984) and the idea of a linear progression in moral development (Rest, 1979). 67 Others criticized the kind of dilemmas used to elicit moral judgment, focusing on the artificial 68 nature or the mundane character of them (Rosen, 1980; Bauman et al., 2014). At bottom, 69 however, many of Kohlberg's critics agreed that morality was a system of rules that could be 70 studied by means of situations where plausible moral rules conflict with one another. They just 71 disagreed about how to properly characterize these rules and the situations that best exemplified 72 these conflicts.173 Rules and Commonsense Morality74 Some contemporary psychologists have sought to take distance from this early tradition by 75 proposing dual models of moral cognition (for discussion, see Crockett, 2013). We believe, 76 however, that these models are in an important respect a continuation of the rule-based tradition77 that has dominated the study of moral cognition. While opening up new possibilities to 78 1 One notable exception comes from feminist critiques of ethical theory. Several feminist ethicists have noted that the central preoccupation of moral theorizing prioritizes abstract principles over the particularity of ethical life (Gilligan, 1982. p. 32-38). The emphasis on abstract generalization obscures the details that are crucially important to a well-lived life (Young, 1987, p. 61-62). VARIETIES OF MORAL JUDGMENT 5 understand the architecture of moral decision-making, the dual process paradigm has inherited 79 from its early predecessors the view of morality as a system of rules. 80 Dual models are premised on the belief that commonsense morality reflects attitudes that 81 lie on a continuum between full-blown utilitarian and deontological ethics. Differences on this 82 dimension are supposed to be explained in terms of a deep architectural divide (Bartels & 83 Pizarro, 2011; Christensen & Gomila, 2012; Conway & Gawronski, 2013; Djeriouat & 84 Trémolière, 2014; Holyoak & Powell, 2016; Lee & Gino, 2015). Controlled processes produce 85 characteristically utilitarian responses; intuitive processes produce characteristically 86 deontological responses (Bialek & De Neys, 2017; Cushman, Young, & Greene, 2010; Moll & 87 de Oliveira-Souza, 2007).88 Among proponents of these models, there is a standing debate as to how deep this divide 89 is. Some have interpreted it as showing that rules need not be represented as inputs to the 90 decision-making processes that culminate in moral judgment (see Blair, 1995; Greene et al., 91 2001; Haidt, 2001). Others believe that the rules are always represented, except that sometimes 92 they are only tacitly represented. (Mallon & Nichols, 2010; Mikhail, 2011). Still others have 93 proposed characterizing the divide in terms of model-based and model-free processing (Crockett 94 2013).95 These discussions, however, operate within the rule-based tradition that we wish to 96 challenge here. They concern the architectural or algorithmic processes underlying how moral 97 rules are instantiated in judgments about specific actions. Precisely because of this, they leave98 untouched the more basic assumption regarding the subject matter of morality itself. After all, 99 VARIETIES OF MORAL JUDGMENT 6 the styles of thinking that are supposed to dominate commonsense morality are still 100 conceptualized in terms of systems of moral rules: either expressions of duty or prescriptions to 101 maximize or minimize some valued quantity.102 In general, moral rules can be defined as functions from relevant inputs to behavioral 103 imperatives. For example, a rule against murder is a general function from some action's being 104 an instance of murder to an imperative against so acting. As Sidgwick (1981, p. 228) explains, 105 "...rules of duty ought to admit of precise definition in a universal form." Under this definition, 106 many generalizations (but not all of them) count as moral rules: the prohibition not to harm 107 innocents, the injunction to maximize saved lives, etc. 108 Thus, it might be an open question whether the algorithmic processes that result in moral 109 judgments take as inputs explicit representations of these functions. It is possible that not all 110 moral decision-making is based on models shaped by a moral grammar. Be that as it may, the 111 fact is that researchers working under this new paradigm still classify the outputs of these 112 processes by conformity to the prescriptions of certain kinds of moral rules. The styles of 113 thinking modeled by them, deontology and utilitarianism, which are supposedly characteristic of 114 commonsense morality, are still styles of thinking in accordance with some distinctive moral 115 rules. 116 It is not surprising, then, that contemporary work on the psychology of moral judgment 117 continues to be dominated by moral dilemmas, in particular sacrificial dilemmas. Observing 118 people's choices or their evaluation of the available options when each choice is made 119 impermissible by deontological or utilitarian rules seems a natural way to measure people's 120 VARIETIES OF MORAL JUDGMENT 7 attitudes in terms of these moral frameworks. Whatever the underlying computational processes 121 are, dilemmas seem recommended by the goal of understanding whether commonsense morality 122 embodies deontological or utilitarian rules.123 In sum, there is a long tradition, spanning from Piaget to contemporary dual models, that 124 views morality as a rule-based system. Dilemmas appear useful for studying moral judgment in 125 virtue of this underlying assumption. Hence, moving beyond dilemmas requires moving beyond 126 the assumption that morality is a rule-based system. And moving beyond the assumption of 127 morality as a rule-based system requires moving beyond the study of moral judgment in 128 situations where rules conflict.129 There are, as we shall see, numerous dimensions of the moral life that are not governed 130 by rules, whether these refer to duties or codify maxims for the maximization of some valuable 131 result. Hence, a science of morality that focuses only on dilemmas and rule-based judgments 132 risks painting a picture of moral cognition that is overly narrow and stilted (Bauman et al., 2014). 133 We want to challenge the use of dilemmas to study moral judgment because we reject the 134 assumption that morality consists in a structured system of rules. Further, seeing how moral 135 cognition operates beyond the rules provides a fresh perspective for the study of moral judgment. 136 To this end, we examine in what follows moral judgment in cases of suberogatory behavior. 137 The Supererogatory and the Suberogatory138 From time to time, people are faced with the option of doing more than what they are 139 required to do, for example, spending some of their free time volunteering at the local animal 140 shelter or donating a large portion of money to charity. These actions are admirable, despite the 141 VARIETIES OF MORAL JUDGMENT 8 fact that failing to do them does not seem to merit condemnation. Some classify these actions as 142 'supererogatory', going above and beyond the call of duty when failing to go above and beyond 143 is perfectly morally permissible (Archer, 2018).144 More controversially, people seem at times to underperform relative to some ideal in a 145 way that is permissible, for instance, not offering to proctor the exam of a sick colleague despite 146 the fact that one is available and the colleague has helped one in the past. Doing this tends to be 147 regarded as bad despite the fact that there is no rule that requires one to pick up the duties of sick 148 colleagues or that one no explicit agreement to help each other was made. Some use the label 149 'suberogatory' to describe this kind of behavior (see Driver, 1992; Hurd, 1998). The behavior is 150 morally objectionable but there is no well-defined duty that it violates.151 In failing to do a supererogatory action, one usually does not do anything bad. It's an 152 admirable thing to donate money to charity, but failing to do so is not reprehensible. However, in 153 some situations, failing to do a supererogatory action constitutes suberogatory behavior. If a 154 tourist asks you for directions, you are completely within your rights to walk away without 155 saying anything. Doing it, though permissible, is bad, whereas giving directions is good despite 156 not being required. This possibility suggests a different kind of conflict that people encounter in 157 their day-to-day experiences of morality: conflicts between equally permissible good and bad 158 options. Here, we refer to these moral encounters that do not constitute real dilemmas as morally 159 charged situations. We claim that these situations, along with the concepts used to evaluate 160 them, offer a distinctive opportunity to study moral judgment.161 VARIETIES OF MORAL JUDGMENT 9 Sacrificial dilemmas represent an interesting albeit limited subset of what people 162 encounter in their everyday life. Using ecological momentary assessment, Hofmann, Wisneski,163 Brandt and Skitka (2014) asked a large sample of people to report whether they had witnessed, 164 committed, or heard about a moral situation during the last hour, five times a day for three days. 165 While some situations resemble the kind of conflict expressed in sacrificial dilemmas (e.g., 166 "Reminded waitress I did not pay for my bill when she thought I did"), many of the situations 167 reported seemed more similar to the situations depicted above. The choice between the 168 competing options didn't seem a matter of aligning oneself with some well-defined rule. 169 Over-relying on dilemmas risks papering over these distinctions and, more generally, the 170 differences between moral categories that inform varieties of moral judgment. O'Hara, Sinnott-171 Armstrong, and Sinnott-Armstrong (2010) compared responses to 15 dilemmas where they asked 172 people to rate how wrong, forbidden, inappropriate and blameworthy an action was. They found 173 that "the influence of wording variations on moral judgments was negligible" (p. 552) and they 174 analyzed the small differences found as a matter of magnitude. Likewise, many researchers treat 175 terms like 'forbidden' or 'blameworthy' as linguistic variations of some homogenous moral 176 judgment (Bjorklund, 2003; Cushman, Young, & Hauser, 2006; Greene et al., 2001b; Koenigs et 177 al., 2012). The assumption is that common sense moral judgment is not granular enough to 178 reflect differences between being forbidden, blameworthy, bad, and so on. It is, instead, 179 monolithic cognitive product, to which different labels provide different access points. 180 While there have been calls to more carefully use these measures (Christensen & Gomila, 181 2012; Monin, Pizarro, & Beer, 2007) it is unclear what the rationale for using different terms 182 could be (cf. Cushman, 2008; Barbosa & Jiménez-Leal, 2017). But, once the repertoire of moral 183 VARIETIES OF MORAL JUDGMENT 10 encounters is expanded to include non-dilemma situations, it is possible that nuances and 184 variability in moral cognition will emerge. In sum, by expanding the kind of moral encounters 185 used when empirically probing people's intuitions and by enlarging the dimensions along which 186 these encounters are assessed we can more adequately study the variation and granularity of 187 moral judgment.188 Here we present 6 experiments that study moral judgments in non-dilemma situations. 189 One key feature of these situations is that the correct choice-if there is one-is not obviously 190 settled by appealing to rules. Experiment 1 is an exploratory study that maps out the descriptions 191 people offer of different situations. We find that people describe situations as suberogatory ('bad 192 but permissible') or supererogatory ('good but not required') when supplying open descriptions 193 of them. Experiment 2 shows that judgments of good/bad, permissible/impermissible, and 194 optional/obligatory dissociate when evaluating suberogatory and supererogatory situations. We 195 also find that people's beliefs about duties negligibly correlate with judgments of goodness, 196 permissibility, obligation, and blame. In Experiment 3, we compare judgments in sacrificial 197 dilemmas to see whether the same distinctions appear. We do not find the same dissociations, 198 which suggests that eliciting these patterns of judgment requires more than giving participants 199 the options to judge along various dimensions. In Experiment 4, we included a measure of 200 praise/blame to see whether it correlates with the main "erogatory" measures. In Experiment 5, 201 we replicated previous findings by using different vignettes that describe more characteristically 202 moral situations adapted from classic philosophical thought experiments about abortion and 203 property rights. We also measured whether beliefs about rights predict any kind of judgment. We 204 find the same pattern of dissociations in these different vignettes and again find a negligible 205 VARIETIES OF MORAL JUDGMENT 11 correlation between judgments and individual beliefs about rights. In Experiment 6, we 206 conducted a qualitative study to begin exploring the variety of factors that differentially drive 207 different judgments. Using independent coders for a qualitative analysis, we find that people 208 describe these situations using the language of character traits and rights rather than duties. In 209 fact, in many cases people explicitly state the absence of duties to do anything in our scenarios210 A methodological coda: Even though this research is mostly exploratory, its ideas are 211 developed against a backdrop of well-established findings in moral psychology. We decided to 212 preregister Experiments 2 to 6 because we believe that clearly establishing design and analysis 213 plans can help distinguish the confirmatory and exploratory aspects of our research by clearly 214 specifying our intent. The procedure, therefore, reduces needless post hoc interpretations (Nosek 215 et al, 2019).2 Materials, data, preregistrations and code for all experiments are available on the 216 OSF page of the project 217 (https://osf.io/kja2u/?view_only=573c1541bbd240b397b79229704dfae7). The IRB of the 218 University (blinded for review) approved this study.219 2 The only important deviation from preregistration plans occurred in Experiment 2, where the main statistical analysis proposed (a repeated measures ANOVA) was replaced by a mixed linear model, since it is better suited to model our data. VARIETIES OF MORAL JUDGMENT 12 Experiment 1220 The objective of this first study is exploratory. We presented participants with vignettes 221 describing either suberogatory or supererogatory behavior. They were instructed to select words 222 from a list to describe these scenarios and to offer a description in their own words. The goal of 223 this is twofold. The first is to see what language people use to spontaneously describe a moral 224 encounter. The second is to see whether people recognize a distinction between different 225 judgment categories that maps onto the complex category of suberogatory. This requires that 226 people have distinct concepts of permissibility and goodness such that they can describe some 227 behavior as bad but permissible. Hence, we decided to run an exploratory study using word 228 selection and open response to see whether participants utilize the moral categories we aim to 229 study without being prompted.230 We expected people to always select always more than one word (e.g., "good") and to 231 give descriptions that characterize both the action and the person. 232 Method233 Participants234 95 participants (60 women and 35 men, mean age = 32.42, SD =10.43), based in the 235 United States and recruited through Prolific Academic, took part in the study in exchange for 40 236 pence. Participants were aware that their answers would be anonymous and were monetarily 237 compensated for their participation. The average completion time was 5.2 minutes.238 VARIETIES OF MORAL JUDGMENT 13 Materials and procedure239 We constructed four scenarios, some of which were based on thought experiments by240 Driver (1992). Each scenario described an individual faced with a choice between a suberogatory 241 and a supererogatory option. Additionally, in order to account for possible asymmetries between 242 actions and omissions (Haidt & Baron 1996), we created two versions of each scenario that 243 describe either an action or an omission. This generated eight vignettes, described below 244 (suberogatory versions of these vignettes are in brackets):245 Two newlyweds are boarding a plane to go on their honeymoon. Because of a booking error 246 by the airline, the couple does not have seats together. They ask someone, already seated, if they would switch 247 seats so the couple could sit together. The passenger switches seats, and the newlyweds can sit together. [The 248 passenger does not switch seats, and the newlyweds have to sit separately.]249 Alex is suffering from severe kidney failure and Alex's only hope is to obtain a transplanted 250 kidney. Alex's cousin, Jamie, is the only known compatible donor. Jamie offers to donate the kidney to Alex. [Jamie 251 does not offer to donate the kidney to Alex.]252 Early one Sunday morning when the neighbors are usually sleeping, Sam notices that the lawn 253 needs to be mowed. Although it is his property and it would be inconvenient to do it later, he decides to not mow 254 the lawn. He knows that starting the lawn mower will probably wake up the neighbors. [Even though he knows 255 that starting the lawn mower will probably wake up the neighbors, he does it anyway. It's his property and it will be 256 inconvenient to mow the lawn later.]257 During the Christmas party, the secretary publicly announced the results of the office raffle: 258 "Congratulations to Alex, who has won the trip for two to Disney World. She can come up front to claim her prize 259 or she can let a cash equivalent go to a hurricane relief fund". After hearing the news, Alex looked excited: ". Even 260 though I have the winning ticket and Disney World sounds fun, I am going to donate the prize to one of the 261 Newlyweds Kidney Mowing Raffle VARIETIES OF MORAL JUDGMENT 14 charities." [After hearing the news, Alex looked excited: "I have the winning ticket! Even though I don't really care 262 much about Disney World, I am going to claim the prize anyway".]263 Participants were presented with four out of the eight possible variations, so each 264 participant was presented with a suberogatory omission, a suberogatory action, a supererogatory 265 omission, and a supererogatory action. After reading the vignettes, participants completed a word 266 selection task by selecting "the word(s) that you think best describe the situation." The word 267 choices were: "permissible", "impermissible", "required", "good", "obligatory", "allowed", 268 "bad", "optional" and "compulsory". Participants also completed an open description task by 269 offering a description of the situation in their own words. The vignette presentation order was 270 counterbalanced and the order of tasks and words was randomized.271 Results272 For the word selection task, participants selected 2.6 words on average (see Figure 1). For 273 suberogatory behaviors, people most often chose the words "optional" (27%) and "allowed" 274 (25%), followed by "permissible" (21%). For supererogatory behaviors, "good" (35%), 275 "optional" (28%), and "allowed" (14%) were the most common choices. "Allowed" and 276 "optional" are the most common pair of words used across all vignettes. "Allowed" and 277 "permissible" are more strongly associated with evaluating suberogatory behaviors. Figure 1 278 summarizes these results.279 280 VARIETIES OF MORAL JUDGMENT 15 281 Figure 1. Mosaic plot for words selection. Height is associated with overall selection frequency282 and width is associated with prevalence for suberogatory and superogatory behaviors (Hornik,283 Zeileis & Meyer, 2006). Darker shades and solid lines represent positive associations, whereas 284 lighter shades and dotted lines represent a negative association within the sub/supererogatory 285 categories. The plot represents a model contrasting observed and expected frequencies of word 286 choices. 287 288 VARIETIES OF MORAL JUDGMENT 16 In the open descriptions, participants predominantly used character trait descriptors, such 289 as selfish for the suberogatory vignettes and thoughtful for the supererogatory situation (see 290 Figure 2). They rarely used words like "permissible". Also, more nuanced descriptions of the 291 suberogatory situations generally highlighted the optionality of the response, in line with the 292 abstract concept of the suberogatory: "he is allowed to do it" (mowing), "Jamie has a right to her 293 own decision" (Kidney), "it is understandable" (kidney). Lastly, people were generally sensitive 294 to the different moral aspects that structure the situations depicted as morally charged. They 295 were, for instance, quick to describe the action/person negatively (mean, rude, etc), while also 296 recognizing that rights and expectations were at play in the scenarios evaluated. They also 297 recognized the possibility that some behaviors could be bad but permissible.:298 • "It is a little bit selfish, but then again she has the right to keep her organs"299 • "While it was mean, the passenger has the right to refuse300 • "Rude, but it is his lawn"301 • "Giving an organ is a big thing to ask. It is something that is optional, and there is no 302 mention of Alex asking him to do it so he is not required to offer"303 VARIETIES OF MORAL JUDGMENT 17 304 Figure 2. Wordcloud for suberogatory (left) and supererogatory (right) situations305 While these results are exploratory, they provide evidence that moral thinking about the 306 depicted situations is nuanced in ways that seem in line with the categories of suberogatory and 307 supererogatory and more complex that simple good/bad, obligatory/impermissible evaluations. 308 These findings made possible the use of the measures and procedures described in the 309 subsequent experiments.310 Experiment 2311 In Experiment 1, we found some evidence that people describe certain moral situations in 312 ways that are sensitive to the categories of the suberogatory and the supererogatory. This 313 suggests that people distinguish between permissibility, obligation, and goodness. Here we test 314 for quantifiable differences between these evaluative categories.315 VARIETIES OF MORAL JUDGMENT 18 We asked people to rate the sub– and supererogatory situations along three dimensions: 316 good/bad, permissible/impermissible, and obligatory/optional. We manipulated the Erogation 317 category (within-subjects, sub– and supererogatory) and the Situation Type (between-subjects, 318 action or omission). We hypothesized that people would judge suberogatory situations as worse 319 than supererogatory situations, but that permissibility and obligatoriness ratings between the two 320 situations would not be significantly different. We also expected permissibility ratings to be321 significantly higher than obligatoriness ratings in both supererogatory and suberogatory 322 conditions, though the difference between permissibility and obligation would be greater in the 323 suberogatory condition than in the supererogatory condition. We did not expect any differences 324 between judging actions and omissions. We also collected data on attitudes towards duties, 325 expecting that ratings along the Erogation category would be associated with these attitudes. 326 Participants327 We ran a power analysis for a mixed ANOVA (between-within interaction), assuming an effect 328 size of f = 0.15 using the software G*Power. This analysis suggested a sample size of 272 for a 329 0.95 power. To account for exclusions, we recruited 311 participants (186 women and 125 men, 330 mean age = 32.77, SD =11.18) through Prolific Academic. We decided to switch to an 331 alternative data strategy after collecting the data, using mixed linear models, given the problems 332 of repeated measures analyses with independence and distributional assumptions (Singmann & 333 Kellen, 2019). Our sample size, however, is consistent with a power of 0.9, assuming 334 participants can be treated as a random factor to account for within-person response variability, 335 with a mixed design (Singmann & Kellen, 2019; Westfall, 2015; See Supplementary materials 336 for details).337 VARIETIES OF MORAL JUDGMENT 19 Each person voluntarily participated in the study and received 38 pence as compensation. 338 The average completion time was 3.11 minutes.339 Materials and procedure340 Materials were the same as Experiment 1. Each participant saw four of eight scenarios. 341 We manipulated the moral category within participants and action/omission between 342 participants, so each participant saw two supererogatory situations and two suberogatory 343 situations, where all of them were either actions or omissions. For each vignette, participants 344 evaluated the situation along different dimensions with a 100-point sliding scale. The dimensions 345 included degree of permissibility (impermissible = 0, neither impermissible nor permissible = 50, 346 permissible =100), degree of goodness (bad = 0, neither good nor bad = 50, good = 100), and 347 degree of obligation (optional=0, neither optional nor obligatory = 50, obligatory = 100). The 348 dimension order was randomized across trials. The scale appeared only with the anchors, the 349 slider was always placed in the center of the scale, and participants were not given a numerical 350 representation of where they placed the slider.351 Participants viewed one vignette at a time and the sliders were placed on the same page as 352 the vignette. After completing the study, participants indicated their age, gender, and political 353 orientation (on a five-point Likert scale from very liberal to very conservative). We also 354 collected information about personal sense of duty. Participants indicated agreement with a 7-355 point Likert scale (1 = strongly disagree, 7 = strongly agree) with the idea that there are duties to 356 respect your neighbors, to help anybody who needs help, and to help one's family members. 357 VARIETIES OF MORAL JUDGMENT 20 Results358 Pre-registered analyses were integrated into a set of linear random effect models fitted 359 with the lme4 R library (Bates, Mächler, Bolker, & Walker, 2015; R Core Team, 2019), with 360 participant as a random intercept.3 Pairwise comparisons were carried out using the emmeans 361 package (Lenth, 2020) which allows degrees of freedom to be calculated with the Kenward 362 Roger method and p values to be adjusted with the Tukey method. Confidence intervals for non-363 standardized simple differences are reported for ease of understanding. 364 Results are summarized in Table 1 and Figures 3 and 4. We fitted four nested models to 365 contrast the interactive effect of the manipulated variables (see models 1 and 2) and to test the 366 effect of differences between scenarios and endorsement of norms (models 3 and 4). Goodness 367 of fit indicators (AIC, BIC and deviance) and the chi square test (χ2(2)=393.2, p<.0.001) favor 368 selection of model 2. Significant interactions between judgment type and condition suggest 369 differences both between and within the erogation category. Supererogatory behaviors were 370 judged as better and more permissible (M = 86.4, SD = 18.7, n = 622 and M = 82.0, SD = 20.6,371 n = 622) than suberogatory behaviors (M = 36.6 SD = 23.3, n = 622 M = 62.2 SD = 29.7 n = 372 622) t-ratio Good: t(3411) = -34.89, p < .001, Mdiff = -49.85, 95% CI [-52.2, -47.5] and t-ratio373 Permissible: t(3411) =-19.7, p < .001, Mdiff = -19.78, 95% CI [-22.6, -17.0], regardless of whether 374 3 It can be argued that the data of this experiment could be considered a cross-classified data set. However, the items in each situation type are not completely equivalent, which makes the corresponding items nested within situation type. Additional models were fitted with additional random term but since results are equivalent, we restrict the presentation here to the different conditions as fixed effects. VARIETIES OF MORAL JUDGMENT 21 they were actions or omissions. Supererogatory behaviors were judged to be marginally more 375 obligatory (i.e., less optional) (M = 30.9, SD = 32.7, n = 622) than the corresponding 376 suberogatory responses (M = 23.5, SD = 26.3, n = 622) t-ratio: t(3411) = -7.47, p < .001, Mdiff = -377 7.47, 95% CI [-10.8, -4.19] though it is clearly a smaller effect. 378 Interestingly, the size of the differences between permissibility and goodness ratings are 379 vastly different when looking at sub and supererogatory responses. Within the suberogatory 380 responses, this difference amounts to 25.6 points (Cohen's d = 0.96) while for the supererogatory 381 category, this difference is only 4.46 points (Cohen's d = 0.22). That is, goodness and 382 permissibility judgements are very similar within the supererogatory condition, but not for the 383 suberogatory responses, where they are more clearly tracking different aspects of the situation.384 385 VARIETIES OF MORAL JUDGMENT 22 Table 1. Summary of Models fitted for Experiment 2386 Model 1 Model 2 Model 3 Model 4 Omission -2.74*** -6.411*** -7.33*** -7.26*** (-4.65, -0.83) (-10.500, -2.322) (-10.41, -4.24) (-10.36, -4.17) OBLIGATORY -34.30*** -17.449*** -38.12*** -38.12*** (-36.41, -32.20) (-21.389, -13.509) (-41.07, 35.17) (-41.07, -35.17) PERMISSIBLE 10.57*** 21.758*** 7.58*** 7.57*** (8.47, 12.67) (17.818, 25.698) (4.63, 10.53) (4.62, 10.52) SUPER EROGATORY 25.70*** 50.771*** 25.70*** 25.70*** (23.98, 27.41) (46.830, 54.711) (23.99, 27.41) (23.98, 27.41) Mowing -7.48*** -7.48*** -7.47*** (-9.90, -5.05) (-9.89, -5.06) (-9.89, -5.05) Newlyweds -3.97*** -3.97*** -3.96*** (-6.39, -1.54) (-6.39, -1.55) (-6.39, -1.54) Raffle -0.15 -0.16 -0.15 (-2.58, 2.26) (-2.58, 2.26) (-2.57, 2.26) DUTY TO FAMILY -0.21 (-0.99, 0.56) DUTY TO HELP -0.01 (-0.79, 0.77) DUTY TO NEIGHBORS -0.31 (-1.35, 0.71) Omission: OBLIGATORY 8.75*** 7.72*** 7.718*** VARIETIES OF MORAL JUDGMENT 23 (3.15, 14.35) (3.53, 11.91) (3.526, 11.91) Omission: PERMISSIBLE 7.75*** 6.04*** 6.042*** (2.15, 13.35) (1.85, 10.23) (1.85, 10.23) Omission: SUPER EROGATORY -1.84 (-7.44, 3.76) OBLIGATORY: SUPER EROGATORY -41.34*** (-46.92, -35.77) PERMISSIBLE: SUPER EROGATORY -28.36*** (-33.93, -22.78) Omission: OBLIGATORY: SUPER EROGATORY -2.07 (-9.99, 5.84) Omission: PERMISSIBLE: SUPER EROGATORY -3.419 (-11.34, 4.50) Constant 52.90*** 39.74*** 55.17*** 58.14*** (50.41, 55.40) (36.86, 42.62) (52.41, 57.94) (51.41, 64.87) N 3732 3732 3732 3732 Log Likelihood -17580.00 -17367.00 -17570.00 -17569.00 AIC 35180.00 34763.00 35163.00 35168.00 BIC 35242.00 34850.00 35238.00 35261.00 ***p < .01; **p < .05; *p < .1 387 VARIETIES OF MORAL JUDGMENT 24 Our results do, however, show that people strongly distinguish between goodness, 388 permissibility, and obligation. Participants rated supererogatory behaviors as good, optional, and 389 permissible; they rated suberogatory behaviors as bad, optional, and permissible. Despite both 390 being rated permissible, supererogatory behaviors were rated as more permissible than 391 suberogatory behaviors.392 393 Figure 3. Scores by type of judgment. Error bars represent 95% confidence intervals.394 We found a smaller interaction between judgment type and the action/omission dimension 395 across erogation conditions. Actions were rated as better (M = 65.1, SD = 32.7, n = 628) than 396 omissions (M =57.8, SD = 32.3, n = 616) t-ratio: t(1539 ) = 4.82, p = .001, Mdiff = 7.33, 95% CI 397 [3.71, 10.94], but there were no differences between permissibility and obligatory ratings for 398 actions and omissions. The three-way interaction is not explored here but overall, there are no 399 VARIETIES OF MORAL JUDGMENT 25 important differences between erogation categories across the action/omission condition except 400 for the goodness judgment, where actions are judged as better than omissions. 401 We explored the association between ratings for each type of judgment and participant's 402 endorsement of statements about personal duties. We did not find any consistent association 403 between judgment scores and responses pertaining to personal duties (see Table 2). There are 404 significant correlations between endorsing different statements of personal duty (ranging from 405 0.13 to 0.34) but most correlations between sense of personal duty and different judgment 406 categories were negligible (from -.01 to 0.07) and non-significant. The one exception is that 407 beliefs about duties to help others significantly correlated with judgments of permissibility, but 408 the correlation is very small.409 Table 2. Bivariate correlations between personal norms and judgment scores 410 Permissible Good Obligatory Duty to neighbors Duty to help Permissible Good 0.55*** Obligatory -0.24** -0.22*** Duty to neighbors -0.03 0.0 -0.02 Duty to help -0.07** -0.02 0.07 0.33*** Duty to family -0.03 -0.02 0.01 0.30*** 0.23*** ***p < .01; **p < .05; *p < .1 Breaking down responses by scenario reveals some variability across vignettes (see Figure411 4). For example, while donating a kidney to a cousin is judged to be better and more permissible 412 than not donating a kidney, the same pattern does not hold in the raffle scenario. In this case, both 413 situations are equally permissible, but donating the raffle prize is better than not. Notice however 414 VARIETIES OF MORAL JUDGMENT 26 that these small differences do not account for significant variance, according to the model fitting 415 presented in Table 1. 416 417 418 Figure 4. Scores by type of judgment and scenario. Error bars represent 95% confidence intervals419 Discussion420 These results reveal quantifiable differences between different evaluative categories that are 421 employed in moral judgment. Similar to our results in Experiment 1, we found that judgments of 422 permissibility, obligation, and goodness dissociate when people make judgments of suberogatory 423 behavior. This shows that including additional measures allows variability in moral judgment to 424 emerge. Moreover, beliefs about personal duties were negligibly correlated with different kinds 425 of judgments. This suggests that people's responses to these situations are not indicative of an 426 VARIETIES OF MORAL JUDGMENT 27 underlying adherence to rules or generalized statements about what one ought to do, even though 427 the duties we asked about specifically applied to the scenarios in the study.428 We mentioned in the Introduction that the use of dilemmas to study moral judgment 429 reinforces assumptions about the way to measure moral judgment. When assessing the 430 relationships between different measures, researchers have found negligible differences between 431 different judgment categories. We see now that including different measures of moral judgment 432 allows for variability, challenging a central assumption about the relationship between different 433 measures of moral judgment. This raises a question: is the lack of variability between judgment 434 categories a function of the measures used to study moral judgment, or the dilemmas commonly 435 used to elicit judgments? It might be the case that some measurement devices function as self-436 reinforcing demand effects. Alternatively, there might be something about dilemmas themselves 437 that make a difference to moral judgment independently of the measures used. To begin 438 answering these questions, we applied our measures to sacrificial dilemmas in Experiment 3 to 439 see how they align with the results of this experiment.440 Experiment 3441 In this experiment, we consider different kinds of judgments in cases of sacrificial 442 dilemmas. In our previous experiments, we found dissociations between different kinds of 443 judgments. However, the situations used to test these differences differ significantly from 444 sacrificial dilemmas, the stimuli most consistently used to elicit moral judgments. First, harm to 445 another agent is unavoidable in a sacrificial dilemma, so there is no straightforward possibility of 446 supererogatory behavior. Second, despite being unrealistic, they elicit distinctive emotional 447 VARIETIES OF MORAL JUDGMENT 28 reactions (Christensen et al., 2014) as measured by arousal and valence. Third, dilemmas are 448 situations where explicit rules come into conflict (e.g., prohibitions against causing harm and 449 prohibitions against allowing easily preventable harm). 450 We chose sacrificial dilemmas that have been widely used in empirical studies of moral 451 judgment. Based on an analysis from Christensen et al. (2014), we selected dilemmas that 452 produce the greatest variation in responses (varying in the use of personal force and the 453 inevitability of harm). We also excluded dilemmas where various interactive effects might 454 plausibly drive moral judgments (e.g., when causing harm is self-beneficial). Characteristics and 455 full text of the dilemmas selected are presented in the supplemental materials.456 Sacrificial dilemmas are sometimes thought to bring out the contrasts between 457 deontological and utilitarian ethical intuitions, because each set of intuitions usually recommends 458 different behaviors in the face of a sacrificial dilemma. Hence, intuitions about what is right or 459 appropriate indicate alignment with one or the other theory. This suggests that people committed460 (implicitly or explicitly) to either utilitarianism or deontology might produce different moral 461 judgments in reaction to sacrificial dilemmas. This presents a challenge. Because we are 462 measuring for dissociations among different judgment categories, and different normative 463 theories make different recommendations for navigating dilemmas, it is possible that each side 464 will cancel the other out, thereby giving the appearance of similarity between judgment 465 categories. To ensure that people with utilitarian tendencies do not cancel out people with 466 deontological tendencies, we used the Oxford Utilitarianism Scale (Kahane et al., 2018) to assess 467 trait utilitarianism across two dimensions of utilitarianism, impartial beneficence and 468 instrumental harm. 469 VARIETIES OF MORAL JUDGMENT 29 The character of this study is primarily exploratory but we expected to replicate some of 470 the main findings in this literature and some of the patterns observed by Christensen et al. (2014). 471 We expected utilitarian responses in personal dilemmas to be associated with lower ratings of 472 goodness and permissibility relative to impersonal dilemmas (Greene et al. (2001b). We also 473 expected behaviors that cause unavoidable deaths to be judged as better than situations where 474 causing death was avoidable (Christensen et al., 2014). 475 Method476 Participants477 Given that our objective was to first to replicate the observed effects with sacrificial 478 dilemmas and, second, to explore the impact of these variables on our new measures, we ran 479 power analysis for a linear mixed model treating participants as random effects, using the effect 480 size reported by Christensen et al (2014) for personal force of r = .75 (equivalent to a d = 2.2). 481 This is the smallest of the effect sizes considered. This analysis suggested a sample of 60 482 participants for our mixed design for a power of .99 (Westfall, 2015). Since we wanted to make 483 sure we would be able to observe differences in our new dependent variables, we aimed to collect 484 data for 300 participants which would allow us to observe a significantly smaller effect (r = .17, d 485 = .3). 304 participants (153 women and 151 men, mean age = 33.21, SD =11.62), recruited 486 through Prolific Academic, took part in the study in exchange for 0.50 pence. The average 487 completion time was of 7.21 minutes and none of the participants had taken part in our previous 488 experiments.489 VARIETIES OF MORAL JUDGMENT 30 Materials and procedure490 Participants saw the four dilemmas and were randomly presented with the version where 491 the main character decides to cause harm (utilitarian response) or allow harm (deontological 492 response). That is, utilitarian/deontological response was a between-subject factors while 493 avoidability of the result (avoidable/unavoidable) and personal force (personal/impersonal) were 494 within subject variables. For each scenario, participants judged the main character's response 495 along dimensions of permissibility, obligation, goodness, and blameworthiness using the same 496 sliders as before. After judging the dilemmas, participants were presented with the items of the 497 Oxford Utilitarianism Scale (OUS) in a random order. Finally, they are asked some basic 498 demographic questions and were asked to rate their experience with this kind of dilemmas in a 499 scale from 1 (not at all familiar) to 5 (extremely familiar).500 Results501 Results are summarized in Table 3 and Figures 5 and 6. As before, we fitted a set of502 random effect models with judgment type, personal/impersonal utilitarian/deontological, 503 avoidable/unavoidable, having experience with dilemmas and the score in the OUS as fixed 504 predictors allowing for a random effect of participant. Pairwise comparisons were calculated with 505 the same parameters specified for Experiment 2. The OUS had good reliability for its both 506 subscales (Instrumental Harm Cronbachs' α = .65 and Impartial Beneficence Cronbachs' α = 507 .72). The models differed in the specification of the interaction of the fixed effects. The best 508 fitting model has interactive effects for two-way interactions for all terms with the 509 personal/impersonal dimension. This model is presented in Table 3. 510 Table 3. Best fitting model for Experiment 3.511 VARIETIES OF MORAL JUDGMENT 31 score BLAME -5.15** (-9.53,	-0.77) OBLIGATORY -2.98 (-7.36,	1.40) PERMISSIBLE 12.78*** (8.40,	17.16) PERSONAL 9.22*** (5.40,	13.04) UTILITARIAN 2.46 (-1.03,	5.95) UNAVOIDABLE 2.90* (-0.55,	6.35) BLAME:	PERSONAL 4.49** (0.13,	8.85) OBLIGATORY:	PERSONAL -4.91** (-9.27,	-0.55) PERMISSIBLE:	PERSONAL -4.95** (-9.31,	-0.59) PERSONAL:	UTILITARIAN -14.00*** (-17.19,	-10.82) BLAME:	UNAVOIDABLE 0.23 (-4.13,	4.59) OBLIGATORY:	UNAVOIDABLE -1.82 (-6.18,	2.54) PERMISSIBLE:	UNAVOIDABLE -4.94** (-9.30,	-0.58) PERSONAL:	UNAVOIDABLE -4.89*** (-7.98,	-1.81) BLAME:	UTILITARIAN 6.82*** (2.46,	11.19) OBLIGATORY:	UTILITARIAN 1.50 (-2.86,	5.86) PERMISSIBLE:	UTILITARIAN -2.37 (-6.74,	1.99) Constant 47.87*** VARIETIES OF MORAL JUDGMENT 32 (44.53,	51.21) N 4864 Log	Likelihood -23017.00 AIC 46074.00 BIC 46204.00 ***p	<	.01;	**p	<	.05;	*p	<	.1 There are significant main effects of judgment type and the utilitarian status of the response512 judged (Deontological, M = 52.39, SD = 28.3 and Utilitarian M = 49.18, SD = 28.4, , t(3924) 513 =3.77, p <.001, Mdiff =3.21, 95% CI [1.61, 4.80] ), but not of its being personal (Personal M = 514 51.64, SD = 25.8, and Impersonal M = 49.87, SD = 30.8, n = 2432, t(3411) =1.99, p =.046, Mdiff515 =1.77, 95% CI [0.17, 3.37] ) or avoidable (Avoidable M = 51.46, SD = 25.01, and unavoidable M516 = 50.04, SD = 31.4, n = 2432, t(3411) =1.49, p =.13, Mdiff = 1.42, 95% CI [-0.17, 3.02]). These 517 variables only have interactive effects, which shows that there is not an overall effect of the 518 manipulation across all judgment dimensions.519 Figure 5 shows how personal deontological responses are judged as better (M = 61.75, SD 520 = 28.89, n = 303), considerably more permissible (M = 65.76, SD = 28.56, n = 303), more 521 obligatory (M = 51.38, SD = 33.94, n = 303) and less blameworthy (M = 42.31, SD = 27.59, n = 522 303) than the corresponding utilitarian responses (Good M = 39.19, SD = 28.12, n = 305, 523 Permissible M = 43.54, SD = 31.06, n = 305, Obligatory M = 36.53, SD = 31.23, n = 305, and 524 blame M = 60.01, SD = 27.22, n = 305) but only for the unavoidable outcomes (all pairwise 525 comparison significant at 0.01). When the outcome is avoidable, the pattern is the opposite for 526 both impersonal and personal responses. The main effect of personal contact consists in making 527 the deontological response more acceptable (less blameworthy, more obligatory, better and more 528 permissible) than the corresponding utilitarian ratings (See lower panel, Figure 5). Overall this 529 VARIETIES OF MORAL JUDGMENT 33 picture is consistent with prior findings where the effect of these variables on dilemma responses 530 is conditional on several factors (See Christensen et al. (2014)).531 532 533 Figure 5. Mean ratings for each judgment type by utilitarian status, personal force and 534 avoidable. Error bars represent 95% confidence intervals. 535 VARIETIES OF MORAL JUDGMENT 34 In the model fitting process, neither experience with dilemmas nor the overall score of the 536 OUS (or its subscales) resulted in significant predictors of participant judgments. Moreover, for 537 all but one dilemma type (Unavoidable, Personal) we did not find significant differences between 538 judgments across trait utilitarian tendencies.539 We wanted to see, however, whether more fine-grained distinctions in trait utilitarianism 540 might allow unnoticed dissociations to emerge. To do this, we assigned participants to a low, 541 medium and high utilitarianism group, by splitting participants aggregate scores on the OUS by 542 the 33rd and 66th percentile. Zooming in on participants with the lowest and highest utilitarian 543 scores (See Figure 6), we see that participants in the Low Utilitarianism Group (left panel) clearly 544 judge causing harm as less permissible, less optional, and more blameworthy than allowing harm. 545 People in the High Utilitarianism Group (right panel) show an opposite pattern, albeit not as 546 distinctive (See Supplemental materials for more information on these comparisons). Note, 547 however, that these patterns only emerge after making exploratory, post-hoc data analyses. Even 548 doing this, which might otherwise be considered methodologically problematic, we weren't able 549 to see the clearly marked dissociations observed in Experiment 2. 550 VARIETIES OF MORAL JUDGMENT 35 551 Figure 6. Mean ratings by of judgment, grouped by Utilitarianism level (panels) and 552 Utilitarian response (bars) 553 Permissibility judgments tend to be higher than goodness judgments, suggesting that even 554 though they are related, their difference in magnitude suggests that these judgments are tracking 555 different features of the actions depicted in the scenario. This difference is larger and statistically 556 significant for participants with low utilitarian tendencies, who consider harm relatively more 557 permissible (M = 50.95, SD = 30.51) than it is good (M = 44.55, SD = 27.23, , t(448) =-2.36, p -558 =.02, Mdiff =-6.40, 95% CI [-11.72, -1.08]). With participants who score in the higher end of 559 utilitarianism, there is virtually no difference between these judgments (permissibility M = 56.02, 560 SD = 27.52 vs obligatoriness M = 55.35, SD = 26.71, , t(338) =-0.23, p =.81, Mdiff =-0.66, 95% 561 CI [-5.45, 5.12]). A similar pattern is found with the obligatoriness levels: participants who score 562 VARIETIES OF MORAL JUDGMENT 36 low in utilitarianism consider causing harm optional (as revealed by a one-sample t test against 563 the indifference point M = 37.09, SD = 31.02, t(227) =-6.79, p <.001, 95% CI [33.98, 41.10]) and 564 participants who score high in utilitarianism consider not causing harm as optional (M = 44.20, 565 SD = 29.81, t(169) =-0.69, p =.48, 95% CI [43.70, 73.0]). These tendencies are statistically 566 significant when explored with a fixed effects model with these terms (see Supplementary 567 materials). Bear in mind that this model is exploring a tendency rather than testing a hypothesis. 568 As such, it lacks confirmatory value.569 Discussion570 We wanted to see whether adding additional measures allowed variability in moral judgment to 571 emerge, even when using dilemmas as stimuli. However, when we had people make judgments 572 about dilemmas, the variability we observed in Experiment 2 nearly disappears. This is not an 573 effect driven by Utilitarian members of the participant pool. Even when we separate people 574 according to their utilitarian tendencies (which tendencies pull against making the distinctions 575 outlined in Experiment 2), we do not find strong dissociations between the categories of 576 goodness, permissibility, and obligation. The dissociations are weak enough that they might 577 seemingly justify the claim that "the influence of wording variations on moral judgments was 578 negligible" (O'Hara et al., 2010).579 Our results suggest that changes in measurement alone are not sufficient to indicate the 580 underlying complexity of moral judgment. The kind of situation being evaluated makes a 581 difference to moral judgment. When we limit ourselves to using only dilemmas, we generate an 582 overly narrow view of moral thinking. Worse, it hinders the ability to make useful generalizations 583 about the operations of moral cognition in general.584 VARIETIES OF MORAL JUDGMENT 37 The lesson is that we need more than different measures. The situation being evaluated 585 appears to have an influence on the shape of moral judgment. Our materials in Experiment 2 586 provide a first step toward expanding the range of situations used as stimuli to study moral 587 judgment. In our next two experiments, we wanted to introduce different measures and scenarios, 588 as well as explore potential connections between moral judgments and generalized rules.589 Experiment 4590 In Experiment 3 we found that judgment dimensions do not behave uniformly when used to 591 judge responses in sacrificial dilemmas. That is, utilitarian and deontological responses are 592 judged to be permissible, optional, blameworthy and good at different levels, depending on 593 variables known to impact participant responses (Christensen et al. (2014)). However, unlike the 594 pattern observed in Experiment 2, differences between categories were generally non-significant. 595 Moreover, significant differences tended to be small, to arise from interactions, and to be driven 596 mainly by individual differences in trait utilitarianism. This shows that sacrificial dilemmas do 597 not elicit differences in judgment that are found when using different erogation conditions.598 Conversely, this suggests that not all situations of moral conflict could be assessed in terms of the 599 dimensions proposed. 600 Additionally, in Experiment 3, participants made judgments of blameworthiness for the 601 response of the person allowing, or causing harm. This marks an important difference with the 602 procedure used in Experiment 2. In Experiment 2, we found that individuals rate scenarios in a 603 way that reflects the categories of suberogatory and supererogatory. However, we did not find 604 that individuals rate suberogatory and supererogatory behaviors as equally permissible and 605 optional, as we had initially hypothesized, despite finding a difference in judgments of goodness.606 VARIETIES OF MORAL JUDGMENT 38 One explanation of this failure might be that people disapprove of suberogatory behaviors, 607 but the various judgment types (permissibility, obligation, and goodness) do not have any 608 obvious place for people to register this disapproval. Hence, disapproval might be skewing 609 judgments in a way that drives the differences between permissibility and obligatoriness across 610 scenarios. To correct for this, and to replicate the findings of Experiment 2, we ran the 611 experiment again with an additional judgment type of blameworthiness. 612 Participants613 We recruited 311 participants (166 women and 145 men, mean age = 31.95, SD =11.00) 614 through Prolific Academic. Sample size was determined using the rationale of Experiment 2. All 615 participants voluntarily agreed to participate in the study and were monetarily compensated (38 616 pence). The average completion time was 4 minutes.617 Materials and procedure618 We used the same materials and procedures from Experiment 2. We included an 619 additional measure for participants to rate the blameworthiness of the character in the vignette620 (praiseworthy = 0, neither praiseworthy nor blameworthy = 50, blameworthy =100).621 Results622 Overall, the results replicated the pattern observed in Experiment 2 (see Figure 7). We 623 fitted the same random effect models as in Experiment 2 and performed the same pairwise 624 comparisons. Model 2, which includes the interactive effect of judgment type, omission and 625 erogation condition, was the best model. Table 4 below presents only the two best fitting models.626 VARIETIES OF MORAL JUDGMENT 39 627 Table 4. Summary of Models fitted for Experiment 4628 Model	1 Model	2 Omission 3.90** 5.70*** (0.19,	7.60) (2.40,	8.90) GOOD -19.00*** 29.00*** (-23.0,	-16.0) (26.0,	32.0) OBLIGATORY -41.00*** -16.00*** (-45.00,	-38.00) (-19.0,	-13.0) PERMISSIBLE 4.20** 40.0*** (0.47,	7.90) (37.00,	43.00) SUPER EROGATORY -50.00*** 5.20*** (-53.00,	-46.00) (3.60,	6.80) Mowing -1.0 (-4.20,	0.460) Newlyweds -1.400 (-3.70,	0.92) Raffle 0.260 (-2.00,	2.60) Omission:GOOD -12.00*** -14.00*** (-18.00,	-7.20) (-19.00,	-9.30) Omission:OBLIGATORY 3.70 0.180 (-1.50,	9.00) (-4.40,	4.80) VARIETIES OF MORAL JUDGMENT 40 Omission:PERMISSIBLE -7.90*** -10.00*** (-13.00,	-2.70) (-15.0,	-5.70) Omission:SUPER EROGATORY 3.50 (-1.70,	8.80) GOOD:	SUPER EROGATORY 97.00*** (92.0,	102.0) OBLIGATORY:	SUPER EROGATORY 51.00*** (46.0,	56.0) PERMISSIBLE:	SUPER EROGATORY 72.00*** (67.00,	77.00) Omission:	GOOD:	SUPER EROGATORY -3.00 (-10.00,	4.40) Omission:OBLIGATORY: SUPER EROGATORY -7.10* (-14.0,	0.36) Omission:PERMISSIBLE: SUPER EROGATORY -4.80 (-12.00,	2.60) Constant 62.00*** 35.00*** (59.00,	64.00) (32.00,	38.00) N 4976 4976 Log	Likelihood -22765.00 -23867.00 AIC 45566.00 47761.00 BIC 45683.00 47852.00 ***p	<	.01;	**p	<	.05;	*p	<	.1 629 VARIETIES OF MORAL JUDGMENT 41 As before, both permissibility and goodness ratings were significantly higher for the 630 supererogatory (Permissibility M = 85.4, SD = 19.5, n = 622 and goodness M =85.6, SD = 20.9, 631 n = 622) than for the suberogatory condition (M = 63.9, SD = 29.9, n = 616 and goodness M632 =38.0, SD = 20.5, n = 622) permissibility: t(4580) = -16.08, p < .001, Mdiff = -21.5, 95% CI [-633 24.3, -18.7], goodness: t(4580) = -35.54, p < .001, Mdiff = -47.2, 95% CI [-49.9, -45.3]. However, 634 the difference between obligatory ratings was not replicated (t(4580) = 0.22, p = .82, Mdiff = 0.32, 635 95% CI [-2.77, 3.41]). Blame ratings differed significantly across the suberogatory and 636 supererogatory conditions. Participants rated suberogatory behaviors as more blameworthy (M =637 63.70, SD = 20.77, n = 622) than supererogatory behaviors (M = 15.76, SD = 20.62 , n = 622) 638 t(4580) = 35.82, p < .001, Mdiff = 47.9, 95% CI [45.6, 50.2]. As in Experiment 2, there are 639 significant differences between scenarios, with supererogatory responses in the Kidney scenario 640 having the highest positive scores (as better, more permissible, more praiseworthy) and 641 suberogatory responses in the Mowing scenario the most negative (as worse, less permissible, 642 less optional, and more blameworthy). 643 Blame ratings were not significantly correlated with statements about personal sense of 644 duty across either scenarios or conditions. The same overall pattern of correlations between the 645 other ratings and statements about duties was observed (See Table 2, Experiment 2). 646 Supererogatory behaviors are consistently rated as praiseworthy, while suberogatory 647 behaviors are consistently rated as blameworthy. However, the magnitude of praiseworthiness 648 judgments is much greater in the case of supererogatory behaviors. This asymmetry suggests that 649 supererogatory behaviors deserve more praise than the corresponding suberogatory behaviors650 VARIETIES OF MORAL JUDGMENT 42 deserve blame. (see Figure 7). This asymmetry has been repeatedly observed in other studies 651 (Monroe et al., 2018, Pizarro et al., 2007).652 It is worth noting that, again as in Experiment 2, there are no important differences 653 between the permissibility and goodness means in the supererogatory condition (M = 85.61, SD 654 = 20.86 vs M = 85.43, SD = 19.46 respectively) while for the suberogatory responses we find a 655 difference of 25.88 points between these (Cohen's d = 1.01), suggesting that participants are 656 considering different information when using assessing permissibility and goodness in these 657 situations.658 659 Figure 7. Scores by type of judgment Experiment 4. Error bars represent 95% confidence 660 intervals661 Discussion662 VARIETIES OF MORAL JUDGMENT 43 In this experiment, we replicated the key findings of Experiment 2. When comparing judgments 663 about suberogatory and supererogatory behaviors, people distinguish between the goodness, 664 permissibility, and blameworthiness of the action. However, people do recognize suberogatory 665 and supererogatory behaviors as equally optional. Despite statistically significant differences 666 between the permissibility of suberogatory and supererogatory actions, participants rated 667 suberogatory behaviors as permissible.668 Now that we have provided some evidence for distinctions between these evaluative 669 categories, we turn to another question: what factors differentially affect these judgments? As we 670 saw in Experiment 2, beliefs about personal duties do not relate significantly to people's 671 judgments. The significant difference in blame judgments suggests that perhaps norms of social 672 sanctioning help to inform at least one kind of judgment. To understand the reasoning underlying 673 these judgments, we conducted two additional experiments. 674 Experiment 5675 In this experiment we explored how judgments of deserved social sanction are associated 676 with the different kinds of moral judgment identified in previous experiments. Thus far, we have 677 established that some situations can be understood in terms of suberogatory or supererogatory 678 characterizations. In order to understand people's responses to these situations, we need to deploy 679 a richer array of judgment dimensions. In this experiment, we explored whether suberogatory 680 responses are associated with potential behavioral consequences, such as social punishment.681 Plausibly, social punishment is, as much as expressions of blame, a behavioral marker of 682 underlying moral judgment.683 VARIETIES OF MORAL JUDGMENT 44 We also explore whether attitudes about rights predict different kinds of judgments. The 684 situations that we describe often bring into conflict an individual right and a kind or generous 685 thing that could be done for others. For example, someone has the right to keep the seat they paid 686 for on the airplane, but it is a kind thing to offer one's seat when asked to switch. Because of this, 687 we thought that attitudes towards rights, instead of duties, might usefully predict moral 688 judgments. This would partially vindicate the rule-based conception of morality, as one might 689 hold that there are generalized functions from rights to prohibited behaviors (even if there are 690 fewer-if any-such functions from rights to prescribed behaviors). Moreover, we found this 691 pattern of reasoning in the open responses for Experiment 1. All of this suggests that personal 692 sense of rights might provide some details about the range of inputs to which moral judgment is 693 sensitive. 694 Participants695 309 participants (165 women and 144 men, mean age = 39.89, SD =13.76), recruited 696 through Prolific Academic, took part in the study in exchange for 60 pence. The average 697 completion time was of 4.87 minutes and none of the participants had taken part in the previous 698 experiments.699 Materials and procedure700 We used the same materials and procedures from Experiment 4 with two changes. First, 701 people had to judge whether the response merited social sanction or recognition by using a slider 702 with an underlying scale going from 0 to 100 (social sanction = 0, neither = 50, social 703 recognition=100). Second, people provided information about their attitudes towards rights by 704 answering these three questions with a 7-point Likert scale (1= Strongly disagree, 7 = Strongly 705 VARIETIES OF MORAL JUDGMENT 45 agree): 1) Everyone has a right to do anything they want on their property (as long as they are not 706 hurting anyone else); 2) Everyone has the right to bodily autonomy; 3) Everyone has a right to 707 make use of their money (or goods) as they see fit.708 Results709 As in Experiments 2 and 4, we fitted a series of random effect models which resulted in a 710 best fitting model that includes the interactive effect of judgment type, omission, erogation 711 condition and scenario. Crucially, including terms for attitudes about rights did not significantly 712 improve the fit of this model.713 Table 5. Summary of Models fitted for Experiment 5 score Omission -8.08*** (-10.95,	-5.22) OBLIGATORY -22.94*** (-26.74,	-19.15) PERMISSIBLE 19.74*** (15.94,	23.54) SANCTION -2.66 (-6.45,	1.14) SUPER EROGATORY 44.90*** (42.21,	47.59) Mowing -5.24*** (-8.25,	-2.24) Newlyweds -5.14*** (-7.04,	-3.23) Raffle 3.68** (0.68,	6.69) Omission: OBLIGATORY 15.28*** (9.90,	20.65) VARIETIES OF MORAL JUDGMENT 46 Omission: PERMISSIBLE 9.13*** (3.75,	14.51) Omission: TypeSANCTION 7.98*** (2.60,	13.35) OBLIGATORY: SUPER EROGATORY -39.76*** (-45.13,	-34.39) PERMISSIBLE: SUPER EROGATORY -22.18*** (-27.55,	-16.81) SANCTION: SUPER EROGATORY -14.17*** (-19.54,	-8.80) Omission: OBLIGATORY: SUPER EROGATORY -3.29 (-10.90,	4.31) Omission: PERMISSIBLE: SUPER EROGATORY -5.62 (-13.22,	1.99) Omission: SANCTION: SUPER EROGATORY -3.65 (-11.26,	3.96) Constant 45.96*** (43.03,	48.90) N 4944 Log	Likelihood -22788.0 AIC 45617.0 BIC 45747.0 ***p	<	.01;	**p	<	.05;	*p	<	.1 The pattern found in Experiments 2 and 4 was replicated here (see Figure 8). 714 Supererogatory behaviors were judged as better and more permissible (M = 85.16, SD = 19.31, n 715 = 618 and M = 84.47, SD = 20.18, n = 618) than suberogatory behaviors (M = 40.26 SD = 23.99,716 n = 618 and M = 64.55, SD = 30.09, n = 618) Good: t(4619) = -32.72, p < .001, Mdiff = -44.4, 717 95% CI [-47.3, -42.5] and Permissible: t(4619) = -14.51, p < .001, Mdiff = -19.9, 95% CI [-22.8, -718 17.1], regardless of being actions or omissions. Supererogatory behaviors were again judged to 719 be less optional (M = 28.43, SD = 30.61, n = 618) than the corresponding suberogatory responses 720 VARIETIES OF MORAL JUDGMENT 47 (M = 24.93, SD = 26.34, n = 618), t(4619) = -2.55, p = .01, Mdiff = -3.50, 95% CI [-6.68, -0.31]. 721 As in Experiment 4 this difference only emerged when an additional measure was included 722 (blameworthiness in Exp 4 and social sanction in this Exp) suggesting the presence of a joint 723 evaluation effect (Hsee, Blount, Loewenstein, & Bazerman, 1999).724 Supererogatory behaviors were judged to merit more social recognition than suberogatory 725 responses (M = 70.49, SD = 25.41, n = 618 vs M = 41.59, SD = 20.57, n = 618, t(4619) = -21.07, 726 p < .001, Mdiff = -28.9, 95% CI [-31.5, -26.3]). Crucially, both means for social recognition and 727 social sanction are significantly different from the indifference point (50 in our scale)728 (Supererogatory t(617) = 20.1, p < .001, M = 70.5, 95% CI [68.5, 72.5] and suberogatory t(617) 729 = -10.2, p < .001, M = 41.5, 95% CI [40.0, 43.2]. Mirroring the asymmetric pattern observed for 730 blameworthiness ratings, supererogatory responses deserve more recognition than sanction is 731 deserved by suberogatory responses. Also, as in the previous studies there are important732 differences between scenarios, even when the overall pattern is consistent. For example, not 733 giving up the raffle prize does not merit neither recognition nor sanction (M = 50.82, SD = 22.06,734 n =155) while mowing your lawn early morning is clearly disapproved (M = 34.23, SD = 19.76,735 n = 155). 736 VARIETIES OF MORAL JUDGMENT 48 737 Figure 8. Scores by type of judgment and condition. Error bars represent 95% confidence 738 interval.739 To explore the association between judgment scores and attitudes towards rights we first 740 observed the strength of the association between attitudes towards rights. Correlations are 741 medium to small in size (see Table 6) and there is low internal consistency (Cronbach's alpha = 742 0.50). We then calculated the correlations between judgment scores and these attitudes towards 743 rights (Table 6). Similar to what happened with questions about duties, the correlations are low 744 and only significant for two associations: (1) bodily autonomy and permissibility, and; (2) 745 obligatoriness and right to property. Only when considering particular scenarios, moderately 746 stronger correlations start to emerge (for example, permissibility ratings in the Kidney scenario 747 are significantly correlated with attitudes on bodily autonomy r (309) = .12, p < 0.01)748 VARIETIES OF MORAL JUDGMENT 49 Table 6. Bivariate correlations between attitudes towards rights and judgment scores 749 Permissible Good Obligatory Sanction Right to Money Bodily autonomy Permissible Good 0.54*** Obligatory -0.24** -0.04*** Sanction 0.24*** 0.40*** 0.14*** Right to money 0.03 0.04 -0.05 0.01 Bodily autonomy -0.16** 0.05 -0.14 0.00 0.16*** Right to property 0.05 0.02 0.06* 0.01 0.44*** 0.17*** ***p < .001; **p < .01; *p < .05 Discussion750 We find the same dissociations observed in Experiments 2 and 4. Including the category of 751 sanctions, however, brings out something interesting. There is a much closer relationship 752 between goodness and sanction than there is between goodness and blame in suberogatory cases. 753 The relationship does not hold in cases of supererogatory behavior, where judgments of goodness 754 were statistically distinct from judgments of sanction (positive recognition, in this case). This 755 suggests that people think suberogatory actors should be sanctioned, though supererogatory 756 actors need not necessarily be positively recognized. This, however, might be a function of the 757 cases we used rather than marking out an intrinsic difference between commonsense thinking 758 about the suband supererogatory.759 We also identified an interesting relationship between sanction and goodness. Actions 760 rated as bad received similar ratings of sanction. In the previous experiment, we saw that ratings 761 of badness do not align with ratings of permissibility. Permissibility ratings are more closely 762 aligned with ratings of blame. This suggests that the permissibility of subererogatory behaviors is 763 related to blameworthiness, whereas the badness of suberogatory behaviors is related to public 764 VARIETIES OF MORAL JUDGMENT 50 sanction. However, there is an asymmetry between subererogatory and supererogatory behaviors. 765 We discuss this pattern more in the General Discussion.766 Lastly, we did not find any interesting correlations between beliefs about rights and 767 different kinds of moral judgment. This was similar to our findings about individual sense of duty 768 in Experiment 2. This suggested the absence of general function from facts about individual 769 rights to behavior evaluation in moral encounters, providing further evidence that commonsense 770 thinking about morality does not reflect a structured system of rules.771 Experiment 6772 So far we have shown that judgments of permissibility, goodness, and obligation track 773 distinct features of a situation. Evaluations of supererogatory and suberogatory behaviors brings 774 these distinctions to light. One question that remains is what factors drive these different 775 judgments. The previous experiments offered some suggestions. In Experiment 1, we found that 776 participants used the language of rights to offer justifications for suberogatory behavior. 777 Additionally, they used virtue terms and aretaic categories in free descriptions of the scenarios. 778 Experiments 2, 4 and 5 suggest, however, that people are not explicitly taking into account their 779 attitudes towards duties (Exps. 2 and 4) or rights (Exp 5). This suggests two things. Judgments of 780 goodness might be sensitive to what people think a decent or virtuous person would do in a 781 situation, while judgments of permissibility track what some individual has the right to do in a 782 given situation. 783 If these suggestions are correct, it would explain why judgments of permissibility and 784 goodness often coincide: what is decent or virtuous often overlaps with what one has a right to 785 VARIETIES OF MORAL JUDGMENT 51 do. However, it also explains why the two concepts dissociate in cases of suberogatory behavior. 786 Suberogatory situations arise when someone has a right to do something that a virtuous or decent 787 person would not do. If this is the case, it is the absence of duties and the corresponding presence 788 of rights that are characteristic of suberogatory situations. These characteristics are, however, not 789 framed as rules, but are instead low-level features of the situation. 790 In this experiment, we explore the justifications people offer for their judgments of 791 permissibility and goodness to see whether these different factors explain the distinction between 792 these two kinds of judgment. We also include scenarios that are more perspicuously moral than 793 the ones previously used (e.g. lawn mowing or seat ownership). By providing situations with a 794 higher moral charge, we can be more confident that the responses observed so far are not merely 795 tracking the perceived conventional permissibility or social aptness of the behaviors evaluated. 796 Method797 Participants798 We recruited 316 participants (160 women and 153 men, mean age = 33.30, SD =10.81)799 through Prolific Academic to participate in the study in exchange for 60 pence. Sample size was 800 set to reproduce findings of Experiments 2, 4 and 5 but with a within participants design in this 801 case, which would allow us to increase statistical power. The average completion time was 10802 minutes and none of the participants had taken part in the previous experiments.803 VARIETIES OF MORAL JUDGMENT 52 Materials and procedure804 We constructed two new scenarios and created a suberogatory and supererogatory version 805 of each. The two scenarios (adapted from Thomson, 1971 and Nozick, 1974, respectively) are 806 described below [suberogatory version in brackets]:807 Alex is driving home from work on the highway when she gets into an accident that knocks her 808 unconscious. When she wakes up, she finds herself in a hospital bed. She's also connected to another individual 809 through a series of wires and tubes. A doctor enters the room and explains to Alex that she is fine, but the 810 individual she's connected to suffered some severe damage to internal organs. Alex has the right blood type to 811 help, and-since she was unconscious-the doctor decided to connect Alex to keep the other individual alive for 812 the time being. The doctor explains that Alex can unplug herself if she chooses, but the individual will most likely 813 die. The individual will recover from these injuries in about a month (give or take a few days), after which time Alex 814 can unplug herself and leave. After a few hours of pondering what to do, Alex decides stay plugged in for the 815 month [to unplug herself].816 Jones finds a large freshwater source on his property, so he digs a well as a way of claiming the 817 water. A few weeks later, the town where he lives begins experiencing a drought, which was completely 818 unpredictable. Town representatives visit Jones to ask whether they can use his water to alleviate some of the 819 drought. Without Jones' help, the town will likely run out of water in a few days. If Jones donates some of his 820 water, however, he might experience the effects of the drought in the unlikely event that the drought prolongs for 821 too long. After considering what to do, Jones decides to offer his water [declines to offer his water]822 To test variation against a known benchmark, we also included the Newlyweds scenario 823 (See Exp. 5). Participants read the informed consent and then rated each one of the three 824 vignettes. The order of the vignettes was randomized across participants. Each participant was 825 randomly assigned to see either the supererogatory or suberogatory condition of each scenario826 and the rest of the procedure was the same as in Experiment 4. The only difference is that, after 827 Violinist Well VARIETIES OF MORAL JUDGMENT 53 providing judgment for the dimensions requested, participants were asked to explain their ratings 828 in their own words (cf. Christensen et al., 2014).829 Results830 Overall, numerical ratings closely followed the pattern observed in prior Experiments (See 831 Figure 9). Suberogatory behaviors are rated as worse (M = 34.19, SD = 23.71, n = 468 vs M =832 86.37, SD = 19.80, n = 480) , more blameworthy (M = 64.13, SD = 22.46, n = 468 vs M = 14.92,833 SD = 19.12, n = 480), and less permissible (M = 28.17, SD = 32.31, n = 468 vs M = 83.15, SD =834 22.27, n = 480) than supererogatory responses (all significant pairwise comparisons). Unlike 835 Experiment 5, there is no significant difference in obligatoriness judged (Suberogatory M =836 28.17, SD = 32.31, n = 468 vs M = 26.64, SD = 31.75, n = 480). As before, we fitted random 837 effect models, with a random intercept for participants. However, models with this term resulted 838 in singular fits, due to lack of variation for the random intercept for participant and suggesting an 839 overcomplex random structure (Matuschek, Kliegl,Vasishth, Baayen & Bates (2017). Therefore,840 we fitted only fixed effect models, where the best model includes the interaction of judgment 841 type, scenario and erogation condition (results can be seen in Table 7).842 VARIETIES OF MORAL JUDGMENT 54 843 Figure 9. Mean scores by type of judgment Experiment 4. Error bars represent 95% confidence 844 intervals845 There is also a significant between-scenario variation. For example, only in the Well846 scenario people judged the suberogatory behavior, not giving water, as significantly less optional 847 than the supererogatory behavior (M = 31.26, SD = 31.75, n = 160 vs M = 38.81, SD = 34.63, n =848 156, Good: t(310)4 = 2.64, p < .001, Mdiff = 7.55, 95% CI [0.19, 14.9]). In this scenario also 849 occurred the most extreme values for suberogatory responses. 850 4 Degrees of freedom correspond to the Welch t-test, since the library emmeans (Lenth, 2020) calculated the asymptotic result for this comparison (value from the z distribution). VARIETIES OF MORAL JUDGMENT 55 851 Table	7.	Summary	of	Best	Model	fitted	for	Experiment 6 score GOOD -18.65*** (-24.33,	-12.97) OBLIGATORY -43.06*** (-48.74,	-37.38) PERMISSIBLE 14.01*** (8.33,	19.70) SUPER_EROGATORY -35.49*** (-41.08,	-29.89) Violinist 3.61 (-2.02,	9.24) Well 13.13*** (7.48,	18.79) GOOD:	SUPER_EROGATORY 77.22*** (69.31,	85.13) OBLIGATORY:	SUPER_EROGATORY 36.02*** (28.11,	43.93) PERMISSIBLE:	SUPER_EROGATORY 50.07*** (42.16,	57.98) GOOD:	Violinist -7.79* (-15.75,	0.17) OBLIGATORY:	Violinist 10.88*** (2.92,	18.84) PERMISSIBLE:	Violinist -10.41** (-18.37,	-2.45) GOOD:	Well -25.91*** (-33.91,	-17.91) OBLIGATORY:	Well 10.22** (2.22,	18.22) PERMISSIBLE:	Well -29.89*** (-37.89,	-21.89) SUPER_EROGATORY:	Violinist -17.16*** (-25.07,	-9.25) VARIETIES OF MORAL JUDGMENT 56 SUPER_EROGATORY:	Well -24.19*** (-32.10,	-16.28) GOOD:	SUPER_EROGATORY:	Violinist 27.74*** (16.56,	38.93) OBLIGATORY:	SUPER_EROGATORY:	Violinist 19.66*** (8.47,	30.84) PERMISSIBLE:	SUPER_EROGATORY:	Violinist 12.94** (1.76,	24.13) GOOD:	SUPER_EROGATORY:	Well 44.97*** (33.79,	56.16) OBLIGATORY:	SUPER_EROGATORY:	Well 16.10*** (4.91,	27.29) PERMISSIBLE:	SUPER_EROGATORY:	Well 39.84*** (28.66,	51.03) Constant 58.52*** (54.51,	62.54) N 3792 Log	Likelihood -17629.00 AIC 35306.00 ***p	<	.01;	**p	< .05;	*p	<	.1 852 853 854 For the qualitative analysis of the open responses, we used two coders. Both were blind to 855 the individual ratings associated with each open response and one coder was completely blind to 856 the objective of the study. Coders used ten predefined categories to sort responses. We report 857 agreement between coders (Cohens Kappa) and results from fitting loglinear models on the 858 classification frequencies.859 There were 4003 unique written explanations of ratings (1004 for goodness ratings, 999860 for obligatory ratings, 997 for permissibility ratings and 1003 for blame/praise ratings). Coders 861 Qualitative data VARIETIES OF MORAL JUDGMENT 57 were given ten categories. Character when the response included some mention of personality or 862 character traits; Rights (presence or absence) when responses appealed to what the protagonist 863 can morally do, is allowed to do, or its explicit negation; Duties (presence or absence) when864 responses appeal to what the protagonist is morally obliged to do, or its explicit negation; Neither865 when participants appealed to norms that could not be considered rights nor duties, etc; Outcomes 866 when the justification was based on the consequences for the "victims", negative or positive; 867 Values actions performed had a clear valence; Justification when appealing to reasons not based 868 on consequences; Other if the response did not fall in the previous categories but was common 869 enough to merit its own. Coders were given an example of each one of the categories. Raters 870 completed coding independently. They were also told that they could classify any given response 871 in more than one category.872 For example, one response to the Well vignette reads: "He didn't have to donate the water, 873 and even if he didn't that wouldn't be necessarily a bad thing." This was rated as indicating 874 'Absence of Duties' and 'Values'. Another response (to the Newlyweds vignette) reads: "No one 875 should have to switch a seat unless they want to. They bought the seat they were in and would 876 have been justified in staying in that seat." This was rated as indicating 'Justification' and 877 'Presence of Rights'.878 Raw agreement between coders and Cohen's Kappa for each category are presented in 879 Table 8. The category Neither is not presented because there was virtually no agreement in its 880 use. Most of the unique responses were classified into at least 2 categories (58%). Half of the 881 categories show a weak reliability between coders (outcomes, values, justification and absence of 882 VARIETIES OF MORAL JUDGMENT 58 rights) while the other half shows an overall good agreement (presence of duties, character, 883 presence of rights and absence of duties).884 Table 8. Cohen's kappa agreement between coders for the eight categories. Columns L and 885 U are the Lower and Upper limits of Cohen's confidence interval.886 Kappa L U % Classified as % Agreement Abs duties 0.81 0.79 0.84 11.02 95.68 Pres rights 0.75 0.73 0.77 21.06 90.50 Character 0.69 0.65 0.72 8.61 94.18 Pres duties 0.63 0.58 0.68 5.65 95.25 Outcomes 0.39 0.35 0.43 11.86 85.08 Values 0.38 0.35 0.42 20.28 77.18 Justification 0.37 0.34 0.41 13.7 82.35 Abs rights 0.34 0.27 0.41 3.13 95.15 To determine the association between justification category and the variables manipulated 887 in this experiment, we submitted the classification frequencies to a loglinear analysis with a 888 saturated model including erogation condition (sub and supererogatory), judgment dimension 889 (goodness, permissibility, obligatoriness, blameworthiness) and justification classification 890 category (presence of duties, character, presence of rights and absence of duties). The three-way 891 loglinear model produced a model that retained all interactive effects (likelihood ratio χ2 (0) = 0, 892 p = 1), indicating a significant interaction between the variables fitted to the model (χ2 (12) = 893 102.53, p < 0.001). Judgments of permissibility and obligatoriness were mainly sensitive to 894 presence of rights and these explanations appealing to the presence of rights were more prevalent 895 in suberogatory cases. Judgments of permissibility were also justified by appealing mostly to the 896 absence of duties and the presence of rights (>40%) but in this case there is no asymmetry 897 VARIETIES OF MORAL JUDGMENT 59 between sub and supererogatory response. Judgments of goodness were mostly justified by 898 appealing to character and again the suberogatory responses were justified by relying on presence 899 of rights. Additionally, there are more character justifications for supererogatory responses based 900 on character than for the corresponding suberogatory responses.901 The qualitative analysis of the open explanations suggests that people do systematically 902 call to mind considerations on duties and rights when making judgments about permissibility and 903 obligatoriness. While this is not surprising, the interesting thing here is that these considerations 904 are also significant for judgments of blameworthiness and goodness. Duties and rights showed 905 up, not as blank generalizations – as explored in Experiments 2 and 5 – but as situated appeals to 906 factors of the situation. 907 Discussion908 As before, we replicated the basic pattern found in our other experiments using two 909 situations that are more clearly moral. This provides strong evidence for our claim that 910 differences in evaluative categories track differences in kinds of moral judgments.911 We also used open responses to allow participants to supply their own reasoning for 912 making different moral judgments. Our results are striking. People appear to be sensitive to many 913 kinds of information differentially when making moral judgments. Judgments of praise and 914 blame are sensitive to character considerations. Judgments of goodness seem primarily sensitive 915 to character considerations, whereas judgments of badness seem sensitive to both character 916 considerations and the presence of rights. This, however, might be a function of the situations we 917 provided. When people make judgments of badness, they were often pointing to the fact that the 918 VARIETIES OF MORAL JUDGMENT 60 bad behavior was fully within the rights of the individual, even though the behavior was 919 indicative of vicious character. The fact that people mention both points to the relevance of both 920 kinds of information to making a judgment of badness. Judgments of permissibility appear to 921 track the presence of rights and the absence of duties, whereas judgments of obligatoriness track 922 the presence of rights and the presence or absence of duties. We offer an interpretation of this in 923 the general discussion.924 Finally, we should note that the categories used to sort open responses were generated a 925 priori. Bottom-up, data-driven methods might reveal a more perspicuous classificatory scheme 926 that might generate surprising results about the kinds of information relevant to making different 927 kinds of moral judgment.928 General Discussion929 In this paper, our goal was to develop the idea that morality is not exhausted by a system 930 of rules and explore some of its implication for our understanding of the psychology of moral 931 judgment. To do this, we've made a case for moving beyond the use of dilemmas in the study of 932 moral judgment and propose a wider array of measures and situations. In this General Discussion, 933 we summarize how our findings apply to the measurement of moral judgment, the cases used to 934 elicit judgments, and the rule-based view of morality.935 Measures and Categories of Moral Judgment936 We found that judgments across different moral categories are dissociable. In particular, 937 judgments of goodness and permissibility come apart depending on the situation presented. 938 VARIETIES OF MORAL JUDGMENT 61 People form judgments about situations that conform to the category of the suberogatory (bad, 939 but permissible). This challenges the idea that different measures of moral judgment tap into the 940 same underlying reasoning process. The resulting picture of moral judgment is that people are 941 variably sensitive to different kinds of considerations when making different kinds of judgments942 (see Barbosa and Jiménez-Leal, 2017). While this presents a more granular picture of moral 943 judgment, it also opens up new questions. When do people prefer different kinds of information 944 in making moral judgments? What sorts of considerations predominantly drive different kinds of 945 judgments?946 Our results provide some initial suggestion for how to answer these questions. In many of 947 our experiments, ratings of obligation remained consistently low. This indicates that people 948 considered both suberogatory and supererogatory behaviors as optional rather than obligatory. 949 The qualitative data from Experiment 6 suggests that ratings of obligation are predominantly 950 tracking the presence of duties or explicit rules that prohibit or prescribe conduct. Because the 951 situations considered here are not composed of conflicts between rules, there is an absence of 952 duties that defines how one ought to behave. Hence, ratings of obligation are low.953 Results from Experiment 5 indicate that ratings of badness align with ratings of sanction. 954 That is, the degree to which one rates a suberogatory action as bad is related to amount of 955 sanction one deserves in light of doing something suberogatory. This is importantly different 956 from the results of Experiment 4, which indicated that ratings of permissibility are aligned with 957 ratings of blame. 958 These results suggest that folk psychological categories of blame and sanction might 959 dissociate. Because these categories differentially associate with others, it might be the case that 960 VARIETIES OF MORAL JUDGMENT 62 these judgments track different features of a situation. Understanding this difference might 961 provide further insight into what judgments of permissibility and goodness indicate about folk 962 psychological evaluation. This is an important result to investigate in future work. Moral 963 psychologists and philosophers have assumed that these constructs significantly overlap and, 964 accordingly, have used them interchangeably (Bendor & Swistak, 2001; Deutsch & Gerard, 965 1955; Cialdini, Reno, & Kallgren, 1990; Scanlon, 2008; McKenna, 2013; Bennett, 2012). Future 966 work should investigate further whether and under what circumstances there are reliable 967 dissociations between these categories.968 Notably, some have mentioned the need for using new measures in studying moral 969 judgment (see Uhlmann, Pizarro, & Diermeier, 2015), arguing that folk psychological categories 970 of judgment are fundamentally directed at personal evaluation rather than behavior evaluation971 and routinely employs aretaic rather than deontic concepts. Our results suggest that this is 972 partially true. People do show an interest in personal evaluation using aretaic concepts. However, 973 we also find that behavior evaluation and deontic concepts also play a significant role. This 974 suggests that social cognition is sensitive to both act-based and person-based considerations, as 975 well as a wide conceptual repertoire in normative evaluation. 976 Moral Situations977 Using a variety of measures is not sufficient to bring out the underlying variability of 978 moral judgment. In Experiment 3 (on sacrificial dilemmas), we did not find similar dissociations 979 between judgment categories as we did in our other experiments. This was the case even when 980 controlling for underlying ethical tendencies (i.e., low or high trait utilitarianism). Even among 981 low trait utilitarians, different kinds of judgments never significantly differed from each other. 982 VARIETIES OF MORAL JUDGMENT 63 This suggests that dilemmas themselves are qualitatively distinct from the kind of moral 983 encounters we used in our experiments.984 As discussed earlier, assumptions about methodology and dilemmas mutually reinforce 985 each other. We can now clarify this point further. The use of dilemmas as stimuli functions as a 986 demand effect that gives the appearance of commonsense moral thinking exhibiting a rule-based, 987 hierarchical structure. In providing scenarios that explicitly bring different sets of rules into 988 conflict, researchers have set people up to exhibit judgments that appear to preferentially select 989 some rules over others (deontological prescription vs. maximization principles). This, in turn, 990 blurs the distinctions between different judgment categories. Once we free moral judgment of the 991 constraints of dilemmas, variegated measures capture better the variability and nuance of992 judgment. That variability is only possible when using situations that do not bring different sets 993 of rules into conflict.994 Different situations also allow for a different kind of variability. Moral dilemmas have 995 mostly been constructed out of examples designed to test abstract philosophical principles. which 996 has led to worries about their ecological validity (Kahane, 2015; Dahl & Oftedal, 2019). A 997 related but, perhaps more significant, concern is that dilemmas, because of their abstract and 998 artificial nature, likely occlude differences in moral attitudes that arise from socio-cultural 999 variation. Thus, in the limited number of studies that have been conducted outside the urban areas 1000 of North America and Western Europe, researchers have found responses similar to those found 1001 among WEIRD populations (Abarbanell & Hauser, 2010; Barrett et al., 2016; Koenigs et al., 1002 2007; Perkins et al., 2013; Szekely & Miu, 2015; Johnson, Danko, Huang, Park, Johnson, & 1003 Nagoshi, 1987). 1004 VARIETIES OF MORAL JUDGMENT 64 By contrast, it is likely that responses that are deemed suberogatory in one community, 1005 such as tipping 10%, might be considered supererogatory by another one. As cultural variability 1006 mediates moral learning, it is plausible to conjecture that different cultures assign varying 1007 weights to values, virtues, and rules in the justification of their moral judgments (Graham, 1008 Meindl, Beall, Johnson & Zhang, 2016). Consequently, cultural differences can be observed not 1009 as variation in a dimension (e.g. being more or less utilitarian) but as a particular pattern of 1010 judgements and their justifications. At present, we do not have results that speaks directly to 1011 these issues-although we currently have relevant work in preparation. We do think, however, 1012 that moving beyond dilemmas opens up the possibility for this variability to emerge. 1013 Rule-based Morality1014 We mentioned at the outset that not all moral situations reflect conflicts between rules. 1015 Our results show that when people approach some of these situations and are asked to evaluate 1016 them, they bring to bear a variety of considerations, not all of which are codifiable in rules. This 1017 strongly suggests that moral judgment does not necessarily rely on the application of abstract 1018 moral principles. 1019 In Experiments 2, 4 and 5, we asked participants to fill questionnaires aimed at measuring 1020 general attitudes towards duties and rights. We failed to find correlations between their attitudes 1021 toward abstract claims about the nature of rights and duties and their judgments about particular 1022 situations. This reflects a tendency of people to identify features of situations that guide judgment 1023 without having explicit representations of abstract principles play a causal role in forming moral 1024 judgments (Graham et al., 2013). Likewise, the open responses we collected in Experiment 6 1025 suggest that rules play a role in structuring commonsense moral thinking alongside axiological 1026 VARIETIES OF MORAL JUDGMENT 65 and characterological considerations, at least when it comes to explaining proffered moral 1027 judgements. In particular, as indicated by the responses under the "absence of duties" category, 1028 participants judged actions to be good, bad, etc., while explaining that these didn't violate a 1029 prohibition (in suberogatory cases) or were not generally mandated (in supererogatory cases). 1030 Instead, their evaluations seem overwhelmingly responsive to considerations about rights, values, 1031 and character traits.1032 For all we have said, it may turn out that precise algorithms adequately characterize the 1033 computations performed in forming moral judgments. Alternatively, moral judgment might be1034 supported by an architecture that functions as a model-based system without having to explicitly 1035 represent rules (see Crockett, 2013; Cushman, 2015; Brownstein, 2018). Our interest here, 1036 however, does not lie with the computations or cognitive architecture behind moral judgment, but 1037 with the content of morality. Our claim is that most research on moral judgment has, thus far, 1038 assumed that the content of moral judgment can be measured in terms of its alignment with some 1039 structured system of moral rules. But fixating on rules provides only a partial window into the 1040 moral life, one that fails to reveal the complexities and subtleties of commonsense moral 1041 judgment. When moral judgment is removed from the narrow frame of sacrificial dilemmas, the 1042 appearance of rules in moral judgment evaporates. 1043 Finally, our argument here does not assume that if morality has a rule-based structure, 1044 then commonsense moral judgment ought to exhibit perfectly coherent and systematic principles. 1045 Even Sidgwick (1981) admits that the maxims of commonsense morality are "somewhat vague 1046 generalities" (p. 342). Still, on his view and on the view of those who follow him, commonsense 1047 morality can be refined through rigorous theorizing to approximate the structure of an ideal 1048 VARIETIES OF MORAL JUDGMENT 66 normative theory. It's made of the right sort of stuff, as it were, to function as a rule-based 1049 system.1050 The judgments that we have identified in these experiments, however, imply that this is an 1051 incomplete view of the moral framework reflected in moral cognition. Consider that people 1052 regularly judge that some behavior is bad but permissible across a range of different scenarios. If 1053 there are rules constitutive of commonsense morality, what must they be like such that they allow 1054 for such judgments? How can we make sense of such judgments within a rule-governed system? 1055 Recall that one key feature of the rule-based system of morality is the inter-definability of moral 1056 concepts. Whatever is permissible is in line with the rules of morality; whatever is bad is bad 1057 because it goes against those rules. 1058 One option is to say that separate domains of rules govern separate judgments. But that 1059 requires giving up on the notion of inter-definability that is central to the rule-based system of 1060 morality. Another option is to say that one kind of judgment is properly moral and the other is 1061 not. However, there appears to be no principled way of stating that either judgments of 1062 permissibility or goodness are properly moral while excluding the other. A third option is to say 1063 that there are systematic principles underlying these judgments, but people do not understand 1064 what these are. Hence, they are making a mistake in dissociating permissibility and goodness. 1065 Lastly, one could argue that morality is self-contradictory. 1066 While all of this is possible in principle, the attempt to defend these answers in the face of 1067 the evidence presented here begins to look like the imposition of researcher assumptions rather 1068 than an investigation into commonsense moral thinking. At this point, one begins to wonder if 1069 VARIETIES OF MORAL JUDGMENT 67 the substantive assumption about morality being a system of rules is worth the cost. We suggest 1070 that it is not, and that it is time to consider what morality beyond the rules would look like.1071 References1072 Abarbanell, L. & Hauser, M.D. 2010. Mayan morality: An exploration of permissible harms. 1073 Cognition 115:2, 207-24.1074 Anderson, A.R. (1958). A reduction of deontic logic to alethic modal logic. Mind, 67, 100-103.1075 Andrade, G. (2019). "Medical ethics and the trolley Problem," Journal of Medical Ethics and 1076 History of Medicine, 12:3.1077 Archer, A. (2018). Supererogation. Philosophy Compass, 13(3), 1–9. doi:10.1111/phc3.124761078 Barbosa, S., & Jiménez-Leal, W. (2017). It's not right but it's permitted: Wording effects in 1079 moral judgement. Judgment and Decision Making, 12(3), 308.1080 Barrett, H.C., Bolyanatz, A., Crittenden, A.N., Fessler, D.M.T., Fitzpatrick, S., Gurven, M., 1081 Henrich, J., Kanovsky, M., Kushnick, G., Pisor, A., Scelza, B.A., Stich, S., von Reuden, C., 1082 Zhao, W., & Laurence, S. 2016. Small-scale societies exhibit funamdental variation in the 1083 role of intentions in moral judgment. Proceedings of the National Academy of the Sciences1084 113:17, 4688-93. 1085 Bartels, D. M., & Pizarro, D. A. (2011). The mismeasure of morals: Antisocial personality traits 1086 predict utilitarian responses to moral dilemmas. Cognition, 121(1), 154–161.1087 Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models 1088 using lme4. ArXiv Preprint ArXiv:1406.5823.1089 Bauman, C. W., McGraw, A. P., Bartels, D. M., & Warren, C. (2014). Revisiting external 1090 validity: Concerns about trolley problems and other sacrificial dilemmas in moral 1091 psychology. Social and Personality Psychology Compass, 8(9), 536–554.1092 Białek, M. & De Neys, W. (2017). Dual processes and moral conflict: Evidence for 1093 deontological reasoners' intuitive utilitarian sensitivity. Judgment and Decision Making, 1094 12(2), 148–167.1095 Bennett, C. 2012. The Expressive Function of Blame. In J. Coates and N. Tognazzini (Eds.) 1096 Blame: Its Nature and Norms (Oxford: Oxford University Press).1097 Björklund, F. (2003). Differences in the justification of choices in moral dilemmas: Effects of 1098 gender, time pressure and dilemma seriousness. Scandinavian Journal of Psychology, 44(5), 1099 459–466.1100 Christensen, J. F., Flexas, A., Calabrese, M., Gut, N. K., & Gomila, A. (2014). Moral judgment 1101 VARIETIES OF MORAL JUDGMENT 68 reloaded: A moral dilemma validation study. Frontiers in Psychology, 5(JUL), 1–18. 1102 https://doi.org/10.3389/fpsyg.2014.006071103 Christensen, J. F., & Gomila, A. (2012). Moral dilemmas in cognitive neuroscience of moral 1104 decision-making: A principled review. Neuroscience & Biobehavioral Reviews, 36(4), 1105 1249–1264.1106 Conway, P., & Gawronski, B. (2013). Deontological and utilitarian inclinations in moral decision 1107 making: a process dissociation approach. Journal of Personality and Social Psychology, 1108 104(2), 216.1109 Crockett, M. (2013). Models of Morality. Trends in Cognitive Sciences 17:8, 363-66.1110 Cushman, F. (2008). Crime and punishment: Distinguishing the roles of causal and intentional 1111 analyses in moral judgment. Cognition, 108(2), 353–380.1112 Cushman, F., Young, L., & Greene, J. D. (2010). Our multi-system moral psychology: Towards a 1113 consensus view. The Oxford Handbook of Moral Psychology, 47–71.1114 Cushman, F., Young, L., & Hauser, M. (2006). The role of conscious reasoning and intuition in 1115 moral judgment: Testing three principles of harm. Psychological Science, 17(12), 1082–1116 1089.1117 Dahl, F. A., & Oftedal, G. (2019). Trolley Dilemmas Fail to Predict Ethical Judgment in a 1118 Hypothetical Vaccination Context. Journal of Empirical Research on Human Research 1119 Ethics, 14(1), 23–32.1120 Darley, J. M., & Shultz, T. R. (1990). Moral rules: Their content and acquisition. Annual Review 1121 of Psychology, 41(1), 525–556.1122 Djeriouat, H., & Trémolière, B. (2014). The Dark Triad of personality and utilitarian moral 1123 judgment: The mediating role of Honesty/Humility and Harm/Care. Personality and 1124 Individual Differences, 67, 11–16.1125 Driver, J. (1992). The suberogatory. Australasian Journal of Philosophy, 70(3), 286–295.1126 Everett, J. A. C., & Kahane, G. (2020). Switching Tracks? Towards a Multidimensional Model of 1127 Utilitarian Psychology. Trends in Cognitive Sciences.1128 Fauconnet, P. (2013). La responsabilité. Étude de sociologie. Presses Électroniques de France.1129 Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical 1130 power analysis program for the social, behavioral, and biomedical sciences. Behavior 1131 Research Methods, 39, 175-1911132 Gilligan, C. (1982). In a different voice: Psychological theory and women's development. 1133 Harvard University Press.1134 Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral 1135 VARIETIES OF MORAL JUDGMENT 69 foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental 1136 social psychology (Vol. 47, pp. 55–130). Elsevier.1137 Graham, J., Meindl, P., Beall, E., Johnson, K. M., & Zhang, L. (2016). Cultural differences in 1138 moral judgment and behavior, across and within societies. Current Opinion in Psychology, 1139 8, 125-130.1140 Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., & Cohen, J. D. (2001). An 1141 fMRI investigation of emotional engagement in moral judgment. Science, 293(5537), 2105–1142 2108. https://doi.org/10.1126/science.10628721143 Haidt, J. (2001). The emotional dog and its rational tail: a social intuitionist approach to moral 1144 judgment. Psychological Review, 108(4), 814.1145 Haidt, J., & Baron, J. (1996). Social roles and the moral judgement of acts and omissions. 1146 European Journal of Social Psychology, 26(2), 201-218.1147 Hofmann, W., Wisneski, D. C., Brandt, M. J., & Skitka, L. J. (2014). Morality in everyday life. 1148 Science, 345(6202), 1340–1343.1149 Holyoak, K. J., & Powell, D. (2016). Deontological coherence: A framework for commonsense 1150 moral reasoning. Psychological Bulletin, 142(11), 1179.1151 Hornik, K., Zeileis, A., & Meyer, D. (2006). The strucplot framework: visualizing multi-way 1152 contingency tables with vcd. Journal of Statistical Software, 17(3), 1–48.1153 Hsee, C. K., Loewenstein, G. F., Blount, S., & Bazerman, M. H. (1999). Preference reversals 1154 between joint and separate evaluations of options: a review and theoretical analysis. 1155 Psychological Bulletin, 125(5), 576.1156 Hurd, H. M. (1998). Duties beyond the call of duty. JRE, 6, 3.1157 Johnson, R.C., Danko, G.P., Huang, Y.-H., Park, J.Y., Johnson, S.B., & Nagoshi, C.T. 1987. 1158 Guilt, shame, and adjustment. Personatliy and Individual Differences 8:3, 357-64.1159 Kahane, G. (2015). Sidetracked by trolleys: Why sacrificial moral dilemmas tell us little (or 1160 nothing) about utilitarian judgment. Social Neuroscience, 10(5), 551–560. 1161 https://doi.org/10.1080/17470919.2015.10234001162 Kahane, G., Everett, J. A. C., Earp, B. D., Caviola, L., Faber, N. S., Crockett, M. J., & Savulescu, 1163 J. (2018). Beyond sacrificial harm: A two-dimensional model of utilitarian psychology. 1164 Psychological Review, 125(2), 131.1165 Koenigs, M., Young, L., Adolphs, R., Tranel, D., Cushman, F., Hauser, M., & Damasio, A. 2007. 1166 Damage to the prefrontal cortex increases utilitarian moral judgements. Nature 446, 908-11.1167 Koenigs, M., Kruepke, M., Zeier, J., & Newman, J. P. (2012). Utilitarian moral judgment in 1168 psychopathy. Social Cognitive and Affective Neuroscience, 7(6), 708–714.1169 VARIETIES OF MORAL JUDGMENT 70 Kohlberg, L., & Hersh, R. H. (1977). Moral development: A review of the theory. Theory into 1170 Practice, 16(2), 53–59.1171 Kohlberg, L., Levine, C., & Hewer, A. (1983). Moral stages: A current formulation and a 1172 response to critics.1173 Lee, J. J., & Gino, F. (2015). Poker-faced morality: Concealing emotions leads to utilitarian 1174 decision making. Organizational Behavior and Human Decision Processes, 126, 49–64.1175 Lenth, R. (2020). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package 1176 version 1.4.5. https://CRAN.R-project.org/package=emmeans1177 Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error 1178 and power in linear mixed models. Journal of Memory and Language, 94, 305-315.1179 McKenna, M. 2013. Conversation and Responsibility (Oxford: Oxford University Press).1180 Mikhail, J. (2011). Elements of moral cognition: Rawls' linguistic analogy and the cognitive 1181 science of moral and legal judgment. Cambridge University Press.1182 Moll, J., & de Oliveira-Souza, R. (2007). Moral judgments, emotions and the utilitarian brain. 1183 Trends in Cognitive Sciences, 11(8), 319–321.1184 Monin, B., Pizarro, D. A., & Beer, J. S. (2007). Deciding versus reacting: Conceptions of moral 1185 judgment and the reason-affect debate. Review of General Psychology, 11(2), 99–111.1186 Monroe, A. E., Dillon, K. D., Guglielmo, S., & Baumeister, R. F. (2018). It's not what you do, 1187 but what everyone else does: On the role of descriptive norms and subjectivism in moral 1188 judgment. Journal of Experimental Social Psychology, 77(March), 1–10. 1189 https://doi.org/10.1016/j.jesp.2018.03.0101190 Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van't 1191 Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in 1192 Cognitive Sciences, 23(10), 815–818.1193 Nozick, R. (1974). Anarchy, state, and utopia (Vol. 5038). New York: Basic Books.1194 O'Hara, R. E., Sinnott-Armstrong, W., & Sinnott-Armstrong, N. A. (2010). Wording effects in 1195 moral judgments. Judgment and Decision Making.1196 Patil, I., Zucchelli, M., Kool, W., Campbell, S., Fornasier, F., Calo, M., Silani, G., Cikara, M., & 1197 Cushman, F. 2020. Reasoning supports utilitarian resolutions to moral dilemmas across 1198 diverse measures. Journal of Personality and Social Psychology.1199 Perkins, A.M., Leonard, A.M., Weaver, K., Dalton, J.A., Mehta, M.A., Kumari, V., Williams, 1200 S.C.R., & Ettinger, U. 2013. A dose of ruthlessness: Interpersonal moral judgment is 1201 hardened by the anti-anxiety drug lorazepam. Journal of Experimental Psychology: General1202 142:3, 612-20.1203 VARIETIES OF MORAL JUDGMENT 71 Piaget, J. (2013). The moral judgment of the child. Routledge.1204 Pizarro, D., Uhlmann, E., & Salovey, P. (2003). Asymmetry in judgments of moral blame and 1205 praise: The role of perceived metadesires. Psychological Science, 14(3), 267–272.1206 Rest, J. R. (1992). Development in judging moral issues. U of Minnesota Press.1207 Rosen, B. (1980). Moral dilemmas and their treatment. Moral Development, Moral Education, 1208 and Kohlberg. B. Munsey (Ed).(1980), 232–263.1209 Scanlon, T.M. 2008. Moral Dimensions (Cambridge, MA: Harvard University Press).1210 Sidgwick, H. (2019). The methods of ethics. Good Press.1211 Simpson, E.L. 1974. Moral development research: A case study of scientific cultural bias. Human 1212 Development 17, 81-106.1213 Singmann, H., & Kellen, D. (2019). An Introduction to Mixed Models forExperimental 1214 Psychology. In D. H. Spieler & E. Schumacher (Eds.),New Methods in Cognitive 1215 Psychology. Psychology Press. 1216 Snarey, J.R. 1985. Cross-cultural universailty of social-moral development: A critical review of 1217 Kohlbergian research. Psychological Bulletin 97:2, 202-232.1218 R Core Team. (2019). R: A language and environment for statistical computing.1219 Szekely, R.D. & Miu, A.C. 2013. Incidental emotions in moral dilemmas: The influence of 1220 emotion regulation. Cognition & Emotion 29:1, 64-75.1221 Thomson, J. J. (1976). A defense of abortion. In Biomedical ethics and the law (pp. 39–54). 1222 Springer.1223 Uhlmann, E. L., Pizarro, D. A., & Diermeier, D. (2015). A person-centered approach to moral 1224 judgment. Perspectives on Psychological Science, 10(1), 72–81.1225 Westfall, J. (2015). PANGEA: Power analysis for general ANOVA designs. Unpublished 1226 manuscript. Available at http://jakewestfall. org/publications/pangea. pdf.1227 Walker, L.J. 1984. Sex differences in the development of moral reasoning: A critical review. 1228 Child Development 55:3, 677-691.1229 Young, I. M., (1985). Impartiality and the civic public: Some implications of feminist critiques of 1230 moral and political theory. Praxis International, 5(4), 381–401.