1 Introduction

Reflective equilibrium is widely used as a method to bring our intuitions into accord with theoretical principles. While reflective equilibrium is used in philosophy in general (Baumberger & Brun, 2021; Cath, 2016; Elgin, 1996), it is particularly seen as a reasonable method (among others) for justifying normative principles in political philosophy (Knight, 2017; Scanlon, 2003; Sinnott-Armstrong et al., 2010; Varner, 2012).

However, difficulties arise when we try to collect a wide range of people’s intuitions as a (provisional) basis for justification in terms of reflective equilibrium. The difficulties arise from the fact that we have to find intuitions latent in our moral practice. Thus, political philosophers do not simply try to gather intuitions: To evoke our intuitions in a controllable manner, they often use thought experiments that consider hypothetical scenarios, but purport to reflect real dilemmas we confront, such that the available options do not allow for desirable outcomes in practical contexts. However, there are also challenges to the use of thought experiments, such that the aroused intuitions are not reliable because the experimental settings are far removed from reality (Anscombe, 1957; Goodin, 1982; O’Neil, 1989). Political philosophers must ward off these challenges.

This paper shows that we can defuse the challenges to the use of thought experiments and thus overcome the difficulties of gathering and inducing intuitions to justify normative principles. We propose a twofold approach. (i) Use survey experiments (if available): Survey experiments are conducted by using a poll-style survey to investigate people’s intuitive views. We show that this allows for the use of thought experiments—more generally, the use of possible cases—to avoid the aforementioned challenges with available samples. (ii) Use a model selection method for coping with our intuitive reactions to thought experiments in a systematic manner. We propose the use of the Akaike Information Criterion (AIC) as a feasible way to select normative principles that would systematically account for the intuitions. This illustrates that a survey experiment and a model selection method can be viewed together as a methodological means of satisfying the epistemic desiderata implicit in reflective equilibrium. To show this, we conduct a survey experiment on two theories of distributive justice, prioritarianism and sufficientarianism. We then analyze the results using a model selection method. Through the combination of the survey experiment and AIC model selection, we are able to show that the refined sufficientarian principle, a widely supported principle of distributive justice, cannot be considered more plausible than the prioritarian principle. This tells us that some changes of certain intuitions revolving around sufficientarianism should be examined (separately), which is an important stage of reflective equilibrium. Thus, our proposed approach can contribute to the development of reflective equilibrium as a method of political philosophy.

2 Reflective equilibrium in practice

2.1 What is reflective equilibrium?

Reflective equilibrium serves as a standard method in philosophy for justifying judgements about what the world is, what the world should be, or what people (should) do. Roughly, reflective equilibrium is the end state of the deliberative process by which we revise our initial beliefs—i.e., initial judgments, or more simply intuitions—about a targeted subject such as justice or knowledge. While the characteristics of these different judgments differ in some important ways, they commonly possess non-inferential warrants for the claims of the targeted subjects.Footnote 1 In the context of this study, we can consider all these opinions as simply “intuitions”, where intuitions are understood to contribute to the coherent system of judgments in regard to a targeted subject.

Intuitions can thus be viewed as initial input to the end state of the deliberative process in reflective equilibrium. This is illustrated by Scanlon (2003, pp. 140–141), who describes reflective equilibrium as a three-stage process. In the first stage, a relevant set of intuitive judgments is identified. The second stage is to formulate or select theoretical principles that would systematically account for these judgments. In the third stage, any mismatches between the principles and the intuitive judgments are resolved by working back and forth between them.

There is an additional point worth noting: In the three-stage process of reflective equilibrium, the theoretical principles account for more than simply psychological facts or mechanisms. Rather, they are meant to cover the content of the intuitions, and the method is supposed to justify principles with the same kind of content. This element of reflective equilibrium requires credible adjustments in the equilibration process in a way that does justice to epistemic desiderata such as parsimony and generality (Brun, 2014, pp. 241–242; Baumberger & Brun, 2021, pp. 7933–7935). This treatment of intuitions suggests that the three-stage process goes beyond the mere achievement of coherence among intuitions as initial input: the adjustment process is required to meet a relevant set of the epistemic desiderata.Footnote 2 After all, the point of reflective equilibrium is the reconstructive process of justifying theoretical principles by adjusting intuitive judgments (Baumberger & Brun, 2021, pp. 7935–7938).

2.2 Two concerns about the use of intuitions in political philosophy

As is well-known, reflective equilibrium is influential in political philosophy. We can partly attribute this to the impact of John Rawls, who was the first to apply this method for the purpose of justifying his two principles of justice. Perhaps even more importantly, as a main subject of political philosophy, justice is distinctly normative—that is, justice has a guiding force that directs people in some collective and compulsory manner. This echoes Rawls’s (1971, pp. 5, 48–49) statement about the notion of reflective equilibrium: “it is a notion characteristic of the study of principles which govern actions shaped by self-examination” which denotes the rules that “determine a proper balance between competing claims to the advantages of social life”.Footnote 3 As such, political philosophers tend to regard our intuitions concerning justice (and morality) as being reflected in the formation of justified judgments about what we ought to do (Copp, 2012; Knight, 2017; Rawls, 1971; Scanlon, 2003).

In political philosophy, two main concerns have been raised with regard to the method of reflective equilibrium. First, some philosophers have expressed concern about manipulating intuitions as initial input to support principles of justice. The intuitions must be folk intuitions rather than those of political philosophers. Otherwise, intuitions cannot play a non-circular role in justifying the principles. In the practice of reflective equilibrium, philosophers examine theoretical principles to see whether they are systematically consistent with our intuitive judgments. In political philosophy, however, appeals to intuitions are based on anticipated folk intuitions, so that political philosophers take their own intuitions to reflect the intuitions of ordinary people. This is often seen in theories of distributive justice; distributive theorists presume that their anticipated intuitions can be equated with people’s reactions to the states of affairs that the theories evaluate in terms of the goodness and/or badness of the states of affairs.Footnote 4 This raises a question as to whether the theories are tested reflectively in light of folk intuitions: The anticipated intuitions may be merely those of the distributive theorists themselves. This may well motivate the skeptics of reflective equilibrium to suspect that the anticipated intuitions are “rigged up” to countenance their proposed theories in a circular manner with respect to full justification (Brandt, 1979, 1990; Hare, 1973; Singer, 1974, 2005).Footnote 5

Second, there have been doubts about the (legitimate) use of thought experiments that purport to elicit intuitions as initial input in ethics and political philosophy. Anscombe’s (1957) criticism of the use of thought experiments is the classic example: Thought experiments treat morally serious issues in such a flippant way as to dismiss the richness of philosophical discussions about normatively significant and sensitive issues involving people’s life and death. The richness in question may have bearings on a key feature of moral principles: They apply to the practical context in which real people confront dilemmatic issues and problems involving various factors concerning just institutions (Goodin, 1982) and moral dilemmas and vicissitudes in real life (O’Neil, 1989). In other words, there may be non-negligible gaps between abstract or purely hypothetical—often modally bizarre—cases (such as Nozick’s utility monster), and actual practical cases. Our intuitions prompted by the former, but not those prompted by the latter, are unreliable as reflective warrants for or against theoretical principles in political philosophy. Although this is not a direct challenge against any appeal to intuitions in reflective equilibrium, it does pose a fundamental problem with the use of the method in question; political philosophers often employ thought experiments including those of a purely hypothetical kind as possible cases. Since their arguments rely on folk intuitions about how to respond to such cases, and since intuitions as initial input are a starting point for reflective equilibrium, proponents of reflective equilibrium must respond to this challenge.

We can defuse the two concerns, however. In response to the first concern, the key point is that the intuitions collected must be folk intuitions, not the intuitions of the political philosophers themselves, in order to avoid the charge of manipulating the intuitions to uphold the principles of justice. In response to the second concern, philosophers have to provide a convincing method for the use of thought experiments in political philosophy. Let us explain this by examining a proposal for the proper use of thought experiments in political and moral philosophy. According to Walsh (2011, pp. 478–480), we can conduct thought experiments properly in light of the distinction between their legitimate and illegitimate uses. The illegitimate use of thought experiments is problematic because it ignores the richness of contexts in which the issues and problems arise, such that thought experiments are naïvely used to show the plausibility of theoretical principles in all logically possible worlds. Many (if not all) bizarre and purely hypothetical cases are meant to accommodate possible worlds far removed from reality (even if nomologically relevant), and it is this accommodation that skeptics question. However, this does not lead us to repudiate any appeal to possible cases. A use of thought experiments is legitimate if it caters to “the contingency of the problems with which applied ethicists characteristically deal” and does not try to “draw conclusions that attempt to accommodate a wide range of merely possible cases rather than the actual case before us” (Walsh, 2011, p. 478; emphasis original). If thought experiments are legitimately used, we may respond to the context-based challenge against the use of thought experiments.

2.3 Survey experiments and model selection

Up to this point, we agree with Walsh’s argument. However, it is not clear how we can legitimately draw on thought experiments in practice. In what follows, we suggest a way to ensure the relevance of appealing to possible cases: Possible cases can be treated in such a way as to satisfy the context-sensitivity of the issues and problems if we conduct survey experiments in which we analyze the results with a proper model selection.

Clearly, this proposal draws on folk intuitions, because the subjects of survey experiments are ordinary people. The survey experiments aim to ensure the sample size required for quantitative analysis and to facilitate the acquisition of a sample size representative of the population. Hence, the use of survey experiments can respond to the first challenge against the method of reflective equilibrium. As a first approximation, this proposal seems promising too because people’s intuitions obtained as the data through survey experiments may well reflect the contextual interactions of relevant factors and vicissitudes of life. This should help to establish a good start at the first stage of reflective equilibrium, and may well allay the concerns of skeptics about the use of thought experiments.

Nevertheless, the mere use of survey experiments is not sufficient for the legitimate use of possible cases. As a second approximation, we suggest the use of a proper model selection by means of the Akaike Information Criterion (AIC). Before exploring this point, let us see how difficult it is to single out particular cases relevant to an issue (such as abortion) in an ex ante manner. There are two problems at this point. First, apparently relevant cases often have disanalogies to the issue under consideration that are difficult to discern in advance. This renders the (apparently) intuitive fit with theoretical principles worthless. Second, apparently irrelevant cases could be of the type that steer our intuitions in certain directions. We may doubt that the cases at issue are legitimately excluded and thus reach an unconvincing verdict about the proper (un)fit between our intuitions and the proposed theoretical principles. As long as we cling to the method of cases, we must have a criterion for sorting out possible cases in an ex ante manner.

Can we establish such a criterion in an ex ante manner? We doubt it, because we can easily point out illegitimate inclusions and exclusions of possible cases if we carefully look through each particular case. We can raise famous examples of philosophical arguments relating to illegitimate inclusions and exclusions. Thomson’s (1971) violinist may be seen as an example of illegitimate inclusions: There might be disanalogies between unplugging an individual from the famous unconscious violinist and allowing the abortion of pregnant women who were raped (Davis, 1983). Foot’s (original) trolley problem has been questioned as an example of legitimate exclusions in order to support the killing-and-letting-die principle: The other cases as variants of the trolley problem cannot be covered by the killing-and-letting-die principle, such as the case where the trolley driver has just died and a passenger must decide whether to turn the trolley around (Thomson, 1976, 1985). To avoid misunderstandings, we do not underestimate the significance of the philosophical discussion over case-based explorations such as Thomson’s violinist and Foot’s trolly problem. Nor do we deny the possibility of establishing an ex ante criterion for sorting out possible cases in a relevant manner. We only claim that there are difficulties in establishing the proper criterion in an ex ante manner, given these famous examples and arguments, and that it may be feasible to have a different manner of dealing with possible cases. (Of course, ours is not the only pertinent way to handle possible cases.)

Our suggestion is as follows: We should use a model selection method for coping with possible cases in thought experiments in an ex post manner. That is, we propose to use AIC-based model selection as a practical method for reconciling intuitions with theoretical principles in a systematic manner, ensuring that the epistemic desiderata, particularly parsimony (simplicity), are honored in the practice of reflective equilibrium. While this method does not directly search for the relevant similarities of possible cases, it leads to the justification of a targeted principle by virtue of the systematic adjustments of intuitions that possible cases evoke; satisfying the epistemic desiderata of generality and (especially) parsimony would guarantee the legitimate use of possible cases, even if they may include irrelevant cases. Obviously, this method reflects the virtue of reflective equilibrium. Let us elaborate on this point in more detail below.

To begin with, let us explain why we suggest the use of AIC. While there are criteria that differ quantitatively from AIC (such as the Bayesian Information Criterion (BIC)), AIC is simply defined and can be seen as generalizable in a perspicuous manner (Forster & Sober, 1994, p. 2). Indeed, AIC is a widely used method for evaluating how well a model befits the obtained data. Roughly, AIC is calculated by the number of independent variables for constructing the model and by the maximal likelihood estimate of the model (i.e., the higher the likelihood of a model with few independent variables yielding the data, the better the model). According to AIC, the best model has the greatest predictive ability measured by estimated likelihood (P (data | model)). AIC aims to achieve the maximum degree of data fit by incorporating a minimal number of independent variables, in keeping with the condition of parsimony as a theoretical virtue for reflective equilibrium practice.

We can now state the philosophy underlying model selection as follows: Although multiple models are always maintained, they can be compared and ranked according to specific criteria and based on data. Notably, this is different from the Neyman–Pearson philosophy based on frequentism, which is a theory about which hypothesis should be accepted as true or rejected as false based on existing data. But why is AIC better than the other criteria in our argument?Footnote 6 To see this, let us focus on the comparison of AIC with BIC. While empirical studies often recommend competing models based on both criteria, our reason for choosing AIC over BIC is that the former measures the predictive accuracy of a model based on existing data, without the specific information that certain empirical observations carry (Sober, 2002, 2008; see Otsuka, 2021, p. 55). BIC measures the likelihood (posterior probability) of a model relative to existing data. Importantly, BIC does not necessarily follow the principle of Occam’s razor: the simpler the model, the better. By contrast, AIC recommends a model with higher predictive accuracy for future data, which plausibly favors simplicity. Thus, AIC-based model selection can be used as an ex post way of dealing with possible cases, which incorporates the epistemic virtue of parsimony in practicing reflective equilibrium.

Let us explain this point in more detail. According to AIC, we can comparatively evaluate how well each theoretical principle fits with the data obtained from survey experiments. There are two advantages of using AIC in this way. First, this method takes into account the limited availability of relevant cases that persist in survey experiments. AIC is used for the estimation of a model’s predictive performance within the confinement of the available data. Second, the statistical model selection can be viewed as a reasonable estimate of the maximally relevant set of independent variables that determine the predictive performance of a targeted theoretical principle. Importantly, from a reflective equilibrium perspective, the estimated independent variables coupled with the principle single out the significant features of the principle that pertain to people’s intuitions prompted by possible cases, whether relevant or irrelevant. More concretely, due to the emphasis on parsimony, the principles and parameters will not fit to every intuition regarding every case. We can thus hope that intuitions that are misled due to problematic cases are effectively not taken into account. Rather, the final model concentrates on a relatively small set of principles and parameters that capture the intuitive reactions of people overall well. In this way, the AIC-based model selection allows us to sidestep significant challenges with an ex ante case selection, dispensing with illegitimate inclusions and exclusions of relevant possible cases. The AIC-based model selection can be seen as a kind of ex post case selection.Footnote 7

We can now say that our proposal serves as the three-stage process of reflective equilibrium in which principles (i.e., models in this context) are adjusted based on intuitions that respond to possible cases in thought experiments. Intuitions as a starting point are input commitments for building or selecting a relevant principle. This is the first stage of reflective equilibrium, and it is carried out using survey experiments. The process of model selection can be seen as achieving the second stage of reflective equilibrium, in which we check whether the principle can systematically account for the intuitions. This is because its epistemic goal is to obtain the best and most parsimonious fit between a model and the data obtained. Since we can grasp the intuitions as the relevant data from survey experiments, a theoretical principle that would pass the AIC test can reasonably be regarded as the best—or at very least a better—model. More moderately, we can view a model that shows a bad (worse) AIC score as a less plausible model (compared to one which has a better score). For this reason, we can consider this use of AIC as a formal and feasible method to simplify the factors required in the second stage of reflective equilibrium.

Note that AIC model selection does not itself cover the third stage of reflective equilibrium: that any possible systematic disparity between our intuitions and the principle is resolved in such a way as to work back and forth between them.Footnote 8 In AIC model selection, a model is selected simply based on its high predictive performance for future data in a parsimonious manner. As shown above, this can be seen as the second stage of reflective equilibrium, in which the principle of justice is selected that would systematically account for the intuitions that are invoked by possible cases in thought experiments. However, this does not itself involve any change of some existing intuitive judgments that would be an expected result of going back and forth between principles and intuitions.Footnote 9 In our argument, what would be involved in this third stage of reflective equilibrium? Our answer is that the third stage is outside the statistical analysis in our argument: Any modification of certain intuitions should be done separately in light of the results of the survey experiment and the AIC model selection. This separate process can be better illustrated through the use of a test case, which is one of the tasks in the upcoming sections. The results of our test-case analysis will be presented in Sect. 4.3.

3 Testing the theories of distributive justice

In this and the next sections, we highlight the practicality and significance of the proposed practice of reflective equilibrium by referring to the debates over theories of distributive justice, in particular the debate between prioritarianism and sufficientarianism. Using a survey experiment and a model selection method, we show that the sufficienciantarian principle cannot be evaluated as a better theoretical principle than the prioritarian principle. This will serve to illustrate how the proposed method can be exercised as reflective equilibrium in practice. First, our method allows us to examine whether folk intuitions indeed fit well with sufficientarianism, such that many political philosophers would intuitively support the indisputability of a minimal threshold. Second, we may then consider the modification of some intuitions in light of the results of the statistical analysis, which is an important part of working back and forth between principles and intuitions (i.e., the third stage of reflective equilibrium).

3.1 Egalitarianism, prioritarianism, and sufficientarianism

Let us first introduce popular theories of distributive justice. Egalitarianism is certainly the best-known of these theories. Although egalitarianism has variants in terms of people getting the same, being treated the same, or being treated as equals (Arneson, 2013), egalitarianism as defined here is simply concerned with people being equally well-off. According to egalitarianism, it is intrinsically bad if some people are worse off than others.Footnote 10 Many (if not all) political philosophers argue that endorsing the badness of distributive inequalities simpliciter is unreasonable because it is objectionable to claim the intrinsic value of eliminating distributive inequalities by radically reducing the overall welfare of all people (Holtug, 1998; Parfit, 2000; Temkin, 2000). The so-called “leveling down objection” encourages many political philosophers to suggest two different theories: prioritarianism and sufficientarianism. Prioritarians assert that gains in well-being are more valuable, the worse off the person would otherwise be (Arneson, 2022; Hirose, 2015, chap. 4; Holtug, 2007, 2010, chap. 8; Parfit, 2000, pp. 101–106). According to sufficientarianism, whether a person has enough of some goods matters rather than being concerned with inequalities as such (Frankfurt, 1987; Gosseries, 2011; Hirose, 2015, chap. 5; Shields, 2020). These two theories of distributive justice have been seen as attractive alternatives to egalitarianism.

The appeal of the theories has been strengthened by respective refinements. In particular, sufficientarianism has been elaborated in an alluring manner. A refined version of sufficientarianism incorporates two “enough” thresholds, the minimal and maximal thresholds. The minimal threshold is the point where basic needs are met, whereas people above the maximal threshold have good (content) lives in terms of healthy and cultured living (Huseby, 2010, 2017). According to refined sufficientarianism, welfare shortfalls below the minimal threshold are simply (non-gradually) morally bad; welfare shortfalls between the two thresholds become (gradually) worse as their number and sizes increase (Huseby, 2017, p. 74). Refined sufficientarianism powerfully embraces the intuitive aspects of egalitarianism and prioritarianism. It endorses the complex evaluations of inequalities, in that it is not concerned with the badness of distributive inequalities simpliciter, but rather with people’s worse-off positions below the threshold(s). As such, the refined sufficientarian approach to distributive justice has gained popularity in political philosophy.Footnote 11

3.2 The method of cases and reflective equilibrium in practice: the example of the theories of distributive justice

Our interest lies in how sufficientarians attempt to compete with the prioritarian principle. As many political philosophers do, sufficientarians have appealed to people’s intuitions, but exactly how? There are two ways of appealing to intuitions. First, sufficientarians can point to the popularity of the maximizing principle with an income floor rather than Rawls’s difference principle—a prioritarian principle in the not-strict senseFootnote 12—among ordinary people. The popularity of moral principles restricted with a sufficientarian threshold was shown first by Frohlich and Oppenheimer’s (1992, pp. 58–60) laboratory experiments and later replicated in other studies (Bruner, 2018; Bruner & Lindauer, 2020; Lissowski et al., 1991; cf. Inoue et al., 2021). However, this appeal to people’s intuitions is not (explicitly) employed by political philosophers, because they have recourse to the method of cases by illustrating the (im)plausibility of competing theoretical principles. This is the second way of appealing to people’s intuitions.

To illustrate: Sufficientarians raise the so-called “Beverly Hills case” in order to show the plausibility of their sufficientarian proposal (and the untenability of prioritarianism). The Beverly Hills case is as follows: Suppose we must choose between benefiting the rich and benefiting the super-rich. While many ordinary people would intuitively not prioritize the rich in this case, the prioritarian position defies that intuition, espousing a policy of always benefitting the rich rather than the super-rich simply because the rich are worse off than the super-rich (Benbaji, 2006; Crisp, 2003). By contrast, as mentioned above, refined sufficientarianism appeals to people’s intuition in that the different thresholds are germane to the differential degree of moral urgency assigned to the states of affairs involving the thresholds. On this basis, Huseby (2010, p. 183) contends that sufficientarianism (with its use of a maximal threshold above which people should have content lives) can respond to Holtug’s (2007, pp. 149–150) case—which can be dubbed “the Left-Behind case”—against simple sufficientarianism, i.e., sufficientarianism with only one threshold: Only one individual at the threshold level is left behind in the boom of the world economy where everyone else enjoys much better-off positions than hers. Huseby (2010, p. 183) believes that while “[t]he relative deprivation of the person left behind in Holtug’s scenario, makes it very hard for her to be content in an environment in which she is considerably worse off than all others”, the maximal threshold of sufficientarianism would license her claim for “a level of welfare at which she would be content”. As such, sufficientarians use the method of cases against the prioritarian principle by appealing to people’s intuitive responses to the states that the theoretical principles (do not) endorse.

However, as argued in the previous section, it seems reasonable to ask whether ordinary people would truly find no plausibility in the proposed theoretical principles in possible cases. Nor can we ensure that the cases in question involve neither illegitimate inclusions nor illegitimate exclusions; there might be some disanalogies or a result of snubbing relevant cases. These concerns can reasonably be defused if we adopt the method of reflective equilibrium in practice. More specifically, we can conduct a survey experiment using apparently relevant cases (the first stage of reflective equilibrium) and then analyze people’s intuitive responses to the possible cases using AIC (the second stage of reflective equilibrium). We can then compare theoretical principles—here the prioritarian principle and the refined sufficientarian principle—to see which principle better fits the data obtained from a survey experiment; AIC ex post indicates which of the two principles better fits with people’s intuitions. In other words, we do not need to select possible cases before investigating the intuitive judgments. The third stage of reflective equilibrium involves attempts to resolve any inconsistencies between the selected principle and certain existing intuitions. In this context, the modification of some intuitions supporting, for example, sufficientarianism may be considered in light of the results of the survey experiment and the AIC model selection. While, here again, any such modifications must be conducted separately from the statistical analysis, they nonetheless play an important role in our approach, and distinguish it from the method of cases.

Let us further note the relevance of reflective equilibrium in practice to the debate over the two theories of distributive justice. In light of people’s intuitions about cases of an apparently relevant sort, we will require a sophisticated analysis of the distributive theories. As a test case to clarify the significance of our proposal of reflective equilibrium in practice, we will attempt to compare the refined sufficientarian principle that incorporates the two thresholds with the prioritarian principle. Specifically, we will investigate how sensitive ordinary people are to distributive inequalities in the presence of minimal and maximal thresholds. We can thereby evaluate the states of affairs involving the different types of inequalities and worse-off positions below or above the two threshold(s). From the viewpoint of reflective equilibrium in practice, it is important to examine (i) whether the state of equality is more supportable than unequal states, (ii) whether ordinary people tend to prioritize the worse off in apparently relevant cases (including the Beverly Hills case and the Left-Behind case), and (iii) whether the multiple thresholds concern the ordinary judgments in such cases. We can then compare prioritarianism with refined sufficientarianism in terms of whether they each systematically befit people’s intuitive judgments. Finally, we can consider modifying some intuitions related to the principles when one of the principles (models) is selected on the AIC.

4 Experiment

The aim of our experiment is fourfold. First, we want to find out whether ordinary people are generally egalitarian or not. Second, we investigate whether ordinary people react significantly to distributive inequalities in a variety of apparently relevant cases (including the Beverly Hills case and the Left-Behind case). Third, we examine which of the prioritarian principle and the refined sufficientarian principle fits better with systematically captured intuitions, presented with the two thresholds, based on the model selection method.Footnote 13 Fourth, we will consider modifying or eliminating certain intuitions in light of the selected principle. For this purpose, we conduct a survey experiment that focuses on how ordinary individuals react to distributive cases of an apparently relevant kind.

4.1 Method

4.1.1 Participants

A private research company (Rakuten Insight, Inc.) was asked to recruit respondents for our online experiment. These respondents had voluntarily applied to the research company to participate in experiments from their homes by answering questions via the Internet. The instructions were presented on their computer. After the experiment, the company randomly chose some of the respondents and paid them a fee of 500 yen (approximately US$5). The experiment took place from March 23rd to 29th 2022, with 2,707 subjects (1,352 females, 1,344 males, and NA 11). The age distribution was 12 respondents in their teens, 397 in their 20s, 398 in their 30s, 520 in their 40s, 471 in their 50s, 450 in their 60s, 455 in their 70s, and 4 in their 80s. Our sample roughly corresponds to the age and gender distribution of the actual population in Japan.

4.1.2 Design and materials

We constructed ten cases based on a between-subject design in which respondents were randomly assigned to each of the ten cases. Each case was described as a figure showing two distributive states of affairs that the respondents were requested to evaluate comparatively.Footnote 14 Four features were common to all cases. The first of these features was that each state had two persons, x and y. Second, the bar heights indicated the levels of each person’s income. Third, the first distributive state had an unequal distribution of income (person x was better off than person y), whereas the two persons enjoyed equal income in the second state. In both states, the sum of income is the same. Fourth, dashed lines were drawn to represent the two thresholds (maximal threshold: 4 million yen per year; minimal: 2.5 million yen per year) in every case. An income of 4 million yen was chosen as the maximal threshold because this is the average annual income in Japan. This can reasonably be seen as a threshold above which people can lead healthy and cultured lives. An income of 2.5 million yen was considered the minimal threshold because this is the approximate income qualifying for public assistance in Japan. This can reasonably be regarded as a threshold where the basic needs of people are met. The difference among the ten cases thus boiled down to whether each income was above or below the minimal and/or maximal thresholds.

The ten cases cover all potentially relevant differences. In Case 3, for example, the first (unequal) state (Society A) has person x, whose income is between the minimal and maximal thresholds, and person y, whose income is below the minimal threshold, while persons x and y have incomes between the minimal and maximal thresholds in the second (equal) state (Society B). The following figure was used in Case 3 (Fig. 1).

Fig. 1
figure 1

Unequal society A vs equal society B in Case 3

The ten cases can be described in terms of the two thresholds, such that:

Case 1: Society A (Lx > Ly), Society B (Lx = Ly).

Case 2: Society A (Mx > Ly), Society B (Lx = Ly).

Case 3: Society A (Mx > Ly), Society B (Mx = My).

Case 4: Society A (Mx > My), Society B (Mx = My).

Case 5: Society A (Ux > Ly), Society B (Lx = Ly).

Case 6: Society A (Ux > Ly), Society B (Mx = My).

Case 7: Society A (Ux > My), Society B (Mx = My).

Case 8 (the Left-Behind case): Society A (Ux > Ly), Society B (Ux = Uy).

Case 9: Society A (Ux > My), Society B (Ux = Uy).

Case 10 (the Beverly Hills case): Society A (Ux > Uy), Society B (Ux = Uy).

Note: L means a level of income below the minimal threshold. M means a level of income between the minimal and maximal thresholds. U means a level of income above the maximal threshold.

4.1.3 Procedure

Respondents completed the questions online, in their own time. Before beginning, they read a consent form and were assured of the anonymity of their data. After granting consent, they were presented with a written scenario and a figure (Case 7 is shown below as an example) and were asked to respond to a question:

The following figure shows two societies where two persons, x and y, can gain different levels of income. The blue bars indicate the levels of income (unit: yen) that x and y will get when society A is realized, whereas the black bars indicate the levels of income that x and y will get when society B is realized.

Moreover, the green dashed line represents enough income for one individual to lead a healthy and cultured life. The red dashed line represents enough income for one individual to lead a barely healthy and cultured life.

In this case, which set of incomes do you prefer, the blue bars or the black bars, and how strong is your preference? Please choose the option most close to your view.

figure a
  1. (1)

    Blue is strongly preferable.

  2. (2)

    Blue is preferable.

  3. (3)

    Blue is slightly preferable.

  4. (4)

    Both are on par.

  5. (5)

    Black is slightly preferable.

  6. (6)

    Black is preferable.

  7. (7)

    Black is strongly preferable.

This question is intended to capture folk intuitive reactions to distributive inequalities in the presence of the two thresholds. Their reactions to the presence of persons below and above each threshold will also be revealed through their answers to this question. We can reasonably expect that the results will elucidate how people’s intuitions are manifested in the face of different states.

4.2 Results

As Fig. 2 shows, respondents showed a general tendency to prefer Society B (an equal society) to Society A (an unequal society). However, Society A was preferred in some cases. Interestingly, there was a difference in people’s preferences between the Left-Behind case and the Beverly Hills case: In the former, more people preferred Society B over Society A, whereas Society A was more often preferred to Society B in the latter. That is, people preferred a state in which no one was left behind, but found inequalities above the threshold tolerable. This seems to illustrate that (1) simple egalitarianism may not suit people’s intuitions in the apparently relevant cases, and that (2) we cannot claim that people are either prioritarian or refined sufficientarian in light of the descriptive statistics based on the two cases; neither the preferences of prioritarians nor those of refined sufficientarians consistently matched the preferences of the respondents in these two cases. This, we believe, supports the use of a model selection method to examine which of these principles befits intuitions systematically captured in the apparently relevant cases.

Fig. 2
figure 2

*Note: EQ is the number and ratio of respondents who chose answers (5), (6) or (7). IND is the number and ratio of respondents who chose answer (4). UNEQ is the number and ratio of respondents who chose answers (1), (2) or (3)

People’s preferences in regard to the two distributive states in ten cases

Let us apply the model selection method for the comparative evaluation of prioritarian and sufficientarian principles (statistical models) in terms of the extent to which they fit well with people’s intuitions in the apparently relevant cases. First, we suggest two models, P1 and S1, both of which contain all independent variables and control variables.

P1 consists of the three important independent variables (for details, see the note in Table 1): poorLevel (the variable reflecting the income level of the poor person y in Society A), richLevel (the variable reflecting the income level of the rich person x in Society A) and middleLevel (the variable reflecting the income level of persons x and y in Society B). We also control for participants’ demographic characteristics in the model to ensure accurate and unbiased estimation of the important independent variables (see the note in Table 1). The negative value of the coefficient poorLevel (Coefpoor = –0.33; p < 0.001) indicates that if the income level of a poor person in Society A increased from below the minimal threshold to above the maximal threshold, it would very likely cause respondents to change their preference from Society B to Society A. The positive value of the coefficient of middleLevel (Coefmiddle = 0.13; p = 0.031) means that if the income level of average persons in Society B increased from below the minimal threshold to above the maximal threshold, it would very likely cause respondents to change their preference from Society A to Society B. Thus, comparing the values of these coefficients, the poor person has more impact than the person with average income. As we see it, P1 approximately represents the prioritarian principle such that P1 echoes intuitions changed in an egalitarian direction especially if we attended to the level of the poor, and also those changed (slightly weakly) in an egalitarian direction if we attended to the level of people in equal Society B.

Table 1 Coefficients of prioritarian model 1 (P1)

As Table 2 shows, S1 is composed of four important variables. The first two are DifMiserableLine (the dummy variable that was coded 1 if a distributive inequality between the rich person x and the poor person y existed across the minimal threshold in Society A, and otherwise 0) and DifSufficientLine (the dummy variable that was coded 1 if a distributive inequality between the rich person x and the poor person y existed across the maximal threshold in Society A, and otherwise 0). The second two are numDemiserablized (the variable representing the net number of persons who would move across the minimal threshold if unequal Society A were changed to equal Society B) and numSufficiencialized (the variable representing the net number of persons who would move across the maximal threshold if unequal Society A were changed into equal Society B). As can be seen, the two thresholds affected the intuitive judgments of respondents. We also control for participants’ demographic characteristics in the model to ensure accurate and unbiased estimation of the important independent variables (see the note in Table 2). Regarding DifMiserableLine, respondents tended to marginally prefer equal Society B to unequal Society A if the distributive inequality existed across the minimal threshold (coefficient odds ratio: Coefmin = 0.12; p = 0.096). The threshold sensitivity was confirmed distinctly regarding DifSufficientLine (coefficient odds ratio; Coefmax = 0.29; p < 0.001). These results show: First, ordinary people attend to the two thresholds, in that they would prefer egalitarian societies when distributive inequalities hold across the two thresholds; second and more importantly, ordinary people are more sensitive to the maximal threshold than the minimal one (Coefmax = 0.29 > Coefmin = 0.12, p = 0.096 and p < 0.001, respectively).Footnote 15 Thus, we can tentatively say that the statistical results shown in Table 2 barely support the refined sufficientarian principle.

Table 2 Coefficients of sufficientarian model 1 (S1)

Next, following the standard procedure of model selection, we reconstruct P2 and S2 by eliminating insignificant variables. Here, P1 and P2 represent the prioritarian principle and S1 and S2 the sufficientarian principle. Let us first examine the two prioritarian models.

In tandem with the usual model selection process, we build P2 because the coefficient of richLevel is not significant in P1, which would, very likely, indicate the irrelevance of that variable to a prioritarian statistical model. This selection also seems reasonable in the prioritarian theory because, according to prioritarianism, the worse off people are, the more morally important it is to benefit them. With these models in hand, although the AIC of P2 is almost the same as that of P1 shown in Table 3, we can regard P2 as a relevant prioritarian model accommodating the relevant independent variables.

Table 3 Coefficients of prioritarian models 1 and 2 (P1 and P2)

Now let us turn to the two sufficientarian models, S1 and S2.

Under the usual model selection process, S2 is built based only on the respondents’ reaction to the presence or absence of the distributive inequality across the two thresholds. This is because numDemiserablized and numSufficiencialized are not at all significant in S1. Under the sufficientarian theory, any transitional change from Society A to Society B is axiologically irrelevant: We ought to evaluate each state independently and compare them. As Table 4 shows, since the AIC of S2 is slightly smaller than that of S1, we can regard S2 as a more relevant sufficientarian model in terms of accommodating the relevant independent variables.

Table 4 Coefficients of sufficientarian models 1 and 2 (S1 and S2)

We are now in a position to evaluate P2 (the prioritarian statistical model) and S2 (the sufficientarian statistical model) in terms of their fit with the observed data that echo intuitions systematically captured in the apparently relevant cases. While the AIC of P2 is 9595.364, that of S2 is 9598.407. The smaller the value of AIC, the better fit is the model, and the gap is 3.043. This implies that, at the very least, we cannot claim that S2 is better-fit than P2.

5 Summary

The results of our experiment suggest that (1) while ordinary people tend to prefer equal societies rather than unequal societies, we cannot dismiss the tendency to prefer unequal societies in some cases; (2) ordinary people are more sensitive to the maximal threshold than the minimal one; and (3), most importantly, according to the AIC-model selection method, we can in no way claim that S2 is better-fit than P2. In light of these results, we can also state: (4) Some changes, or more specifically the elimination of people’s intuitive judgments revolving around refined sufficientarianism may be considered part of the process of going back and forth between principles and intuitions. No doubt these results are important not only because sufficientarianism has enjoyed a wide range of support from philosophers, but also because we cannot evaluate whether the refined sufficientarian principle is more plausible than the prioritarian principle by the method of cases, i.e., by appealing only to the Beverly Hills case and the Left-Behind case.Footnote 16

6 Discussion

In this paper, we have shown that survey experiments can be used to demonstrate whether theoretical principles are systematically consistent with people’s intuitions prompted by possible cases. In a case study, we have conducted an experiment on competing principles of distributive justice, refined sufficientarianism and prioritarianism. What is unique about this experiment is that its results differ from—indeed, contradict—what refined sufficientarians try to show using the two single cases, the Beverly Hills case and the Left-Behind case: We find that S2 (the sufficientarian statistical model) cannot be said to fit better than P2 (the prioritarian statistical model). In other words, the experiment shows that the systematically captured folk intuitions did not support what some philosophers and (perhaps) ordinary people find plausible in the context of distributive justice. We can thus say that the particular intuitions evoked by the two particular cases should not simply be taken to speak for or against the principles of distributive justice.

This illustrates the importance of going beyond the method of cases and practicing the method of reflective equilibrium through AIC-based model selection in three respects. First, since AIC measures the predictive accuracy of the model based on the existing data, we can use the model selection method to make reasonable judgments about possible cases. By limiting the complexity of the model, we can make the model easier to use to estimate what a society ought to do without much reducing its predictive power. Second, our appeal to AIC can be regarded as the central stage in the process of reflective equilibrium. That is, as the results of our experiment have shown, the model selected by the AIC scores may run counter to some intuitive reactions to individual cases. Since the goal of AIC is to evaluate the plausibility of relevantly stripped-down models, the proposed method systematizes folk intuitions to pave the way for the justification of the theoretical principle. Third, with the result of model selection, we may expect some intuitions (here, those concerning refined sufficientarianism) to be discarded (tentatively). The practice of reflective equilibrium involves moving back and forth in the system of judgments, which often requires changing intuitions as initial input. In our experiment, particular intuitions in the Beverly Hills case and the Left-Behind case are very likely to be revised in light of the better results of P2. To be sure, this is not a full endorsement of prioritarianism, but the results of our experiment provide an important challenge to our reliance on what refined sufficientarians and (some) ordinary people find plausible. With these findings in hand, our proposal can reasonably be thought of as a method of reflective equilibrium, in such a way as to distinguish reflective equilibrium essentially from the method of cases. In Rawls’s terms, reflective equilibrium is the systemic equilibrium “reached when someone has carefully considered alternative conceptions of justice and the force of various arguments for them,” as distinct from a mere state in which “general beliefs, first principles, and particular judgments are in line” (Rawls, 2001, pp. 30–31).

It may be objected that the proponents of reflective equilibrium, of whom Rawls is representative, are not committed to the idea that intuitions as initial input should be elicited through unrealistic thought experiments.Footnote 17 This seems to contradict our appeal to the model selection using survey experiments that involve possible cases of a not-always-common kind. We concede that this is true. But there is a camp of reflective equilibrium theorists who do not ignore the intuitions of ordinary people stimulated by unrealistic thought experiments (De Vries & Van Leeuwen, 2010; Savulescu et al., 2021). We can thus say that at least many proponents of reflective equilibrium cannot ignore what our argument shows.

Admittedly, our argument has limitations. First, our experiment is based on the major debate about theories of distributive justice. Therefore, it remains to be shown whether our argument has broader implications for other philosophical debates. Second, while our experiment has a large sample size, the vast majority of experimental works in philosophy use small samples. It is recognized that AIC is justified in a very general framework and, as a result, provides a crude estimator of the expected discrepancy: one that has a potentially high degree of negative bias in small-sample applications (Cavanaugh, 1997). While our proposal tells us that experiments in philosophy should employ a sample size that is as representative of the population as possible, it may not be broadly generalizable to common experiments in philosophy.Footnote 18 Third, since AIC is a measure of the relative quality of a statistical model for a given set of data, if all candidate models fit the data poorly, then AIC may be of no use. This is another limitation of generalizing the use of “imprecise” model selection as a method of reflexive equilibrium in practice.Footnote 19 In other words, the AIC-based ex post model selection cannot be an “all-purpose tool” in philosophy. However, unless we use it in the wrong way, we believe it is an effective tool.