Abstract
Responding to recent concerns about the reliability of the published literature in psychology and other disciplines, we formed the X-Phi Replicability Project (XRP) to estimate the reproducibility of experimental philosophy (osf.io/dvkpr). Drawing on a representative sample of 40 x-phi studies published between 2003 and 2015, we enlisted 20 research teams across 8 countries to conduct a high-quality replication of each study in order to compare the results to the original published findings. We found that x-phi studies – as represented in our sample – successfully replicated about 70% of the time. We discuss possible reasons for this relatively high replication rate in the field of experimental philosophy and offer suggestions for best research practices going forward.
Similar content being viewed by others
Change history
10 August 2018
A Correction to this paper has been published: https://doi.org/10.1007/s13164-018-0407-2
22 June 2021
A Correction to this paper has been published: https://doi.org/10.1007/s13164-021-00559-0
Notes
Meaning, the ratio of published studies that would replicate versus not replicate if a high-quality replication study were carried out.
In practice, it can be hard to determine whether the ‘sufficiently similar’ criterion has actually been fulfilled by the replication attempt, whether in its methods or in its results (Nakagawa and Parker 2015). It can therefore be challenging to interpret the results of replication studies, no matter which way these results turn out (Collins 1975; Earp and Trafimow 2015; Maxwell et al. 2015). Thus, our findings should be interpreted with care: they should be seen as a starting point for further research, not as a final statement about the existence or non-existence of any individual effect. For instance, we were not able to replicate Machery et al. (2004), but this study has been replicated on several other occasions, including in children (Li et al. 2018; for a review, see Machery, 2017a, chapter 2).
Note that this page is basically a mirror of the “Experimental philosophy” category of the Philpapers database.
There was some initial debate about whether to include papers reporting negative results, that is, results that failed to reject the null hypothesis using NHST. We decided to do so when such results were used as the basis for a substantial claim. The reason for this was that negative results are sometimes treated as findings within experimental philosophy. For example, in experimental epistemology, the observation of negative results has led some to reach the substantive conclusion that practical stakes do not impact knowledge ascriptions (see for example Buckwalter 2010; Feltz and Zarpentine 2010; Rose et al. in press). Accordingly, papers reporting ‘substantive’ negative results were not excluded.
Note, however, that the more ‘demanding’ paper that was originally selected was not discarded from our list, but remained there in case research teams with the required resources agreed to replicate these studies.
It should be noted that two other papers were replaced during the replication process. For the year 2006, Malle (2006) was replaced with Nichols (2006), given that the data kindly provided by author for Study 1 did not completely match the results presented in the original paper. Note that this does not mean that the effect reported in Malle (2006)'s Study 1 is not real, as subsequent studies in the same paper (as well as unpublished replications by the same author) found a similar effect. For the same year, Cushman et al. (2006) proved to be too resource-demanding after all and was replaced by Nahmias et al. (2006).
In this respect, our methodology differed from the OSC’s methodology, which instructed replication teams to focus on the papers’ last study.
Ns were computed not from the total N recruited for the whole study but from the number of data points included in the relevant statistical analysis.
For this analysis, studies for which power > 0.99 were counted as power = 0.99.
For studies reporting statistically significant results, we counted studies for which the original effect size was smaller than the replication 95% CI as successful replications on the ground that, given the studies’ original hypotheses, a greater effect size than originally expected constituted even more evidence in favor of these hypotheses. Of course, theoretically, this need not always be the case, for example if a given hypothesis makes precise predictions about the size of an effect. But for the studies we attempted to replicate, a greater effect size did indeed signal greater support for the hypothesis.
As pointed out by a reviewer on this paper, this criterion might even be considered too stringent. This is because, in certain circumstances in which no prediction is made about the size of an effect, a replication for which the 95% CI falls below the original effect size might still be considered as a successful replication, given that there is a significant effect in the predicted direction. Other ways of assessing replication success using effect sizes might include computing whether there is a statistical difference between the original and replication effect size (which would present the disadvantage of rewarding underpowered studies), or considering whether the replication effect size fell beyond the lower bound of the 95% CI of the original effect size (which returns a rate of 28 successful replications out of 34 original studies, i.e. 82.4%). Nevertheless, we decided to err on the side of stringency.
This analysis was done on the basis of Google Scholar’s citation count (as of March 23rd, 2018).
In a previous version of this manuscript, we reported 30 content-based studies and 5 demographic effects. However, helpful commentaries from readers, including Wesley Buckwalter, led us to revise our classification for Nichols (2004).
A low replication rate for demographic-based effects should not be taken as direct evidence for the nonexistence of variations between demographic groups. Indeed, out of 3 demographic-based effects that failed to replicate, one was a null effect, meaning that the failed replication found an effect where there was none in the original study.
Possible reasons for such transparency might be that (i) experimental philosophy is still a smaller academic community where individual researchers are likelier to be well known to each other and thus able and willing to hold each other accountable, and (ii) research resources (such as online survey accounts) used to be shared among researchers in the early days of the field, thus making questionable research practices more difficult to obscure (see Liao 2015).
One more cynical explanation would simply be that experimental philosophers are less well versed in into statistics, and that certain questionable research practices are only available to those who have sufficient skills in this area (i.e., the ability to take advantage of highly complex statistical models or approaches to produce ‘findings’ that are of questionable value).
For example, as of November 2017, the Wikipedia page for “Experimental Philosophy” dedicates a large part of its “Criticisms” section to the “Problem of Reproducibility,” arguing that “a parallel with experimental psychology is likely.”
References
Alfano, M. & Loeb, D. 2014. Experimental moral philosophy. In The Stanford Encyclopedia of Philosophy (Fall 2017 Edition), ed. E. N. Zalta. Retrieved from https://plato.stanford.edu/archives/fall2017/entries/experimental-moral/
American Statistical Association. 2016. American Statistical Association statement on statistical significance and p-values. American Statistical Association. Retrieved from http://www.amstat.org/asa/files/pdfs/P-ValueStatement.pdf
Amrhein, V., and S. Greenland. 2017. Remove, rather than redefine, statistical significance. Nature Human Behaviour. https://doi.org/10.1038/s41562-017-0224-0.
Anderson, S.F., K. Kelley, and S.E. Maxwell. 2017. Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science 28 (11): 1547–1562. https://doi.org/10.1177/0956797617723724.
Baker, M. 2016. Is there a reproducibility crisis? Nature 533 (1): 452–454.
Benjamin, D.J., J.O. Berger, M. Johannesson, B.A. Nosek, E.-J. Wagenmakers, R. Berk, et al. in press. Redefine statistical significance. Nature Human Behaviour. https://doi.org/10.1038/s41562-017-0189-z.
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... & Cesarini, D. 2018. Redefine statistical significance. Nature Human Behaviour 2 (1): 6
Boyle, G. J. (in press). Proving a negative? Methodological, statistical, and psychometric flaws in Ullmann et al. (2017) PTSD study. Journal of Clinical and Translational Research.
Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., … van ’t Veer, A. 2014. The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50 (supplement C), 217–224. https://doi.org/10.1016/j.jesp.2013.10.005, The Replication Recipe: What makes for a convincing replication?
Buckwalter, W. 2010. Knowledge isn’t closed on Saturday: A study in ordinary language. Review of Philosophy and Psychology 1 (3): 395–406. https://doi.org/10.1007/s13164-010-0030-3.
Button, K.S., J.P. Ioannidis, C. Mokrysz, B.A. Nosek, J. Flint, E.S. Robinson, and M.R. Munafò. 2013. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14 (5): 365–376. https://doi.org/10.1038/nrn3475.
Casler, K., L. Bickel, and E. Hackett. 2013. Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior 29 (6): 2156–2160. https://doi.org/10.1016/j.chb.2013.05.009.
Cesario, J. 2014. Priming, replication, and the hardest science. Perspectives on Psychological Science 9 (1): 40–48. https://doi.org/10.1177/1745691613513470.
Chambers, C., & Munafò, M. 2013. Trust in science would be improved by study pre-registration. The Guardian. Retrieved from http://www.theguardian.com/science/blog/2013/jun/05/trust-in-science-study-pre-registration
Champely, S. 2018. Package ‘pwr’. Retrieved from http://cran.r-project.org/package=pwr
Chang, A.C., and P. Li. 2015. Is economics research replicable? Sixty published papers from thirteen journals say “usually not”, Finance and Economics Discussion Series 2015–083. Washington, DC: Board of Governors of the Federal Reserve System.
Clavien, C., C.J. Tanner, F. Clément, and M. Chapuisat. 2012. Choosy moral punishers. PLoS One 7 (6): e39002. https://doi.org/10.1371/journal.pone.0039002.
Collins, H.M. 1975. The seven sexes: A study in the sociology of a phenomenon, or the replication of experiments in physics. Sociology 9 (2): 205–224. https://doi.org/10.1177/003803857500900202.
Colombo, M., Duev, G., Nuijten, M. B., & Sprenger, J. 2017. Statistical reporting inconsistencies in experimental philosophy. Retrieved from https://osf.io/preprints/socarxiv/z65fv
Cova, F. 2012. Qu’est-ce que la philosophie expérimentale ? In La Philosophie Expérimentale, ed. F. Cova, J. Dutant, E. Machery, J. Knobe, S. Nichols, and E. Nahmias. Paris: Vuibert.
Cova, F. 2016. The folk concept of intentional action: Empirical approaches. In A Companion to Experimental Philosophy, ed. W. Buckwalter and J. Sytsma, 121–141 Wiley-Blackwell.
Cova, F. 2017. What happened to the trolley problem? Journal of Indian Council of Philosophical Research 34 (3): 543–564.
Crandall, C.S., and J.W. Sherman. 2016. On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology 66 (Supplement C): 93–99. https://doi.org/10.1016/j.jesp.2015.10.002.
Cullen, S. 2010. Survey-driven romanticism. Review of Philosophy and Psychology 1 (2): 275–296. https://doi.org/10.1007/s13164-009-0016-1.
Cumming, G. 2013. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
Cushman, F., Young, L., & Hauser, M. 2006. The role of conscious reasoning and intuition in moral judgment testing three principles of harm. Psychological Science 17 (12): 1082–1089.
De Villiers, J., R.J. Stainton, and P. Szatmari. 2007. Pragmatic abilities in autism spectrum disorder: A case study in philosophy and the empirical. Midwest Studies in Philosophy 31 (1): 292–317. https://doi.org/10.1111/j.1475-4975.2007.00151.x.
Del Re, A. C. 2015. Package “compute.es”. Available from https://cran.r-project.org/web/packages/compute.es/compute.es.pdf Accessed 08 Apr 2018.
Doyen, S., O. Klein, D.J. Simons, and A. Cleeremans. 2014. On the other side of the mirror: Priming in cognitive and social psychology. Social Cognition 32 (Supplement): 12–32. https://doi.org/10.1521/soco.2014.32.supp.12.
Dunaway, B., A. Edmonds, and D. Manley. 2013. The folk probably do think what you think they think. Australasian Journal of Philosophy 91 (3): 421–441.
Earp, B.D. 2017. The need for reporting negative results – a 90 year update. Journal of Clinical and Translational Research 3 (S2): 1–4. https://doi.org/10.18053/jctres.03.2017S2.001.
Earp, B.D. in press. Falsification: How does it relate to reproducibility? In Key concepts in research methods, ed. J.-F. Morin, C. Olsson, and E.O. Atikcan. Abingdon: Routledge.
Earp, B.D., and D. Trafimow. 2015. Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology 6 (621): 1–11. https://doi.org/10.3389/fpsyg.2015.00621.
Earp, B.D., and D. Wilkinson. 2017. The publication symmetry test: a simple editorial heuristic to combat publication bias. Journal of Clinical and Translational Research 3 (S2): 5–7. https://doi.org/10.18053/jctres.03.2017S2.002.
Feltz, A., and F. Cova. 2014. Moral responsibility and free will: A meta-analysis. Consciousness and Cognition 30: 234–246. https://doi.org/10.1016/j.concog.2014.08.012.
Feltz, A., and C. Zarpentine. 2010. Do you know more when it matters less? Philosophical Psychology 23 (5): 683–706. https://doi.org/10.1080/09515089.2010.514572.
Fiedler, K., and N. Schwarz. 2016. Questionable research practices revisited. Social Psychological and Personality Science 7 (1): 45–52. https://doi.org/10.1177/1948550615612150.
Findley, M.G., N.M. Jensen, E.J. Malesky, and T.B. Pepinsky. 2016. Can results-free review reduce publication bias? The results and implications of a pilot study. Comparative Political Studies 49 (13): 1667–1703. https://doi.org/10.1177/0010414016655539.
Fraley, R.C., and S. Vazire. 2014. The N-pact factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. PLoS One 9 (10): e109019. https://doi.org/10.1371/journal.pone.0109019.
Franco, A., N. Malhotra, and G. Simonovits. 2014. Publication bias in the social sciences: Unlocking the file drawer. Science 345 (6203): 1502–1505. https://doi.org/10.1126/science.1255484.
Gilbert, D.T., G. King, S. Pettigrew, and T.D. Wilson. 2016. Comment on “estimating the reproducibility of psychological science”. Science 351 (6277): 1037–1037. https://doi.org/10.1126/science.aad7243.
Greene, J.D., R.B. Sommerville, L.E. Nystrom, J.M. Darley, and J.D. Cohen. 2001. An fMRI investigation of emotional engagement in moral judgment. Science 293 (5537): 2105–2108. https://doi.org/10.1126/science.1062872.
Greene, J. D., Morelli, S.A., Lowenberg, K., Nystrom, L.E., & Cohen, J.D. 2008. Cognitive load selectively interferes with utilitarian moral judgment. Cognition 107 (3): 1144–1154.
Grens, K. (2014). The rules of replication. Retrieved November 8, 2017, from http://www.the-scientist.com/?articles.view/articleNo/41265/title/The-Rules-of-Replication/
Heine, S.J., D.R. Lehman, K. Peng, and J. Greenholtz. 2002. What's wrong with cross-cultural comparisons of subjective Likert scales? The reference-group effect. Journal of Personality and Social Psychology 82 (6): 903–918. https://doi.org/10.1037//0022-3514.82.6.903.
Hendrick, C. 1990. Replications, strict replications, and conceptual replications: Are they important? Journal of Social Behavior and Personality 5 (4): 41–49.
Hitchcock, C., & Knobe, J. (2009). Cause and norm. The Journal of Philosophy 106 (11): 587–612.
Ioannidis, J.P.A. 2005. Why most published research findings are false. PLoS Medicine 2 (8): e124. https://doi.org/10.1371/journal.pmed.0020124.
John, L.K., G. Loewenstein, and D. Prelec. 2012. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science 23 (5): 524–532. https://doi.org/10.1177/0956797611430953.
Knobe, J. 2016. Experimental philosophy is cognitive science. In A companion to experimental philosophy, ed. J. Sytsma and W. Buckwalter, 37–52. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118661666.ch3.
Knobe, J. 2003a. Intentional action and side effects in ordinary language. Analysis 63 (279): 190–94.
Knobe, J. 2003b. Intentional action in folk psychology: An experimental investigation. Philosophical psychology 16 (2): 309–324.
Knobe, J., & Burra, A. 2006. The folk concepts of intention and intentional action: A cross-cultural study. Journal of Cognition and Culture 6 (1): 113–132.
Knobe, J. 2007. Experimental Philosophy. Philosophy Compass 2 (1): 81–92.
Knobe, J., and S. Nichols. 2008. Experimental philosophy. Oxford University Press.
Knobe, J., W. Buckwalter, S. Nichols, P. Robbins, H. Sarkissian, and T. Sommers. 2012. Experimental philosophy. Annual Review of Psychology 63 (1): 81–99. https://doi.org/10.1146/annurev-psych-120710-100350.
Lakens, D. 2013. Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology 4: 863.
Lakens, D., F.G. Adolfi, C. Albers, F. Anvari, M.A.J. Apps, S.E. Argamon, et al. 2017. Justify your alpha: a response to “Redefine statistical significance”. PsyArXiv. https://doi.org/10.17605/OSF.IO/9S3Y6.
Lam, B. 2010. Are Cantonese-speakers really descriptivists? Revisiting cross-cultural semantics. Cognition 115 (2), 320–329.
Lash, T.L., and J.P. Vandenbroucke. 2012. Should preregistration of epidemiologic study protocols become compulsory? Reflections and a counterproposal. Epidemiology 23 (2): 184–188. https://doi.org/10.1097/EDE.0b013e318245c05b.
Li, J., L. Liu, E. Chalmers, and J. Snedeker. 2018. What is in a name?: The development of cross-cultural differences in referential intuitions. Cognition 171: 108–111. https://doi.org/10.1016/j.cognition.2017.10.022.
Liao, S. 2015. The state of reproducibility in experimental philosophy Retrieved from http://philosophycommons.typepad.com/xphi/2015/06/the-state-of-reproducibility-in-experimental-philosophy.html
Locascio, J. 2017. Results blind science publishing. Basic and Applied Social Psychology 39 (5): 239–246. https://doi.org/10.1080/01973533.2017.1336093.
Machery, E., Mallon, R., Nichols, S., & Stich, S. P. 2004. Semantics, cross-cultural style. Cognition 92 (3): B1–B12.
Machery, E. 2017a. Philosophy within its proper bounds. Oxford: Oxford University Press.
Machery, E. 2017b. What is a replication? Unpublished manuscript.
Makel, M.C., & Plucker, J.A. 2014. Facts are more important than novelty: Replication in the educationsciences. Educational Researcher 43 (6), 304–316.
Malle, B. F. 2006. Intentionality, morality, and their relationship in human judgment. Journal of Cognition and Culture 6 (1), 87–112.
Maxwell, S.E., M.Y. Lau, and G.S. Howard. 2015. Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? The American Psychologist 70 (6): 487–498. https://doi.org/10.1037/a0039400.
McShane, B.B., Gal, D., Gelman, A., Robert, C., & Tackett, J L. (2017). Abandon Statistical Significance. arXiv preprint. arXiv:1709.07588.
Munafò, M.R., B.A. Nosek, D.V.M. Bishop, K.S. Button, C.D. Chambers, N.P. du Sert, et al. 2017. A manifesto for reproducible science. Nature Human Behaviour 1 (21): 1–9. https://doi.org/10.1038/s41562-016-0021.
Murtaugh, P.A. 2014. In defense of p-values. Ecology 95 (3): 611–617. https://doi.org/10.1890/13-0590.1.
Nadelhoffer, T., & Feltz, A. 2008. The actor–observer bias and moral intuitions: adding fuel to Sinnott-Armstrong’s fire. Neuroethics 1 (2): 133–144.
Nadelhoffer, T., Kvaran, T., & Nahmias, E. 2009. Temperament and intuition: A commentary on Feltz and Cokely. Consciousness and cognition, 18 (1): 351–355.
Nakagawa, S., and T.H. Parker. 2015. Replicating research in ecology and evolution: Feasibility, incentives, and the cost-benefit conundrum. BMC Biology 13 (88): 1–6. https://doi.org/10.1186/s12915-015-0196-3.
Nahmias, E., Morris, S.G., Nadelhoffer, T., & Turner, J. (2006). Is incompatibilism intuitive? Philosophy and Phenomenological Research 73 (1): 28–53.
Nichols, S. 2004. After objectivity: An empirical study of moral judgment. Philosophical Psychology 17 (1): 3–26.
Nichols, S. 2006. Folk intuitions on free will. Journal of Cognition and Culture 6 (1): 57–86.
Nichols, S., & Knobe, J. 2007. Moral responsibility and determinism: The cognitive science of folk intuitions. Nous 41 (4): 663–685.
Nosek, B.A., and T.M. Errington. 2017. Reproducibility in cancer biology: Making sense of replications. eLife 6: e23383. https://doi.org/10.7554/eLife.23383.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (in press). The preregistration revolution. Proceedings of the National Academy of Sciences.
O’Neill, E., and E. Machery. 2014. Experimental philosophy: What is it good for? In Current controversies in experimental philosophy, ed. E. Machery and E. O’Neill. New York: Routledge.
Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349 (6251): aac4716. https://doi.org/10.1126/science.aac4716.
Reuter, K. 2011. Distinguishing the appearance from the reality of pain. Journal of Consciousness Studies 18 (9-10): 94–109.
Rose, D., and D. Danks. 2013. In defense of a broad conception of experimental philosophy. Metaphilosophy 44 (4): 512–532. https://doi.org/10.1111/meta.12045.
Rose, D., Machery, E., Stich, S., Alai, M., Angelucci, A., Berniūnas, R., … & Cohnitz, D. (in press). Nothing at stake in knowledge. Noûs.
Rosenthal, R. 1979. The file drawer problem and tolerance for null results. Psychological Bulletin 86 (3): 638–641. https://doi.org/10.1037/0033-2909.86.3.638.
Schmidt, S. 2009. Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology 13 (2): 90–100. https://doi.org/10.1037/a0015108.
Scott, S. 2013. Pre-registration would put science in chains. Retrieved July 29, 2017, from https://www.timeshighereducation.com/comment/opinion/pre-registration-would-put-science-in-chains/2005954.article
Simmons, J.P., L.D. Nelson, and U. Simonsohn. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22 (11): 1359–1366. https://doi.org/10.1177/0956797611417632.
Simonsohn, U., L.D. Nelson, and J.P. Simmons. 2014. P-curve: A key to the file-drawer. Journal of Experimental Psychology: General 143 (2): 534.
Sprouse, J., & Almeida, D. 2017. Setting the empirical record straight: Acceptability judgments appear to be reliable, robust, and replicable. Behavioral and Brain Sciences 40: e311.
Stroebe, W., and F. Strack. 2014. The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science 9 (1): 59–71.
Trafimow, D., and B.D. Earp. 2017. Null hypothesis significance testing and type I error: The domain problem. New Ideas in Psychology 45: 19–27.
Weinberg, J.M., S. Nichols, and S. Stich. 2001. Normativity and epistemic intuitions. Philosophical Topics 29 (1/2): 429–460.
Woolfolk, R.L. 2013. Experimental philosophy: A methodological critique. Metaphilosophy 44 (1–2): 79. https://doi.org/10.1111/meta.12016.
Young, N.S., Ioannidis, J.P., & Al-Ubaydli, O. 2008. Why current publication practices may distort science. PLoS medicine 5 (10): e201.
Zalla, T., & Leboyer, M. 2011. Judgment of intentionality and moral evaluation in individuals with high functioning autism. Review of Philosophy and Psychology 2 (4), 681–698.
Acknowledgments
This project could not have been possible without the financial support of multiple organizations. Florian Cova’s work on this project was supported by a grant from the Cogito Foundation (Grant No. S-131/13, “Towards an Experimental Philosophy of Aesthetics”).
Brent Strickland’s work was supported by two grants from the Agence Nationale de la Recherche (Grants No. ANR-10-IDEX-0001-02 PSL*, ANR-10-LABX-0087 IEC).
Matteo Colombo, Noah van Dongen, Felipe Romero and Jan Sprenger’s work was supported by the European Research Council (ERC) through Starting Grant. No. 640638 (“Making Scientific Inferences More Objective”).
Rodrigo Diaz and Kevin Reuter would like to acknowledge funding from the Swiss National Science Foundation, Grant No. 100012_169484.
Antonio Gaitán Torres and Hugo Viciana benefited from funding from the Ministerio de Economía y Competitividad for the project “La constitución del sujeto en la interacción social” (Grant No. FFI2015-67569-C2-1-P & FFI2015-67569-C2-2-P).
José Hernández-Conde carried out his work as a Visiting Scholar at the University of Pittsburgh’s HPS Department. He was financially supported by a PhD scholarship and mobility grant from the University of the Basque Country, and by the Spanish Ministry of Economy and Competitiveness research project No. FFI2014-52196-P. His replication research was supported by the Pittsburgh Empirical Philosophy Lab.
Hanna Kim’s work was supported by the Pittsburgh Empirical Philosophy Lab.
Shen-yi Liao’s work was supported by the University of Puget Sound Start-up Funding.
Tania Moerenhout carried out her work as a Visiting Researcher at the Center for Bioethics and Health Law, University of Pittsburgh, PA (Aug 2016-July 2017).
Aurélien Allard, Miklos Kurthy, and Paulo Sousa are grateful to Rashmi Sharma for her help in the replication of Knobe & Burra (2006), in particular for her help in translating the demographic questions from English to Hindi.
Ivar Hannikainen and Florian Cova would like to thank Uri Simonsohn for his help in discussing the meaning and best interpretation of p-curves.
Finally, we would like to thank all the authors of original studies who accepted to take the time to answer our questions, share their original material and data, and discuss the results of our replication attempts with us.
Author information
Authors and Affiliations
Corresponding author
Additional information
OSF Repository
Details, methods and results for all replications can be found online at https://osf.io/dvkpr/
Softwares
Most of the analyses reported in this manuscript were conducted using the R {compute.es} and {pwr} packages (Champely 2018; Del Re 2015). We are also indebted to Lakens’ R2D2 sheet (Lakens 2013).
The original article has been revised: Appendix 1 has been corrected.
The original version of this article was revised: Footnote 8 was incomplete in the original version of this article. The original article has been corrected.
Appendices
Appendix 1. List of Studies Selected for Replication
(Crossed-out studies are studies who were planned for replications but did not get replicated.)
*2003
Most cited: Knobe, J. (2003a). Intentional action and side effects in ordinary language.
Analysis, 63(279), 190–194. [Study 1] (Content-based, successful, osf.io/hdz5x/).
Random: Knobe, J. (2003b). Intentional action in folk psychology: An experimental investigation. Philosophical Psychology, 16(2), 309–324. [Study 1] (Content-based, successful, osf.io/78sqa/).
*2004
Most cited: Machery, E., Mallon, R., Nichols, S., & Stich, S. P. (2004). Semantics, cross-cultural style. Cognition, 92(3), B1-B12. (Demographic effect, successful, osf.io/qdekc/)
-
Replacement: Knobe, J. (2004). Intention, intentional action and moral considerations. Analysis, 64(282), 181–187. [Study 1] (Content-based, successful, osf.io/ka5wv/)
Random 1: Nadelhoffer, T. (2004). Blame, Badness, and Intentional Action: A Reply to Knobe and Mendlow. Journal of Theoretical and Philosophical Psychology, 24(2), 259–269. (Content-based, unsuccessful, osf.io/w9bza/).
Random 2: Nichols, S. (2004). After objectivity: An empirical study of moral judgment. Philosophical Psychology, 17(1), 3–26. [Study 3] (Content-based, successful, osf.io/bv4ep/).
*2005
Most cited:Nahmias, E., Morris, S., Nadelhoffer, T., & Turner, J. (2005). Surveying freedom: Folk intuitions about free will and moral responsibility. Philosophical Psychology, 18(5), 561–584. [Study 1] (Content-based, successful, osf.io/4gvd5/).
Random 1: McCann, H. J. (2005). Intentional action and intending: Recent empirical studies. Philosophical Psychology, 18(6), 737–748. [Study 1] (Context-based, null effect, successful, osf.io/jtsnn/).
Random 2: Nadelhoffer, T. (2005). Skill, luck, control, and intentional action. Philosophical Psychology, 18(3), 341–352. [Study 1] (Content-based, successful, osf.io/6ds5e/).
*2006
Most cited:
Cushman, F., Young, L., & Hauser, M. (2006). The role of conscious reasoning and intuition in moral judgment testing three principles of harm. Psychological Science, 17(12), 1082–1089.
-
Replacement: Nahmias, E., Morris, S. G., Nadelhoffer, T., & Turner, J. (2006). Is incompatibilism intuitive? Philosophy and Phenomenological Research, 73(1), 28–53. [Study 2] (Content-based, unsuccessful, osf.io/m8t3k/)
Random 1: Knobe, J., & Burra, A. (2006). The folk concepts of intention and intentional action: A cross-cultural study. Journal of Cognition and Culture, 6(1), 113–132. (Content-based, successful, osf.io/p48sa/)
-
Replacement:
Malle, B. F. (2006). Intentionality, morality, and their relationship in human judgment. Journal of Cognition and Culture, 6(1), 87–112. -
Replacement: Nichols, S. (2006). Folk intuitions on free will. Journal of Cognition and Culture, 6(1), 57–86. [Study 2] (Content-based, successful, osf.io/8kf3p/)
Random 2: Nadelhoffer, T. (2006). Bad acts, blameworthy agents, and intentional actions: Some problems for juror impartiality. Philosophical Explorations, 9(2), 203–219. (Content-based, successful, osf.io/bv42c/).
*2007
Most cited: Nichols, S., & Knobe, J. (2007). Moral responsibility and determinism: The cognitive science of folk intuitions. Nous, 41(4), 663–685. [Study 1] (Content-based, successful, osf.io/stjwg/).
Random 1: Nahmias, E., Coates, D. J., & Kvaran, T. (2007). Free will, moral responsibility, and mechanism: Experiments on folk intuitions. Midwest studies in Philosophy, 31(1), 214–242. (Content-based, successful, osf.io/pjdkg/).
Random 2: Livengood, J., & Machery, E. (2007). The folk probably don’t think what you think they think: Experiments on causation by absence. Midwest Studies in Philosophy, 31(1), 107–127. [Study 1] (Content-based, successful, osf.io/7er6r/).
*2008
Most cited: Greene, J. D., Morelli, S. A., Lowenberg, K., Nystrom, L. E., & Cohen, J. D. (2008). Cognitive load selectively interferes with utilitarian moral judgment. Cognition, 107(3), 1144–1154. (Context-based, unsuccessful, but with deviations from the original procedure, see osf.io/yb38c).
Random 1: Gonnerman, C. (2008). Reading conflicted minds: An empirical follow-up to Knobe and Roedder. Philosophical Psychology, 21(2), 193–205. (Content-based, successful, osf.io/wy8ab/).
Random 2: Nadelhoffer, T., & Feltz, A. (2008). The actor–observer bias and moral intuitions: adding fuel to Sinnott-Armstrong’s fire. Neuroethics, 1(2), 133–144. (Context-based, unsuccessful, osf.io/jb8yp/).
*2009
Most cited: Hitchcock, C., & Knobe, J. (2009). Cause and norm. The Journal of Philosophy, 106(11), 587–612. (Content-based, successful, osf.io/ykt7z/).
Random 1: Roxborough, C., & Cumby, J. (2009). Folk psychological concepts: Causation. Philosophical Psychology, 22(2), 205–213. (Content-based, unsuccessful, osf.io/5eanz/).
Random 2: Nadelhoffer, T., Kvaran, T., & Nahmias, E. (2009). Temperament and intuition: A commentary on Feltz and Cokely. Consciousness and Cognition, 18(1), 351–355. (Demographic effect, null effect, unsuccessful, osf.io/txs86/).
*2010
Most cited: Beebe, J. R., & Buckwalter, W. (2010). The epistemic side-effect effect. Mind & Language, 25(4), 474–498. (Content-based, successful, osf.io/n6r3b/)
Random 1:
Lam, B. (2010). Are Cantonese-speakers really descriptivists? Revisiting crosscultural semantics. Cognition, 115(2), 320–329.
-
Replacement: Sytsma, J., & Machery, E. (2010). Two conceptions of subjective experience. Philosophical Studies, 151(2), 299–327. [Study 1] (Demographic effect, successful, osf.io/z2fj8/)
Random 2: De Brigard, F. (2010). If you like it, does it matter if it’s real? Philosophical Psychology, 23(1), 43–57. (Content-based, successful, osf.io/cvuwy/).
*2011
Most cited: Alicke, M. D., Rose, D., & Bloom, D. (2011). Causation, norm violation, and culpable control. The Journal of Philosophy, 108(12), 670–696. [Study 1] (Content-based, unsuccessful, osf.io/4yuym/)
Random 1:
Zalla, T., & Leboyer, M. (2011). Judgment of intentionality and moral evaluation in individuals with high functioning autism. Review of Philosophy and Psychology, 2(4), 681–698.
-
Replacement: Reuter, K. (2011). Distinguishing the Appearance from the Reality of Pain. Journal of Consciousness Studies, 18(9–10), 94–109. (Observational data, successful, osf.io/3sn6j/)
Random 2: Sarkissian, H., Park, J., Tien, D., Wright, J. C., & Knobe, J. (2011). Folk moral relativism. Mind & Language, 26(4), 482–505. [Study 1] (Content-based, successful, osf.io/cy4b6/).
*2012
Most cited: Paxton, J. M., Ungar, L., & Greene, J. D. (2012). Reflection and reasoning in moral judgment. Cognitive Science, 36(1), 163–177. [Study 1] (Context-based, unsuccessful, osf.io/ejmyw/).
Random 1: Schaffer, J., & Knobe, J. (2012). Contrastive knowledge surveyed. Noûs, 46(4), 675–708. [Study 1] (Content-based, successful, osf.io/z4e45/).
Random 2: May, J., & Holton, R. (2012). What in the world is weakness of will? Philosophical Studies, 157(3), 341–360. [Study 3] (Content-based, successful, osf.io/s37h6/).
*2013
Most cited: Nagel, J., San Juan, V., & Mar, R. A. (2013). Lay denial of knowledge for justified true beliefs. Cognition, 129(3), 652–661. (Content-based, successful, osf.io/6yfxz/).
Random 1: Beebe, J. R., & Shea, J. (2013). Gettierized Knobe effects. Episteme, 10(3), 219. (Content-based, successful, osf.io/k89fc/).
Random 2: Rose, D., & Nichols, S. (2013). The lesson of bypassing. Review of Philosophy and Psychology, 4(4), 599–619. [Study 1] (Content-based, null effect, successful, osf.io/ggw7c/).
*2014
Most cited: Murray, D., & Nahmias, E. (2014). Explaining away incompatibilist intuitions. Philosophy and Phenomenological Research, 88(2), 434–467. [Study 1] (Content-based, successful, osf.io/rpkjk/).
Random 1: Grau, C., & Pury, C. L. (2014). Attitudes towards reference and replaceability. Review of Philosophy and Psychology, 5(2), 155–168. (Demographic effect, unsuccessful, osf.io/xrhqe/).
Random 2: Liao, S., Strohminger, N., & Sripada, C. S. (2014). Empirically investigating imaginative resistance. The British Journal of Aesthetics, 54(3), 339–355. [Study 2] (Content-based, successful, osf.io/7e8hz/).
*2015
Most cited: Buckwalter, W., & Schaffer, J. (2015). Knowledge, stakes, and mistakes. Noûs, 49(2), 201–234. [Study 1] (Content-based, successful, osf.io/2ukpq/).
Random 1: Björnsson, G., Eriksson, J., Strandberg, C., Olinder, R. F., & Björklund, F. (2015). Motivational internalism and folk intuitions. Philosophical Psychology, 28(5), 715–734. [Study 2] (Content-based, successful, osf.io/d8uvg/).
Random 2: Kominsky, J. F., Phillips, J., Gerstenberg, T., Lagnado, D., & Knobe, J. (2015). Causal superseding. Cognition, 137, 196–209. [Study 1] (Content-based, successful, osf.io/f5svw/).
Appendix 2. Pre-replication form
Reference of the paper: ….
Replication team: ….
*Which study in the paper do you replicate? ….
*If it is not the first study, please explain your choice: ….
*In this study, what is the main result you will focus on during replication? Please give all relevant statistical details present in the paper: ….
*What is the corresponding hypothesis? ….
*What is the corresponding effect size? ….
*Was the original effect size:
-
Explicitly reported in the original paper
-
Not explicitly reported in the original paper, but inferable from other information present in the original paper
-
Not inferable from information present in the original paper.
*What is the corresponding confidence interval (if applicable)?
*Was the original confidence interval:
-
Explicitly reported in the original paper
-
Not explicitly reported in the original paper, but inferable from other information present in the original paper
-
Not inferable from information present in the original paper.
*From which population was the sample used in the original study drawn? (Which country, language, students/non-students, etc.)
*Was the nature of the original population:
-
Explicitly reported in the original paper
-
Not explicitly reported in the original paper, but inferable from other information present in the original paper
-
Not inferable from information present in the original paper.
*What was the original sample size (N): ….
*Was the original sample size:
-
Explicitly reported in the original paper
-
Not explicitly reported in the original paper, but inferable from other information present in the original paper
-
Not inferable from information present in the original paper.
*Does the study involve a selection procedure (e.g. comprehension checks)? (YES/NO).
*If YES, describe it briefly: ….
*Were all the steps of the selection procedure (including, e.g., comprehension checks):
-
Explicitly reported in the original paper
-
Not explicitly reported in the original paper, but inferable from other information present in the original paper
-
Not inferable from information present in the original paper.
*Overall, would you say that the original paper contained all the information necessary to properly conduct the replication (YES/NO).
*If NO, explain what information was lacking: ….
Power analysis and required sample size:
(Please, describe briefly the power analysis you conducted to determine the minimum required sample size. If the original effect is a null effect, just describe the required sample size you obtained by doubling the original sample size.)
Projected sample size:
(Please, describe the actual sample size you plan to use in the replication.)
Appendix 3. Post-replication form
Reference of the paper: ….
Replication team: ….
Methods
Power analysis and required sample size:
Please, describe briefly the power analysis you conducted to determine the minimum required sample size. If the original effect is a null effect, just describe the required sample size you obtained by doubling the original sample size.)
Actual sample size and population:
(Describe the number of participants you actually recruited, and the nature of the population they are drawn from. Indicate whether the number of participants you actually recruited matched the one you planned on the OSF pre-registration. Describe briefly any difference between the population you drew your sample from and the population the original study drew its sample from.)
Materials and Procedure:
(Describe the procedure you employed for the replication, like you would in the Methods section of a paper. At the end, indicate all important differences between the original study and replication, e.g. language,)
Results
Data analysis - Target effect:
(Focusing on the effect you singled out as the target effect for replication, describe the results you obtained. Then describe the statistical analyses you performed, detailing the effect size, the significance of the effect and, when applicable, the confidence interval.)
Data analysis - Other effects:
(If the original study included other effects and you performed the corresponding analyses, please, describe them in this section.)
Data analysis - Exploratory Analysis:
(If you conducted additional analyses that were absent from the original study, feel free to report them here. Just indicate whether they were planned in the OSF pre-registration, or exploratory.)
Discussion
Success assessment:
(Did you succeed in replicating the original result? If applicable, does the original team agree with you?)
Rights and permissions
About this article
Cite this article
Cova, F., Strickland, B., Abatista, A. et al. Estimating the Reproducibility of Experimental Philosophy. Rev.Phil.Psych. 12, 9–44 (2021). https://doi.org/10.1007/s13164-018-0400-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13164-018-0400-9