1 Constructive and critical projects in ordinary language philosophy

Ordinary language philosophy involves both constructive and critical projects. The constructive project consists of observations about how philosophically significant expressions are ordinarily used and uses those observations to support conclusions about non-linguistic aspects of the world. Austin (1957, p. 8) describes the methodology of ordinary language philosophy as follows:

When we examine what we should say when, what words we should use in what situations, we are looking again not merely at words (or ‘meanings’, whatever they may be) but also at the realities we use the words to talk about: we are using a sharpened awareness of words to sharpen our perception of, though not as the final arbiter of, the phenomena.

The constructive project is exemplified by J. L. Austin’s attempt to clarify the problems of “Freedom” and “Responsibility” through an investigation of the subtly different ways we use the expressions “by mistake”, “by accident”, “intentionally” and “deliberately” (Austin 1957). Austin’s approach to the problem of knowledge of other minds through the examination of parallels between the use of “I know” and “I promise” (Austin 1946), especially as that approach has been reconstructed by Lawlor (2013), is another example of the constructive project.

Contemporary adherents of the constructive project include both armchair and experimental philosophers. For example, contextualists about knowledge, like DeRose (2009) and Ludlow (2005), draw conclusions about the nature of knowledge (at least partly) on the basis of observations about the ordinary use of the word “knows”, and experimental philosophers use empirical methods developed in the cognitive sciences to investigate philosophically significant concepts (knowledge, e.g.), and, assuming the concepts are veridically applied, the parts of reality that those concepts represent.Footnote 1 Pinillos (2012), for example, begins an experimental investigation of theories of knowledge by saying:

      The central methodological assumption I will be adopting is that information about the behavior and mental states of ordinary people, including careful observation of their deployment of the word ‘knowledge’, can be relevant in assessing [competing theories of knowledge].

      I do not believe that this is an exotic assumption.

While the assumptions underlying the constructive project of ordinary language philosophy may not be “exotic”, the critical project, in contrast, has not found many advocates in contemporary philosophy.Footnote 2 The critical project in ordinary language philosophy involves the charge that philosophers produce “nonsense” or are led to produce intractable philosophical problems when they depart from or ignore the way language is ordinarily used. Classic examples of the critical project include Wittgenstein’s (1969, Sect. 10) remark that when one is sitting at a sick man’s bedside, looking attentively into his face, neither the question “I know that a sick man is lying here?” nor the assertion “I don’t know that there is a sick man lying here” makes sense and Austin’s (1962, p. 15) argument that the word “directly” has been “stretched” by philosophers in discussions of perception to the point that it has become “meaningless”.Footnote 3

One of the rare contemporary advocates of the critical project is Baz (2012a, b, 2014, 2015, 2016, 2018), who argues that “the prevailing program” in contemporary analytic philosophy is fundamentally flawed, and that we don’t actually understand the content of what we are being asked when confronted with philosophical thought experiments and asked to judge whether or not someone knows some proposition, or whether some knowledge ascription is true or false. Examples of such thought experiments include Gettier cases and contextualist “bank” scenarios. Because we don’t understand what we are being asked in such thought experiments, any way we respond will be “unsystematic” Baz (2012b, p. 46), and will provide only an illusory foundation for philosophical theories. The strategy of this paper is to develop a less radical and more defensible version of Baz’s argument from ordinary language. In the next section, I spell out Baz’s radical version of the critical project of ordinary language philosophy, in Sect. 3 I raise objections to Baz’s version, and in Sect. 4 I discuss experiments that support the revised argument from ordinary language.

2 Baz’s challenge to “the prevailing program”

Baz criticizes a philosophical method that he says is common in “the mainstream of analytic philosophy”. The method aims to develop or test philosophical theories of some subject matter by asking what Baz calls “the theorist’s question”, which asks for judgments whether or not “our concept of x, or [the expression] ‘x’, applies to some particular case, actual or imagined” (Baz 2012a, p. 1). For example, philosophers have investigated the concept of knowledge by asking whether or not we have intuitions that the concept applies in certain imagined situations (Gettier scenarios, driving through fake barn county, Mr. Truetemp’s miraculously reliable beliefs about the temperature, and so on). Baz calls “the research program that takes answers to the theorist’s question as its primary data ‘the prevailing program”’ (p. 1).

It is controversial to describe this particular methodology as the “prevailing program”, but there is little doubt that it is an influential aspect of contemporary philosophy. In particular, experimental philosophers have turned the traditional armchair method of eliciting judgments about scenarios into a branch of cognitive science by running formal experiments. These experiments ask ordinary experimental participants to make judgments about various philosophically significant expressions or concepts and using those judgments as evidence for or against philosophical theories.Footnote 4

Baz is not alone in wanting to challenge the “prevailing program”. Advocates of the “negative program” in experimental philosophy (Machery et al. 2004; Mallon et al. 2009; Weinberg et al. 2001) have criticized certain adherents of the prevailing program for assuming that the way in which a small subset of human beings apply a concept reveals something about the concept as such. And Cummins (1998) has challenged the prevailing program on the grounds that there is no way of “calibrating” the intuitions it relies on. That is, there is no independent means of determining whether or not they reliably track what they are supposed to track.

Baz’s immediate target is a particular defense of the “prevailing program” against these recent challenges. Baz focuses on the defense of the “prevailing program” offered by Williamson (2004, 2005, 2007). Williamson denies that what goes on when philosophers ask whether a concept x applies to some imagined or real situation should involve eliciting intuitions as to whether or not the concept applies, where those intuitions are evidence that the concept applies or does not. That kind of approach both invites embarrassing investigations into whether or not philosophers’ intuitions are widely shared and into how we could know that they are reliable indications of the subject matter under investigation, and it unnecessarily psychologizes the evidence available to philosophers. According to Williamson, the question whether a concept x applies to a particular situation can be answered by using our everyday capacity to apply concepts to actual and counterfactual situations (Williamson 2005, p. 12; Williamson 2007, p. 188). Insofar as that everyday capacity is reliable, the application of concepts to cases in philosophy should be reliable as well.Footnote 5 The prevailing program can then proceed to answer the theorist’s question by simply reflecting on whether or not a concept of interest applies to particular actual or counterfactual situations.

Baz criticizes Williamson’s “continuity defense” of the prevailing program for assuming that “what we are invited to do when we are invited (or invite ourselves) to answer the theorist’s question is not essentially different from what we do when, outside philosophy, we judge that, for example, someone knows or does not know this or that” (Baz 2012a, p. 3). Focusing on “know that”, and the concept knowledge, Baz argues that the theorist’s question “is fundamentally different from any question to which we might need to attend as part of our everyday employment of these expressions” (Baz 2012a, p. 4). If the theorist’s question is fundamentally different from everyday questions, then Williamson’s defense of the prevailing program, which ties the reliability of our answers to the theorist’s question to the reliability of our everyday capacity to apply concepts to encountered situations, fails. Baz takes the final sentence in the following Gettier scenario, from Weinberg et al. (2001, p. 443), as an exemplar of a “theorist’s question”:

Bob has a friend, Jill, who has driven a Buick for many years. Bob therefore thinks that Jill drives an American car. He is not aware, however, that her Buick has recently been stolen, and he is also not aware that Jill has replaced it with a Pontiac, which is a different kind of American car. Does Bob really know that Jill drives an American car, or does he only believe it?

Baz maintains that it is a “fundamental assumption” of the “prevailing program” that competent ordinary speakers of English (or whatever language the scenario is written in) who read this scenario understand the question that it concludes with, and are able to give it a meaningful answer. He wants to challenge that assumption, he says, “by way of a form of ordinary language philosophy” (Baz 2015, p. 4).

Baz summarizes his ordinary language procedure for challenging the “fundamental assumption” as follows (pp. 4–5):

Take some version of the theorist’s question—by which I mean, the form of words in which his question is couched—and ask how it might reasonably be understood in the course of everyday discourse, with respect to a case such as the one described by the philosopher. One thing that would then emerge is that, depending on the circumstances in which it arose, there are any number of different senses the similarly worded but non-merely-theoretical question could have—different ways the theorist’s words would, or could, reasonably be understood, depending on the context in which they were uttered or considered, even though the case under consideration remained the same. That would show that, contrary to the fundamental assumption...the words and case by themselves do not suffice for fixing the theorist’s question with a determinate sense, and a correct answer. In other words, it would show...that the theorist, in raising his question apart from any context that would fix his words with a determinate sense, has failed to raise a clear question.

The argumentative core of Baz’s challenge to the fundamental assumption consists in five attempts to show how the theorist’s question (in this case, “Does Bob really know that Jill drives an American car, or does he only believe it?”, asked of the Gettier scenario from Weinberg et al. 2001 described above) might matter in a non-philosophical context (Baz 2012b, pp. 108–115). Baz argues that all of these attempts fail, leaving us without evidence that the theorist’s question might naturally arise in a non-philosophical context. The burden is then on the defender of the “prevailing program” to defend the continuity of the philosophical question with ordinary questions about knowledge. I’ll summarize each of the attempts and Baz’s reasons for thinking that they fail.

Attempt #1: If Bob knows that Jill drives an American car, then he will be in a position to assure others that she drives an American car. Maybe we care about whether or not Bob is in such a position.

Reply: Given that it is stipulated in the Gettier scenario that Jill drives an American car (a Pontiac), there is no reason we, or anyone else who knows as much about the case as we do, would need assurance from Bob that Jill drives an American car. So it’s not clear what point (other than the purely theoretical point of finding out what knowledge is) there would be in asking the question whether Bob really knows, or merely believes, that Jill drives an American car.

Attempt #2: Suppose some third party (“Agent”) needs to know whether Jill drives an American car. Agent might wonder whether she can count on Bob’s assurance that Jill does drive an American car. That would give the question “Does Bob know that Jill drives an American car, or does he merely believe it?” significance in an ordinary context.

Reply: There are two possible ways of understanding Agent’s question about Bob: either Agent knows the basis for Bob’s assurance and can assess it, or she does not. If she does not know the basis for Bob’s assurance, or she’s not in a position to assess it, then her question is not the theorist’s question about Bob. The theorist’s question is whether Bob’s evidence is “good enough” for him to count as knowing, given that Jill does in fact drive an American car. If Agent does know the basis for Bob’s assurance and can assess it, and doesn’t doubt its truth, then her question is whether the fact that until recently Jill has driven a Buick gives her sufficient assurance that Jill is currently driving an American car. But that is not the same as the theorist’s question about whether Bob knows that Jill drives an American car.

Attempt #3/4: Imagine that another person (“Judge”) is aware of all of the facts of the Gettier scenario, and his job is to assess whether Bob was in a good enough position to assure Agent that Jill drives an American car. Imagine that Jill is an American politician, Agent is her press secretary, and Bob is Jill’s personal assistant. If Jill is seen driving a foreign car, her enraged constituents will vote her out of office and Agent (the press secretary) will lose her job. One of Bob’s responsibilities is to ensure that Jill is always seen driving an American car; if he fails to do so, that will have negative consequences for both Jill and Agent.Footnote 6 Judge’s question, “Does Bob really know...” is then a question about whether Bob is being sufficiently epistemically vigilant in carrying out his job, given the high stakes.

Reply: The point of Judge’s question still isn’t the same as the point of the theorist’s question. Judge’s question concerns Bob’s epistemic responsibility, so “Judge must put himself in Bob’s position if he is to judge him competently” (p. 111). But from Bob’s perspective, the situation is not a Gettier scenario, so the question does not come to the same thing as the theorist’s question. If the point of the question “Does Bob really know...” is instead simply whether Bob has been doing everything he should be doing with regard to keeping track of what Jill is driving, that too is a different question than the theorist’s question, the point of which is just to investigate whether or not Bob knows.

Attempt #5: The question “Does Bob know...” is just the question whether Bob has a piece of information that the questioner already possesses; whether Bob is aware that Jill drives an American car. Here is an example of this kind of use of “Does [he] know...”, drawn from the Corpus of Contemporary American English:

  • SARA-HAINES: Does he know you sneak off in the middle of the night?

  • SUSIE-ESSMAN: Well, when he turns around and goes like this and I’m not there. And, and you’re not there? Okay. So, he, he knows now. (Inaudible).Footnote 7

Reply: On this reading of the question “Does Bob know that Jill drives an American car”, it would amount to a question about whether Bob is aware that Jill drives an American car, to which the answer is clearly yes—he would not find it informative to be told that she drives an American car. (He already knows that, in the relevant ordinary sense of “knows”.) What Bob is not aware of is that Jill drives a Pontiac, not a Buick. The point of the question “Does Bob know that Jill drives an American car?”, understood in this way (about what Bob is aware of) is not the same as the point of asking the theorist’s question, which concerns whether Bob’s justification, plus the truth of his belief that Jill drives an American car, is sufficient to count as knowledge.

Assuming that there isn’t an example of the question “Does Bob really know...” in ordinary conversation that Baz has overlooked, what is the upshot of this series of failed attempts to associate the theorist’s question about knowledge with various everyday questions about knowledge? Here is Baz’s (2012b, pp. 115–117) account of what is going on:

      My aim is to bring out the anomalousness of her question and thereby to raise doubts about the presumed significance of the answers to it that she and others might give....

      In considering each of the different [everyday encounters with the question “Does Bob really know...”], we saw that the question that the person encountering Bob would naturally ask herself...is importantly different from the question that the theorist has wanted, and taken himself, to be asking. What answering the everyday question would normally involve and require, in each of the different cases, is nothing like what answering the theorist’s question involves and requires....

      There is good reason to suspect that no question that may naturally arise in the everyday [sic] would come to anything like the theorist’s question.

Baz is not alone in observing a disconnection between the “theorist’s question” and everyday questions about knowledge. For example, Bach (2005, pp. 62–63) observes that contextualists about knowledge ascriptions are not justified in treating their responses to the “theorist’s question” (whether someone knows something in a particular context) as representative of ordinary uses of “knows”, because

...outside of epistemology, when we consider whether somebody knows something, we are mainly interested in whether the person has the information, not in whether the person’s belief rises to the level of knowledge. Ordinarily we do not already assume that they have a true belief and just focus on whether or not their epistemic position suffices for knowing. Similarly, when we say that someone does not know something, typically we mean that they don’t have the information.

(Bach is invoking the ordinary sense of “Does he know...” that appears in Baz’s Attempt #5, above.)

If the “theorist’s question” is indeed fundamentally different from “everyday” questions, then any answers that the philosopher receives to her question will not help answer questions about everyday uses of expressions (and vice versa). That would be a serious problem for defenders of the “claim of continuity” (like Williamson) who take responses to the theorist’s question to support or undermine metaphysical theories (of knowledge, for example), as well as experimental philosophers who take answers to the theorist’s question to be evidence for or against theories of the meaning of a particular expression used in ordinary thought and talk (“know”, for example).Footnote 8

In addition to arguing that the theorist’s question could not arise in everyday contexts, Baz argues that we do not even know how to answer the theorist’s question, or assess other people’s answers to it and therefore seeking answers to it is fundamentally misguided. In order to establish that ambitious conclusion, he argues as follows:

  1. 1.

    “[T]he point of an everyday question guides us in answering it and in assessing our own and other people’s answers”.Footnote 9

  2. 2.

    “[T]he theorist’s question has no point, in the relevant [everyday] sense” (Baz 2012a, p. 327).

  3. 3.

    So it is not surprising that there is substantial disagreement over how to answer the theorist’s question, because there is no everyday point to guide answers to the question.

Other philosophers, reflecting on the practice of asking non-philosophers to respond to versions of the theorist’s question, have expressed thoughts similar to Baz’s first two premises, about the way non-philosophers may have a hard time understanding the theorist’s question:

“...experimental philosophy subjects are ipso facto at a significant disadvantage since it is often a precondition of their participation that they have no idea why anyone would be interested in finding out what the folk think about Gettier scenarios, much less what a Gettier scenario actually is” (Cullen 2010, p. 281).

“...anyone who, like me, has taken a survey when you didn’t have any good feeling for why you were being asked the questions directed at you and so didn’t know what to focus on should be able to appreciate how lost some ordinary person, just being asked about these strange cases on some survey, might be” (DeRose 2011, p. 93).

“...when a person responds to a yes/no survey question (or rates assent on a Likert scale), just what is the conversational context? Who is he or she conversing with, and how do we work out what he or she assumes about the hearer’s beliefs? Frankly, this is a baffling task” (Kauppinen 2007, p. 107)

There are therefore two related arguments that Baz is making against the “prevailing program”. First, because the “theorist’s question” (for example, “Does he know that Jill drives an American car?”) lacks any practical “point” or significance, while the “point” or significance of everyday questions guides our answers to such questions, when participants in an experiment give answers to the theorist’s question, we shouldn’t assume that their answers tell us anything about their competence with the underlying concept that philosophers are interested in investigating. Second, Baz is arguing that because the “theorist’s question” lacks an everyday point, the question lacks a determinate sense. Both of these arguments are intended to challenge Williamson’s “claim of continuity”.

Do those two arguments stand up to scrutiny? In the next section, I’ll argue that there is experimental evidence that runs counter to the conclusion of the second argument. The first argument is more difficult to dismiss, however, and I’ll show how responding to it requires rethinking how philosophers design both informal (“armchair”) and formal experiments.

3 Responding to Baz’s second argument against “the prevailing program”

Baz’s second, more radical, criticism of “the prevailing program” alleges that there is substantial disagreement over how to answer the theorist’s question, and offers a diagnosis of the source of that disagreement in terms of the fact that the theorist’s question lacks a point, in contrast with everyday questions. The most straightforward problem with this argument is that there is not evidence of substantial disagreement about how to respond to Baz’s chosen “theorist’s question” of a kind that would support Baz’s claim that the theorist has “failed to raise a clear question” (Baz 2015, p. 5).

The central piece of empirical evidence that Baz cites in support his claim of substantial disagreement in response to the theorist’s question is Weinberg et al. (2001). In that study, Weinberg et al. found that while a majority of Westerners tended to say that Bob “only believes” (and doesn’t “really know”) that Jill drives an American car in the Gettier scenario described above, that preference was reversed when East Asian participants and participants from the Indian sub-continent were asked the same question. That is a striking result, and Weinberg et al. argue that it undermines “a sizeable group of epistemological projects—a group which includes much of what has been done in epistemology in the analytic tradition” (Weinberg et al. 2001, p. 429).

The experimental evidence that has accumulated since the publication of Weinberg et al.’s study, however, has not supported the claim of substantial variation in epistemic intuitions (Turri 2016). There have been several failures to replicate the original finding of cultural variation in epistemic intuitions (Machery et al. 2017; Seyedsayamdost 2015; Turri 2013), including a study using exactly the same experimental materials as the original Weinberg et al. (2001) study but using a substantially larger sample size (Kim and Yuan 2015). And recent investigations have indicated that some variability in response to different Gettier cases is systematically related to epistemically significant features of the cases themselves, such as whether the evidence that the protagonist has for their belief is “authentic” or merely “apparent” (Starmans and Friedman 2012).

Blouw et al. (2017) and Turri et al. (2015) argue that there is in fact no epistemically unified category of “Gettier cases”, but five different types of case, ranging from “Gettier-1” cases in which the agent “perceptually detects the truth, and there is a salient but failed threat to the truth of her judgment” (Goldman’s (1976) fake barn county example illustrates this type of case), to “Gettier-5” cases in which “the agent fails to detect the truth, but her judgment is nevertheless made true by a state of affairs dissimilar to what she based her belief on” (p. 10) (Gettier’s 1963 “Either Jones owns a Ford, or Brown is in Barcelona” case is the paradigm of this latter type) (Blouw et al. 2017, p. 9). Intermediate Gettier cases included scenarios in which:

  • (Gettier-2: detection, similar replacement) the agent forms a true belief on the basis of “detecting” the relevant truth-maker (forming the belief that there is a pen on a table on the basis of seeing the pen), but then the truth-maker is replaced with a similar truth maker (another visually indistinguishable pen, for example),

  • (Gettier-3: detection, dissimilar replacement) the agent forms a true belief on the basis of “detecting” the relevant truth-maker (she forms the belief that she has a diamond in her pocket on the basis of purchasing a genuine diamond), but the original truth-maker is replaced by a dissimilar truth-maker (a thief steals the one she bought, but there is, unbeknownst to her, another diamond stitched into her pocket),

  • (Gettier-4: no detection, similar replacement) the agent forms a true belief but fails to “detect” the relevant truth-maker (she forms the belief that she has a diamond in her pocket on the basis of purchasing a fake diamond, which is then stolen, but her belief is made true by a genuine diamond that is slipped into her pocket without her knowledge).

There were significantly different rates of knowledge attribution in response to the different types of Gettier scenarios, ranging from knowledge attributions that do not significantly differ in rates of knowledge attribution from clear cases of knowledge in response to Goldman-style Gettier-1 scenarios (up to 83% in Turri et al. 2015), down to 19% in Gettier-5 scenarios (with the same structure as Gettier’s “Barcelona” case), which do not significantly differ in rates of knowledge attribution from clear cases of non-knowledge.Footnote 10 See Table 1 for a summary of relevant results, based on Figure 1 in Turri et al. 2015; triple vertical bars indicate a significant difference in responses.

Table 1 “Really knows” dichotomous response percentages for Experiment 4 (Turri et al. 2015)

The wider pattern of responses to different types of Gettier cases reported in Blouw et al. (2017), Starmans and Friedman (2012) and Turri et al. (2015), which include responses to (theoretically) clear cases of knowledge and clear cases of non-knowledge (either cases of false belief, or true beliefs that lack justification) in fact poses a challenge to Baz’s contention that the theorist’s question (which, in Gettier cases is the question whether the protagonist knows that, e.g., Jill drives an American car) is not “clear” because it lacks a practical point.Footnote 11 If the theorist’s question lacked a sense, as Baz claims then it should be surprising to see the consistent levels of knowledge-denial in certain kinds of Gettier cases that experimenters have found (around 80%—see Turri 2016, p. 341) as well as the consistent patterns of variation when epistemically significant features of the Gettier cases are varied (see the Appendix for details), and especially the much higher rates of knowledge attribution in theoretically clear cases of knowledge (79–90% in Starmans and Friedman 2012 and Turri et al. 2015) than in theoretically clear cases of non-knowledge (8–14% in Starmans and Friedman 2012 and Turri et al. 2015).Footnote 12 (All of these experimental studies are described in greater detail in the Appendix.)

Where does this evidence leave Baz’s more ambitious argument? Even if we grant him that the theorist’s question about whether the protagonist in a Gettier case knows something lacks an everyday “point”, there is a substantial body of evidence that does not support the idea that participants fail to understand the content of the question they are posed. If the “theorist’s question” in the Gettier cases genuinely lacked sense, then we should find a pattern of responses to versions of the “theorist’s question” that indicates that participants are failing to understand the question.Footnote 13 But existing experiments do not find such a pattern.Footnote 14

In addition to running into a body of experimental findings that challenge its conclusion, Baz’s more ambitious argument also makes a deeper theoretical mistake: it assumes that there is a sharp cut-off between “everyday” questions, which are raised in contexts where there is some practical point to posing the question, and the “theorist’s question”, which is raised in a context that is stripped of any practical significance (for the participants attempting to answer the question). The assumption is mistaken because the distinction between the “everyday” and the “theoretical” is porous. Purely “semantic” questions come up naturally in everyday conversations, where there is no obvious point to the discussion other than sheer interest in figuring out the meaning of some expression. For example, Niedzielski and Preston (2000) includes a collection of 59 recordings of “everyday” or “folk” conversations pertaining to linguistic matters. Those conversations include everyday discussions about the following questions of meaning:

  • Is the word “maturity” associated with “closed-mindedness” or with the ability to do things “wisely” and “correctly”? (pp. 266–267)

  • Does a diary consist only of “notes”, or can it be “reflective” and “book-like” like a journal?

  • Can a “hairdo” be correctly used to describe a man’s hair? (p. 267)

These kinds of folk meta-linguistic discussion can lack a practical “point” in the same way that philosophical debates about the meaning of expressions like “knows” can lack a practical point—there may be no practical issue that turns on which way they are settled.Footnote 15 And yet the participants in these conversations can come to agree on a particular meaning for an expression. There is no principled reason why a similar conversation about the meaning of “knows” couldn’t arise in an “everyday” (non-philosophical) situation.Footnote 16 Theoretical investigations of meaning are continuous with these kinds of everyday meta-linguistic conversations.

4 The insight in Baz’s first argument: the need to diversify experimental contexts

The previous section discussed reasons to reject Baz’s more ambitious second argument that the theorist’s question is not “clear”, and his claim that when we try to answer it we lack “orientation of the kind that is ordinarily provided by a suitable context”, because it lacks an everyday “point”. Experimental evidence indicates, however, that participants are not responding to the theorist’s question (at least in the case of “know” and knowledge) in a way consistent with the question lacking sense.

But what about Baz’s first argument, that the point of asking the theorist’s question and the point of an identically worded question in an everyday context are different, so the way people respond to the question in one context doesn’t necessarily tell us anything about the way they would respond to it in the other? I think that Baz’s first argument is indeed an important challenge to standard experimental approaches to investigating the meaning of a term like “knows”. I will raise some additional considerations in support of this argument in this section, by considering several experimental case studies, each of which lends weight to Baz’s claim that when participants provide answers to the “theorist’s question” about “knows”, detached from features of ordinary conversation, they may be doing something substantially different than what they ordinarily do when operating with “knows” and the concept of knowledge.

4.1 Varying the motivational context

It is possible that we are missing important dimensions of our concepts by only testing them in theoretical contexts in which participants have no stake in the outcome of their judgments. For example, a development of one of the most dramatic findings in 20th century social psychology—Asch’s (1956) conformity experiments—shows that varying a participant’s motivational context can affect how they perform an experimental task.

Asch’s conformity experiment involves asking participants to make extremely simple perceptual judgments comparing the length of “comparison” lines with the length of a standard (see Fig. 1). The ease of the perceptual task is conveyed by the high accuracy of such comparisons (99%) when participants performed the task without any outside influence, in a control condition. The experimental manipulation involved placing the participant in a context of social influence with a group (6–8) of experimental confederates who made unanimously incorrect comparative judgments. In the social influence condition, participants’ responses became significantly less accurate, conforming with the incorrect judgments of the majority in 36.8% of the trials (Asch 1955, p. 32).

Fig. 1
figure 1

Stimuli from the perceptual discrimination task used in Asch (1956, Fig. 2); length labels did not appear on the experimental stimuli

Further variations indicate that other manipulations have a significant effect on rates of conformity on the perceptual judgment task. Asch (1956) provides evidence that varying the size of the majority, and the presence or absence of dissenters (both those who report accurate and inaccurate judgments) has an effect on whether participants judge in accordance with the majority. Baron et al. (1996) investigate whether the Asch conformity effect only arises because of the triviality of the perceptual task:

One could dismiss the conformity effect as a laboratory ‘hothouse’ phenomenon that occurs because the potential face-to-face rejection of peers is far more important to participants than their accuracy on some unimportant ‘scientific’ test of perception or social judgment. (Baron et al. 1996, p. 915)

What would happen to the conformity effect if participants were given some additional motivation for performing the perceptual task accurately? To answer that question, Baron et al. used a “lineup” task, in which participants were shown a drawing of a target person and then asked to judge whether the target appeared in a lineup of four individuals in an image presented separately (see Fig. 2).

Fig. 2
figure 2

“Lineup” task used in Baron et al. (1996, Fig. 1); example “perpetrator” slide is on the left, example “lineup” slide is on the right

Participants were given the lineup task in four different conditions, which varied the difficulty of the task (low vs. high), and the importance of the task (low vs. high). The low-difficulty version of the task allowed participants to view the perpetrator slide and the lineup slide for five seconds each, and showed the two-slide sequence two times. In the high-difficulty version of the task, the perpetrator slide was only shown once, for 0.5 seconds. The low-importance condition involved informing participants that they were participating in a pilot study developing materials to test eyewitness testimony. In the high-importance condition, participants were told that they were calibrating an eyewitness testimony test that will soon be used by police and in courtrooms, and that if they performed in the top 12% in terms of accuracy on the test, they would receive a $20 prize.

Baron et al. found that in the low-difficulty, high-importance condition, participants were significantly less likely to be subject to the conformity effect than in the low-difficulty, low-importance condition, lending support to the idea that participants in the original Asch experiments conformed to the majority at the rates they did partly because of the low importance of the task they were asked to perform. But even more interestingly, in the high-difficulty, high-importance condition, participants were significantly more likely to conform to an inaccurate group consensus than in the high-difficulty, low-importance condition. Baron et al. (1996, p. 924) explain this finding by observing that when it is difficult to “objectively” verify a particular judgment (because of the short exposure time in the high-difficulty condition), “individuals become increasingly reliant on social information to gauge the accuracy and appropriateness of their views”. The Baron et al. investigation reveals that participants’ responses can be affected by participants’ sense of what the perceived point or importance of the experimental task is.

Embedding existing experiments on “know” and knowledge in a context where participants have some additional motivation for performing the task would require only a slight divergence from standard experimental investigations of knowledge. For example, one of the more closely studied questions in experimental epistemology is whether knowledge is sensitive to the stakes of being wrong (i.e., are people more willing to ascribe “knowledge” to an individual when the consequences of the individual being wrong are trivial than when the consequences are severe).Footnote 17 In all existing experiments probing the concept of knowledge, it is simply stated what the stakes are, and assumed that participants will take that statement at face value when asked to judge whether someone “knows” something; the actual stakes for the participants or for those who they are judging are not varied.Footnote 18

In contrast to the methods employed by existing studies in experimental epistemology, studies in behavioral economics regularly employ methods in which the actual stakes for participants are varied. For example, stakes can be straightforwardly manipulated by varying monetary rewards (for a review of such experimental approaches see Kamenica 2012). For example, Ariely et al. (2009) found that increases in monetary stakes increased performance in simple tasks but degraded performance in complex tasks. Such a design is easily extendable to investigate the effect of stakes on judgments about knowledge and the meaning of “knows”, so that participants are placed in situations where genuine financial effects of being wrong either on another or on themselves can be manipulated to determine whether self-ascription or other-ascription of knowledge is sensitive to stakes. Experiments of that form could assess whether effects similar to those observed in Baron et al. (1996) extend to assessments of knowledge.

4.2 Varying awareness of being in an experiment

The “dictator game” is used to probe whether people have a sense of “fairness” in how they allocate a monetary windfall. The game was developed to test the “unfairness” assumption in standard economic theory: “The economic agent is assumed to be law-abiding but not ‘fair’—if fairness implies that some legal opportunities for gain are not exploited” (Kahneman et al. 1986, p. S286). The “dictator” receives (or is told to imagine that she receives) a certain amount of money ($20 in the original study), and is then instructed to decide how much of the windfall to offer anonymously to a recipient. Standard economic theory would predict that the dictator should keep all of the windfall.

Kahneman et al. (1986) offered the dictator a choice between offering $2 and $10 to the recipient. The high rates of fair ($10) offers (76%) was taken as evidence against the “unfairness” assumption of standard economic theory as a model of actual human behavior (Kahneman et al. 1986, p. S291). Subsequent dictator game experiments which offered a wider range of response options did not reproduce the high rates of a completely fair distribution (only 22% made a 50–50 offer in the dictator experiment with actual pay in Forsythe et al. 1994, for example), but there has been extensive evidence from dictator game experiments that challenges the “unfairness assumption” of standard economic theory (for a summary of the results of many studies, see Camerer 2003, Table 2.4 and Guala and Mittone 2010).

One methodological worry that has been raised about the use of dictator games to challenge the unfairness assumption is that in standard experiments participants are not anonymous. If the dictator’s offer is not genuinely anonymous, it can’t be concluded that it is purely a sense of fairness that is driving their altruistic offers—it might be, for example, the dictator’s desire to protect her reputation that (partially) explains the fact that offers diverge from the predictions of standard economic theory. Hoffman et al. (1994) lent experimental weight to this worry by conducting a double-blind dictator game (in which individual participants’ offers could not be known by the experimenters or the recipients of the offers, and participants knew that they could not know) which had the effect of significantly reducing the amount of the offers that the dictators made (half of the dictators offered nothing).Footnote 19

But even in Hoffman et al. double-blind experiment, participants are still aware that they are taking part in an experiment. Winking and Mizer (2013) conducted a “natural field experiment” that removes even that residual element of the dictator’s sense that her behavior is being examined (even if not de re) by an experimenter. Their study yielded an astonishing result: under conditions when dictators didn’t realize they were participating in an experiment, they did not make any altruistic offers—they kept all windfalls for themselves.

Winking and Mizer’s field experiment involved a pair of confederates. Confederate 1 waited at various bus stops, each of which was within one block of a casino in Las Vegas. When a potential participant also began to wait at the bus stop, Confederate 1 pretended to take a phone call on a cellular phone and “walked some distance away, facing away from the participant”. Confederate 2 then walked by the participant, and “pretended to notice [casino] chips in his pocket, stopped briefly and claimed to the participant that he was late for a ride to the airport and asked the individual if he/she wanted the casino chips [$20], which he did not have time to cash in” (Winking and Mizer 2013, p. 290). There were three experimental conditions: In condition 1, Confederate 2 either simply walked off; in condition 2, Confederate 2 told the participant, when handing over the chips, “I don’t know, you can split it with that guy however you want”, referring to Confederate 1; condition 3 involved a set up roughly parallel to Hoffman et al. (1994), in which participants were aware they were taking part in an experiment, but the experimenter didn’t see how participants allocated the $20 in chips they received. While the results in condition 3 were consistent with laboratory dictator game results, with a mean offer of $5.43, no participants in either condition 1 or condition 2 \((\hbox {n}=60)\) offered any chips to Confederate 1 (p. 291). Winking and Mizer’s experiment indicates the dramatic effect that awareness of being in a non-ordinary (experimental) context can have on participants’ behavior.

The dramatic effect of moving the dictator game out of the lab and into the wild demonstrated in the Winking and Mizer study provides a model for how to think about more naturalistic experiments investigating philosophically significant concepts (such as knowledge). With the help of confederates, it would be possible to evaluate how stakes affect the way ordinary speakers assess whether someone knows something in a covert way. For example, two confederates could play the role of parent and student at a University open day (open house). The participant would be selected from those who have volunteered to be guides for prospective students. The student confederate would ask the participant guide for directions to their next appointment (which is scheduled to take place in building B), and then walk away after receiving directions. After the student confederate walks away, the parent would then approach the participant guide and ask (condition 1, low stakes) if their child knows that their next meeting (which concerns what student clubs are available on campus) is in building B; or (condition 2, high stakes) if their child knows that their next meeting (which they have to be on time for because they are going to be interviewed for a full academic scholarship) is in building B. Such a design does not vary the stakes for the participant, but it creates a condition in which apparent real-world stakes (for the confederates) can vary while concealing the fact that an experiment is taking place.

4.3 Conversation versus one-off speech acts, and addressees versus overhearers

Clark (1997) observes that most experimental investigations of language employ unnatural conversational contexts, stripped of normal features of social interaction. Typically such experiments involve making judgments about one-off utterances, which participants cannot query or challenge:

It is difficult to study understanding in the wild, so investigators have developed a variety of laboratory techniques instead. Most of these techniques are built around contrived sentences presented to people isolated from any realistic human activity. (p. 577)

Clark argues that the standard methodological assumption in experimental investigations of meaning is that understanding an utterance is “autonomous”, meaning that it doesn’t require any interaction beyond the passive comprehension of the speaker’s utterance by the audience. Stimuli are usually written or pre-recorded spoken texts that are presented to participants, who are asked to respond to them in various ways, but querying the stimulus or asking for clarification is usually not permitted. For example, the “presupposition assessment task” (Syrett 2007; Syrett et al. 2010; Liao and Meskin 2017; Hansen and Chemla 2017) tests whether participants are willing to accommodate the uniqueness and existence presuppositions of definite descriptions when combined with different types of adjectives.

Fig. 3
figure 3

Stimuli used in the presupposition assessment task, from Syrett (2007, Appendix E). a Please give me the long rod. b Please give me the full one. c Please give me the spotted one

The task involves showing participants pairs of objects with varying degrees of a particular property picked out by an adjective F, and then asking for the participant to select “the F one” (see Fig. 3). Participants are willing to accommodate both the uniqueness and existence presuppositions of the definite description when asked to select the longer of the two rods, but they tend to refuse both the request to hand over “the full one” (because neither jar is completely full—a failure of the existence presupposition of the definite description), and the request to hand over “the spotted one” (because both disks are spotted—a failure of the uniqueness presupposition). That pattern of responses is taken as evidence of a difference in the standards that participants associate with different types of adjective. But the task (like many experimental probes used in experimental semantics and pragmatics) is non-naturalistic in the respect that participants can’t ask for clarification of the request, or confirmation that they’ve selected the right object.

Schober and Clark (1989) demonstrate that the ability of the audience to interact with the speaker has significant effects on successful communication. Schober and Clark provide evidence that when addressees can actively interact with speakers, they can more accurately represent what the speaker intends to communicate than “mere overhearers” who passively listen to the same conversations. In one of their experiments, a “director” was seated across from a “matcher”, separated by a barrier that prevented them from seeing each other. The director has a sheet with 16 tangram figures on it, arranged in a random order (see Fig. 4). The first 12 figures on the director’s sheet were numbered 1–12. The matcher had 16 cards with corresponding tangram figures on them, and ordered slots in which 12 of the cards could be placed. The primary communicative task was for the matcher to arrange 12 cards in the order in which they appeared on the director’s sheet, and the director and the matcher could talk to each other as much as they wanted. Each director–matcher pair played the game six times in a row, with the order of the tangram figures randomized each time.

Fig. 4
figure 4

Tangram figures (Schober and Clark 1989, Fig. 1)

A secondary communicative task involved a third participant, an “overhearer”, who was in the room with the director and the matcher, but who was instructed not to interact with either. The overhearer was instructed to try to match the same 12 tangram figures that the director and the matcher were trying to match. The director and the matcher were told that the overhearer was a coder who was there in order to “reduce experimental bias”, in order to make sense of the presence of a silent listener (p. 222). The overhearer therefore had access to all of the same utterances as the director and the matcher, but Schober and Clark found that the matchers were significantly more accurate than the overhearers: “Matchers started out with 95% correct on Trial 1, and, by Trial 6, they all matched every reference correctly. In contrast, overhearers started out with only 78% correct and only improved to 89% by the last trial” (p. 223). That supports the idea that optimal understanding involves joint activity between speaker and addressee. Because standard experimental tasks used to probe the meaning of expressions don’t involve a collaborative component, they may only be capturing a small slice of typical linguistic understanding—namely, that which is available to overhearers, rather than the optimal form of understanding that requires collaboration between speaker and addressee.

How could this conversational paradigm be applied to the investigation of “knows” and knowledge? One approach would be to adopt the interview methodology used in Niedzielski and Preston (2000), in which trained fieldworkers recorded open-ended conversations with ordinary speakers that focused on linguistic topics. It would be straightforward to prompt participants to have conversations about the meaning of “know”, and steer conversation towards specific topics of theoretical interest (stakes sensitivity, what participants think of Gettier-style cases, and so on). This kind of approach would have to take steps to avoid the obvious risk of experimenter bias, but it has the potential to reveal not just how participants apply “know” to particular cases, but also to reveal higher-level beliefs about “know” and knowledge.Footnote 20

A different approach would be to adopt a design similar to that used in Schober and Clark (1989). Pairs of participants would be confronted jointly with standard stimuli about “know” (Gettier cases, stakes-sensitivity cases, and so on), and asked to discuss how to classify the cases. That type of design would have the advantage of yielding both “extensional” data about classification, as well as constrained contexts in which to observe meta-linguistic “intensional” data [and potentially new examples of “meta-linguistic negotiation”—see Plunkett and Sundell (2013)] about the meaning of “know”.

5 A revised challenge from ordinary language

The three experimental case studies discussed above provide some empirical support to Baz’s first argument that answers to the “theorist’s question” may not give us an accurate picture of the concepts that speakers employ (knowledge, e.g.) in ordinary circumstances. With these experiments in mind, I propose a new version of the argument from ordinary language as follows:

  1. 1.

    Standard experimental approaches to the investigation of philosophically significant concepts assume that stripping away conversational or “pragmatic” factors from the experimental context yields a clearer picture of the underlying concepts.

  2. 2.

    But experimental studies in more “ecologically valid” contexts—which may include (i) motivations that go beyond just wanting to perform the experimental task, (ii) participants’ awareness that they are taking part in an experiment, or (iii) an experimental task that involves active collaboration between speakers and addressees—may not interfere with or distort the application of the relevant concepts; such contexts may in fact provide better conditions for the application of those concepts. (At least: we don’t yet have a reason to think that by stripping out standard features of ordinary situations in which a concept is applied, we get a more accurate picture of how that concept functions.)Footnote 21

  3. 3.

    So drawing conclusions about philosophically significant concepts solely on the basis of answers given to the “theorist’s question” in experimental contexts that lack (i–iii) is, so far, unjustified.

The conclusion of this revised challenge from ordinary language to standard experimental ways of investigating meaning is less radical than Baz wants: it doesn’t establish that there is a “fundamental” difference between the theorist’s question and ordinary questions, and it could turn out that these factors (i–iii) only matter in certain cases, and that, say, the way we understand the word “know” isn’t sensitive to different motivations or conversational “points”, or whether people are aware that they are participating in an experiment, or whether the word is used in a collaborative conversation or just an utterance that is directed to mere overhearers. But one advantage of this revised argument is that it does not depend on any contentious (Wittgensteinian or otherwise) conceptions of meaning and understanding in general—it is a challenge grounded in experimental data and some (hopefully not overly contentious) features of non-experimental conversation.

6 Conclusion: “Nobody would really talk that way!”

The revised challenge from ordinary language can be viewed as a modest branch of the critical project in ordinary language philosophy. Endorsing the argument doesn’t require saying that philosophers are speaking “nonsense” when they diverge from ordinary use (as in Malcolm 1951), or that ordinary speakers do not understand what they are being asked when confronted with Gettier scenarios, because such questions could be understood in any number of ways, and the context in which the “theorist’s question” is posed doesn’t provide a way of selecting among those ways (as Baz argues). But it does require some response if philosophers are going to continue to claim that formal or informal experiments illuminate the lexical meanings and concepts that ordinary speakers employ, or (more ambitiously) that such experiments tell us something about the underlying features of reality those meanings and concepts are about.

One way of responding to the revised challenge from ordinary language would involve designing experiments that probe the meaning of “know” (e.g.) while incorporating some or all of the features (i–iii) ((i) motivations that go beyond just wanting to perform the experimental task, (ii) participants’ awareness that they are taking part in an experiment, and (iii) an experimental task that involves active collaboration between speakers and addressees). Such a response would require some experimental ingenuity. The design of such experiments that would investigate “knows” and the concept of knowledge (and possibly knowledge itself) is sketched in Sect. 4.

The quote in the title of this paper comes from a story that Keith DeRose tells about Rogers Albritton. DeRose describes his early attempts to develop pairs of examples that were supposed to illustrate the idea that knowledge ascriptions (“S knows that p”) are context–sensitive. DeRose’s early examples involved ascriptions that appeared to say something true, but which were conversationally inappropriate:

My adviser, Rogers Albritton, objected, as near as I can remember, ‘Nobody would really talk that way!’ I replied that it didn’t matter whether people would talk that way. All I needed was that such a claim would be true, and that certainly was my intuition about the truth-value of the claim. He would have none of that, and answered, quite sternly, ‘Look, if you’re going to do ordinary language philosophy—and that’s what you’re doing here—you’d better do it right’...Albritton never explained to me why the examples should be constructed so that what’s said is natural and appropriate beyond insisting that that’s how ordinary language philosophy should be done. (He seemed to think it a point too obvious to require explanation, and I was not about to ask!) (DeRose 2009, p. 51)

In roughest outline, the critical project in ordinary language philosophy can be summed up as a version of Albritton’s objection: It challenges standard ways of investigating the meaning of philosophically significant expressions that ignore the way people “would really talk”.Footnote 22 The revised argument from ordinary language proposed in this paper, and the recommendation to enrich standard experimental investigations of “know” and knowledge is intended to focus new attention on what would be required to “do ordinary language philosophy right”, at least in an experimental context.