1 Introduction

Proper names, demonstrative pronouns, certain uses of definite or indefinite descriptions, and possibly some further kinds of expressions enable us to refer to particular things. How they do so and whether different expressions enable us to refer differently has been the topic of much debate in the philosophy of language.

According to Frege-inspired semantics, all linguistic expressions have both sense and reference (Frege, 1892), where the sense of an expression is often taken to be something like its descriptive content. Somewhat simplified, reference would come about by objects’ fulfilling the respective descriptions (cf., e.g., Lycan 2008; Powell, 2010; Bianchi, 2015). Russell (1905, 1911) held that this is true for most of the common referring expressions but that some expressions, which he termed logical proper names, refer without such descriptions simply by our acquaintance with objects. Quine (1982, 1974) held that there are no logical proper names and that reference could only be made within a whole linguistic referential apparatus that centrally includes quantification besides a range of other linguistic constructions. Referring expressions are those which can take the position adequate for variables of quantification within a sentence. In effect, Quine held that singular reference must be analysed in terms of general sentences. Strawson (1950, 1959), while accepting that referring terms have descriptive content, argues against Quine that the use of general sentences cannot be explained without recurrence to singular sentences. For Strawson, demonstratives are indispensable for explaining reference, but demonstratives are not understood like Russellian logical proper names. What a singular term refers to depends on speakers’ and hearers’ ability to identify the expression’s referent within a spatiotemporal scheme. Authors like Kripke (1980), Kaplan (1978, 1989), Perry (1977, 1979; reprinted in Perry 2000), and others have then forcefully argued that at least some expressions refer directly, much like labels attached to particular objects, effectively rehabilitating Mill’s (1843) semantic theory (cf. Powell, 2010; Ponte & Korta, 2017).

While many philosophers of language accept that Kripke et al. have shown that “the Frege-Russell (disguised (truncated)) description theory of names” (Almog et al., 2015, p. 356, fn7) is unsuccessful, the direct-reference outlook raises many pressing questions that remain largely unanswered. As Bianchi (2015, p.2) points out, direct reference does not offer a unified theory of linguistic meaning that consistently deals with all kinds of expressions. Moreover, direct-reference accounts raise questions about the relation between language and cognition. While for Frege-inspired semantics, an expression’s cognitive import consists in grasping its sense, direct-reference accounts are not equipped with any semantic feature that could play a cognitive role. Direct reference is just that: direct. Correspondingly, direct linguistic reference appears to depend on an unexplained cognitive capacity to refer to objects. Furthermore, there is a range of linguistic expressions for which direct-reference-inspired accounts have difficulties spelling out their meaning, such as propositional attitude reports or empty names, or which have not been extensively dealt with, such as verbs and pronouns or indefinite descriptions.

Among the philosophical accounts of reference, Quine’s (1974) The Roots of Reference stands out in offering an integrated account of the acquisition of linguistic reference and object individuation. Quine argued that reference is not acquired piecemeal (singular term by singular term, so to speak) but by acquiring a whole referential apparatus. Based on a non-referential ability to distinguish bodies and to “individuate” bodies that belong to the same sortal kind, the pivotal step is the acquisition of quantification. Above all, Quine’s great merit is that he strives for a detailed attempt to explain the development of reference to objects. A developmental account can shed light on how speakers can use the referential resources of their language and constrains which theories of reference are cognitively plausible.

Cognitive and developmental psychologists have taken up the question of how we can refer to objects. While several philosophical accounts are on offer of how objects are extracted from sensory stimulation (e.g., Peacocke 1992; Pylyshyn, 2001, 2007; Leslie et al., 1998; Burge, 2009, 2010; Butterfill, 2020), it is currently common in developmental psychology to rely on object files to explain performance on object individuation tasks (cf., e.g., Stavans et al., 2019, for a recent attempt at integrating various findings). This approach is well compatible with direct reference. Learning a name (or another referring expression) would consist in associating the word with an object file representing the object to which the name (or another referring expression) directly refers. According to this picture, the development of reference would consist in developing a functioning object file system and associating referring expressions with object files. Referring expressions would enable us to speak about objects much like name tags enable us to pick out persons at a conference.

Nonetheless, several aspects of Quine’s theory have been tremendously influential in the empirical research on the development of object cognition and the acquisition of concepts. Especially in the predominating work of Carey, Xu, and their numerous collaborators, aspects of Quine’s work have been used to structure research questions and to explain empirical findings (Carey, 2009; Xu & Carey 1996; Xu et al., 2004; Xu, 2007). Central to this empirical work is the idea that sortal concepts individuate (see Quine 1974, p. 53 ff.). Moreover, Quine stresses that individuation of bodies by sortals is an intermediate step and that reference is only acquired when critical aspects of the referential apparatus are learned. Nonetheless, theorists are commonly quite content to explain reference to objects in terms of an ability to track bodies and a subsequently acquired ability to identify the different bodies that fall under a sortal concept (e.g., Cacchione & Rakoczy 2017).

The prominence of the direct-reference approach notwithstanding, Millian approaches to meaning are confronted with numerous objections Church, 1949; Carnap, 1947; Quine, 1974; Searle, 1958; Strawson, 1959; Wittgenstein, 1969; Tugendhat 1976; Evans, 1982; Montague, 1970, 1973; Chomsky, 2000; Hinzen & Sheehan, 2015; Dickie, 2015; Glauer & Hildebrandt, 2021) and the currently dominant approaches to object individuation in developmental psychology have likewise come under attack (e.g., Cohen et al., 2002; Krøjgaard, 2000; Krøjgaard et al., 2013; Hildebrandt et al., 2020, 2022).

In this article, we want to critically re-assess Quine’s account of the acquisition of reference and object individuation. While we are not going to argue that incorporating Quine’s whole approach of putting quantification centre-stage will solve theoretical problems in the theory of reference, an analysis of why Quine’s approach is not successful will shed light on what is required for the acquisition of reference and object individuation. We are going to start our discussion with a brief presentation of the central stages of the acquisition of reference, according to Quine (1974). Our critique will then proceed in three steps to show that Quine effectively presupposes what he sets out to explain, namely, the individuation of objects (see Harman 1975, for a similar line of argument and Peacocke 1978, regarding reference to abstract objects). We are going to argue (i) that sortals do not individuate, (ii) that bodies are already objects, and (iii) that the acquisition of variables of quantification presupposes identity. The results will shed light on what a theory of reference has to incorporate, namely, an explanation of the spatiotemporal individuation of ordinary objects. This result is independent of whether reference is direct.

2 Quine on reference

In his seminal The Roots of Reference, Quine (1974) has presented a detailed account of how linguistic reference is acquired from a system of non-referring linguistic expressions through simple, domain-general learning mechanisms. Our developed adult language, especially the language employed in scientific theory, affords a referential apparatus. Pronouns, tenses, and copulas belong to this apparatus, as do quantification and relative clauses. It enables us to talk about particular things—concrete or abstract—and formulate general truths holding for whole domains over which we can now quantify. Quine’s goal is to explain how we come to handle such a referential apparatus starting only with the means to acquire an indicator language, i.e., a language that allows signalling the presence of stimuli of a certain kind.

In presenting Quine’s account, we will distinguish three stages. The first stage comprises acquiring an indicator language and its extension to uses where a word’s sensory basis is absent. The second stage consists in acquiring sortals which involves the individuation of bodies. In the third stage, substituting variables is learned via acquiring relative clauses. Thereby, quantification and the ability to refer to objects are acquired.

2.1 Stage 1: signalling language extended

According to Quine, language acquisition sets out by learning to assent to and produce expressions strictly under those circumstances where sensory stimuli are present that belong to the similarity basis of the expression. A significant step in acquiring words is determining the boundaries of their similarity bases. By recognising the presence of a stimulus that belongs to the similarity basis of a word, speakers can be led to produce a word or to assent to another’s production of a word. “Mama” can be uttered when Mama is around, “red” when there is red, and “water” when there is water. A child’s early language is an indicator language.

Furthermore, this stage of language learning already enables the combination of words to form complex phrases such as ‘yellow paper’ or ‘smiling Mama’ (Quine, 1974, p. 59 ff.). Such combinations are similarly bound to the usage circumstances of their constituents. A combination of words can be uttered when the conditions for each constituent word are met in a certain way. Here, a yellow patch and a paper pattern must occur within the same bounded region of the stimulus. According to Quine, such combinations can already result in expressions that can be assented to under all conditions (ibid., p. 65 f.). The assertability of ‘snow is white’ (or, initially, ‘snow white’) does not depend on the occurrence of white snow. The assertability conditions of ‘snow’ are simply such that the assertability conditions for ‘white’ are also generally met. Such combinations can be assented under all conditions and comprise the first standing sentences.

In this line of thought, when certain prepositions are used, for instance, further forms of combination can likewise be learned based on recognising similarities between the circumstances under which they are used. ‘Mama is in the garden’ and ‘Fido is in the house’ share some relation of containment of stimuli (ibid., p. 61). Similarly, truth functions can be approximated by inductively generalising speakers’ verdicts to uses of truth-functional junctors. Speakers might assent, dissent, or abstain from composed sentences. Moreover, for each junctor, assenting, dissenting, and abstaining the whole sentence goes along with a characteristic pattern of assent, dissent, or abstention from its components. For instance, in the case of disjunction, speakers become disposed to dissent to each disjunct when the disjunction is dissented to and assent to the whole disjunction when at least one disjunct is assented. The disjunction is abstained from when one disjunct is dissented and the other abstained. Nevertheless, observed verdicts to the truth-functional disjunction leave it open whether the whole sentence is to be abstained from when both disjuncts are abstained. Verdicts are more primitive three-valued counterparts of truth functions (ibid., p. 75 ff., also cf. Harman, 1975).

As Quine points out, this enables quite some detachment from observation situations. We do not just learn to assent to ‘Snow is white’ because we have been exposed to episodes in which an adult assented to ‘Snow is white.’ We can productively combine words we have only been exposed to separately. The result is a language that can serve far more than merely signalling the presence of a certain kind of stimulus but which nonetheless contains no means to refer to objects.

2.2 Stage 2: individuation of bodies by general terms (sortals)

For Quine, language acquisition is becoming tuned to the similarity bases of words and their combinations. Adequate uses of mass terms—which apply to uncountable masses like water or yeast—as well as words that apply to only one thing, like names (‘Mama’ or ‘Fido’), can be learned simply by associating them with occasions in which stimulations that belong to the similarity basis for these words are present. In the case of ‘Mama’ or ‘Fido’, the similarity basis consists of all the views from arbitrary perspectives and partial occlusions, impressions from various modalities, and the continuities between subsequent impressions and over brief stretches when sensory contact is interrupted. Other words are based on similarities between pairs of stimuli (e.g., ‘darker than’) or on specific combinations of stimuli (e.g., ‘in’ or ‘is’ as in ‘Mama is in the garden’ or ‘snow is white’).

According to Quine (ibid., pp. 55 ff.), the acquisition of a word like ‘dog’ cannot proceed in the same way by associating it with a similarity base consisting of such presentations and continuities. General terms like ‘dog’ are associated with a second-order similarity between the similarity bases for each dog. Fido looks (sounds, feels, etc.) one way, Aliki another, and Laika looks still another way. What holds the applications of ‘dog’ together are the similarities between these similarities associated with each dog.

Expressions like ‘dog’, ‘Fido’, and ‘Mama’ differ from expressions like ‘water’ or ‘red’ in that masses and colours are not stably bounded. Pouring two portions of water together still makes water. Moreover, ‘red’ can be assented whenever red is present—irrespective of its shape. What counts as a dog or as Mama, on the other hand, is very much a matter of shape, of their relatively stable boundaries. ‘Dog’ and ‘Fido’ are still different in that Fido always happens to be there only once. However, in the case of dogs, several dogs can be around, each counting for a dog. They cannot be “poured or lumped together.” Each complete dog provides an occasion to utter ‘dog’ or assent to ‘dog’ utterances. Prolonged pointings accompanying utterances of ‘this is a dog’ must follow one dog. They may not jump from one dog to another (ibid., p. 56). By saying ‘dog’, a presentation is individuated as a dog (a body that belongs to the class of dogs) because the similarity basis of ‘dog’ consists of the second-order similarities between the similarities of presentations of each dog.

Moreover, Quine holds that ‘dog’ can be reduced to ‘same dog as’ (ibid., p.57 f.). Utterances of ‘same dog as’ are accompanied by pairwise pointings that happen to highlight points that always lie on one particular dog. Other particulars, for instance, geometrical figures, could not serve the same purpose. They can overlap, meaning that one point could lie on several such figures. Besides our patent “body-mindedness” (ibid., p. 54), this is why individuation sets out with bodies and not with abstract particulars like geometrical figures. According to Quine, in acquiring expressions like ‘same dog as’ lies “the inception of the identity predicate” (ibid., p. 58). With their built-in individuation of bodies, the application of sortals allows us to speak about particular bodies of a certain kind.

2.3 Stage 3: objective reference—variables and quantification

Quine, however, sees the application of a referential apparatus “all across the board” (ibid., p. 89) as the central aspect of our sophisticated scientific theory. His main goal is to explain how we acquire the referential linguistic apparatus that enables us to state identity between unspecified terms and refer to any object whatsoever, including what are traditionally considered abstract objects like numbers, sets, functions, or universals. According to Quine, two significant developments lead to acquiring a referential apparatus from the described initial language. The first is learning to use variables. The other is that quantification becomes objectual (ibid., p. 101).

Variables are expressions that serve as placeholders for other expressions. Their importance for a theory of reference can easily be overlooked because they are usually not considered referring terms themselves. However, Quine claims that the variables of quantification take a principal role in our referential scheme. Variables define the position that referring terms can take in a sentence by stripping them from any attribution.

In Quine’s explicitly speculative storyFootnote 1 of how variables enter a learner’s language, the use of variables begins with the acquisition of relative clauses. Relative clauses are initially learned in simple constructions. ‘I bought Fido from a man that found him’ can be turned into a relative clause by substituting ‘that’ or ‘which’ (or, we might add, ‘whom’) for ‘Fido’ and moving it to the front: ‘whom I bought from a man that found him.’ This relative clause can then be applied to ‘Fido’, whereby the initial sentence is reaffirmed (ibid., p. 89): ‘Fido is whom I bought from a man that found him.’ Sentences containing such relative clauses have an analogous structure to ‘Fido is a dog’ such that the relative clause can be seen as a complex predicate.

Relative clauses form sentences that can be asserted in just the same conditions as the original sentence. Both sentences are equivalent. According to Quine, over numerous expositions to such pairs of sentences, the language acquiring child learns that terms appearing in the argument position of a predicate can be substituted by a relative pronoun to form constructions that can then be used as predicates.

Quine notes that another construction serves the same role as relative clauses but does not require moving the pronoun to the front. Such-that constructions preserve word order. This brings out more clearly that these constructions are formed by substituting pronouns for nouns. The above sentence can thus be rendered as ‘Fido is such that I bought him from a man that found him.’ Something like the variables of formal logic comes about because, as Quine puts it, “[t]angles of cross-reference quickly arise” when a sentence contains several such constructions. ‘Fido is such that I bought him from a man such that he found him’ cannot unambiguously be disentangled. Variables then serve to mark the dependencies on ‘such that’. ‘Fido is an x such that I bought x from a man y such that y found x’ or ‘Fido is a thing x such that I bought x from a man y such that y found x.’ Quine imagines this rectification to be the essence of the relative clause. By learning relative clause constructions, children acquire (implicit) variables.

Quine emphasises that these variables are to be understood substitutionally on several occasions, meaning their role is confined to their replaceability by other expressions. To make this clear, Quine abbreviates ‘a is a thing x such that’ as ‘a vicè x’. When applied to a sentence F(a), in this abbreviated form, the result is ‘a vicè x F(x)’. Furthermore, by “dissociating our ‘a’ and ‘x’ from the category of singular terms” (ibid., p. 95), it can be seen that vicè “makes sense for any grammatical category. You could transform ‘How do you do’ and say ‘Do vicè x How x you x’” (ibid.). According to Quine, the variables acquired by learning to use relative clauses do not range over a domain of objects. They stand in for words.Footnote 2

The detachment from any domain of reference, together with its predicative structure, allows the analogical transfer of the such-that construction into other sentential contexts. Most notably, they can be used in the context of categorical statements. Already before the acquisition of variables of quantification, the universal affirmative categorical form [Every α is a β] can be learned by figuring out the unifying similarity between its instances, for example, ‘Every dog is an animal’, ‘All cars have wheels’, etc. Sentences of the form [Every α is a β] can be asserted insofar as the assertability of α is always accompanied by the assertability of β (ibid., p. 66).Footnote 3 By employing predicative such-that constructions in universal categoricals, the sentence form becomes ‘Everything x such that x is α is an x such that x is β.’ This introduces variables into categorical sentential contexts and thereby creates universally quantified sentences. Existential quantification results analogously from particular categoricals: ‘Something x such that x is α is an x such that x is β’ (ibid., p. 97 ff.).

From this substitutional understanding of variables and quantification, it is one further step to objectual reference. Because categorical constructions are learned from numerous exemplary categoricals involving sortals (which have individuation built-in), and because we do not generally have singular terms for all individuals classified by a sortal, the variables of quantification must range over objects, not just expressions. By saying ‘All dogs are animals’, we say that all instances of the similarity base of ‘dog’ are associated with instances of the similarity base of ‘animal’. In the substitutional sense, this would require a singular term for each dog that could be substituted for the quantification variables. However, Quine maintains, we do not have these singular terms at our disposal and can nonetheless make quantified statements about dogs. As he puts it: “The namelessness of apples and rabbits was what showed us that our variables had gone objectual” (ibid., p. 103). According to Quine, this is reference in the full sense because the objects over which variables range are stripped from any sortals that would individuate them. Saying which kind of things they are is only taken over by the predicates employed in a quantified statement.

3 Critique

Starting with the idea that sortals individuate, we are going to argue that Quine employs lines of reasoning at each stage of acquisition that effectively presuppose what is to be explained, namely, reference to objects. Concerning sortals, we are not going into the discussion between Quine and Geach (e.g., Geach, 1967, 1973) about relative identity. Our line of argument is more in line with Ayers (1974), who held that the notion of an object is fundamental to human Cognition and provides identity criteria—although ‘object’ is not a sortal. Correspondingly, objects appear to have sortal-independent individuation criteria.

3.1 Sortals do not individuate

Quine holds that individuation is acquired by learning to use sortals (see Stage 2 above). This idea strongly influenced empirical research and received support, for instance, from Xu (2002), who argues that linguistic labels play a role in acquiring sortal concepts in infancy. Moreover, sortal individuation is taken to be so central that it is even attributed to infants in a pre-linguistic stage (Cacchione et al., 2016; Mendes et al., 2008, 2011; Santos et al., 2002), and it is argued that it might apply to certain primates (Phillips et al., 2010). In this section, we are going to discuss the idea that sortals individuate critically.

According to Quine (1974), as opposed to names, sortals have second-order similarity bases. Furthermore, sortals—as opposed to names—individuate because they may apply to several cohesive patterns simultaneously, each of which comprises a complete exemplar of the similarity basis of a sortal expression. Moreover, ostensions accompanying their learning “must go with the grain” (ibid, p. 55 f.).

There are several problems with holding that sortals, in the way presented by Quine, serve to individuate bodies.Footnote 4 (i) Irrespective of how we spell out the second-order similarity that allegedly distinguishes sortals from mass terms and names, we do not achieve the individuation of exemplars of a kind. (ii) The simultaneous presentation of several complete members of the similarity basis does not help individuation. (iii) Different kinds of ostensions cannot serve to disambiguate between mass terms, names, and sortals.

3.1.1 Second-order similarity bases

One way of characterising second-order similarities is to say that second-order similarities are similarities between similarities. Say, we have two balls. These two balls are similar concerning their shape. Two cubes are equally similar concerning their shape. Cubes and balls do not have the same shape. Nonetheless, the two pairs are similar regarding the respect in which their members are similar, i.e., shape. In this sense, the similarity of the objects in each pair is similar. However, such a second-order similarity is not what makes different dogs similar to each other. Dogs have similar shapes, smell and sound similarly. They are not just similar concerning how each presentation of a dog is similar to another presentation of that dog.

Learning to use any expression is to become tuned to the correct similarity basis associated with the expression. All presentations that become associated with an expression must be similar in the proper respect. The case is no different for sortal expressions. Although taken together, the respects in which all presentations of dogs are similar are different from the respects in which all presentations of Fido are similar; this is not a matter of dog similarities being higher-order similarities. Dogs have four legs, fur, pointy teeth, a tail, and bark. Fido has all that, and in addition, his fur is spotted and makes these funny noises when someone rubs his tummy. In this sense, both similarity bases are first-order. The respects in which all dogs are similar comprise a subset of the respects in which impressions of Fido (or any other particular dog) are similar.

There might be another sense in which ‘dog’ has a second-order similarity basis as compared to ‘Fido’. As Quine puts it: “Already in learning the name ‘Fido’ the child depended on the similarity between one presentation of Fido to another, and of one phase of a sustained presentation of Fido to another. In learning the general term ‘dog’ he has to appreciate a second-order similarity between the similarity basis of ‘Fido’ and the similarity bases determining other enduring dogs.” (ibid., p. 56) The idea is that children first learn the similarities among presentations of individual dogs, and then they learn to discern the similarities among these similarities.

However, we must be cautious not to use objects to draw the boundaries between the alleged first-order clusters of similarities. Quine emphasises that the child does not yet individuate objects at the stage of language learning where names and mass terms are acquired. Objects are only individuated by sortal expressions. This means that a presentation’s being of a particular dog (e.g., Fido) cannot be involved in demarcating the boundaries of the similarity basis of the expression. Put differently: All the child has at her disposal to decide which word to use are similarities between what she sees, not the objects that are similar. —What, then, could delimit the similarity bases among which second-order similarities are discerned in a way that could serve to individuate the exemplars of a sortal expression?

We can try to spell out the idea by postulating that the similarities between presentations of one dog are commonly closer than the similarities between presentations of several dogs. The idea then seems to be that children get tuned to narrow similarity bases first and then learn that these narrow similarities are all similar, forming a broader similarity basis. Because this broader similarity is extracted from narrow similarities, it is deemed second-order. However, one difficulty is that presentations of one object need not be more similar to each other than presentations of different objects. For instance, two different black dogs lying on the lawn might be more similar than two subsequent presentations of the same dog, first lying, then standing. Based on similarity alone, the similarity bases a child might acquire need not correspond to the objects classified by a sortal expression.

At first, this might not appear to be a problem. After all, a language-learning child need not get all boundaries between objects right from the outset. However, the difficulties run deeper. A term’s similarity basis is the result of determining the relative similarity between presentations. Any two subsets of a set of presentations can stand in the narrow-to-wide second-order similarity considered here. This can lead to a wide range of different partitionings into narrow similarity bases and, correspondingly, to different kinds of “objects” individuated by a sortal. Without a further criterion or a special kind of similarity, no division into narrow similarity bases is privileged over the others. Thus, by themselves, narrow similarity bases do not ‘select’ the bodies that are classified by a sortal.

More importantly, the similarity basis of any expression can be partitioned into more narrow similarity bases, irrespective of whether it is a sortal, proper name, or mass term. Thereby, ‘red’ could, for instance, be seen as a sortal for different shades of red, ‘water’ for different ways water might appear, and ‘Fido’ for different temporal stages of Fido. By itself, the closeness of similarity cannot distinguish these kinds of terms and, therefore, cannot ground that sortals have their individuation built-in.

There is still a more general difficulty in the idea that individuation resides in second-order similarities. Recall that Quine holds that terms for individual bodies do not individuate. Individuation only comes in when sortals are learned. Sortals enable children to appreciate that any dog is just one among many dogs. Allegedly, second-order similarity is required to distinguish individual dogs as it brings together all the dogs.

However, if distinguishing one dog from another is at the heart of acquiring second-order similarities, learning to use ‘Fido’ would likewise have to rest on second-order similarity. After all, if one is to use ‘Fido’ correctly, one has to distinguish Fido from all the rest (implicitly, this includes all the other dogs). Quine himself notes that there is a hierarchy of ever more encompassing second-order similarities that provide the bases for ever more general terms: from ‘dog’ and ‘rabbit’ via ‘animal’ to ‘body’ or ‘thing’ (Quine, 1974, p. 56). At the same time, he holds that the similarity basis of names like ‘Mama’ or ‘Fido’ involves their boundedness, cohesion, and the constancies that make for Mama’s and Fido’s being bodies. Indeed, these continuities are likewise critical to distinguishing one dog from another. Distinguishing Fido, then, would appear to rest on some encompassing second-order similarity associated with bodies or things. It is hard to see why sortals should help to distinguish individual dogs while names do not help distinguish individual bodies. This undermines the idea that sortals individuate because of their second-order similarity bases.

More seriously still, if we are not to dogmatically claim that the initial ontology must be a bodies-ontology, learning to use a mass term like ‘water’ or ‘red’ would likewise appear to involve second-order similarities that allow a speaker to distinguish water and red from all the rest, thereby individuating colours, liquids, or whatnot.

3.1.2 Simultaneous presentations of several dogs

Letting go of the idea that individuation rests on second-order similarity, we can still discern another idea about what the built-in individuation of sortals might rest on. While there is always only one Fido, red, sugar, or water presentations can come in simultaneous scattered portions. “Similarly we can be confronted by many dogs at once” (Quine, 1974, p. 55). Note that sortals are distinguished from names in that presentations can be scattered over several simultaneous portions of the sensory field. However, there is no reason to think that the simultaneous presentation of several dogs would promote the acquisition of individuation.

Recall that the language learning child does not yet grasp that ‘Fido’ refers to just one object while ‘dog’ classifies many objects. Nor does the child know that we call ‘one object’ what moves continuously and can appear only once in one’s sensory field. Correspondingly, the child does not know that, other than ‘dog’, which can apply to many, ‘Fido’ denotes a single object. Learning to apply ‘Fido’ to just one dog involves narrowing down on the similarity basis associated with Fido. And learning to apply ‘dog’ to any dog involves becoming sensitive to the similarities among all dog presentations. This does not require the simultaneous presentation of several dogs, nor does the simultaneous presentation appear to make a difference. If one does not already know what counts as one presentation as opposed to several—which is the guiding assumption—being confronted with several presentations simultaneously will simply lead to attuning to a different similarity basis. To distinguish presentations of just Fido from simultaneous presentations of several dogs, it is not necessary to individuate different dogs. Already on a superficial level, a presentation of Fido and a presentation of several dogs look different. Without presupposing that the former is a presentation of one dog and the latter of several, this difference is merely a difference in patterns. A child learns to use ‘Fido’ and ‘dog’ adequately because both expressions have different similarity bases. The case is no different for synchronic and diachronic presentations.

3.1.3 Protracted ostensions that “go with the grain” and single gestalts

We can find still another idea about how sortals involve individuation. For Quine (1974), the kind of ostensions that accompany the learning of an expression can make the whole difference as to which similarity basis becomes attached to the expression. Sweeping gestures of ostension can highlight a prolonged presentation of Mama through various visual distortions. Moreover, when Mama is wearing a red shawl, the similarity bases of ‘Mama’ and ‘red’ can be disambiguated by prolonged ostensions that either follow Mama or her red shawl, depending on which expression is to be learned (Quine, 1974, p. 53). For sortal expressions, Quine notes that: “The great difference between the ostensive learning of a name or a mass term like ‘Mama’ or ‘red’ or ‘water’ and the ostensive learning of a general term like ‘dog’ is that the latter must go with the grain. The sustained dynamic pointing that accompanies the word ‘dog’, or the words ‘This is a dog’ or a still longer pleonasm, must not jump dogs; and it can be protracted and repeated as necessary” (ibid., p. 55 f.).

Partitioning the similarity basis of ‘dog’ into sets of presentations associated with particular dogs would thus result from the kind of ostensions that must accompany learning to use a sortal expression. These ostensions would highlight the single, unified Gestalt that characterises presentations of bodies. The idea seems to be that ostensions to bodies must somehow highlight their cohesion, avoiding mixing up the boundaries of several bodies. Which gestures might achieve this? At first, it might seem that following a body with a pointing gesture might help to highlight its boundaries. We have already seen that following Mama or her red shawl can help disambiguate between Mama and red. But this means that it does not highlight Mama’s boundaries, for we can likewise follow her red shawl as an example of red.

Maybe, then, we can make sweeping gestures to highlight the boundaries of a body, its shape. However, sweeping gestures, wiping over a surface or drawing along the boundaries, can likewise be used to highlight a sample of red. The difference then must be that gestures that highlight presentations of dogs “may not jump dogs”, meaning that presentations of red would be allowed to jump patches of red. When several disparate presentations of red are present, we can point to them subsequently—or, for that matter, sweepingly—and say ‘red, red, red’ or just one long ‘reeeeed’ or ‘this is red’ (pleonastically to prolong ostension). However, we can likewise point to several dogs subsequently or sweepingly saying ‘dog, dog, dog’ or ‘doooooog’ or ‘this is a dog and this another’ (again, pleonastically) much in the same manner as for red, and this would enable a child to become attuned to the similarities among dogs. Moreover, pending understanding of ‘this is a…’ and ‘this is another…’, even changing the idiom here would not help individuating dogs.

Quine’s attempt to reduce ‘dog’ to the relative mass term ‘same dog as’ does not help either. According to Quine’s account, ‘same dog as’ is sufficient to individuate because “each dog consists of just the points that are on the same dog as some one point” (ibid., p. 58). Therefore, pairwise ostensions accompanying ‘This is the same dog as this’ point to the same dog twice and can thereby provide examples for the similarity basis of ‘same dog as’. This is contrasted with geometrical figures that could overlap. The points to which an ostension is directed could lie on several figures, undermining individuation if ‘same circle as’ is only accompanied by pairwise pointings. For an account of the acquisition of individuation, however, the difficulty is that a language acquiring child who lacks the notion of identity cannot interpret the pairwise ostensions as two pointings to the same dog. Nonetheless, everything hinges on realising that the two pointing gestures are directed towards points that lie on the same dog. To acquire the similarity basis of ‘same dog as’, the child must realise (implicitly) that the two ostensions are directed towards one body. This is individuation already.

Generally, while ostensive gestures can highlight a region in a scene, they cannot determine the kind of similarity that one is to attend to. This holds for ‘Mama’ and ‘Fido’ just as it holds for ‘dog’ and ‘red’. Ostensive gestures cannot determine whether they are meant to hint at the similarity basis of a name, a mass term, or a sortal.

3.2 Quine’s bodies are already objects

Recall that Quine holds that children do not refer to objects from the outset and that names and mass terms can be learned without individuating objects. Individuation only comes in when sortals are acquired, and, at this stage, it is restricted to the individuation of bodies of a certain kind. Moreover, according to Quine, acquiring the relative mass term ‘same dog as’ is only the inception of identity. The full grasp of identity—and thereby reference to objects—requires the acquisition of quantification. Quine’s whole point is to give an account of how reference to objects develops from an ability to use expressions for bodies in a language without a referential apparatus via learning to use sortals and the variables of quantification. In this section, we are going to argue that Quine’s account effectively presupposes what it is meant to explain, namely, reference to particular objects. The problem is that the individuation of bodies by sortals draws on individuative spatiotemporal characteristics of bodies. At the outset, it will be helpful to clarify what Quine means by saying that “man is a body-minded animal among body-minded animals” (ibid, p. 54) and how this need not involve objectual reference.

3.2.1 Bodies and physical objects

When introducing bodies and their importance for language learning, Quine points out that he sees “little point, now or later, in trying to make a notion of body precise. Bodies are things like Mama and Fido and other animals, also apples, cups, chairs” (Quine, 1974, p. 54). He seems to think that the notion of bodies is unproblematic, giving characterisations of what is important about distinguishing bodies repeatedly. Employing Mama as a paradigmatic example of a body, Quine summarises that “Mama comes as a single Gestalt, unified by a range of continuities: displacement, visual distortion, discoloration.” (Quine, 2000, p. 3; cited from Keil 2002, p. 162, note 10).

On the other hand, objects are not merely associated with presentations that are unified by these continuities. Objects have individuation criteria. For instance, physical objects have spatiotemporal individuation criteria (Quine, 1974, p. 54). Thus, distinguishing bodies does not appear to require individuation but only a “readiness to recognise” characteristic continuities (ibid.).

However, looking closer at the characteristic continuities, we see that bodies are distinguished in spatiotemporal terms, much like physical objects. The continuities centrally include continuities of motion and persistence—two fundamentally spatiotemporal notions. The main difference between bodies and physical objects is that ‘physical object’ does not impose any restrictions on the kind of spacetime regions that can individuate. Physical objects can be arbitrarily gerrymandered, whereas bodies are “clumpy”; they are cohesive wholes. However, allowing arbitrarily gerrymandered spacetime regions for the individuation of physical objects does not mean that more restricted, cohesive spacetime regions associated with persisting bodies do not individuate. According to Quine, both physical objects and bodies have spatiotemporal characteristics. Prima facie, spatiotemporal characteristics should individuate bodies just like spatiotemporal characteristics individuate physical objects.Footnote 5

3.2.2 Continuities of displacement and deformation

One might defend Quine’s account by arguing that spatiotemporal properties only provide individuation criteria for cognitive systems that have already acquired identity. Learning to employ our linguistic referential apparatus turns spatiotemporal characteristics into individuation criteria. The ability to individuate at all hinges on the acquisition of variables of quantification.

However, employing spatiotemporal continuities to distinguish bodies must already involve individuation. This is because discerning continuities of displacement and deformation requires a frame of reference against which motion and deformation occur. The possession of such a spatiotemporal frame of reference, in turn, ipso facto serves to individuate objects. ‘Frame of reference’ can be characterised as “a standard relative to which motion and rest may be measured; any set of points or objects that are at rest relative to one another enables us, in principle, to describe the relative motions of bodies. A frame of reference is therefore a purely kinematical device, for the geometrical description of motion without regard to the masses or forces involved” (DiSalle, 2020). Notably, a frame of reference is given by a set of points or objects. These points or objects are distinguishable and re-identifiable by their relative position. That is, they are individuated. Moreover, the frame of reference makes any point re-identifiable by its relative distance within the frame of reference. Spatiotemporal frames of reference that are used for interacting with one’s environment are attached to something featural, and whatever such a frame of reference is attached to thereby becomes something that is re-identifiable within that frame of reference. For instance, our global frame of longitudes and latitudes is attached to Greenwich.

Displacement or deformation cannot be discerned without recourse to a frame of reference that provides individuation criteria for objects because discerning displacement or deformation requires discerning something displaced or deformed. To ascribe motion or deformation, one must conceptualise something that remains identical over time. Without a notion of identity, one could only notice changes in patterns, not displacements or deformations. Irrespective of how regular such changes are, they would not lead to discerning one body as opposed to another and, correspondingly, would not unify the patterns to form a body that is displaced or deformed.

3.2.3 Bodies do not overlap

Note that in describing the “inception of the identity predicate” (ibid., p. 58), Quine uses spatiotemporal individuation criteria without restraint to explain why ‘same dog as’ individuates: dogs consist of all the points that lie on the same dog as any one point, that is, dogs do not overlap. The ostensibly highlighted points are distinguished in space by their relative position. This requires a spatial frame of reference, and it involves that the points are unequivocally distinguished from each other and can be re-identified; that is, they are individuated. If the points towards which ostensions are directed were not already individuated, pairwise pointings to one dog would not ensure that the expression individuates. Its similarity basis would just contain a range of non-spatially understood continuities. The individuative force of ‘same dog as’, thus, presupposes that children already recognise that dogs are distinguished by the spacetime region they occupy.

Overall, Quine’s account only appears to be successful because the objects allegedly individuated by a sortal are effectively already individuated by their characteristic spatiotemporal continuities (cf. Harman, 1975, for a similar idea to the effect that spatiotemporal individuation is involved in the attributive combination of expressions). Employing the ‘same dog as’ idiom effectively draws on spatiotemporal individuation criteria for bodies.

3.3 The acquisition of variables presupposes identity

Quine introduces variables via acquiring relative-clause constructions and their analogical portation into categorical sentences. ‘I bought Fido from a man’Footnote 6 can be turned into a relative clause by substituting a relative pronoun for ‘Fido’: ‘whom I bought from a man.’ The variable character of the relative pronoun comes out more clearly in Quine’s thing-x-such-that formulation of the relative clause: ‘Fido is a thing x such that I bought x from a man.’ The variables in relative-clause constructions allow for the substitution of ‘Fido’ for the variable ν mentioned in the ‘thing ν such that’ operator. The operator is then transferred into categorical sentences on the analogy between relative clauses and predicates. According to Quine, this gives variables of quantification.

3.3.1 ‘Fido is a thing x such that x is F’

According to Quine, the acquisition of the relative clause and, thereby, of first variables is the result of learning, ‘by abundant example’ (p. 94), the interchangeability of sentences of the form ‘I bought Fido from a man’ and ‘Fido is a thing x such that I bought x from a man’. Quine introduces ‘is a thing x such that’ by first introducing ‘Fido is such that I bought him’ as interchangeable with ‘I bought Fido’ and then claiming that the word ‘thing’ is merely used as an adaptor to bring the adjectival ‘is such that I bought him’ into substantival form ‘is a thing x such that…’.

However, this change in formulation comprises a crucial change in what the sentences mean. This is evident from the functioning of the pronouns, which is entirely different in either formulation. In the adjectival ‘such that’ formulation, the pronoun anaphorically refers to Fido: ‘Fidoi is such that I bought himi from a man’. In the substantival formulation, on the other hand, the pronoun is a non-referring expression that is bound by a quantor: ‘Fido is a thing xi such that I bought iti from a man.’

Moreover, it is evident that the two sentences say different things. While ‘I bought Fido from a man’ is a simple predication of the form R(a,b), ‘Fido is a thing x such that I bought x from a man’ would have to be formalised as a quantified sentence of the form ∃x(R(a,x) ∧ x = b). Moreover, even without formalising the respective sentences, it is clear that the initial formulation is a straightforward predication, claiming that Fido falls under the predicate following the ‘is such that’ (being bought by me from a man). At the same time, in the ‘thing x’-formulation, x is the subject of the predication, and it is said that Fido is identical to this thing x. This becomes clearer when we reverse the order of the sentential components: ‘A thing x such that I bought x is Fido.’ Quine’s affirmation to the contrary that ‘thing’ is merely introduced as an adaptor, the interchangeability of the above sentences is effectively licensed by the identity of Fido and the thing to which the quantified expression applies. Note that the introduction of variables hinges on this interchangeability.

3.3.2 ‘Thing x such that’ in categorical sentences

One might defend Quine’s account by claiming that neither sentence can be analysed as having its canonical logical form at this stage of language acquisition. Only in hindsight, the sentences we acquire when we learn to speak receive the logical analysis that is adequate within our referential apparatus.

However, learning to use variables comprises a significant shift in the way we use language, which either presupposes what it is to explain, namely, identity/reference, or remains unexplained. Recall that the initial use of ‘I bought Fido from a man’ rests on having tuned in on the similarity bases of the involved expressions: ‘Fido’, ‘I’, ‘bought’, ‘from’, ‘a man’. The formulations involving ‘such that’ and ‘thing x such that’ are now introduced as mere variants of the same sentence. They have the same complex similarity basis. For the adjectival formulation ‘Fidoi is such that I bought himi from a man’, it is easy to see how it is just another way of putting together the similarity basis of the original sentence. The sentence specifies by which expression the pronoun ‘himi’ is to be substituted. ‘himi’ anaphorically refers to ‘Fidoi’ and thereby makes the same contribution to the sentence’s similarity basis. If the equivalence of ‘Fido is such that…’ and ‘Fido is a thing x such that…’ is to hold, Quine must be committed to thinking that ‘is a thing x such that… it’ functions like ‘is such that … him’ in the adjectival formulation and simply provides another way of getting at the same similarity basis. After all, the sentence contains all the same components and some additional expressions claimed not to contribute to its similarity basis. Note that, like the adjectival formulation, the substantival formulation states which expression should be substituted for the variables.

Quine now claims that the relative clause is transferred into categorical sentential contexts on the analogy between ‘is a thing x such that’-constructions and predicates: ‘is a thing x such that x is F’ can be used in the same position as the predicate ‘is F’. ‘thing x such that x is F’ thus appears to be employable in the same position as general terms (e.g., ‘F’) in other sentential contexts, notably, in categorical constructions. The transfer of relative clauses to universal categoricals gives sentences like: ‘Everything x such that x is a dog is a thing x such that x is an animal.’

However, in such categorical sentences, the contribution of ‘thing x such that’ to a sentence’s similarity basis cannot be the same as in the particular relative clause. There, the expression that has to be substituted for the variable(s) is explicitly stated (cf. Richards, 1979). This makes transparent how the similarity basis of the sentences depended on the similarity bases of the involved expressions. However, categorical sentences do not state which expression(s) must be substituted for the variable(s). For instance, we cannot go back and forth between a sentence about particular dogs and a quantified sentence by substituting dogs’ names for variables. It is indeed this loss of eliminability, as Quine puts it, that constitutes the usefulness of ‘thing x such that’-constructions in categorical sentential contexts and the acquisition of variables of quantification.

Nonetheless, Quine’s account leaves unexplained how children come to associate any substitution relation with the ‘thing x’ operator and the variables in a quantified sentence. Recall that children do not yet have acquired the notion of a thing as a re-identifiable particular. Nor do they have long lists of names for dogs, say, which could be substituted. While variable substitution in relative clauses consists in replacing variables and explicitly used singular terms, in the case of categoricals, substitution depends on the acquisition of a different rule which somehow serves to determine what can be substituted for the employed variables. Such a rule effectively involves identity criteria for objects.Footnote 7

The decisive step for acquiring variables in categorical sentences is to “decouple” ‘thing x such that’ from the anaphoric relation to ‘Fido’. Thereby, the relative clause is transformed from a classification of Fido into a predication about x. As ‘x’ is variable, this ‘decoupling’ is nonsensical if unsupported by substituting something for ‘x’. We cannot say anything about x. For the sake of argument, we can grant Quine that in the case of our example particular sentence, the substitution is licensed by explicitly mentioning Fido. In the case of the categorical sentence, however, no expression that could be substituted is mentioned, and the category of ‘things’ must licence the substitutions that give evaluable predications by substituting one thing after the other and checking whether the sentences’ truth conditions are fulfilled. Thereby, the acquisition of variables of quantification presupposes a domain of ordinary objects. This domain cannot be created by attaching bodies to variables already employed in the substitutional sense.

4 Conclusions

This article critically re-assessed Quine’s account of the acquisition of reference and object individuation. We criticised three main points of his developmental story after a brief presentation of the central stages of the acquisition of reference according to Quine (1974). We argued: (i) that sortals do not individuate, because they do not have second-order similarity bases and all other attempts at arguing that sortals individuate effectively presuppose individuation; (ii) that bodies are objects already because the continuities that hold the similarity bases of expressions for bodies together provide individuation criteria; and (iii) that the acquisition of variables of quantification presupposes identity because variable substitutions are licensed by a concealed appeal to a domain of ordinary objects.

At each step of our critique, we aimed to show that Quine (1974) presupposed identity by relying on individuation criteria for ordinary objects, that is, spatiotemporal individuation criteria. In this light, Quine’s (1974) demanding linguistic account of reference presupposes what it aims to explain. It does not show how reference is acquired in the first place but how reference to ordinary objects is generalised to a referential apparatus in a scientifically stratified way that allows reference to anything we find fits our theoretic needs. This is no small achievement. Nevertheless, it does not come up to Quine’s own objective (e.g., ibid., p. 81 f.).

Although Quine’s account is one of the most prominent attempts at giving an indirect theory of reference, this critique does not permit the conclusion that reference must be direct. However, we take it that our critique is diagnostic of what is required for any successful theory of reference. The success of any such account hinges on explaining how cognitive systems can process spatiotemporal individuation criteria for ordinary objects.

In principle, an explanation of spatiotemporal object individuation could be provided by an account of how cognitive systems process spatiotemporal identity criteria for ordinary objects independently of having acquired referring expressions. In this case, the processing of spatiotemporal identity criteria would have to be explained without committing the misunderstanding that spatiotemporal continuities are just some other perceptual feature that could be discerned, like similarities between colour patches, for instance. Transformations and displacements are no mere perceptual features because they involve an implicit understanding of identity. For a feature change to count as a transformation or displacement, it must be located in a non-featural frame of reference.Footnote 8

Alternatively, spatiotemporal individuation and reference could be explained conjointly, making clear how cognitive and linguistic development interleave. Such an approach would have to trace the developmental intermediaries that lead to the incremental acquisition of reference and object individuation. This would include an explanation of how the usage rules of referring expressions are extracted from one’s learning history.