1 Introduction

Lopes (2021) has critiqued my discussion of neural networks and dynamical systems theory in relation to phenomenology (Yoshimi, 2016), with the following argument:

  1. 1.

    Neural networks employ the empiricist theory of abstractionFootnote 1

  2. 2.

    The empiricist theory of abstraction entails psychologism

  3. 3.

    Therefore, neural networks entail psychologism

  4. 4.

    Neural networks are a species of dynamical systems theory

  5. 5.

    Dynamical systems theory entails psychologism

  6. 6.

    Therefore, neural networks entails psychologism

I will use “empiricist theory of abstraction” and “psychologism” in the same way Lopes does. The empiricist theory of abstraction is a philosophical theory that “reduces one’s phenomenological ability to intend types… to one’s past history of intending tokens.” Psychologism is “the doctrine that the laws of mathematics and logic can be reduced to or depend on the laws governing thinking.” Since Husserl rejects psychologism, the argument suggests that Husserl cannot be interpreted using neural networks or dynamical systems theory.

Premises 1 and 5 do all the work in Lopes’ argument, and both are false. Thus, the argument is unsound. Unpacking the mistakes in Lopes’ argument provides an opportunity to revisit some old debates in philosophy of cognitive science, in particular debates associated with the “framework wars” that unfolded between different approaches to cognitive science in the 1980s, including classical AI, connectionism, and dynamical systems theory.Footnote 2 Though that war has died off, new skirmishes have taken their place, as new frameworks (and variants on earlier frameworks, like deep learning) vie with each other.

It’s a timely and important topic for readers of this journal. Responding to Lopes provides an opportunity to clarify what a pluralist neurophenomenology looks like, one that incorporates the best insights of multiple paradigms in a common framework, including the most recent advances in such fields as deep learning and Bayesian cognitive science. In a pluralist framework, continuous laws governing the dynamical evolution of sensory expectations can exist alongside formal rules governing language and reasoning. Moreover, these are just two examples within the vast landscape of contemporary cognitive science, where theories and methods at multiple scales complement one another in a coordinated effort to understand the embodied brain and its relationship to cognition and consciousness.

In Section 2, I articulate the form of explanatory pluralism I rely on throughout the paper and develop a few salient features of the view. In Section 3, I argue that neural networks and dynamical systems are philosophically neutral tools and use this observation to disarm the main lines of Lopes’ critique. In Sections 4, 5, and 6 I use historical and conceptual arguments to defend explanatory pluralism as a research strategy in cognitive science, phenomenology, and neurophenomenology. In each case, dynamical systems theory, classical computational theories, and other forms of study complement and inform one another. In Section 7, a final component of Lopes’ argument is addressed, and in the process the virtues of a pluralist framework are further exemplified, by showing what a substantive debate in the area looks like, and by describing a specific pluralist model of symbolic reasoning.

2 Explanatory pluralism

The background of this paper is a version of explanatory pluralism that was initially developed as an approach to cognitive science. It is the view that “no cognitive or behavioral phenomenon can be exhaustively described by reference to a single temporal or spatial scale or theoretical framework” (Abney et al., 2014). Rather:

different frameworks, paradigms, and tools can be jointly applied to the study of cognition, considered at multiple spatial and temporal scales…The question thus should not be which one scale of analysis or which one theoretical framework is the right one to target and study for a given phenomenon… the issue is which scales and which theoretical frameworks are relevant for the question at hand, and how they relate to each other (Noelle & Yoshimi, 2022).

The topic of explanatory pluralism is extensive; here I focus on a few salient points needed to support my larger argument.Footnote 3

First, the world is studied at many levels and studies at these different levels interact in non-trivial ways. The claim bears emphasis because some deny it. In particular, Fodor—who Lopes draws on—famously argued that special sciences should be treated as autonomous from lower-level sciences (Fodor, 1997; Fodor, 1974), even if there are ontological dependency relations between them. An analogy clarifies the idea. One can write computer code in a high-level programming language like Python without knowing anything about how electrical circuits work, even though the code is usually run on a computer that is built out of electrical circuits. In the same way, this view goes, one can study the symbolic structures of human cognition without paying close attention to neuroscience, even though human cognition depends on neuroscience. I have argued against this view elsewhere (Yoshimi, 2011a), as have other like-minded pluralists (Dale, 2008; Giere, 2006).Footnote 4 Across the sciences, workers at different levels routinely and profitably work closely with one another. An impressive range of proposals for what these interactions look like in practice has emerged, including: top-down constraints (Abney et al., 2014), bottom-up scaffolding (Abney et al., 2014), overlapping perspectives (Giere, 2006), and heuristic identities (McCauley & Bechtel, 2001).

Second, I define levels in a way that applies both to natural sciences and to phenomenology, and thus facilitates bridging the two. I take a level to be a domain of observable entities or phenomena, where one level N is “above” another level N-1 when the entities in level N-1 are mereological parts of entities in level N.Footnote 5 For example, the level of organs is above the level of atoms, because atoms are mereological parts of organs. Levels of physical reality are associated with “scales”, that is, units of measurement prototypically associated with measurements at that level. I am not assuming that levels are defined by scales; I am making the weaker claim that items at a given level are prototypically associated with certain kinds of measurements or observations, which are in turn associated with standard units of measurement, at least in the physical case. Levels are also prototypically associated with theoretical frameworks and approaches to formal modeling. Thus, in some fields differential equations are common, in others finite state machine. Again, the linkage is non-essential: differential equations can in principle be used in any science but in practice are used where they provide the most insight.

Third, this is not just a type of explanatory pluralism, but also a “perspectival” pluralism (Giere, 2006), according to which we piece together the best picture of reality we can by way of a plurality of perspectives.Footnote 6 I think of it as an updated version of the old story about the blind people and the elephant. Not a bunch of mutually ignorant individuals touching an elephant and coming up with limited theories, but rather a cooperative group of differently-abled individuals collectively developing the fullest picture they can. The idea is well understood within neuroscience, where a range of methods from implanted electrodes to EEG and fMRI are each associated with proprietary temporal and spatial scales and modes of analysis (Bojak & Breakspear, 2014). The same idea applies to phenomenology. As we will see, some phenomenological studies occur at the level of sensory experiences and protentions unfolding on a scale of milliseconds, while other studies occur at the level of symbolic thoughts unfolding at the level of seconds (and there are other scales besides these). In particular, we will see that the concepts and metaphors used in the literature to describe neural networks and dynamical systems (clouds of meaning, shifting or sculpted similarity spaces, fluxes, etc.) are perfectly compatible with the concepts used to describe computational theories (content identities, rules, universals, transformations, etc.).

3 The philosophical neutrality of neural networks and dynamical systems

Lopes’ first premise is that neural networks employ an empiricist theory of abstraction, and his second premise is that this in turn entails psychologism.Footnote 7 But neural networks are just a formal tool, a model of what neurons or neuron-like elements do when they are wired together. There are hundreds of types of neural network architecture, some geared towards modeling biological neurons, some geared towards building more abstract machine learning models. These tools are logically independent of philosophical theories of abstraction, or more generally theories concerning the ontological status of logic, mathematics, or mental content. In fact, they do not carry any philosophical commitments at all. It is true that neural networks are sometimes used to support philosophical theories, but they are not thereby logically tied together.

One way to see this is to note that people who argue against empiricism, like Fodor, also allow neural networks as part of a theory of how computational mechanisms are implemented in the brain (Fodor & Pylyshyn, 1988). In fact, using neural networks to implement logic gates was a core project of the earliest neural network researchers, McCulloch and Pitts (McCulloch & Pitts, 1943), back in the time before cognitive science—let alone its division into camps—existed. It is not hard to find more examples of theorists (e.g. Chomsky) who are fine with neural networks as tools, but who don’t endorse empiricism.Footnote 8

The situation is similar with dynamical systems theory. Dynamical systems are mathematical objects with a formal mathematical definition. A dynamical system is a map \(\upphi:S\times T\rightarrow S\) that satisfies a few conditions (Hotton & Yoshimi, 2011). Given any initial state in a set of states S and any future time in a set of times T, a dynamical system tells us what future state will occur. The set S is almost always some kind of structured space, e.g. a metric space, or “similarity space”, a fact emphasized by some who use neural networks. What is distinctive about dynamical systems is the visual and conceptual tools used to analyze their state spaces, e.g. phase portraits that show attractors and repellers, or bifurcation diagrams that show how attractors emerge and disappear as parameters are varied.

Dynamical systems encompass an extremely broad range of systems. Differential equations are probably the best known as examples of dynamical systems because they are often analyzed using the visual and geometric tools associated with dynamical systems theory. Neural networks are also, as Lopes says, a species of dynamical systems. But many computer programs and even Turing Machines also fit the formal definition. A Turing Machine is a dynamical systems on a state space consisting of binary sequences (Delvenne et al., 2004).Footnote 9 Thus, when Lopes says “if one assumes one’s theory of meaning to be explicable in terms of dynamical systems, the only option would seem to be the empiricistic theory of abstraction”, he is contradicting himself, since Turing machines are dynamical systems, Turing machines are the basis of his own preferred theory of mind, and he does not endorse the empiricist theory of abstraction.

Lopes’ mistake is to think that dynamical systems are necessarily empiricist. But neural networks and dynamical systems are logically independent of philosophical theories in metaphysics, epistemology, etc. This should be obvious given what a broad class of systems they encompass. Any science or theory that draws on differential equations, or for that matter deterministic computer programs, is using dynamical systems, i.e. pretty much every science. So Lopes’ fifth premise is false.

Even if neural networks and dynamical systems are philosophically neutral tools, some do make use of them in elaborating philosophical theories. For example, some make a distinction between dynamical systems theory as a formal theory, and the “dynamical hypothesis in cognitive science” (van Gelder, 1998) or the “dynamical stance” (Chemero, 2000), where the latter refer to non-representational forms of explanation that draw on the tools of dynamical systems theory. Although there are valuable insights in these works, I am not endorsing the dynamical hypothesis or stance. As I say in (Yoshimi, 2011b), “ I do not treat dynamical systems theory as an alternative to other approaches to cognitive science (as in, e.g., Van Gelder, 1998), but rather as a mathematical framework that is neutral with respect to cognitive architectures.” So, while I am happy to cite these authors for some purposes, I depart from them when they tie dynamical systems theory to anti-representationalism (Hotton & Yoshimi, 2011; Yoshimi, 2012). I explicitly see value in dynamical systems theory as a philosophically agnostic tool that can be used in multiple ways and at multiple scales in a pluralist framework.

Lopes’ arguments are obviously more plausible if we focus on neural network theorists and dynamical systems theorists explicitly committed to empiricist theories of abstraction and psychologism. But in that case the relevant premises become vacuous, as in “Neural network theories of abstraction that are empiricist are empiricist,” and again, I explicitly distance myself from such views. Lopes thus seems to be faced with a dilemma: his assimilation of connectionism and dynamical systems theory to empiricism is either trivially true, if he ties those tools to philosophical theories, or obviously false, if he accepts that the tools are philosophically neutral.

Lopes avoids this dilemma by developing arguments according to which using neural networks or dynamical systems theory in cognitive science forces one into an empiricist theory of abstraction. Call these “forcing arguments.” Lopes’ forcing arguments take two broad forms: an egregious form and a substantive form. In neither case do they succeed but responding to them helps to illustrate some the larger themes of this paper: that identity and similarity can usefully co-exist in neurophenomenology, and that explanatory pluralism is the right general approach to use when pursuing neurophenomenology.

(1) Some of Lopes’ forcing arguments are based on egregious mis-readings. For example, the claim that neural networks or dynamical systems are never in the same state twice violates the most basic assumptions of both types of model. Similarly for the claim that all that is possible in these models is an account of similarity; that all we get from these models are “similarity amalgams.” We will see that these are fundamental errors: identity conditions follow directly from the definition of neural networks and dynamical systems and are fundamental to their most basic applications. Perhaps Lopes’ claim is that connectionist or dynamical systems models of the brain are never in the same state twice. As we will see, even if that is assumed to be the case for a theoretical model of the brain as a whole, it is not true when we consider parts of the brain.

(2) Some of Lopes’ forcing arguments are substantive but non-decisive. These arguments look at the details of how learning occurs in a specific class of neural networks (feed-forward networks trained using supervised learning) and assimilates these networks to the empiricist theories Husserl critiques. The virtue of this line of argumentation is that it considers the details of neural network architectures and learning algorithms, on the one hand, and how abstraction works in Husserl, on the other. The easy way to see that the argument is not decisive is to note that there are many kinds of neural networks beyond the feed-forward networks Lopes critiques, including neural networks that Lopes himself endorses when he develops his positive account. This is enough to show that neural networks, as tools, can be used in a non-empiricist way. The substantive forcing arguments are considered in more detail in Section 7.

We have seen that premises 1 and 5 of the main argument are false, so that the conclusions (3 and 6) don’t follow. Neural networks and dynamical systems theory do not force us to deny the existence of symbols or to reduce experience to a flux of impressions. Neural networks, dynamical systems theory, and symbolic approaches can co-exist in a common pluralist framework. For the next few sections I use historical and conceptual arguments to further defend and exemplify explanatory pluralism as a research strategy. We will see that in both cognitive science and phenomenology an early emphasis on symbolic processes gave way to dynamical systems and neural network approaches, and how in each case a pluralist resolution ultimately took pace.

4 Pluralist cognitive science

In this section I develop the idea of a pluralist cognitive science in two ways, one historical, one conceptual.

Historically, we can start in the 1980s, when neural networks first emerged as a serious alternative to the symbolic approach (neural networks have a longer history, but this is when they became a major force in cognitive science). The basic insight was that the mind is, in many ways, not like a computer. Catching a fly ball or reaching behind a computer to turn it off involve continuous high-bandwidth interactions that are hard to model using a symbol system. Moreover, symbolic systems can be brittle, they rely on rigid rules, and they tend to operate in a modular, flowchart-like manner (Rumelhart & McClelland, 1987). Neural networks, by contrast, are fault tolerant, gracefully degrade relative to damage, and, thanks to their parallelism, can solve problems that involve satisfying many constraints simultaneously. Neural networks also have advantages over their symbolic counterparts in the domain of language (Smolensky, 1987). Understanding language involves fluency and context sensitivity that goes beyond what a formal grammar can capture. For example, some features of semantic content can’t be captured by context-free meanings: “coffee” in “cup with coffee” means something different than “coffee” in “coffee bean” (Smolensky, 1995).

But the very features that got everyone so excited about neural networks also raised worries for what came to be known as the classical view. For example, (Fodor & Pylyshyn, 1988) noted that if meanings were treated as shifting clouds of connotation, with no stable identity, then logical inference would be impossible. Consider the argument

P1. Turtles are slower than rabbits.

P2. Rabbits are slower than Ferraris.

C. Turtles are slower than Ferraris.

If the meaning of words like “turtle” and “slower than” changes as cognition unfolds—if the meaning of each term changes as we move from one premise to another—then there is no way to draw the inference. In fact, if we can never mean the same things twice all kinds of disastrous consequences follow (Fodor & Lepore, 1999). It arguably becomes impossible to translate sentences, compose complex sentences out of simpler parts, describe truth conditions for sentences, or explain behavior using simple practical syllogisms like “they ran from the bear because the bear was attacking them.” In general we lose all the attractive features of the language of thought hypothesis (Rescorla, 2019) according to which thinking has a linguistic form: complex thoughts are formed out of simpler thoughts, on the basis of combinatory mechanisms that mirror those of language. But if thoughts keep changing their meaning from moment to moment, the language of thought evaporates. Everything sinks in an associationist quicksand.

Problems were raised from other quarters as well. Chomsky argued that any strong form of empiricism was implausible as a general characterization of the mind, given how many genetic structures characterize organisms:

In physiology, no one has ever accepted anything analogous to empiricist dogma with regard to the mind. No one finds it outlandish to ask the question: What genetic information accounts for the growth of arms instead of wings? Why should it be shocking to raise similar questions with regard to the brain and mental faculties? (Chomsky, 1977)

As evolutionary psychologists sometimes put it, why should evolution stop at the neck?

Things get a little messy from here, but what each side did was basically incorporate the best insights of their opponent’s theories into their own theories (or insist that they had recognized the points all along). Fodor and Pylyshyn, as noted above, were perfectly fine with neural networks and most of their features, so long as they were treated as an account of the “neurophysiological implementation” of psychological principles.Footnote 10 On the connectionist side, Smolensky developed an account according to which symbolic structures sufficient to support inference, constituent structure, etc. are approximations of a sub-symbolic level of non-linguistic constituents, which are turn approximations of the neural level (Smolensky, 1995).Footnote 11 It is worth noting how powerful Smolensky’s approach was: it later developed into optimality theory, according to which linguistic structures emerge as optimal solutions to multiple constraint problems. Optimality theory is the dominant approach to contemporary phonology and is widely deployed in linguistics today (Griffiths, 2019; Prince & Smolensky, 2004).

Similar processes unfolded in response to the nativism / empiricism debate, with connectionists arguing that genetic determinants could be understood as “architectural, computational, temporal and other biases or constraints, without building in representational content” (Elman et al., 1996), and others allowing for various combinations of connectionist ontogeny and nativist phylogeny (Marcus, 2001; Quartz, 1993, 2003).Footnote 12 Today neural network theorists generally understand learning as a process that takes place within a set of genetically mediated structures and constraints. My sense is that this is no longer even a point of contention. For example, the basal ganglia, frontal cortex, and posterior parts of the brain are obviously specified in part by genes, but involve synaptic networks that learn incrementally according to connectionist learning algorithms (O’Reilly & Munakata, 2000; O’Reilly et al., 2020).

All of these approaches are, in one way or another, pluralist, in the sense elaborated above. For example, consider how a person learns a skill based on verbal instruction and guided practice (think of being coached on how to play soccer or on how to perform a dance move). This is obviously a complex, multi-scale phenomenon, involving language, sensori-motor skill, and coordination between the two. The phenomenon can be studied using a mix of multiple theoretical frameworks:

Bayesian modeling has been used to argue that the processes that imprecisely integrate instructed knowledge and induced knowledge are adaptive (Noelle, 2001). Connectionist modeling has shown how symbolic instructions can be represented as patterns of activation (Noelle & Cottrell, 1996), with dynamical systems theory used to understand how those representations are maintained in working memory (Noelle & Zimdars, 2020). Computational cognitive neuroscience models demonstrate how these learning processes could arise from interacting brain systems, with the prefrontal cortex encoding “rules without symbols” (Rougier et al., 2005). In this way, many levels and methods are combined to understand a cognitive phenomenon, without any ideological commitment to this being the “right” mix of computational perspectives (Noelle & Yoshimi, 2022).

The pluralist viewpoint just developed in a historical manner can also be developed conceptually. Dynamical systems are versatile resources: they can be used to describe “clouds of meaning” as well as various kinds of stable state. Indeed, state identities are built into the very definition of a dynamical system. The state space S of a dynamical system is a set of univocal states, any of which can recur in principle. Moreover, recurrence is built into some of the most fundamental concepts of dynamical systems theory. A fixed point, for example is a point that a dynamical system remains in for all time, that is, an \(s\in S\) such that \(\forall t\in T,\upphi \left(s,t\right)=s\). A fixed point attractor is a fixed point that draws nearby points towards it. The steady state of a pendulum where its bob hangs down is a classic example: whatever state you start the pendulum in, when you let go it will eventually settle down to that state. Individual states can be attracting, but so can sets of states. For example, a pendulum in a natural environment will settle into a state where it is pointing down, but it will flutter and drift a little in the wind.

These points apply to neural networks (which are, again, a type of dynamical system). Attention is often focused on changing patterns in their activation spaces, but they clearly have stable states that can recur. Indeed, such states are essential to standard interpretations of neural networks. For example, when neural networks are used in classification tasks their output nodes are typically binary nodes which are trained to respond to specific classes of input. In such cases, each output node reliably produces the same output in response to similar inputs.Footnote 13 Classification of hand-written digits is a standard example: whenever a written “8” is presented to the input nodes of such a network (after it has been successfully trained), a specific output node corresponding to “8” is activated. Even if the inputs tend to vary and change, in cases of correct classification output states are reliably reproduced in exactly the same, identical state. These stable output activations can then be passed to a classical computational system, for example.

In other cases, we can consider a region of a networks’ activation space as itself being effectively stable and repeatable (compare the attracting set of a pendulum in the wind). A classic example is a Hopfield network, which is trained to reliably settle into specific patterns but where noise and other factors can lead the actual state to vary slightly over time (Hotton & Yoshimi, 2011). In the brain, oscillatory patterns correspond to loops that remain in a determinate region of a state space. Standard EEG waves like alpha and beta band oscillations are like this. In fact, pretty much any reliably reproducible “state” of the brain has this kind of oscillatory but repeatable character.Footnote 14

So, the egregious forcing arguments fails egregiously (we return to the substantive arguments below). When Lopes says things like “since this similarity space is constantly shifting due to new input, one cannot think the same thought twice”, he is fundamentally mistaken. To deny that a neural network can be in the same state twice would undermine most uses of neural networks or dynamical systems in cognitive science (e.g. any use of the concept of an attractor). Neural networks and dynamical systems are perfectly consistent with a view that requires content identities. Indeed, neural networks and dynamical systems themselves requires these identities.

It is true that Churchland says things that suggest Lopes’ reading, for example, when Churchland refers to the “ever-changing dynamical state of the brain as a whole… [an] all-up pattern of current activation levels..[which] provide an ever-moving, never-repeating cognitive context into which its every sensory input is interpretively received” (P. M. Churchland, 2012, p. 20). From this standpoint, neural activity is a Heraclitean stream that is never quite the same. What is going on here? On the one hand, Churchland is right. In practice, the same total state will never recur in a brain. On the other hand, we just saw that repeating states are of the very essence of neural networks and dynamical systems. We can resolve this impasse by making a simple observation. Even if the entire brain’s state is unlikely to recur, the state of some part of the brain’s state is likely to recur, and this gets more likely, the smaller the part. For a single neuron, state-recurrence is all but guaranteed. We can formalize this via the following lawFootnote 15:

Inverse law of recurrence: given n binary state variables and a uniform probability mass function across states, as n increases, the probability of state recurrence at a given time decreases as 1/2n

That is, as the “size” of the state space increases (more dimensions means more possible states), the chances of any one state recurring decreases exponentially. When n = 1, there is a 1/2 chance of state recurrence. When n = 2 there is a 1/4 chance of state recurrence. When n = 20, there is roughly a one in a million chance of state recurrence. Eventually, the chances of recurrence become virtually impossible.

So, depending on how many nodes we consider, recurrence is more or less likely. For one node in a neural network, for example the output node of a trained classifier, recurrence is highly likely. For four or five nodes, it’s less likely, because there are more states, but it can still happen. For the entire visual system, it’s unlikely to happen, though similar states can occur.Footnote 16 For the entire brain the exact same state will probably never recur, because there are too many nodes (hence the spirit of Churchland’s remark). Thus, we can have our ever-changing fluxes alongside our content identities.

5 Pluralist phenomenology

As with cognitive science, we can develop the concept of a pluralist phenomenology first historically then conceptually.

The historical developments in cognitive science surveyed above stimulated parallel developments in phenomenology, which interpreted Husserl variously as friend, foe, or father to research in AI, culminating (at least as I tell the story) in a kind of pluralism. There were a fair number of twists and turns, so here’s a guide:

  • Husserl as part of an anti-AI coalition (early Dreyfus)

  • Husserl as friend of classical AI, part 1 (Dreyfus, McIntyre)

  • Husserl as friend of connectionism and dynamical systems theory (Varela, Van Gelder, Petitot, etc.)

  • Husserl as friend of classical AI, part 2 (Lopes)

  • Husserl as pluralist (Yoshimi)

Beginning in the 1960s, Hubert Dreyfus began criticizing symbolic AI from a phenomenological standpoint, in a series of articles that culminated in his famous book, What Computers Can’t Do (Dreyfus, 1972). He argued that the explicit rules and data structures of AI systems, and their lack of a body, prevent them from having the kind of situational understanding and global insight human beings have. Dreyfus gives numerous examples where perceiving data or applying rules can only occur relative to a pre-existing global sense of a situation: phonemes are only constructed after they have been perceived, goals are post-hoc constructions relative to an initially vague sense of what one is after, rules are only used to solve a problem after global insight has directed the problem solver’s mind in the right direction, etc. In this early critique, Dreyfus drew on Heidegger, Husserl, and Merleau-Ponty, among other broadly Continental philosophers.

In the 1980s, Dreyfus updated Husserl’s status. He was no longer part of an anti-AI coalition, but was now being described as a “father” of AI research and cognitive psychology (Dreyfus & Hall, 1982).Footnote 17 Dreyfus’ goal in making this assimilation was to show how Heidegger’s critique of Husserl could be applied to AI. (If Heidegger critiques Husserl, and Husserl is the father of AI, then Heidegger’s critique of Husserl should transfer to AI and cognitive science). Others in this period also saw links between Husserl, AI, and early cognitive science, but were more sympathetic about the relationship (Edie, 1977; McIntyre, 1986).

Later in the 1980s, as connectionism and dynamical systems theory were making inroads into cognitive science, philosophers began using these frameworks to re-interpret Husserl. For example, Van Gelder interpreted Husserl’s theory of time-consciousness in terms of dynamical systems theory (Van Gelder, 1996), and Petitot formalized Husserl’s account of spatial experience and intentionality using the tools of differential geometry (see his contribution to Petitot et al., 1999). In these quarters, Husserl was not treated as a father of classical AI, but rather as a precursor to the exciting new approaches that were emerging in cognitive science.

Recently the effort to sympathetically link Husserl to classical AI has been revitalized by Lopes (2020, 2022). In fact, Lopes argues that Husserl actually discovered the computational theory of mind. The remarkable thing about this is that, on Lopes’ reading (which I agree with) Husserl discovered this theory on purely phenomenological grounds. The argument is roughly this. Husserl observed that we can draw valid, truth-preserving inferences, but that we often do so quickly, using fragmentary “inauthentic” representations (which are contrasted with authentically drawn inferences where we have a clear sense of each part of the inferential process). If someone tells me a story about rabbits, turtles, and Ferraris, I can quickly draw inferences about it—concluding that the Ferrari would beat the rabbit in a short race—and I normally do this in a quick, automatic way. I could spell out my reasoning, but I need not do so, and need not have a clear sense of why I believe this. It just feels right. If asked, the answer comes to me immediately, in a flash. How is this possible? Some mechanism must explain how these unconscious processes unfold in such a way as to preserve truth. It can’t be purely associationist, because simple associationist systems are not capable of operating in a domain-general way. We might be able to draw some particular inferences using associations, but everything would hang on what contingent associations we had built up in the past. Thus, some kind of unconscious mental mechanism must operate in a domain-general way on arbitrary mental representations. Thus, there is a computational theory of mind in Husserl.

I have been trying to wrap this all into a neat chronology, but the next part of the story actually happened before Lopes work. In (Yoshimi, 2009) I argued that there are two systems of belief in Husserl, one involving passive syntheses of sensory experience best described by continuous mathematics and dynamical systems, and another involving active syntheses of explicit beliefs best described using logic and discrete mathematics.Footnote 18 Thus, the whole story about Husserl either being for or against AI is misguided. Husserl can be understood in a pluralist way. Some features of Husserl are best explained using neural networks and dynamical systems; others are best explained using the tools of formal logic; some features are best explained using other frameworks altogether. The focus of that paper was belief, but the point generalizes. Phenomenology is complex, and different features of our phenomenology are best described using different tools.

Evidence that Husserl is best understood in this pluralist way is provided by a topical survey of his published works. Some of Husserl’s many texts emphasize phenomena best modeled using continuous mathematics: for example, the first halves of Analyses of Passive Synthesis and Experience and Judgment, the Lectures on Time Consciousness, and Thing and Space. These texts focus on sensory experience, the dynamics and protention and retention, and sensori-motor constitution. Analysis of the ideas in these texts using dynamical systems theory are in (Albarracin et al., 2022; Yoshimi, 2009, 2016). Other texts emphasize phenomena best described using discrete mathematics and formal logic, especially the more logic-oriented texts, like Philosophy of Arithmetic, Logical Investigations, Formal and Transcendental Logic, and the Early Writings. These texts focus on such topics as formal grammar and the constitution of mathematics and logic.Footnote 19 Analyses of the ideas in these texts using symbolic computational theories are in (Lopes, 2020, 2022). Of course, Husserl did not think he was contradicting himself in conducting these different studies, so we have historical evidence that phenomenology can be pursued in a pluralist framework.

Here again we can develop the point about pluralism not just historically but conceptually as well. Phenomenology studies dynamically changing patterns (of sensory contents, impressions, protentions, retentions, etc.) as well as more stable states. Husserl himself describes consciousness as a “Heraclitean flux” while also allowing for repeatable elements, the timeless essences at the core of his philosophy. Husserl admits that these seem to be in tension, but explicitly defuses that tension:

At first, to be sure, the possibility of a pure phenomenology of consciousness seems highly questionable, since the realm of phenomena of consciousness is so truly the realm of a Heraclitean flux… In spite of that, however, the idea of an intentional analysis is legitimate, since, in the flux of intentional synthesis… an essentially necessary conformity to type prevails and can be apprehended in strict concepts ((Husserl, 2013, p. 49), italics Husserl’s).

It's a striking passage, because it shows how sensitive Husserl himself was to the interplay between non-repeating continuous phenomena on the one hand, and to stable, repeating phenomena on the other.

We can also understand this point using a phenomenological version of the inverse law of recurrence. On the one hand, the total field of experience is unlikely to ever recur in exactly the same way in a given person’s lifetime. In fact, there may be a Husserlian argument that this is a necessary feature of experience: the retentions associated with current experience are constantly accumulating and forming into an overall sense of our past, which is never quite the same. On the other hand, the parts and moments of experience can recur, and are more likely to recur, the “smaller” they are.Footnote 20

Inverse law of recurrence (phenomenological version): The smaller the mereological scale the more likely phenomenological recurrence is.

Visual sensations at particular locations in the visual field, or kinesthetic experiences in some determinate region of the body are all but guaranteed to recur, an idea pervasive in Husserl’s work. For example, Husserl discusses “cyclical paths” in section 80 of Thing and Space (Husserl, 1997), which involve “recurrence of… old kinesthetic circumstances.” Moreover, thoughts recur (e.g. for a mathematician contemplating an elementary theorem for the thousandth time), which is essential to Lopes’ argument about Husserl’s computational theory of mind.Footnote 21

So Husserlian phenomenology allows for both constancy and flux. On the one hand, the field of consciousness as a whole is constantly changing. Time consciousness constantly surges and flows in new ways; we see things in constantly changing sensory conditions; the words we hear are surrounded by shifting horizons of connotation; thoughts are girt by penumbras and fringes of fragmentary, inauthentic thought. But within these different kinds of flux there are many forms of invariance and repetition: colors that repeat, sounds that repeat, bodily experiences that repeat, and thoughts that we return to. How else could a mathematician solve a theorem or a logician draw an inference, or how, for that matter, could a parent look lovingly on a child, that same child, that they have loved for so long?

6 Pluralist neurophenomenology

Section 4 culminated in pluralist cognitive science, and Section 5 culminated in pluralist phenomenology, which naturally raises the idea of a pluralist framework for bridging the two. In such a framework the never-repeating pattern of activity unfolding in a whole brain is the basis of a similarly complex, never-repeating stream of consciousness. However, islands of stability and recurrence exist within both structures; pools of neurons which go into the same state repeatedly, providing a basis for the many stabilities of experience Husserl described.

Depending on which spatial or temporal scale we consider, different approaches to neuro-phenomenological inquiry are appropriate. Some studies focus on fast and local dynamics of small populations of neurons and link these dynamics to the similarly fast and “low level” phenomenology of sensory fields.Footnote 22 Other studies focus on the longer time scale of a person’s developing sense of the world, or ability to learn a skill. My own work has focused on dynamical processes governing expectations, while Lopes’ work has focused on inferential processes occurring in language and thought. Here is a sampling of other investigations bridging phenomenology and cognitive science at different spatial and temporal scales, using different formal tools.

Edge detection was independently described by Husserl and by vision scientists. Husserl bases his account on phenomenological reflection. Husserl says: “A ‘gradual’ transition through a gradually changing qualitative graduation yields a border only if the transition first goes very slowly and then proceeds very quickly, and then very slowly again” (Husserl, 2001, p. 194). Marr, Hildreth and others developed more or less the same idea in the form of a numerical algorithm, which has come to be known as the Marr-Hildreth edge detection algorithm (Marr & Hildreth, 1980). The algorithm looks for where the second derivative of image brightness changes sign.Footnote 23 For example, when brightness increases (positive first derivative of brightness with respect to spatial location), then changes slowly or not at all, then goes quickly down (negative first derivative of brightness with respect to spatial derivative), the second derivative goes from being positive to negative. Thus, the algorithm formalizes what Husserl describes phenomenologically. A detailed discussion of edge detection and image segmentation algorithms and how they can be used to formalize Husserlian ideas is in (Petitot, 1999). So, at the scale of visual processing occurring at the milliseconds level we have parallel computational and phenomenological stories.

Paul Churchland asks us to “reflect on the common ability to catch an outfield fly ball on the run, or hit a moving car with a snowball” (Churchland, 1981). He describes these as non-linguistic phenomena that are difficult to analyze using the tools of folk psychology and symbolic AI. These abilities can, however, be effectively analyzed using differential calculus and dynamical systems theory. For example, the HKB model (Kelso, 2021; Kelso, 2008) is a dynamical systems model that is used to describe patterns of motor coordination. The canonical example involves moving two fingers at different rates, either in-phase or anti-phase (either wagging both fingers up and down together, or alternating one up and one down). The model describes how these different patterns of behavior re-organize or “bifurcate” as the speed of finger motion is varied (at high speeds only the in-phase behavior is possible). Husserl makes use of similar mathematical tools when he develops his account of the kinesthetic experience of bodily activity. For example, in section 57 of Thing and Space, he refers to “kinesthetic circumstances” using “a complex of variables (K, K’, K”, …) which are independently variable in relation to one another but in such a way that they form a system wherein each of the variables always has a definite value.” As we saw above, he refers to cyclical paths in these spaces, where the same bodily experiences recur, as for example if someone is repeatedly wagging their fingers in the same way. So here again we have parallel stories at similar temporal scales. The two stories do not match as closely as in the edge detection case, but the tools are the same, and one can easily imaging drawing on these resources to develop a neuro-phenomenological account of rhythmic bodily movements.

The transition from simply interacting with objects to isolating their features using concepts, and then referring to those concepts in language, is an example which focuses squarely on the “transformation mechanism” Lopes describes, the transition from sensory to conceptual experience. Husserl describes this transition in an extremely nuanced way, referring to a series of stages between simply being among objects, to attending to them and fixing on them, to explicating their features, to identifying concepts, to making judgment using these concepts. For a sense of these various levels, and their subtleties, consider some entries from the table of contents for Experience and Judgment:Footnote 24

  • The retaining-in-grasp of simple apprehension.

  • Explication as elucidation of what is anticipated according to the horizon.

  • The precipitate of explication in habitus.

  • Explication in anticipation and in memory.

  • Multileveled explication and the relativization of the distinction between substrate and determination.

Husserl begins with the structural “pre-figurations” of pre-predicative consciousness, where we simply retain something in grasp. For example, we see a cat roll over, exposing a cute fur pattern, which we retain in grasp during the few seconds of its roll. When it rolls again, we anticipate or pre-figure seeing this pattern again, and begin to disengage the pattern from its context, explicating it as a concept. It is now something we can talk about and refer to, as in “oh yeah, you know, the cat with the fur patch on its belly.” This process leaves a sediment in our experience, a conceptualized content, and in the future we start to see the cat as “the one with a cute fur patch.” So we go from fleeting sensory experiences, to momentary fixings-in-place and retentions of these experiences, to disengaging these experiences and processing them as linguistic units that are stable and re-identifiable. The neuro-phenomenological linkages described by myself and Lopes both apply here, and close attention to relevant texts and sources could jointly produce a rich neurophenomenology of sensory and conceptual experience, and transitions between the two.

7 The substantive forcing arguments

To complete my response to Lopes I address his substantive forcing arguments, according to which supervised learning in a feed-forward networks entails the empiricist theory of abstraction, which Husserl critiques the Logical Investigations. There is value in this line of argument because it involves close attention to the details of supervised learning, on the one hand, and Husserl’s critique of nominalist and empiricist theories (and his positive account of categorial intuition), on the other. It also leads us to consider a model (endorsed by Lopes himself) that merges classicist, connectionist, and dynamical systems considerations in a pluralist way. Thus, we can disarm what’s left of Lopes’ argument and in doing so illustrate how this part of our debate exemplifies the virtues of explanatory pluralism.

Lopes’ basic idea is that training a neural network so that similar inputs produce identical outputs entails a Lockean view of abstraction, whereby “an abstract idea just is the retention of a certain number of aspects or dimensions of the sensuously given features of the individual instances” (p. 8). The view is contrasted with Husserl’s account of categorial intuition, whereby species or universals are given in “acts of a new kind, referring in general fashion to the corresponding Species” (Husserl as quoted on p. 9). Lopes expands on the contrast, quoting Sokolowski:

Categorial objects are radically different from perceptual objects. Categorial objects are discrete units. They are packaged and wrapped with grammar. They are identifiable in a crisp, exact manner, whereas perceptions present an object continuously as the identity within a flow of appearances. (Sokolowski, 2003, p. 116, quoted by Lopes on p. 13)

Many questions arise at this point. First, while it is true that feed-forward neural networks operate on similarity spaces in their input layers, they culminate in an output layer where only one node is active at a time, and where recurrence is common. Thus, they involve a transformation between the input and output layers, which could support categorial intuition (the input layer activations correspond to the retained sensory features; the output layer activations to the new type of act). Second, the whole purpose of hidden layers (sometimes many of them, with deep networks) is to allow for the development of highly complex features, which allow objects that are highly dissimilar to still be grouped together. As more layers are added, more input heterogeneity can be tolerated. Third, many variations on feed-forward networks are possible, including variations that allow for top-down influences. Thus, it is not at all clear that these networks treat abstraction in the Lockean manner Lopes describes.

I think these observations are sufficient to show that Lopes’ arguments on this point are not decisive. My counter isn’t decisive either. More work needs to be done. The exchange shows what a substantive discussion in neurophenomenology looks like. Whether feed-forward networks trained using standard forms of supervised learning can capture categorial intuition is an interesting question. Settling the issue requires close attention to how supervised learning works in feed-forward networks work, close attention to the details of Husserl’s argument in Logical Investigation II, and close attention to the question (intimated above in the discussion of Experience and Judgment and Husserliana 34) of how we move in stages from perceptual to conceptual acts. Perhaps for some classes of network, modeling categorial intuition is in fact impossible. Specifying what precisely that class is, and why, would be an important and valuable piece of work.

However, even if Lopes were to succeed in specifying such a class, his forcing argument cannot succeed on the whole (that is, it cannot apply to the use of any type of neural network model in neurophenomenology), as his own example shows. Lopes describes physical mechanisms which, on his view, could implement categorial intuition and avoid psychologism. He says:

neural structures that can implement the sort of variable binding with the universal quantifier that is necessary to avoid psychologism have been discovered (Gallistel, 2018; Kriete et al., 2013; O’Reilly et al., 2014; Stocco et al., 2010). In particular, the communication pathway between the pre-frontal cortex and the basal ganglia “enables an important step of content-independent processing, as in structure-sensitive processing” (O’Reilly et al., 2014, p. 205)…. For our purposes, this is potentially a way to neurally ground what is necessary phenomenologically to avoid psychologism…

It’s a striking statement, in part because Lopes is citing neural network researchers, including my own colleague Dave Noelle and Trent Kriete, the first person to get a Ph.D. from UC Merced, whose dissertation committee I sat on. I say this because I am intimately familiar with these models, and they are precisely the kind of model I have in mind when I say that neural networks are tools that can be used in many ways in a pluralist framework. At any rate, since these are neural network models, and since Lopes endorses them while being a non-empiricist, this shows that neural network models do not force us into an empiricist theory.

The example also sheds further light on explanatory pluralism. I think there is strong reason to believe Lopes is right in citing this research (and wrong in his substantive forcing argument), and that something like the (Kriete et al., 2013) type of neural network could explain categorial intuition. Here is roughly how the story goes. The model treats symbols as being implemented in “stripes” in pre-frontal cortex (PFC). These stripes are a genetically determined feature of the network that support its attractor dynamics, which in turn allow the PFC to maintain “symbol like” activations. These activations are domain general, serving as “pointers” to other symbols and to activity in other areas of the brain. Transitions between these symbol-like activations are mediated by the basal ganglia (BG). The BG-PFC system learns to sequence these symbols in a way that tends to produce reward over time. It’s a striking example of balancing connectionist and classicist ideas in a common framework. As Kriete et al. say, their model “can exhibit the kind of systematicity…typically attributed to a symbol processing system” but note that “it does so using standard neural network mechanisms for both learning and processing” (Kriete et al., 2013). As a result, it has some psychologically realistic limitations as compared with purely classical models: in particular, “neural pointers cannot be nested at arbitrary levels of complexity or depth.” So we get the best features of connectionism, dynamical systems, and the classical view in a pluralist framework, while overcoming their limitations.

8 Conclusion

In their review of Plato’s Camera, Carruthers and Ritchie say

While [Churchland’s] book has many virtues, it is unfortunate that he repeatedly fails to do justice to his opponents’ views. For the most part he critiques implausible caricatures, rather than engaging sympathetically but critically with the most charitable interpretations of their position (Carruthers & Ritchie, 2012).

This is precisely how I feel about Lopes. While his positive work has many virtues, he repeatedly fails to do justice to his opponents’ views, offering unhelpful caricatures instead. Carruthers and Ritchie are writing from the same standpoint as Lopes, defending classicism against connectionist critiques and emphasizing weaknesses in the connectionist standpoint. But unlike Lopes, they treat their opponent in a sympathetic and constructive way, exemplifying the kind of pluralist nuance and subtlety I have been advocating. After acknowledging the strengths of Churchland’s view, they go on to note limitations. Throughout the discussion they remain open to pluralist collaboration. For example, state space approaches are “by no means implausible” but do not “exclude an important role for innateness.” Connectionist style sculpting of “face space” may take place within an “innately channeled domain-specific learning mechanism specialized for faces.”

Regarding the language of thought and symbolic approaches to cognitive science, they fault Churchland with a “failure to engage with his actual opponents” and then give a nuanced description of current LOT theories that is also sensitive to the considerations motivating Churchland. They reiterate the pluralist idea that symbolic and connectionist mechanisms must somehow co-exist, and then describe several forms of learning and knowledge that don’t fit easily within Churchland’s system, like one-shot learning, and episodic memory. Rather than simply present these as limitations of connectionism, they describe ways they can be adapted to Churchland’s framework. For example, they propose that standard connectionist learning algorithms are best suited to describing the development of implicit knowledge, which “locate the organism in the here and now, enabling it to know what to expect next or how to effect changes in that environment” (Husserl’s protentions, which I emphasize in my book, come to mind here). They go on to describe the following account of episodic memory:

Such memories are not regions in any one state-space. Rather, they seem to involve the creation of long-term linkages between regions of many different state-spaces, corresponding to the various sensory components of the original experience, in such a way that activations of any one are likely to cause activations of the others. If one recalls an episode of three red tomatoes falling on one's kitchen floor and smashing, for example, then this would seem to require a long-term link between the region of color state-space that represents red and the region of fruit-and-vegetable-space that represents tomatoes, together with the region representing a numerosity of three and the region of location-space that corresponds to one’s kitchen.

In Husserlian terms, we have a subtle interplay between several sensory horizons, categorial intuitions, and linkages between these. It’s fascinating work, especially when combined with Kriete and Noelle’s account of cortical pointers, whereby activations in the PFC-BG circuit (the basis of stable thought contents, like “the tomato incident”) function like pointers to unfolding activity in other areas of the brain, binding them together into one memory.

Cognitive science, phenomenology, and their linkage are complex, and need not be studied within a singular, totalizing framework. I am not the dynamical systems or connectionist partisan Lopes takes me to be. I am not the caricature he associates me with, who does not even allow for repeating states. I accept both the strengths and the limitations of connectionism and dynamical systems theory, and similarly for symbolic cognitive science. Standard learning rules in a neural network are great for describing the way expectations evolve in particular domains. They are not as good at describing the syntactic structure of thought processes or language. The best work integrates both perspectives, for example Carruther’s and Ritchie’s proposal about episodic memory or Kriete, Noelle, et. al’s work suggesting that symbolic operations are implemented using neural pointers that “cannot be nested at arbitrary levels of complexity or depth.” Similarly for knowledge, learning, perception, emotion, anticipation, temporal awareness, linguistic competence, motor skill, creativity, problem solving, etc.

Though I’d like to say the lesson has been learned, and that everyone else has moved on and is working in a de facto in a pluralist framework (and in fact, I suspect most working cognitive scientists are pluralists in my sense), totalizing tendencies do persist, for example in those who present active inference, enactivism, or Bayesianism—to list just a few of the more recent trends—as the final theory that can explain everything.Footnote 25 Instead, we should recognize that the mind, the brain, and their linkage are extremely complicated, and best studied using multiple tools, conceptual frameworks, and explanatory paradigms. We can leave the old partisan framework mongering where it belongs—in the history books—raise an eyebrow at those who take their own frameworks too seriously, and then just get on with the hard work of incremental advance.Footnote 26