Timothy Rogers Page 1 of 23 Is Dretske's Theory of Information Naturalistically Grounded? How emergent communication channels reference an abstracted ontic framework Timothy Rogers Timothy.Rogers@mail.utoronto.ca Trinity College, University of Toronto 2007 Abstract: By bringing together Dretske's theory of knowledge, Shannon's theory of information, and the conceptual framework of statistical physics, this paper explores some of the meta-physical challenges posed by a naturalistic notion of semantical information. It is argued that Dretske's theory cannot be said to be naturalistically grounded in the world described by classical physics and that Dretske information is not consistent with Shannon information. A possible route to reconciling Dretske's insights with Shannon's theory is proposed. Along the way, an attempt is made to clarify several points of possible confusion about the relationships between Dretske information, Shannon information and statistical physics. Keywords: Information Theory, Communication channels, Abstract reference, Shannon information, Dretske information, Natural Philosophy 2007 Timothy Rogers Page 2 of 23 "In the beginning there was information. The word came later." -Fred Dretske, Preface to Knowledge and the Flow of Information 1. Introduction In his book Knowledge and the Flow of Information, Dretske claims to present a theory of meaning, belief and (true) knowledge that is naturalistically grounded in the sense that these interpretative processes develop out of "lower-order, purely physical" mechanisms. The keystone of his theory is a notion of objective information that bridges physical and semantic realms, so that information becomes the raw material from which "meaning, and the constellation of mental attitudes that exhibit it", are manufactured [Dretske pvii]. Dretske describes his project as an exercise in material metaphysics and although he acknowledges the project is ambitious-"perhaps too ambitious", he writes-the book is a tour-de-force that pieces together a theory of meaning from the law-like regularities of the material world. But do the pieces really fit together? My claim is that, from the perspective of the physical sciences, Dretske's material metaphysics is too naive for such an ambitious project. In developing his theory, Dretske tacitly assumes a world of everyday objects and relations, which are physically well-defined, separable and enduring, and which obey exceptionless lawful regularities regardless of whether or not such regularities are known or articulated (i.e. the lawful regularities refer to the world-itself and not the world-as-itis-understood). He works within the framework of a "classical" ontology in which objectivity, states of affairs, separability, analyticity and a host of other properties can be assumed from the outset. Dretske also imports a very restrictive notion of the lawful regularity of the material world. The problem is that our current understanding of the physics of the world will not support the demands that Dretske places on material metaphysics in order for his theory to hang together. If the notion of "naturalistically grounded" is taken in the narrow sense of grounded in the physical sciences, Dretske's program doesn't work. This paper explores some of the metaphysical challenges that a naturalistic notion of semantical information poses by bringing together Dretske's theory of knowledge [Dretske], Shannon's theory of information [Shannon], and the conceptual framework of statistical physics [Reichl]. Specifically, it addresses the question: To what extent, if any, is Dretske's instantiation of an informational relationship consistent with Shannon's measure of information and (classical) statistical physics' definition of entropy? My hope is that the language of statistical physics can provide an overarching framework to clearly cross-reference the commitments and commensurability of these three programs. I argue that Dretske's theory cannot be said to be naturalistically grounded in the world described by classical physics and that Dretske information is not consistent with Shannon information. I explore a possible route to reconciling Dretske's insights with Shannon's theory. Along the way, I try to clarify several points of possible confusion about the relationships between Dretske information, Shannon information and statistical physics. 2007 Timothy Rogers Page 3 of 23 2. Physical Laws and Tolerable Uncertainty Dretske's theory rests hard on a particular notion of lawful regularity underwriting the physical realm. The world, according to Dretske, is parsed into objects, events or states of affairs and information in the world is contingent on the existence of "lawful (exceptionless) dependence" between two or more such parsed out elements. This lawful regularity must be independent of any knowledge or articulation of the laws. It also must be error-free. Given a source of information s, and a signal r, Dretske defines informational content of the signal in the following way: A signal r carries the information that s is F = The conditional probability of s's being F, given r (and k), is 1 (but given k alone, less than 1) [Dretske, p64]. Here k refers to what the receiver already knows about the possibilities at the source-an aspect that can be ignored in the present discussion, since my concern is with semantical information in the absence of knowers. Dretske further goes on to clarify his notion of conditional probability: In saying that the conditional probability (given r) of s's being F is 1, I mean to be saying that there is a nomic (lawful) regularity between these event types, a regularity which nomically precludes r's occurrence when s is not F [Dretske, p245]. The world in which Dretske's theory applies must possess laws to which there can be no exception in principle. And these laws cannot be contingent on any interpretive process-at least not in any naïve way-inasmuch as contingent laws cannot be the basis for their own existence. It is such foundational laws that have the requisite modal qualities-they constrain absolutely what would happen under certain circumstances and what could not happen under any circumstances-and the requisite intentional qualities- they are not simple a set of de facto correlations. According to Dretske, the intentional qualities of the laws provide the foundation for the semantic nature of information. Information "inherits its intentional properties from the lawful regularities on which it depends" [Dretske p77] and because of information's intentional properties it can support notions of belief and knowledge [Dretske, p171+]. The notion of error-free lawful regularity is essential to Dretske's program. Dretske writes: "Information is what is capable of yielding knowledge, and since knowledge requires truth, information requires it also." [Dretske p45]. The truth condition of information, for Dretske, is intimately connected to that fact that "If a signal carries the information that s is F, it must be the case that s if F." [Dretske, p63-4]. And this, in turn, he connects with the conditional probability for law-like regularities being exactly unity [Dretske, p66]. How, then, can it be said of Dretske's theory, that it is "naturalistically grounded"? In which naturalistically grounded programs are the two requirements for Dretske's theory to be found-namely classical ontology and exceptionless laws? To better frame this question, it is helpful to differentiate two different categories of lawful behaviour that 2007 Timothy Rogers Page 4 of 23 might obtain in a physical system, which I will somewhat arbitrarily call empirical and fundamental laws. Empirical laws are laws that are taken to be pragmatic, approximate descriptions of the inhabited world. They are contingent on the framework in which they are articulated and make no claims of exceptionless regularity. Ohm's Law is an example. In the first instance, this law expresses a linear relationship between electrical voltage and current, where the linear co-efficient is the resistance. But more accurately, Ohm's law is a linear approximation to a more accurate (and non-universal) description of how voltage and current vary in particular materials, a description that further ignores the effects of induction and capacitance. No material on earth is perfectly resistive. No material on earth obeys Ohm's law. But under certain circumstances the higher order contributions can be ignored and the "law" is applicable to within a limit of tolerable uncertainty1. The physical laws governing everyday objects and relations are almost universally empirical in this sense. (In the next section I will discuss why this is the case.) Empirical laws do not meet the necessary conditions for Dretske's theory because they are not error free and they do not bear perfect counterfactual correlation. Therefore, Dretske's theory cannot be said to be grounded in the world described by earth scientists or meteorologists or engineers or any other empirical scientists who work with laws that are taken to be pragmatic and approximate descriptions. This argument against empirical laws as an adequate naturalistic grounding for Dretske's theory is based on the pragmatic way that laws are used. At the risk of crushing under the full weight of epistemology, I want to say the real issue here is the nature of truth conditions. For Dretske, the lawful regularity is the source of truth conditions, whereas with empirical science, truth conditions come from a relationship between the laws and the inhabited world. In fact, the most likely candidate for naturalistically grounding Dretske's theory is the world described by "fundamental physics", where the fundamental laws are taken to be exceptionless or absolute in a sense roughly similar to Dretske. And, more specifically, the world described by classical physics, where a classical ontology is also assumed. Even though we no longer believe that classical physics describes the world we live in, it can be said to describe a "possible" world (or ontic framework) and we can say of that world, that it has fundamental laws. This leads to a central question of this paper-Can the world (ontic framework) described by classical physics ground Dretske's theory? If Dretske's theory is not adequately grounded in this ontic framework-which is both classical and absolutely lawful-then the claim that Dretske's theory is naturalistically grounded in any scientific program is significantly challenged. 1 Some argue that Ohm's law should not be considered a law at all, because it is not universal. Instead the equation describing the linear relationship between voltage and current should be considered a definition of resistance that applies to a certain class of materials, under a certain range of conditions, and to within a certain degree of accuracy. 2007 Timothy Rogers Page 5 of 23 3. The Phenomenology of Channels and States By requiring exceptionless lawful regularity between states of affairs, Dretske places very severe demands on material metaphysics. The challenge, which I will explore in this section, is that the parsing of the world into states of affairs and the lawful regularities that are exhibited by such states of affairs are intimately interwoven. In the everyday world that Dretske is describing, both are approximate and neither meets Drestke's conditions for semantical information2. 3.1 Equilibrium Thermodynamics and the Parsing of States3 Equilibrium thermodynamics describes macroscopic physical systems, which are systems that contain a large (enormously large) number of degrees of freedom. An example of a macroscopic system is a "box" (eg. a closed container) filled with "particles" (eg atoms) which interact through (potentially exceptionless) laws. The box itself may be completely closed and isolated (micro-canonical ensemble) or it may interact with an external, time-invariant environment through the exchange of energy (canonical ensemble) or through the exchange of particles (grand canonical ensemble). However, regardless of the boundary conditions of the box, the system is assumed to be in equilibrium, which means that it is left alone for a (very long) period of time in an environment whose properties do not change. The remarkable property of a macroscopic physical system in equilibrium is that it settles into a time-independent, macro-state that is characterized by only a few "state" (i.e. macro-state) variables-such as volume, pressure, temperature, entropy-despite the fact that the microscopic description of the system (at the level of individual particles) is dynamical and is characterized by an enormous number of microscopic variables. The macro-state variables that characterize the macro-state can be related to averaged properties of the particles or micro-states-averaged over time, for example, or over the different micro-states that are possible. Moreover, these macro-state variables are linked by an "equation of state", like the ideal gas law, such that the macro-state variables can fully describe relations between equilibrium macro-states. (Of course, the microscopic dynamics determine these "empirical" relations between macro-state variables, such as the equation of state.) Pragmatically, thermodynamic macro-states can change too, but only over very long time-scales, such that, at each moment in time, the macroscopic system remains in equilibrium with its surroundings (the so-called quasi-equilibrium approximation). The upshot is that, for macroscopic physical systems in equilibrium, the ontic framework is parsed into two levels of lawful regularity, characterized by two different spatiotemporal scales: 2 I apologize for the length of this section, but I feel it is important to lay out this program carefully as there is a lot of confusion in the literature around the "physics" of information. 3 This discussion is based on a standard reading of statistical physics, such as L.E. Reichl, A Modern Course in Statistical Physics. 2007 Timothy Rogers Page 6 of 23 1. The micro-state is a complex description of the simultaneous position and momentum of each particle in the system that varies rapidly in space and time. 2. The macro-state is a simple description of the average behaviour of the system (over the dimensions of the "box) which does not vary over space or time. (Or varies uniformly over such a long time-scale that the system is always in equilibrium with its surroundings.) The macro-states are characterized by constraints, which are typically connected to boundary conditions with a homogeneous time-independent environment. An example of a macroscopic constraint is a closed and isolated system, which does not interact with its environment either through the exchange of energy or the exchange of particles. For this case the total energy E is fixed. The micro-states are characterized by probabilities that they will be realized for a given macro-state. For the example of an isolated system, the total energy E is constrained so only micro-states with energy E can be realized and such states are all equally probable (by assumption). The macro-state variables can be related to the micro-state variables through averaging, where the averaging can be over time or over different micro-states that are possible for a fixed macro-state (an "ensemble" average). These two averages are equivalent if the system is ergodic. The equivalence allows time averages (that can be measured in the lab but not calculated) to be replaced by ensemble averages over all accessible micro-states (that can be calculated but not measured). One such average relationship, which is pivotal for Shannon's theory of information, is the entropy S, which can be defined as:  )log( ii ppS Equation (1) Where pi is the probability of micro-state i and the sum is over the ensemble of all possible micro-states. For a closed and isolated system, the micro-states have equal probability pi=1/w, where w is the total number of accessible micro-states and the entropy reduces to S = log w. In this situation amount of entropy S associated with the macro-state is related to the number of "accessible" micro-states. The replacement of time averages by ensemble averages is a central trope of statistical physics. This equivalence is only well-defined when the macro-system constraints are independent of time (i.e. equilibrium systems). As discussed below, for a time varying macro-system, the whole program of replacing temporal averages with ensemble averages may not be well grounded. Consequently, the interpretation of "entropy" can be problematic, since the appropriate ensemble, with respect to which entropy is defined, can be ambiguous. Likewise, if, along with Dretske, a specific micro-state is said to be an instantiation of the system, then it is necessary to differentiate whether the instantiation is in time, or within an ensemble, and, if the latter, the constraints defining the ensemble must also be specified. These different notions of instantiation should not be assumed equivalent. 2007 Timothy Rogers Page 7 of 23 3.2 Microscopic Laws and Macroscopic Laws4 Because the macro-variables are related to the micro-variables in a statistical manner we can distinguish two types of lawful behaviour appropriate to the two levels of description-a distinction that is crucial to Dretske's theory of information. The lawful regularities that describe the evolution of micro-states (in classical physics) can be taken as fundamental and exceptionless in Dretske's sense. That is, there can be a lawful regularity between two micro-states m1 and m2, such that the conditional probability of m1 given m2 is 1. However, such exceptionless lawful behaviour only obtains in a "micro-canonical ensemble" where the macro-state is completely closed and isolated from the environment (world)5. For such a system, the lawful regularity is the lawful classical dynamics that describe the evolution of a system of particles from one initial state to another. These laws are deterministic in the sense that, once the micro-state is specified (as an initial condition, say), all other micro-states are known by virtue of the classical laws. Therefore, the specification of any one micro-state includes all possible information about the system for all time. The laws are exceptionless but there is no longer the concept of a "possible" state, because all states are fixed once one state is fixed. There is no "valency" or degeneracy to the micro-states6. The notion of information, which is based on the reduction of uncertainty among possible states (i.e. reduction of "multi-valency"), becomes a vacuous concept: In the limiting case, when the probability of a condition or state of affairs is unity [p(si) =1], no information is associated with, or generated by, the occurrence of si [that condition or state of affairs]. This is merely another way of saying that no information is generated by the occurrence of events for which there are no possible alternatives (the probability of all possible alternatives =0). [Dretske, p12]. Once the constraint of isolation is removed from the system, there is an interaction with the environment which introduces uncertainty into the evolution of the micro-states. (For example, the system may be in contact with a heat bath that exchanges energy with the system.) For such non-isolated systems, the conditional probability of m1 given m2 can never be exactly 1. There will be an element of uncertainty in the interaction with the environment, which cannot be eliminated from the description. If the system is not 4 This discussion is limited to laws of interaction. It is beyond the scope of this paper to address laws of symmetry, such as relativistic invariance. 5 For a classical world, probably the only example of a true micro-canonical ensemble is the "universe" for that world. Parsing the universe into sub-systems will bring forth some amount of uncertainty however small. 6 This discussion is restricted to classical dynamics and I am sweeping under the rug some significant issues that can be associated with the notion of determinism in chaotic systems. Interestingly, quantum mechanics does admit exceptionless lawful behaviour and multi-valency, although it does not readily admit isolated states of affairs. 2007 Timothy Rogers Page 8 of 23 completely isolated, Dretske's exceptionless lawful behaviour no longer obtains between micro-states7. On the other hand, the lawful regularities that describe the macro-states are derivative and based on statistical properties of the micro-states. They are obtained in the idealized "thermodynamic limit" of infinite size, infinite time, ergodicity and no changes in the macro-state or in the environment. This idealized limit is never expected to manifest physically-it is taken to be an approximation. For example, in the micro-canonical ensemble discussed above, the macro-state is completely isolated from the environment (world). Therefore it cannot change. One might trivially say that, for such an isolated system, the conditional probability of macro-state M1 at time t1, given the state M2 at time t2 is 1 because they are the same state by definition. But in such equilibrium, nothing can be said to "flow" from one state to another. In order to get macroscopic dynamics, the constraints must be lifted so the system is "slightly" out of equilibrium. An amount of uncertainty is introduced, from the environment for example, which can be made arbitrarily small, although it can never be zero8. This introduction of a small amount of noise, equivocation, or uncertainty is not a problem for statistical physics because it can be managed-the impact on the macro-state can be kept within a limit of tolerable uncertainty. However, "tolerable uncertainty" in lawful regularities is a problem for Dretske's theory of information because Dretske requires laws with absolutely no uncertainty so that he can corral all uncertainty into higher order mental attitudes such as meaning and belief. If uncertainty is part of the laws-such that conditional probabilities relating states of affairs are less than unity-then the truth-value of the semantic content of any informational relationship that derives from those laws is uncertain and there is no Dretske information. From the point of view of equilibrium theory, only completely closed and isolated systems can possibly exhibit the exceptionless lawful regularity that Dretske invokes as the origin of intentionality in informational relationships. But completely closed and isolated systems do not admit multi-valent micro-states at the level where this regularity is found and therefore information cannot be said to flow between different micro-states. And at the macroscopic level, closed and isolated systems do not exhibit any change in the state of affairs over time, once again eliminating such systems as potential carriers of information. 7 Even in the case of an isolated system, the macroscopic behaviour is derived by "coarse-graining" or partitioning the microscopic system to a finite level of resolution. Such coarse-graining "smears" the microstates, introducing uncertainty and irreversibility. The exceptionless lawful behaviour of the micro-states requires us to observe all the microscopic degrees of freedom. It also requires that there be no chaotic dynamics. But "as soon as we violate these conditions and observe the world at a finite level of resolution (no matter how accurate), chaotic dynamics ensures that we will lose information and entropy will increase" [Bais, p27]. The nature of subjectivity itself may be the culprit here. 8 This is called the quasi-static or quasi-equilibrium approximation. 2007 Timothy Rogers Page 9 of 23 3.3 Non-Equilibrium Systems What I have roughly sketched out above is a rigorous formalism for understanding how "approximate" macroscopic laws can emerge from "fundamental" microscopic laws (in a classical system). The formalism shows how and when uncertainties are manageable- why, for example, uncertainties (at the microscopic level) don't cumulatively build errors into the description until all lawful behaviour is erased (at the macroscopic level). This rigorous formalism is very specific in its application, which is to systems in equilibrium. However, the general program-namely that macroscopic laws can be seen as approximate and averaged consequences of more fundamental microscopic laws-is a powerful way to understand how the approximate (empirical) law-like behaviour in our everyday world of objects and relations might be based on fundamental (microscopic) laws. Even in non-equilibrium, this program is expected to have teeth, although considerably more challenges emerge that further frustrate the certainty in law-like behaviour. There are a few key points to this program to bear in mind:  The macro-state is isolated from its environment in some way. This isolation can happen as a result of constraints imposed on the overall system or through emergent collective properties or through global conservation laws or ...  The macro-state is not isolated from its environment in some way. This allows the environment to impinge on the macro-state to bring about macro-dynamics. It also introduces some level of uncertainty in the lawful regularity of the micro-states.  The macro-states parse the system so that there is an exterior (the environment) and an interior (the micro-states defining the macro-states).  There is a (sometimes diffuse) boundary that, in some systems, can potentially differentiate and identify a set of macro-states as a macro-object (such as a chair, a puppy or a rock).  The laws describing macro-states and macro-objects have some degree of uncertainty which, although it may be arbitrarily small in some circumstances, cannot be zero. Non-equilibrium systems can manifest more robust relations at the macroscopic level than equilibrium systems. For example, we can imagine a system in which two spatially separated macro-states, each with its own set of micro-states, interact through an environment which contains "everything else" as shown in Figure 1. 2007 Timothy Rogers Page 10 of 23 S R Environment Figure 1: Non-equilibrium system parsed into two macro-systems plus their environment In this situation, there are fully two levels of dynamical description, with distinct spatiotemporal scales. At the "macro" level, there are two (partially isolated) dynamical subsystems S and R that can interact with each other through a pathway or "channel" that involves the macro-states of each system as well as the environment. Because of this interaction, the variables defining the macro-state can change (which was not the case in equilibrium). Therefore, at any instant we can speak of a macroscopic State of Affairs that specifies the value of the macroscopic variables for each of the two macro-systems. Depending on the overall system, there may be a lawful regularity between the States of Affairs of the two sub-systems, although for the reasons discussed above, such a lawful regularity would not be fundamental or exceptionless. The "channel" would then be the mediator of the regularity. On a much faster timescale, each "fixed" macroscopic State of Affairs involves a highly dynamical set of micro-states. So each specific macroscopic State of Affairs involves an ensemble of microscopic states of affairs (eg. quasiequilibrium approximation.)9. It is important to conceptually differentiate the "States of Affairs" at the macroscopic level from the "states of affairs" at the microscopic level. According to the classical ontology in which we are working, only the latter can be said to exhibit exceptionless lawful regularity. 3.4 Grounding Dretske's theoretical framework I have been trying to present a coherent story of how "laws" observed in an everyday (macroscopic) world might be naturalistically grounded in a classical system of fundamental, exceptionless, lawful behaviour at the microscopic level. By naturalistically grounded, I mean, among other things, that there is no artifice of observer interaction, cognition, subjectivity, mental states, calibrations, etc. I have been working from the framework of a ruptureless, externalized objectivity, much the same as Dretske claims to do. The macroscopic lawful behaviour can therefore been taken as an emergent 9 Here the notion of entropy gets tricky because the system constraints are no longer fixed. What constitutes the time independent ensemble if the constraints vary in time? Pragmatically, the evolution of the macrosystems must be "slow enough" compared to the dynamics of the micro-states that the system is continually sampling the whole ensemble of accessible states at each "macro-instant" in time-but please don't ask me to define what is meant by "macro-instant". 2007 Timothy Rogers Page 11 of 23 phenomenology that can be derived (more or less) mathematically from the statistical behaviour of classical particles given some basic assumptions about their interactions. Well, this is almost true, anyway. If nothing else, this story shows that naturalistically grounding a theory with exceptionless laws does not come easily. And, where there has been some success, it has been at the expense of admitting a tolerable level of uncertainty. If Dretske's theory cannot tolerate uncertainty (however small) in the lawful relations defining information, statistical physics suggests that it is unlikely his theory can be naturalistically grounded. The central problem is that Dretske's theory requires laws with "certainty" (exceptionless laws) acting on states with "uncertainty" (multivalent states). However, classical physics can only provide laws with certainty acting on states with certainty and/or laws with uncertainty acting on states with uncertainty. That is to say, uncertainty enters into the states and the laws at the same level of description. The conceit of fundamental physics is that all other natural sciences (which only involve physical mechanism as per Dretske) are derivative, so, from the perspective of physics, Dretske's program is not grounded in any classical, naturalistic framework. 4. The Physics of Information Flow Entropy is the link between statistical physics and Shannon's notion of information. Recall that in the case of the micro-canonical ensemble above, the entropy was related to the number of accessible micro-states [equation (1)]. In this system, for a fixed set of macro-variables (i.e. fixed macro-state) there are a range of possible micro-states that the system could realize (i.e. an ensemble). Entropy can be thought of as a particular measure of the size, or "valency", of this ensemble. Equivalently, if all that is specified is the macro-state, there is an uncertainty or unpredictability about which micro-state is realized at any given time, since it could be any one of the micro-states from the ensemble. We can only speak of the probability that a specific micro-state will be realized, given the macro-scopic constraints of the ensemble. (In the case of the micro-canonical ensemble, all micro-states are equally probable, but this is not always the case.) As I explore in this section, the definition of entropy is connected to such measures of uncertainty and unpredictability. 4.1 Shannon Information The leap to a notion of information comes from the following observation: if the system is actually in a specific micro-state then the uncertainty of the system has been reduced because it cannot be in any of the other micro-states. Knowing the actual micro-state provides information about the system in the sense that it reduces uncertainty. The amount of Shannon information depends on the number of accessible micro-states out of which the actual micro-state was selected. Entropy can be used as a measure of this "amount" of information. "The important innovation that Shannon made was to show that the relevance of the concept of entropy considered as a measure of information, was not 2007 Timothy Rogers Page 12 of 23 restricted to thermodynamics, but could be used in any context where probabilities can be defined" [Bais p19]. Shannon information, like entropy in statistical physics, concerns a constrained macrosystem that has a range of possible states (s1, s2, ... sn) whose probabilities of occurrence are (p(s1), p(s2), ... p(sn)) – i.e. an ensemble. For such an ensemble S, the average amount of information generated is defined as: ))(log()()( i i i spspSI  Equation (2) The Shannon ensemble is a much more general concept than the equilibrium ensemble of thermodynamics. It can involve any physically constrained system in which the dynamics are limited to a range of possible states with fixed probabilities. Unlike with statistical physics, where the ensemble is an emergent property of more fundamental, physicallydetermined micro-dynamics and related boundary constraints, with Shannon communication systems, the ensemble and its dynamics are given by construction-an ansatz or a priori set of conditions-which may or may not involve the action of conscious agents. Shannon information is a property of a dynamical process. Although, along with Dretske, it is possible to associate an amount of information generated by the occurrence of a specific event si, as I(si)=-log(p(si)), Shannon's theory is not concerned with the occurrence of specific events. It is concerned with the communication of a message as a temporal sequence of events. For Shannon, an information generating system is a constrained ensemble of events or accessible states that is ergodically sampled over a sufficiently long time that the trope of replacing temporal averages with ensemble averages is valid. Shannon is concerned with the communication process as a whole in which individual events have significance only in relationship to the whole temporal sequence of which they are a part. The constrained dynamical process establishes the ensemble, the individual events si and the related probabilities for each event. Shannon information is relative-it measures the reduction in the range of possibilities relative to a physical system that is already constrained. The measure of what could have happened must be independently defined, for example, by the physical context. Thus we cannot say that a given micro-state has a value for Shannon information until we specify the macroscopic constraints that define the ensemble to which it belongs. Shannon information depends on a relationship between two levels of description: a macro-level that is physically constrained in some way and a micro-level that reduces uncertainty or valency associated with the macro-level. This means that the Shannon information depends on how the physical system is modeled [Bais, p22]. Any sequence of events may be taken as a generator of Shannon information if it can be framed as a sampling of an ensemble in such a way that each event which does happen reduces the range of possibilities of what could have happened in a measurable way. However, Shannon information is a term that only describes the measure of this reduction in the range of possibilities-it does not say anything about the meaning of the 2007 Timothy Rogers Page 13 of 23 information. Shannon information is "just a function that reduces a set of probabilities to a number, reflecting how many nonzero possibilities there are as well as the extent to which the set of nonzero probabilities is uniform or concentrated" [Bais, p21]. Of course, the precarious relationship between Shannon's theory and the semantic nature of information is the primary interest in Dretske's theory. 4.2 Shannon Communication Systems and Information Flow The physical set-up, or communication system, in which Shannon information is defined involves the following elements [Shannon p2]:  the information source produces a message or sequence of messages to be communicated to the receiving terminal;  the transmitter operates on the message in some way to produce a signal suitable for transmission over the channel;  the channel is merely the medium used to transmit the signal from transmitter to receiver;  the receiver ordinarily performs the inverse operation of that done by the transmitter, reconstructing the message from the signal;  the destination is the person (or thing) for whom the message is intended. The information source is constrained to a range of possible states which establish the ensemble of possible messages. Together the transmitter and channel provide rules or laws that predict the state of the receiver given the state of the source. These laws, which can be pragmatic or empirical laws, depend on the physical setup and they may be either deterministic or probabilistic. Shannon information is said to "flow" in the system in the following sense: each time a state of the source is "selected", it restricts the possible states of the receiver by virtue of the laws governing the transmitter and channel. As uncertainty is reduced at the source (through selection of a possible state), uncertainty is also reduced at the receiver and Shannon information is said to "flow" from source to receiver. The flow of information is temporal-the object of interest is the sequence of selections at the source. Typically, the physical set-up defining Shannon information is established specifically to transmit signals. The mechanism for selection at the source is external to the system. This can make the system qualitatively different from the physical systems of statistical physics described above. For example, the selection mechanism may involve (and usually does involve) the artifice of a conscious agent. Part of Dretske's approach to semantical information is to ground the physical set-up which defines Shannon information as the outcome of a prior physical system in which only natural laws operate (i.e. no consciousness or intentional agents are invoked). That is to say, whereas Shannon takes the physical communication system as a given construction within which the flow of information is theorized, Dretske's more fundamental program is to deduce the emergence of communication systems and their related informational properties from purely physical processes. 2007 Timothy Rogers Page 14 of 23 The communication system above results in information flow from source S to receiver R. Following Dretske, the information might be said to flow through a channel CH connecting S and R. However, the Dretske channel, CH, includes both the encoding processes that may occur at the transmitter as well as the signal transmission that flows through the Shannon channel. By mapping information flow between two sub-systems S and R in this way, the Shannon communication process can also be related to the interaction between two (non-equilibrium) macrosystems as described in section 3. In this mapping, the channel CH is related to the mutual interaction of the two macrosystems as mediated by the environment. Both the source S and receiver R can be considered as constrained ensembles with possible states (s1, s2, ... sn) and (r1, r2, ... rn) respectively. Each state of S and R has a corresponding probability, so that the average amount of information for the source I(S) and the receiver I(R) can be defined using equation (2). The quantity of interest for measuring the flow of information from S to R is the transinformation I(S,R) which is the average amount of information generated at S and received by R [see, for example, Lombardi, p25]. The transinformation is given by: NRIESIRSI  )()(),( Equation (3) where the equivocation, E, is the average amount of information generated at S but not received at R and the noise, N, is the average amount of information received at R but not generated at S. Transinformation is a measure of the average amount of dependency between S and R. If S and R are completely independent the transinformation is 0 and no information flows between S and R. If S and R are completely dependent, the noise and equivocation are zero and all information generated at S flows to R - I(S,R) = I(S) = I(R)10. 4.3 Grounding the theoretical framework of Shannon information Shannon information is consistent with statistical thermodynamics, in the sense that his definition of information reduces to the definition of entropy under the appropriate conditions. Shannon information is also consistent with fundamental laws and admits tolerable uncertainties (such as noisy channels). However it is not reliant upon the existence of fundamental laws, so it is also consistent with a world in which there are no exceptionless laws-the pragmatic world of engineering, for example. Compared to Dretske, Shannon information does not make much of a commitment to any particular ontic framework, as long as the system is ergodic and the necessary probabilities can be defined. The trade-off is that Shannon information is merely a quantitative measure which may, or may not, have any connection to our everyday notions of information, apart from the specific connection mentioned above, namely reduction of uncertainty in an ensemble of possible states. 10 For a more complete discussion of transinformation, noise and equivocation see Lombardi. 2007 Timothy Rogers Page 15 of 23 The theoretical framework behind Shannon information makes no claim-and needs to make no claim-that the system in question is "naturalistically grounded", in the sense of Drestke. This may seem like a strange statement, given that his is a theory of the physical measure of information. What I mean, however, is that Shannon information allows for the introduction of conscious agents in the physical set-up and conscious agents can manipulate the constraints such that the signal will carry meaning. This is different from Dretske's program, in which the claim is that all properties of information (including semantics and meaning) are emergent properties of lower-order physical mechanisms. In Shannon's theory, the fact that a signal is a temporal sequence is a crucial point, one that doesn't appear to be addressed in Dretske's theory [Lombardi]. The temporality of the signal is the linchpin that enables information to be transmitted without error over channels whose law-like regularities admit tolerable levels of uncertainty or "noise". A sufficiently long temporal signal can be transmitted over a noisy channel by introducing redundant information into the signal such that the global pattern of the signal is constrained-this is how Shannon's theory is applied to the transmission of signals over a noisy channel. The noise may alter the signal unpredictably, but, under certain circumstances, the alteration can be "tracked" by the global constraint and subsequently "undone" at the receiver. The procedure has an effect similar to the averaging procedure for statistical ensembles-it is a way of managing uncertainty. In Shannon's program, the semantic content of the message is said to be outside of the ken of the theory-the meanings of messages are established by selection mechanisms at the source and coding mechanisms at the transmitter, both of which are physical and cognitive mechanisms that are external parameters of the theory11. Many take this to indicate that Shannon information has no bearing on the semantic aspects of information. However, Shannon's program may provide an important insight into the origin of intentionality through signal transmission. This insight comes from Shannon's observation that global coherence properties of the signal can be used to overcome noise in the channel. When a signal is transmitted from a source through a noisy channel, the receiver can learn to "undo" noise by locking onto global, conserved patterns in the signal. These global patterns are related to redundancy or property overlap in the individual components of Shannon information that constitute the signal. In Shannon's communication system, these patterns are externally imposed by the coding mechanism. However, "naturalistically grounded" mechanisms might also lock onto coherence patterns in a system generating a signal. The redundancy in the signal would then be related to patterns which are properties of a set or sequence of instantiations. In this way, the signal can re-present an emergent property of the information source-a property that is averaged or "abstracted" 12. The signal can represent emergent patterns because, unlike Dretske information which refers to a priori states of affairs at the source, Shannon information refers to that which can pass reliably through a noisy channel and which may therefore manifest at a higher level of coarse-graining. Consequently, from 11 By external, I mean that these mechanisms are not emergent properties of Shannon's theory, unlike the case with Dretske where meaning is an emergent property of the theory. 12 This notion of abstraction is closely related to Dretske's notion of digitalization. 2007 Timothy Rogers Page 16 of 23 emergent semiotic patterns of signal generation, pre-symbolic elements might manifest that reference a meta-level of instantiation. The key property of these pre-symbolic elements is that they are in-formed-that is to say there is a structure of interiority which allows for abstraction (through re-presentation or referencing) of a specific element from the ontic totality. Could these pre-symbolic elements be intentional? Taking this one step further, perhaps what naturalistically grounds the parsing of the ontic totality into objects with properties is the spontaneous manifestation of sub-systems where the sub-systems' coherence exceeds the channel noise connecting them, such that in-formation (literally forms manifested interiorly) in this coherence remains intact, roughly in the same way signals are passed through noisy channels in communications engineering. The informational content is some "average" (overlapping) property or pattern of the set of states from which it is constituted13. Which brings us to semantics. 5. The Physics of Representation As discussed in the previous section, Shannon's theory of information exploits the statistical properties of ensembles to overcome noise or error in the (empirical or fundamental) law-like regularities that connect the state of affairs at the source with the state of affairs at the receiver such that information can be transmitted without error through noisy channels. This is the key insight of communication theory. What it tells us is that the perfect counterfactual correlation required for Dretske information is not required for Shannon information. At the same time, Shannon information, as applied to emergent phenomenology, is not about a priori States of Affairs in the sense of Dretske, but rather is about collective, or coarse-grained properties or patterns of the source, properties that are essentially dependent on the degree of noise in the channel. With Shannon's theory, the noise establishes the level of graininess in the communication and this, in turn, establishes what can pass for a state of affairs at the source and at the receiver. Referring back to the figure on page 14, the channel conditions connecting S and R determine the level or graininess of description appropriate to S and R by establishing the level of error-free information that can be transmitted between them. In this section I will attempt to unpack this crucial metaphysical difference between Dretske's approach and that of Shannon. 5.1 The Physics of Dretske Information Dretske's theory is concerned with the semantic sense of information. For Dretske, a signal carries information whose semantic content is "what the signal is capable of 'telling' us, telling us truly, about another state of affairs" [Dretske, p44]. This semantic sense is to be distinguished from the "meaning" of a signal which, for Dretske, has only an incidental relation, if any, to the information carried by the signal. Dretske's nuclear sense of information is related to knowledge and learning: "A state of affairs contains information about X to just that extent to which a suitably placed observer could learn something about X by consulting it ... Information is what is capable of yielding 13 This approach is closely related to Brian's book On the Origin of Objects. 2007 Timothy Rogers Page 17 of 23 knowledge, and since knowledge requires truth, information requires it also." [Dretske, p45] Dretske locates the semantic aspect of a signal or message in the particular instantiation that links an informational source and receiver. He writes, "... if information theory is to tell us anything about the information content of signals, it must forsake its concern with averages and tell us something about the information contained in particular messages and signals. For it is only particular messages and signals that have content." [Dretske, p48]. Dretske seems quite ambivalent about what he means by this particularity, which he variously refers to as a state, a state of affairs, an event and a signal. However, blurring these notions of particularity is problematic because it glosses over the key insight of Shannon theory and of statistical mechanics, namely that a "spatial", or state-based ensemble can replace a "temporal" or sequential dynamics, and while the two systems are interchangeable, the particular instantiations are system dependent and not interchangeable. For example, "signal" is fundamentally temporal while that of "state" is fundamentally spatial and, as I have discussed earlier, this difference is crucial for relating dynamical processes to statistical ensembles. Along with Lombardi, I will commit Dretske to mean "state" (i.e. that which can be extemporaneously totalized) as this is the meaning consistent with his mathematical formalism and much of his discussion. Statehood is a property of classical ontology, in which space and time are orthogonal and parts can be isolated from the whole as autonomous totalities14. Dretske goes on to apply Shannon's formulas for information, transinformation and equivocation to individual instances, where a specific state at S (say si) results in a specific state at R (say ri). Unfortunately, as Hockema and Lombardi have pointed out, he mishandles the formalism of Shannon. Dretske defines the transinformation between the states si and ri as: I(si, ri) = I(si)-E(ri) Equation (4) where ))/(log()/()( iji j ji rsprsprE  In this formula, Dretske uses "the same index 'i' to denote the state of the source and the state of the receiver, as if there were some special relationship between the elements of certain pairs (s,r)"[Lombardi, p28]. Consequently, he makes the erroneous assumption15 that the individual contribution to the equivocation is only a function of the state ri (i.e. E 14 My own predilection is that information can be imported into a more fundamental ontic framework in which space-time and whole-part are emergent categories, such as the ontic frameworks underwriting quantum mechanics and relativity theory. So I can sympathize with Dretske's ambivalence, but his mathematical treatment and related discussion is not adequate, in my opinion, for this deeper penetration of the notion of information 15 Erroneous from the point of view of Shannon's theoretical framework. 2007 Timothy Rogers Page 18 of 23 = E(ri)). As Lombardi points out, the completely generalized application of Shannon's formulas to the particular pair of states si and rj is: I(si,rj) = I(si)-E(si,rj) Equation (5) where E(si,rj) =log(p(si/rj)) The averaging procedure Dretske uses to define the equivocation as a property only of rj (i.e. the summation over all states in equation 4) is invalid in most cases where Shannon's theory is applied. There are two ways of looking at the consequences of this mishandling. On one hand, Lombardi rejects Dretske's approach outright. "This means we cannot accept Dretske's response to those who accuse him of misunderstanding Shannon's theory: his 'interpretation' of the formulas by means of the new quantities is not compatible with the formal structure of the theory. It might be argued that this is a minor formal detail. However, this point has deep conceptual consequences" [Lombardi, p29]. One such consequence is that the equivocation depends essentially on the communication channel and not only on the receiver. Lombardi writes: "... we can get completely reliable information about the source even through a very low probability state of the receiver, provided that the channel is appropriately designed." [Lombardi, p29]. On the other hand, along with Hockema, we could say that, if Dretske's theory is to be ransomed at all, it only applies to the specific case where each state at the source is perfectly paired with a specific state at the receiver as in Figure 2 below. R3 R2 R1 S3 S2 S1 Figure 2: Pairing of States between Source and Receiver in Dretsky's Theory 2007 Timothy Rogers Page 19 of 23 This is quite different from Shannon's theory, where the set of states at the source is correlated with the set of states at the receiver, such that there may be cross-correlations between them as shown in Figure 3 below. R3 R2 R1 S3 S2 S1 Figure 3: Cross-Correlation of States between Source and Receiver in Shannon's Theory And there is the rub. In naturalistically grounded systems, the decoupling of the crosscorrelation implies that whatever mechanism establishes the state of the source also establishes the state of the receiver. If such decoupling underwrites all relatedness between sources and receivers, something is missing from the story, namely, an understanding of how new information is generated16. As discussed in section 3, emergent macro-systems, which would ultimately ground the notion of information in Dretske's program, are coherent entities with interiority and complex interactions with one another. Cross-correlations between states would be the norm. Unlike Dretske's restricted interactions, the more generalized Shannon system allows entanglement which can manifest physically as uncertainty, entropy and heat flow. Because of crosscorrelations, the ontic framework of Shannon's theory is not merely the linear superposition of Dretske's instantiated states-the robustness offered by Shannon's framework is commensurate with the program of statistical physics, whereas Dretske's decoupling is not. Furthermore, the Shannon program allows for macro-systems to exhibit interiority, which is not possible in the Dretske scenario. The issue of transinformation notwithstanding, Dretske's attempt to locate semantics in the instantiation of particular events or states of affairs at the source moves his theory away from the essence of Shannon information in an even more profound way and this is related to a second misreading of communication theory. One of the fundamental properties Dretske requires of information is that it obey the so-called Xerox principle: If A carries the information that B, and B carries the information that C, then A carries the information that C [Dretske, p57-8]. From this he concludes that information preserving channels cannot allow any equivocation since this would involve information loss that 16 As discussed in section 2, we need to get uncertainty into the naturalistically grounded system to speak of information flow. If there is an external conscious agent, choosing the states si, then we can say information flows between S and R, because of the uncertainty in the conscious agent's choices. But this is an imposed uncertainty, not an emergent uncertainty. 2007 Timothy Rogers Page 20 of 23 would cumulatively build error into the transmission of information. [Dretske p58 and p245]. Dretske invokes communication theory to back up his claim: Communication theory tells us that the distant links of this communication chain [from A to B to C] will carry progressively smaller amounts of information about C (since the small equivocations that exist between the adjacent links accumulate to make a large amount of equivocation between the end points). [Dretske p59]. If information is localized in the instantiation of a particular state of affairs then the errors build as Dretske says. But this is not how information works in communication theory. The whole strength of communication theory is that the information can pass intact from A to C despite equivocation because of the way it is collectively encoded to overcome the losses in the channel. Unlike Dretske's theory, communication theory provides the means to negotiate tolerable error and does not require error-free lawful regularities. I think Dretske really misses the boat here and he wrongly draws support from communication theory for his claims. 5.2 Dretske's Semantic Relationship Dretske doesn't need Shannon's framework for his nuclear concept of informational content. He only requires perfect counterfactual correlation that is established by nomic, or law-like regularities in the system. The nomic dependence is required to exclude de facto correlations that may be spurious, without informational content and therefore irrelevant [Dretske, p75-7]. And from this definition it is clear why he requires exceptionless, law-like regularity in the system-a property that I have already argued is unlikely to obtain. Dretske's overall program is a grand scheme for locating information, knowledge and truth in objective physical mechanisms, such that higher-order cognitive processes are derivative. Although I have argued that this grand scheme is untenable, at least in classical physics, I believe there are some significant kernels of insight in Dretske's theory. Suppose, for example, we relinquish the notion of fundamental, exceptionless laws and refer to the physical relationship in which Dretske's nuclear concept of information is embedded as an "information-preserving physical channel". This channel is based on nomic regularities (i.e. "laws of physics") and, as Lombardi points out, this setup imparts a physical nature to information and the dynamics of its flow. "Information may be conceived as a physical entity, whose essential feature is its capacity to be generated at one point of the physical space and transmitted to another point." [Lombardi, p35]. The information-preserving physical channel, as Dretske conceives it, preserves information on a state by state basis. This is an idealization that can never occur in any physical system for the reasons already discussed. Channels are always noisy as Dretske acknowledges [Dretske, p107-124]. However, the notion of an informationpreserving physical channel may have traction in the absence of exceptionless laws if the 2007 Timothy Rogers Page 21 of 23 full statistical power of Shannon's theory is brought to bear. That is to say, noisy channels, in which perfect counterfactual correlation on a state-by-state basis is not achieved, may still have information-preserving capacity and in this way, Dretske's nuclear concept of informational content might be resuscitated. The trick is to find a way to manage tolerable uncertainty. An interesting aspect of this approach is that the information transmitted is not about a state, per se, but rather about an abstracted property of a coherent macrosystem-the information references a smeared, or "digitalized", level of the ontic framework. The key insight of Dretske that leads to a genuine semantic sense of information is not to be found in his nuclear definition which invokes information-preserving physical channels. Rather the insight comes from his articulation of an irreducible three-fold relatedness that establishes a non-physical information-preserving relation between two ensembles. Specifically, imagine an information source A that is connected to two different receivers B and C through two separate information-preserving physical channels as in the Figure 4 below. C B A Figure 4: Semantic Information Preserving Channel In this case, B and C contain information about A by virtue of the physical channel. But there is also an informational relationship between B and C which is not mediated by a physical channel. Neither is it a "spurious" correlation. This informational relationship has genuine semantic import-B "says something" about C even though there is no direct physical flow of information between them. Through this irreducible threefold relatedness, a semantic realm can obtain: A is transmitting to both B and C via some physical channel (indicated by solid lines). B and C are isolated from one another in the sense that there is no physical signal passing directly between them. Though B and C are physically isolated from one another, there is nonetheless an informational link between them (indicated by the broken line). According to information theory, there is a channel between B and C. "Because of the common dependence of B and C on A, it will 2007 Timothy Rogers Page 22 of 23 be possible to learn something about C by looking at B, and vice versa. In the sense of information theory, it is correct to say that a channel exists between B and C" [Quastler as quoted by Dretske, in Dretske, p38] As a concrete example, consider a person [A], as generator of information, repeatedly pointing to a rock [B] and saying the word "rock" [C] at the same time. Two informationpreserving physical channels are established: 1) between the source [A] and the physical rock [B], and 2) between the source [A] and the word "rock" [C.] But, through this process, a third, non-physical correlation is also established between the word "rock"[C] and the physical rock [B]. This non-physical, information-preserving channel has a semantic aspect. Another, more provocative example. Suppose the two states B and C are different states in the same observer (eg mental states), caused by an information carrying signal from A (eg. a tree). A non-causal informational link connects B and C. In the simplest case, this connection is an identity, establishing A as a temporally enduring object. But, drawing from our discussion of Shannon information and the notion of abstraction in section 3, we could also imagine that the informational link between B and C is not an identity but rather a pattern of coherence that points to, or refers to, an abstracted property of the object. What the three-fold relatedness provides is a mechanism of synchronicity that can stabilize the semantic description at a particular level of graininess. There is much more to say about this, but that is beyond the scope of this paper17. 17 In another unfinished paper, I have tried to connect this three-fold relatedness to the core notion of light in the special theory of relativity, which is another communication-based theory. The threefold relation establishes a synchronization mechanism particular to a frame-of-reference and this synchronization is the origin of space-time and the subject-object ontic framework. 2007 Timothy Rogers Page 23 of 23 References Bais, F. Alexander and Farmer, J. Doyne. The physics of information: information theory in the light of thermodynamics, statistical mechanics and nonlinear dynamics. Sante Fe Institute, 2005 [draft] Available: http://www.illc.uva.nl/HPI/Draft_Physics_and_Information.pdf Dretske, Fred J. Knowledge and the Flow of Information. Cambridge: MIT Press, 1981. Hockema Steve. Private communication. Lombardi, O. Dretske, Shannon's theory and the interpretation for information. Synthese 144:23-39, 2005. Reichl, LE. A Modern Course in Statistical Physics. Austin: University of Texas Press, 1980. Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal 27:379-423, 623-56, 1948.