Noname manuscript No. (will be inserted by the editor) On the ergodic theorem and information loss in statistical mechanics Andreas Henriksson Received: date / Accepted: date Abstract In this article, it is argued that, for a classical Hamiltonian system which is closed, the ergodic theorem emerge from the Gibbs-Liouville theorem in the limit that the system has evolved for an infinitely long period of time. In this limit, from the perspective of an ignorant observer, who do not have perfect knowledge about the complete set of degrees of freedom for the system, distinctions between the possible states of the system, i.e. the information content, is lost leading to the notion of statistical equilibrium where states are assigned equal probabilities. Finally, by linking the concept of entropy, which gives a measure for the amount of uncertainty, with the concept of information, the second law of thermodynamics is expressed in terms of the tendency of an observer to loose information over time. Keywords Gibbs-Liouville theorem * Ergodic theorem * Statistical equilibrium * Second law of thermodynamics PACS 05.20.-y * 05.70.-a * 02.50.-r Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 1 2 The Gibbs-Liouville theorem . . . . . . . . . . . . . 2 3 Uncertainty and indistinguishability . . . . . . . . . 2 4 Conservation of classical probability . . . . . . . . . 2 5 Statistical equilibrium . . . . . . . . . . . . . . . . . 3 6 The ergodic theorem . . . . . . . . . . . . . . . . . . 3 7 Microcanonical probability distribution . . . . . . . 4 8 Non-uniform probability distributions . . . . . . . . 4 9 Ergodicity breaking . . . . . . . . . . . . . . . . . . 4 10 Is information conserved or lost? . . . . . . . . . . . 4 Andreas Henriksson St.Olav school, Jens Zetlitzgt. 33, 4008, Stavanger; Norway E-mail: andreas.henriksson@skole.rogfk.no ORCID: 0000-0001-9014-4320 11 Entropy as a measure of uncertainty . . . . . . . . . 5 12 The second law of thermodynamics . . . . . . . . . . 6 1 Introduction A key feature of quantum mechanics which distinguish it from classical mechanics is the Heisenberg uncertainty principle [1][2]. It state that there exist a fundamental limit to the precision by which the state of a system on phase space can be determined. For the ultimate purpose of gaining a better understanding and illuminating the key differences between the evolution in time of classical and quantum systems, the concepts and ideas which lay the foundations of statistical mechanics is in this article revisited and reinterpreted. In the discussion on the foundations of statistical mechanics, it is important to realize that it is a theory which rely both on the fundamental deterministic character of the evolution of classical systems, as characterized mathematically by the Gibbs-Liouville theorem [3][4], and the fact that any given observer of such a system do possess some amount of ignorance about the complete set of degrees of freedom which describe the system. It is this ignorance which lead to the appearance of concepts such as uncertainty, probability and entropy. In fact, it is this ignorance which, from the perspective of the observer, lead to the conclusion that classical systems evolve towards a state of equilibrium where the entropy is at a maximum. In other words, the second law of thermodynamics is proposed to have its origin not within a fundamental law of Nature but rather from the ignorance possessed by the observer of the system. 2 Andreas Henriksson 2 The Gibbs-Liouville theorem For classical Hamiltonian systems, the Gibbs-Liouville theorem state that the Hamiltonian flow on phase space is incompressible. A necessary and sufficient condition for incompressibility is that the divergence of the Hamiltonian phase flow velocity vanish, i.e. that ∇ * v = 0 (1) where v = (q, ṗ) = ( ∂H ∂p ,−∂H ∂q ) (2) where H is the Hamiltonian of the system. The GibbsLiouville theorem can be interpreted to represent a mathematical statement on the deterministic evolution of classical Hamiltonian systems, i.e. that distinctions between the possible states of the system, or, equivalently, that the information content within the system, is conserved in time [5]. 3 Uncertainty and indistinguishability Even when ignoring the laws of quantum mechanics, which place a fundamental limit on the precision which can be gained, the dynamical evolution of a system is quite complicated. Most systems of interest contain a vast amount of particles that interact in complicated ways. For such large systems, it is usually very hard to track the individual evolution of each particle as the system evolve in time. Perfect knowledge about the position and velocity, or momenta, of each individual particle is lost. It is lost not because of a fundamental violation of information conservation but merely because of the difficulty for an observer to keep track of all the degrees of freedom. Therefore, from the perspective of the observer, there is an uncertainty ∆q associated with the position of a state and an uncertainty ∆p associated with the momentum of a state. For this reason, the observer is unable to determine with absolute certainty the state of the system at any given time. The observer can only determine whether or not the system occupy a state which lie within any given region Ωj on phase space, whose volume VΩj is given by the uncertainties ∆q and ∆p, i.e. VΩj = ∆q∆p (3) The volume VΩj is thus a measure of how ignorant the observer is about the details of the system, in the sense that the observer cannot locate an individual state to a greater precision than the size of Ωj . Due to this lack of precision, the observer is unable to distinguish between states that lie within Ωj . All states within Ωj , with their individual sets of degrees of freedom, has, from the perspective of the observer, collapsed into a single state whose single set of degrees of freedom is given by q + ∆q and p + ∆p. This so-called coarse-grained, or mixed, state is not a fundamental, or pure, state of the system. It is a description that average over all pure states within Ωj . Put differently, a mixed state ψj , j ∈ [1,M ], where M is the number of mixed states on phase space, is a subjective representation, by an ignorant observer, of a collection of pure states φα, α ∈ [1, N ], where N is the number of pure states within Ωj . As the system evolve in time, the observer is only able to measure the coarse-grained flow, i.e. the jumping from one mixed state ψj to a different mixed state ψi, i 6= j. It should be noted that due to the lack of perfect knowledge about all the relevant degrees of freedom, the observer is unable to predict a unique evolutionary path on phase space along which the system evolve. 4 Conservation of classical probability Due to the ignorance of the observer, i.e. the observers inability to distinguish the set of pure states within any given coarse-grained region Ωj , it is necessary to introduce the notion of probability on phase space. Let Pj be the probability that the system occupy the region Ωj and let Pα be the probability that the system occupy the pure state φα within Ωj . If the observer know with absolute certainty that the system occupy the mixed state ψj and not some other state ψi, i 6= j ∈ [1,M ], it is given that Pi = 0,∀i 6= j ∈ [1,M ] (4) Pj ≡ N∑ α=1 Pα = 1 (5) For continuous systems, the summation is replaced by an integral, i.e. Pj ≡ ∫ Ωj Pα dVα = 1 (6) where dVα = dqαdpα is the phase space volume of the pure state φα. If the knowledge possessed by the observer about the coarse-grained flow of the system is not lost over time, then the probability Pj is constant in time, i.e. dPj dt = 0 (7) In other words, it is assumed that there is no loss of probability from Ωj to any other coarse-grained region Ωi, i 6= j. Written in terms of the probabilities Pα, the On the ergodic theorem and information loss in statistical mechanics 3 condition of no loss of coarse-grained knowledge become dPj dt = d dt ∫ Ωj Pα dVα = ∫ Ωj ( dPα dt + Pα ∇ * v ) dVα = 0 (8) Since this should hold independently on the size of Ωj , the integrand must identically vanish, i.e. dPα dt + Pα ∇ * v = 0 (9) This is the continuity equation for probability flow within any given coarse-grained region Ωj . It is referred to as the Gibbs-Liouville equation for the probability distribution within Ωj [3][4]. Given that information is conserved within Ωj it is thus obtained that the probability distribution Pα is conserved, i.e. if ∇ * v = 0 then dPα dt = 0 (10) The continuity equation can be rewritten, showing that probability is locally conserved within Ωj . Using the total time derivative of Pα, i.e. dPα dt = ∂Pα ∂t + ∇Pα * v (11) and the product rule ∇ * (Pα v) = ∇Pα * v + Pα ∇ * v (12) the continuity equation become ∂Pα ∂t + ∇ * (Pα v) = 0 (13) The term ∇ * (Pα v) represent the difference between the probability outflow and inflow for the pure state φα. If there is a net probability outflow from φα to the rest of Ωj , i.e. if ∇ * (Pα v) > 0 (14) then the continuity equation give that the probability for φα decrease with time, i.e. ∂Pα ∂t = −∇ * (Pα v) < 0 (15) If there is a net probability inflow to φα from the rest of Ωj , i.e. if ∇ * (Pα v) < 0 (16) then the continuity equation give that the probability for φα increase with time, i.e. ∂Pα ∂t = −∇ * (Pα v) > 0 (17) In terminology borrowed from quantum mechanics, systems which evolve in such a way that the probability distribution is conserved in time and with a total probability equal to unity are said to exhibit unitary evolution. The assumption of unitary evolution for quantum systems is a key ingredient in the formulation of quantum mechanics. In classical mechanics, the statement of unitary evolution is a direct consequence of the GibbsLiouville theorem, i.e that information is conserved. 5 Statistical equilibrium Consider a system which has been closed for a sufficiently long period of time such that the density of pure states within Ωj , and hence M , do not change with time. In this situation, the probability distribution Pα has no explicit dependence on time. The continuity equation is then reduced to ∇ * (Pα v) = 0 (18) This is the mathematical condition the system need to satisfy in order for it to be said to exist in statistical equilibrium. In other words, a system is in statistical equilibrium if there is no net probability flow on phase space. 6 The ergodic theorem The incompressibility of the Hamiltonian flow imply that the time the system spend in any single pure state, before evolving to the next single pure state, is the same for all pure states. If this was not the case, the state points on phase space would lump together which would signify a violation of information conservation. This imply that over the course of a long period of time, the total time spent by the system in any given pure state is expected to be the same for all pure states. This expectation, which is due to a combination of the GibbsLiouville theorem and the law of large numbers, is in this article interpreted to be equivalent to the ergodic theorem of statistical mechanics [6][7][8]. Let nα denote the number of times the system occupy the pure state φα. The total number of times, n, the system occupy the set of N pure states within Ωj is then n = N∑ α=1 nα (19) The ergodic theorem then say that over a long period of time, such that n is large, it is expected that the system occupy all pure states within Ωj an equal number of times, i.e. nα = nβ , ∀β 6= α ∈ [1, N ] (20) such that n = N * nα (21) 4 Andreas Henriksson 7 Microcanonical probability distribution It is now possible to define the notion of a probability Pα for the pure state φα of a closed system from the notion of a relative frequency1, Pα ≡ lim n→∞ nα n = nα N * nα = 1 N (22) Thus, all the pure states within Ωj are equally probable. This imply that an observer has lost all information, down to the scale of VΩj , about the system, since no distinctions can be made between the possible pure states within Ωj . This is always true for systems in statistical equilibrium. The uniform probability distribution given by equation 22 is commonly referred to as the microcanonical [3], or fundamental [9], probability distribution. 8 Non-uniform probability distributions There exist also non-uniform probability distributions. The non-uniformity arise due to interactions that the system has, or have had in the not too far distant past, with an environment. In other words, the system is, or was recently, not isolated. Due to the interaction with an environment, the density of states change with time. If the interaction is uniform on phase space, the density change uniformly on phase space. However, in general, this is not the case. An interaction, characterized by a potential energy, do depend on the specific values for the generalized coordinates. In that scenario, the density of states is a local function on phase space. This has the consequence that the total time spent by the system within any given region on phase space is not necessarily the same as within any other equally-sized region. In other words, the ergodic theorem appear to be violated. Thus, not only is the probability distribution non-uniform when there is a non-negligible net interaction with the environment, it can also change over time. To put it differently, if there exist an interaction between the system and its environment, as seen from the perspective of an observer of the system, this imply that the observer possess knowledge, i.e. information, about the interaction. This information is used by the observer when assigning probabilities for the possible states of the system. The fact that the observer possess some amount of information mean necessarily that the 1 It must be emphasized that this relative frequency is not possible to obtain from a set of repetitive experimental measurements, since the observer, being ignorant, is not able to distinguish between the set of pure states. probability distribution is non-uniform. It is only at statistical equilibrium, where all information is lost, that the observer assign a uniform probability distribution. From the definition of probability in statistical equilibrium it is clear that the probability for any given pure state decrease as the number of pure states N increase, i.e. as the uncertainty volume increase. In nonequilibrium, where probabilities are not equal, it is the average probability which decrease as the uncertainty volume increase. 9 Ergodicity breaking It should be emphasized that the apparent violation of the ergodic theorem is not of a fundamental character. It is only due to the fact that the degrees of freedom associated with the environment cannot be excluded when defining the degrees of freedom for the system. In other words, the environment should be included in the definition of the system. If that is done then there exist no environment and hence there cannot be any net transfer of energy and particles from, or to, the system. Then, this redefined system, which take into account all degrees of freedom, even those which the experimenter may think belong to an 'environment', do indeed conserve information and ergodicity is not broken. The probability distribution for the states of this redefined system is uniform, i.e. all mixed states for any given system, assuming the system has been defined such that no degrees of freedom are being forgotten, are equally probably. In most practical situations, however, there will always exist an environment to any system under study. The question is to what degree this environment interact with the system. The weaker the interaction, the weaker is the ergodicity breaking and the closer will the system come to a uniform probability distribution. 10 Is information conserved or lost? At this stage, it is necessary to clarify the notions of information loss and information conservation to avoid confusion. The process of information loss, experienced by an observer, and the notion that information is conserved seem to contradict each other, making it impossible for a given system to reach statistical equilibrium over time unless it started there. The confusion arise due to a key difference between the statistical evolution of the system, as experienced by an ignorant observer, and the deterministic evolution of the system as described by the classical laws of motion. The subtlety which must be emphasized is that the statement On the ergodic theorem and information loss in statistical mechanics 5 of classical determinism, represented mathematically by the Gibbs-Liouville theorem, is a postulate on the fundamental character of classical systems independent on whether there exist any observer or not, whereas the notion of information loss is observer dependent. The statement of information loss tries to capture the tendency of an observer to become more ignorant over time. In conclusion, statistical equilibrium is a statement on the amount of knowledge, or information, an observer of the system possess. In statistical equilibrium, the observer is unable to make any distinctions between the possible states of the system and therefore possess zero information. This do not imply that there are no fundamental distinctions between the states of the system. The observer is simply unaware of them. In fact, if the system do fundamentally conserve information, i.e. the distinctions between the possible states of the system exist for all times, the system is fundamentally never in statistical equilibrium, from the perspective of an observer whose knowledge of the degrees of freedom for the system is perfect and complete. Thus, the fundamental question of how a system, which is initially not in statistical equilibrium, can evolve into a state of statistical equilibrium, should be modified as follows: How can an observer of a given classical system, which flow on phase space according to the Hamilton equations, loose information about the system over time? 11 Entropy as a measure of uncertainty A measure for the amount of ignorance possessed by the observer, i.e. the amount of uncertainty in the determination of the pure state of the system, should depend on the probability distribution {Pα}. This measure is denoted by S({Pα}) and referred to as the entropy of the system. To obtain a specific form for the entropy as a function of the probability distribution, it is noted that this function should satisfy the following conditions. i The entropy should be zero when the observer has complete knowledge about the evolution of the system. In other words, if the observer know with absolute certainty that the system occupy a specific state φα, such that Pα = 1 and Pβ = 0 ∀β 6= α, the entropy must vanish. ii The entropy should always be either zero or a positive number, i.e. S ≥ 0. iii The entropy should take a maximum value when the observer is maximally ignorant. This happen when the system is in statistical equilibrium. When all states are equally probable, it imply that the observer possess zero partial knowledge which can be used to distinguish between some of the features of the set of states. Thus, Pα = 1 N ∀α ∈ [1, N ] → S({Pα}) = Smax (23) iv The entropy should, in statistical equilibrium, be a continuously increasing function of the number of states N . In other words, when N increase, the uncertainty volume VΩj increase continuously. v The entropy should satisfy the following composition law, S({Pα} * {Pβ}) = S({Pα}) + S({Pβ}) (24) This composition law is understood as follows. Let Ωj be divided into two subregions Ω α j and Ω β j such that VΩj = VΩαj + VΩβj . The states φα, α ∈ [1, Nα], belong to Ωαj and the states φβ , β ∈ [1, Nβ ], belong to Ωβj , where Nα + Nβ = N . The corresponding probability distributions, {Pα}Nαα=1 and {Pβ} Nβ β=1, satisfy ∑Nα α=1 Pα + ∑Nβ β=1 Pβ = 1 and, due to them being independent of each other, their product give the probability distribution associated with the region Ωj , i.e. P (Ωj) = {Pα} * {Pβ}. The composition law thus state that the total uncertainty within region Ωj is the sum of the uncertainties associated with the subregions of Ωj . Conditions (i) and (v) suggest that the entropy has a logarithmic dependence on the probability distribution. Condition (ii) suggest that it is necessary to include an additional minus sign in the definition of the entropy. This is seen from the general definition of Pα, i.e. logPα = lim n→∞ log (nα n ) = log nα − lim n→∞ log n < 0 (25) which, for a system in statistical equilibrium become logPα = log 1 N = log 1− logN = − logN < 0 (26) Since the entropy function should act as a measure for systems both in and out of statistical equilibrium, i.e. for both uniform and non-uniform probability distributions, it is required to take the statistical average of all logarithmic contributions to the entropy, i.e. S({Pα}) ∼ − (n1 n logP1 + * * *+ nN n logPN ) (27) ∼ − N∑ α=1 nα n logPα (28) ∼ − N∑ α=1 Pα logPα (29) This entropy function then satisfy conditions (iii) and (iv). With the proportionality constant identified with 6 Andreas Henriksson the Boltzmann constant kB , it is referred to as the Gibbs entropy [3] and is, in the information theoretic language, identical to the Shannon entropy [10][11][12]. In conclusion, the entropy of a system measure the amount of uncertainty within the system, and it is given by the Gibbs formula S({Pα}) = −kB N∑ α=1 Pα logPα (30) In statistical equilibrium, the Gibbs entropy reduce to the Boltzmann entropy [6][13], S = kB logM (31) It is important to emphasize that entropy is not a physical quantity in the same manner as e.g. energy. It is determined by the probability distribution of the states of the system and as such it is a quantity which depend both on the specifics of the system and of the ignorance of the observer. 12 The second law of thermodynamics Given that the probability distribution is conserved, i.e. that the observer do not become more ignorant over time, the entropy, which is a function of the probability distribution, is necessarily also conserved. In practical reality, however, it is most often the case that the observer loose more and more track of the flow of the system as time evolve thus becoming more ignorant over time. The reason for this is that any motion of some object, initially fairly isolated, will interact with its environment thus involving more and more degrees of freedom over time, making it increasingly difficult for the observer to avoid information loss. Thus, for any observer, over time, the entropy tend to increase. Therefore, over time, the observer tend to loose information about the system. Eventually, the entropy has reached a maximum value, which is when the system is in statistical equilibrium. At this point, entropy will not increase and the observer has stopped loosing information. The information has been completely lost. There is zero information left since the observer is unable to make any type of distinctions between the possible states of the system. The second law of thermodynamics is thus, in terms of information, stated as: An observer tend to loose information about any given system over time until there is none left. It is important to emphasize that the second law is a probabilistic law. The amount of information possessed by an observer about a system is equivalently characterized by the amount of uncertainty, i.e. the entropy, which is dependent on the probability distribution as discussed earlier. Acknowledgement The author would like to express gratitude to Anke Olsen for many inspiring discussions in our office at Bacchus. References 1. W. Heisenberg "Über den anschaulichen inhalt der quantentheoretischen kinematik und mechanik," Zeitschrift für Physik, 43, No.3-4, p.172-198 (1927). 2. W. Heisenberg "The actual content of quantum theoretical kinematics and mechanics," NASA Technical Memorandum, English translation, 35 pages (1983). 3. J.W. Gibbs "Elementary principles in statistical mechanics," Charles Scribner's sons, New York. (1902). 4. F. Bloch "Fundamentals of statistical mechanics: manuscript and notes of Felix Bloch," Imperial College Press and World Scientific Publishing, 3rd ed. (2000). 5. A. Henriksson "On the Gibbs-Liouville theorem in classical mechanics," Philosophy of Science Archive (PhilSci Archive) (2019). http://philsci-archive.pitt.edu/16000/ (accessed 2019-05-12). 6. L.E. Boltzmann "Vorlesungen über Gastheorie, vol II," Verlag von Johann Ambrosius Barth, Leipzig (1898). 7. G.D. Birkhoff "Proof of the ergodic theorem," Proceedings of the National Academy of Sciences (U.S.), 17, p.656-660 (1931). 8. J.v. Neumann "Physical applications of the ergodic hypothesis," Proceedings of the National Academy of Sciences (U.S.), 18, p.263-266 (1932). 9. A.Ya. Khintchine "Fundamental laws of probability," Moscow State University, Moscow, 1927; 2nd ed. GTTI, Moscow (1932). 10. C.E. Shannon "A mathematical theory of communication," The Bell System Technical Journal, 27, p.379-423, 623-656 (1948). 11. E.T. Jaynes "Information theory and statistical mechanics," Physical Review, 106, No.4, p.620-630 (1957). 12. E.T. Jaynes "Information theory and statistical mechanics II," Physical Review, 108, No.2, p.171-190 (1957). 13. L.E. Boltzmann "Vorlesungen über Gastheorie, vol I," Verlag von Johann Ambrosius Barth, Leipzig (1896).