Probabilities in Statistical Mechanics Wayne C. Myrvold Department of Philosophy The University of Western Ontario wmyrvold@uwo.ca September 10, 2014 1 Introduction Probabilities first entered physics in a systematic way in connection with the kinetic theory of gases, according to which a gas consists of a large number of molecules moving about in a haphazard, effectively random way.1 This theory was developed, at the hands of Maxwell, Boltzmann, and Gibbs, into the science that we (following Gibbs) now call statistical mechanics, a theory whose scope has been extended well beyond treatment of gases. Though statistical mechanics has grown into a well-established branch of physics with a substantial array of agreed-upon techniques of calculation, with impressive empirical success, there is little agreement on the ultimate rationale for its methods. For this reason, there has a risen a substantial philosophical literature on conceptual issues associated with statistical mechanics. Much of the philosophical discussion deals, in one way or another, with the role of probability in statistical mechanics. This chapter will review selected aspects of the terrain of discussions about probabilities in statistical mechanics (with no pretensions to exhaustiveness, though the major issues will be touched upon), and will argue for a number of claims. None of the claims to be defended is entirely original, but each deserves emphasis. The first, and least controversial, is that probabilistic notions are needed to make sense of statistical mechanics. The reason 1See Brush (1976a,b) for a survey of the early history. 1 for this-which was, in fact the reason that convinced Maxwell, Gibbs, and Boltzmann that probabilities would be needed- is that the second law of thermodynamics, which in its original formulation says that certain processes are impossible, must, on the kinetic theory, be replaced by a weaker formulation according to which what the original version deems impossible is merely improbable. Second is that we ought not take the standard measures invoked in equilibrium statistical mechanics as giving, in any sense, the correct probabilities about microstates of the system. We can settle for a much weaker claim: that the probabilities for outcomes of experiments yielded by the standard distributions are effectively the same as those yielded by any distribution that we should take as a representing probabilities over microstates. Lastly (and most controversially): in asking about the status of probabilities in statistical mechanics, the familiar dichotomy between epistemic probabilities (credences, or degrees of belief) and ontic (physical) probabilities is insufficient; the concept of probability that is best suited to the needs of statistical mechanics is one that combines epistemic and physical considerations. Outline of the chapter. I will set the stage by briefly reviewing the backdrop, in probability theory, against which the founders of statistical mechanics were working. We will then see how probabilities were introduced into statistical mechanics, and review the considerations that led Maxwell, Gibbs, and Boltzmann to conclude that probabilities were needed in statistical mechanics. Since probabilities play somewhat different roles in the two approaches to statistical mechanics that have their roots in the work of Boltzmann, and Gibbs, respectively, I will briefly present these approaches. I will then discuss some approaches to justifying the choice of probability measures. Lastly, I will discuss some puzzling aspects of the use of the standard equilibrium measures, and argue that these puzzles can be resolved either by invoking quantum probabilities, or by construing the probabilities in statistical mechanics as almost objective probabilities. 2 Meanings of "probability" As Hacking (1975) has amply demonstrated, from the early days of probability theory, there were two distinct concepts that went by the name of "probability." One is an epistemic concept, having to do with degrees of belief. The other, which Hacking calls the aleatory conception, attributes 2 probabilities to events in the world, such as the toss of a coin, which they are thought to possess independently of our knowledge or belief.2 These need not be regarded as rivals; they are two potentially useful senses in which the word "probability" is used, which we must be careful not to conflate. Due, perhaps, to concerns about compatibility of objective chance with the presumed determinism of the laws of nature, we find, in the latter half of the 19th century, a frequency conception replacing single-case chance as the favored construal of objective probability. So thoroughly did the notion of single-case chance drop out of discussions that subjectivists in the first half of the 20th century, such as de Finetti and Savage, when arguing against objective notions of probability, omitted it from their lists of notions to be considered and rejected, and Popper (1957, 1959) took himself to be introducing an entirely new idea when he introduced the notion of single-case objective probabilities, which he called propensities. A central question in understanding the use of probabilities in statistical mechanics is the status of these probabilities. Which notion is in play? Are they epistemic, having to do with our state of knowledge or belief about the system, or are they ontic, properties of the physical systems themselves? If ontic, should they be thought of in frequentist terms, or in terms of singlecase chances? Textbook introductions of probabilities in statistical mechanics typically begin with the observation that, though the systems considered have very many degrees of freedom, our knowledge of the state of a system is limited to the measured values of a small number of thermodynamic variables. This suggests an epistemic reading of the probabilities. And, indeed, there is a long history of construing statistical mechanical probabilities as purely epistemic.3 This fits uncomfortably, however, with the idea that the theory to be developed belongs strictly to physics, and that objective laws of thermodynamics are to be recovered on its basis. These considerations suggest an ontic reading. But, in the context of classical physics, with its deterministic laws of motion, a reading of the probabilities as objective chances seems out of the question. This seems to leave some sort of frequentism as the only option for an ontic reading of probabilities in statistical mechanics. Is frequentism a viable option? It is certainly true that there is a close 2For one clear statement that there are two distinct senses of "probability," and a characterization of the objective notion as single-case chance, see Poisson (1837, 31), quoted in Myrvold (2012a, 74–75). 3See Uffink (2011). 3 connection between frequency and chance. If I draw at random from an urn, with each ball in the urn having an equal chance of being drawn, then the chance that the drawn ball is black is equal to the relative frequency of black balls in the urn. In an infinite sequence of independent trials (such as, say, repeated rolls of a die), in each of which a certain outcome has a chance p of occurring in each trial, it can be proven (this is the Strong Law of Large Numbers) that the relative frequency of that outcome will, with chance equal to one, converge to p. Moreover, if we have available to us a long sequence of independent trials of events with equal chances, relative frequency data can be used as evidence about the values of these chances. These considerations do not however, enable us to define chance in terms of frequencies, as each of them requires a notion of chance distinct from that of relative frequencies to even state. Though the point remains somewhat controversial, there are good reasons to think that frequentism is an inadequate foundation for objective probability.4 In light of considerations such as these, absence of consensus about the status of statistical mechanical probabilities is unsurprising. Neither an epistemic nor an ontic reading seems to be adequate for the job, at least as far as classical statistical mechanics is concerned. One position that has been adopted is that classical statistical mechanics, rather than being an autonomous science, must borrow its probabilities from quantum mechanics. Though the determinism of classical physics undermined the notion of objective chance, quantum mechanics revived it, as quantum mechanics is often regarded as a fundamentally chancy theory. Can we think of the probabilities used in classical statistical mechanics as quantum mechanical in origin? This possibility will be treated in §8.2.3, below. But, I will argue, such a move is not necessary. Though neither a purely epistemic nor a purely ontic reading of probabilities of statistical mechanics is available in the context of classical physics, the epistemic/ontic dichotomy is not exhaustive. As will be argued in §8.2.2, and in §9, there is a notion of probability that combines epistemic and physical considerations, that seems to be well suited for the role required of it by statistical mechanics. 4For further discussion, see Jeffrey (1992), Hájek (1997, 2009). 4 3 The Introduction of Probability into Statistical Mechanics It is useful to distinguish between two sorts of use of the probability calculus. One sort, which we may call quasi-deterministic, uses, implicitly or explicitly, a version of the Law of Large Numbers to replace, when dealing with a large collection of things, some quantity by its expectation value. For example, in a long enough sequence of tosses of a fair coin, we may take the fraction of tosses that are heads to be one-half, as this will, with high probability, be a good approximation. The hallmark of this use is effective certainty from uncertainty; a large number of individually unpredictable events combine to yield a result that is almost certain. The second sort of use is found in cases in which the deviations of a quantity from its expectation value are not negligible. Von Plato (1994, 72) credits Krönig (1856) with the first use of probability in the context of the kinetic theory of gases. Krönig's use of probability is of the quasi-deterministic sort, to conclude that out the irregular motion of molecules would arise regularity (Krönig, 1856, 316). We also find a quasideterministic use, in passing, in Clausius (1857, 371-72; 1966, 126). Unlike his predecessors, Maxwell did not replace a gas of molecules moving with different speeds with one in which all have the same speed, but, rather, investigated the distribution of velocities one should expect to find among the molecules of a gas. Much of his work in kinetic theory is concerned with showing that the distribution of velocities in a gas will be what is now called the Maxwell-Boltzmann distribution. In 1867 he attempted to show that molecular collisions would lead to the Maxwell-Boltzmann distribution of velocities. The argument relies on the assumption (invoked without comment) that the Ehrenfests would later call the Stosszahlansatz (Ehrenfest and Ehrenfest, 1912). This is the assumption that, for pairs of molecules about to collide, one can assume probabilistic independence of the incoming velocities, and moreover, treat the two molecules as if their velocities are randomly sampled from the distribution of velocities in the gas as a whole. Maxwell shows that the Maxwell-Boltzmann distribution is stationary under collisions, and concludes that this distribution is what collisions will lead to. Maxwell's use of probabilistic reasoning is of the quasi-deterministic sort. But it was the Maxwell-Boltzmann distribution, which makes it clear that there will be variations of speeds among the molecules of the gas, that led 5 him eventually to conclude that the second law of thermodynamics would hold, at best, with high probability for macroscopic systems. Boltzmann, in 1868 and, more significantly, in 1872, sought to provide a derivation more satisfactory than Maxwell's. In 1872 he argued that molecular collisions would lead to a decrease in a quantity that he called H, a result known as Boltzmann's H-theorem. Though the proof requires the Stosszahlansatz, it is not highlighted by Boltzmann as a special assumption. So far, these are all quasi-deterministic, or order-from-disorder applications of probability. However, if thermodynamic relations are relations between expectation values of quantities defined as averages of molecular properties, then we should expect to find deviations from these relations, though the probability of significant deviations will diminish as the number of molecules increases. Of particular significance is the recognition that the second law of thermodynamics, as originally conceived, cannot, on the kinetic theory, be strictly correct; at best we can expect it to hold, with high probability, to a high degree of approximation, for systems of many degrees of freedom. Recognition of limitations on the validity of the second law of thermodynamics appears in Maxwell's correspondence starting about 1867. The key consideration is the issue of reversibility. On the assumption that intermolecular forces depend on only their relative positions, the dynamical laws governing molecular motions will be symmetric under time reversal. Therefore, thermodynamic irreversibility cannot be a consequence of dynamical considerations alone. Maxwell's view is that processes that, from the point of view of thermodynamics, are regarded as irreversible, are ones whose temporal inverses are not impossible, but merely improbable. In a letter to the editor of the Saturday Review, dated April 13, 1868, Maxwell draws an analogy between mixing of fluids and balls shaken in a box. As a simple instance of an irreversible operation which (I think) depends on the same principle, suppose so many black balls put at the bottom of a box and so many white above them. Then let them be jumbled together. If there is no physical difference between the white and black balls, it is exceedingly improbable that any amount of shaking will bring all the black balls to the bottom and all the white to the top again, so that the operation of mixing is irreversible unless either the black balls are heavier than the white or a person who knows white from black picks 6 them and sorts them. Thus if you put a drop of water into a vessel of water no chemist can take out that identical drop again, though he could take out a drop of any other liquid (in Garber et al. 1995, 192–193). We find similar considerations in Gibbs several years later. [W]hen such gases have been mixed, there is no more impossibility of the separation of the two kinds of molecules in virtue of their ordinary motions in the gaseous mass without any external influence, than there is of the separation of a homogeneous gas into the same two parts into which it has once been divided, after these have once been mixed. In other words, the impossibility of an uncompensated decrease of entropy seems to be reduced to improbability (Gibbs 1875, 229; 1961, 167). It is one thing to acknowledge that violations of the second law will sometimes occur, albeit with low probability. Maxwell went further, asserting that, on the small scale, minute violations of the second law will continually occur; it is only large-scale, observable violations that are improbable. [T]he second law of thermodynamics is continually being violated, and that to a considerable extent, in any sufficiently small group of molecules belonging to a real body. As the number of molecules in the group is increased, the deviations from the mean of the whole become smaller and less frequent; and when the number is increased till the group includes a sensible portion of the body, the probability of a measurable variation from the mean occurring in a finite number of years becomes so small that it may be regarded as practically an impossibility. This calculation belongs of course to molecular theory and not to pure thermodynamics, but it shows that we have reason for believing the truth of the second law to be of the nature of a strong probability, which, though it falls short of certainty by less than any assignable quantity, is not an absolute certainty (Maxwell 1878b, 280; Niven 1890, 670–71). William Thomson (1874) provided calculations of the probability of a variety of fluctuations away from the equilibrium state of mixed gases. 7 In Boltzmann's work, the quasi-deterministic use of probability in his derivation of the H-theorem, together with the tacit nature of his employment of the key probabilistic assumption, the Stosszahlansatz, fostered the impression that the H-theorem followed from molecular dynamics alone. As we have already noted, Maxwell and the British physicists working on kinetic theory were by this time (1872) keenly aware that there could be no derivation of an irreversible relaxation to equilibrium on the basis of reversible dynamics; in their view, probabilistic assumptions would be needed, and the conclusion to be derived would be that evolution away from macroscopic equilibrium, rather than towards it, is at best improbable, not impossible. There are no hints of reservations of this sort in Boltzmann's work of 1872. It was Loschmidt who, in 1876, drew Boltzmann's attention to reversibility considerations. In his response to Loschmidt, Boltzmann (1877a) acknowledged that there could be no purely dynamical proof of the increase of entropy.5 Thus, in the decade from 1867-1877, the major figures involved in the development of statistical mechanics concluded, on the basis of the reversibility argument, that the second law of thermodynamics, as originally conceived, could not be strictly true, and that it must be replaced by a probabilistic version, in which what is deemed impossible in the original version becomes improbable. 4 Revising Thermodynamics On the kinetic theory, heat is not a substance, and it makes no sense to talk of the heat content of a body. Instead, we distinguish between two modes in which energy may be transferred from one body to another; as heat, or as work done on (or by) the body. The first law of thermodynamics says that, if the internal energy U of a body changes by an amount dU , then this change is equal to the sum of energy transferred as heat, and energy transferred as work. dU = d Q+ dW. (1) The Clausius formulation of the second law of thermodynamics says that there can be no process whose sole net effect is to transfer heat from a cooler 5For further discussion of the probabilistic turn in Boltzmann's thinking, see Uffink (2007), Brown et al. (2009). 8 to a warmer body. It follows from the second law that any two reversible heat engines operating between heat reservoirs at given temperatures T1 and T2 have the same efficiency, and that no engine is more efficient than a reversible one. Considerations of the Carnot cycle lead to the conclusion that, if a system that goes through a reversible cycle that leaves it in the same thermodynamic state as it started,∮ d Q T = 0. (2) And from this, it follows that there is a function S of the thermodynamic state, such that, in any reversible process, the change in S is given by ∆S = ∫ d Q T . (3) This function (defined only up to an additive constant), is called the entropy. As already mentioned, on the kinetic theory, we should expect that not all molecules in a gas will have the same velocity, and that, as the molecules bounce around, there will be differences in local averages of kinetic energy of molecules from place to place. Therefore, since, on the kinetic theory, the temperature of a gas is proportional to the mean kinetic energy of its molecules, temperature differences will arise via spontaneous fluctuations, without expenditure of work, in contradiction to the second law. We can also expect, however, that these fluctuations will for the most part be negligible on the macroscopic scale, and large fluctuations will be both rare and unpredictable. Thus, though the second law of thermodynamics, as originally conceived, is untenable, we can set ourselves the goal of recovering from statistical mechanics a weakened version, which nonetheless would explain the evidence that led to acceptance of the stronger version. What the Clausius version of the second law deems impossible, namely, the transfer of heat from a cooler to a warmer body unaccompanied by a compensating increase of entropy, the revised version declares to be highly improbable. Maxwell would add a further limitation. Note that, in the quotation in the previous section, the improbability of reversal of the mixing of the balls is limited to circumstances in which there is no sorting of white from black. For Maxwell, the validity of even the weakened version is restricted to situations in which we are dealing with molecules in bulk and there is no manipulation of individual molecules (Maxwell, 1871, 328-329).6 6This is what the creature now known as Maxwell's demon is meant to illustrate. The 9 On Maxwell's view, the distinction between heat and work is not inherent in a physical process but has to do, rather, with the means available to us to keep track of and manipulate the motion of molecules. Available energy is energy which we can direct into any desired channel. Dissipated energy is energy we cannot lay hold of and direct at pleasure, such as the energy of the confused agitation of molecules which we call heat(Maxwell 1878a, 221; Niven 1890, 646). To a being such as Maxwell's demon, able to track individual molecules, "the distinction between work and heat would vanish, for the communication of heat would be seen to be a communication of energy of the same kind as that which we call work" (Maxwell 1878b, 279; Niven 1890, 669). With the vanishing of distinction between heat and work also vanishes any possibility of formulating thermodynamics. In particular, since the very definition of thermodynamic entropy requires a distinction between heat and work, for Maxwell, the entropy change associated with a process will not be an intrinsic property of the process-though, one might add, because of the vast gulf in scales between the macroscopic and the level of individual molecules, for macroscopic phenomena the concepts of heat and work will be sufficiently unambiguous to admit of unproblematic application.7 If it is a revised version of the second law of thermodynamics that we aim to recover from statistical mechanics, one according to which the processes declared impossible by the original version of the second law are judged improbable, then, it seems, there will be no avoiding the use of probabilistic concepts in statistical mechanics. This renders questions about the status of probabilities in statistical mechanics central to the interpretation of the theory. Probabilities play somewhat different roles in Boltzmannian and Gibbsian approaches to statistical mechanics. Both make use of the apparatus of phase space and Hamiltonian dynamics. In the next section, this apparatus will be briefly reviewed. demon is first described in a letter dated December 11, 1867, from Maxwell to P.G. Tait (Knott, 1911, 213–214), and makes its first public appearance in Maxwell's Theory of Heat (1871), in a section entitled, "Limitation of the Second Law of Thermodynamics." 7For further discussion of the Maxwellian view of thermodynamics and statistical mechanics, see Myrvold (2011). 10 5 Basic concepts of Hamiltonian dynamics The Hamiltonian formulation of classical mechanics has turned out to be a useful setting for classical statistical mechanics. Consider a system of N degrees of freedom, represented by coordinates {q1, ..., qN}. These might, for example, be the 3n position coordinates of n point particles; they might also include angle variables or other parameters. With each generalized coordinate qi is associated a conjugate momentum pi. Since the Newtonian equations of motion are second-order in the time derivative, to specify a solution it does not suffice to specify the values of coordinates at a given time. We can, however, specify a solution by specifying values of the coordinates and their rates of change, or, equivalently, by specifying the coordinates and momenta. The 2N -dimensional space whose points are given by specifying the coordinates and momenta of a system with N degrees of freedom is called the phase space of the system. The dynamics of the system are encoded in a function on phase space called the Hamiltonian, which, for the systems with which we will be concerned, is simply the total energy of the system, expressed in terms of generalized coordinates {q1, ..., qN} and their conjugate momenta {p1, ..., pN}. The dynamically possible trajectories through phase space are those that satisfy Hamilton's equations of motion, qi = ∂H ∂pi ṗi = − ∂H ∂qi . (4) These equations define a flow on phase space; there is a function Tt that maps the phase space into itself, such that, if x is the phase point at some time t0, Ttx is the phase point at time t0 + t. The phase space volume of a subset A of phase space is given by m(A) = ∫ A dq1...dqN dp1...dpN . (5) Note that this is defined in terms of canonical coordinates and momenta. It is invariant under canonical transformations, that is, coordinate transformations that preserve the equations of motion (4), but not under arbitrary coordinate transformations. In particular, it makes a difference whether we use momenta or velocities to parameterize the space. For example, if we consider a system confined to a finite volume that contain two molecules of different masses, then the set of all states in which the more massive molecule 11 has its velocity within certain limits will have larger phase space volume than the set of states in which the less massive molecule has its velocity within the same limits, although, on the measure corresponding to (5) with velocities in place of momenta, these sets would have equal measure. For any subset A of a phase space Γ, let Tt(A) be the set of points that evolve, in time t, from points in A. Tt(A) = {Tt x | x ∈ A}. (6) It is easy to show that phase space volume is preserved under the dynamical evolution (4), m(Tt(A)) = m(A). (7) A probability distribution P0 over the state of the system at time t0, together with the phase-space flow Tt, determines a probability distribution Pt for any other time t0 + t: Pt(x ∈ A) = P0(x ∈ T−1t (A)). (8) If the probability distribution for the state at time t is represented by a density function ρ(q,p, t), this will obey Liouville's equation: ∂ρ ∂t + N∑ i=1 ( ∂ρ ∂qi ∂H ∂pi − ∂ρ ∂pi ∂H ∂qi ) = 0. (9) It follows from Liouville's equation that any probability distribution given by a density function that is a function of the Hamiltonian will be a stationary distribution. 6 Boltzmannian Statistical Mechanics 6.1 Entropy and probability As already mentioned, in 1872 Boltzmann proved (with implicit assumption of the Stosszahlansatz ) that molecular collisions in a gas would lead to the Maxwell-Boltzmann distribution of velocities. The proof proceeded by defining a quantity that Boltzmann calledH and showing that it tends to decrease. For an ideal gas, at least, the negative of H is related to the thermodynamic entropy. Later (1877b), he showed that there is a relation between H and 12 phase-space volume, suggesting that the entropy of a macrostate is related to its phase-space volume. On a probability measure that assigns probabilities to regions of phase space that are proportional to their phase-space volume, entropy is then connected with probability. In this section we follow the procedure of Boltzmann (1877b, 1995), which is summarized in Ehrenfest and Ehrenfest (1912). For simplicity, we consider a system that consists of a large number N of identical molecules, each with r degrees of freedom (the generalization to systems consisting of several types of molecules is straightforward). Let μ be the 2r-dimensional phase space of an individual molecule, and let Γ = μN be the 2rN -dimensional phase space of the entire system of N molecules. We will assume that we need only consider a finite region of the system's phase space. It might, for example, be a gas confined to a box, with energy known to lie within a small interval [E,E + δE]. For each molecule, there will be an accessible region of its phase space, consisting of states consistent with the constraints on the system as a whole (every molecule will have its position in the box, and no molecule can have an energy greater than the energy of the whole system). Partition the accessible region of μ into small regions {ωi, i = 1, ...,m} of equal phase-space volume [ω], corresponding to small intervals of values of each of the coordinates and momenta. Suppose that the macrostate of the system depends only on the number of molecules whose phase-point lies in each region ωi. 8 Let {ni, i = 1, ...,m} be these occupation numbers, that is, a specification, for each ωi, of the number of molecules whose state lies in that region; such a specification is called a state distribution. For each state-distribution Z there is a corresponding subset ΓZ of Γ, consisting of phase points that yield that state distribution (such a region is called, by the Ehrenfests, a "Z-star"). Define a function H of state-distributions, H(Z) = m∑ i=1 ni log ni. (10) This H is the quantity that Boltzmann had argued, in 1872, would be de8This is a nontrivial assumption, valid for an ideal gas, but not for systems for which intermolecular potentials make a nonnegligible contribution to the total energy. For such systems, the total energy is not a function only of occupation-numbers of a partition of the single-molecule phase space, but also on the distribution of pairs of molecules in the two-molecule phase space. See Jaynes (1965, 1971) for discussion. Jaynes' essential point is correct, though it is marred by his taking (13), rather than its generalization (14), as the definition of the Boltzmann entropy. 13 creased by collisions between molecules in a gas, until it reached its minimum possible value. For large N , we have a relation between H(Z) and the phasespace volume of the Z-star ΓZ . log (m(ΓZ)) ≈ −H(Z) + C, (11) where C is a constant that depends on N and on the size of the cells in our chosen partition of the molecular phase-space μ. The relation (11) reveals the significance of H as an indication (up to an arbitrary constant) of the volume of phase space occupied by a statedescription. Moreover, if we take Zmax to be the state description that minimizes H (that is, maximizes −H), subject to the imposed constraints, then we find that, for an ideal gas, the quantity SB = −kH(Zmax) (12) is equal, up to an additive constant, to the thermodynamic entropy. This suggests a construal of entropy in terms of phase space volume. For any phase point x, let Z(x) be the Z-star containing x, and define SB(x) = k log[m(Z(x))]. (13) One can generalize this to situations in which the macrostate is not a function only of occupation numbers of regions of the single-particle space μ.9 Suppose the macrostate of the system is defined by the values of a small number of functions {X1, ..., Xk}. Partition the accessible phase space Γ into regions corresponding to small intervals of values of these macrovariables; each such region consists of points that, for practical purposes, share the same values of the macrovariables. Then the entropy assigned to a point x is given by SB(x) = k log[m(M(x))], (14) where M(x) is the macrostate containing x. This gives an appearance of assigning an entropy that is a property of the physical state of the system alone. But note that the value of the Boltzmann entropy depends, not only on the phase point x, but also on the macrovariables chosen to define macrostates (presumably, these are the ones that we 9It is this generalization that is referred to in current presentations of the Boltzmannian approach to statistical mechanics. See, e.g., Lebowitz (1993, 1999); Goldstein (2001). 14 are able to measure), and a partition of the macrovariables fine enough that differences within a set are regarded as negligible; this, presumably, has to do with the precision with which we can measure the macrovariables. Given a partition of the accessible phase space into macrostates, we identify the equilibrium macrostate with the one that has largest phase space volume. The ratio of this volume to the volume of all other macrostates will be of order 10N , where N is the number of molecules. If we identify macroscopic systems as those containing a number of molecules roughly on the order of Avogadro's number-that is, on the order of 1023- the equilibrium macrostate will have vastly larger phase-space volume than the rest of the accessible region of phase space. 6.2 Explaining entropy increase These considerations give intuitive content to the H-theorem. The move from a non-equilibrium macrostate to the equilibrium macrostate is a move from a region that occupies a vanishingly small volume of the accessible phase space to a region that occupies most (as measured by phase-space volume) of the accessible region of phase space. Can such considerations lead to a conclusion that, for a macroscopic system in a non-equilibrium macrostate, the system will, with overwhelming probability, relax to equilibrium? Note that, even if we take the uniform measure on phase space to yield a probability measure, the observation that the equilibrium macrostate dominates the phase space does not suffice for the conclusion. What is required is that, for any non-equilibrium macrostate M , most of the states in M are ones that evolve into the equilibrium macrostate. This will be the case if the dynamics are ergodic (see §8.2.1), though ergodicity is not required for the conclusion to go through. Sheldon Goldstein argues that we should expect most phase points in any nonequilibrium macrostate to move into the equilibrium state, on the grounds that [f]or a nonequilibrium phase point X of energy E, the Hamiltonian dynamics governing the motion Xt arising from X would have to be ridiculously special to avoid reasonably quickly carrying Xt into Γeq and keeping it there an extremely long time - unless, of course, X itself were ridiculously special (Goldstein, 15 2001, 43) Note, also, that the argument, though it makes reference to the uniform measure on phase space, does not depend sensitively on this measure being regarded as the one we use to judge some initial conditions improbable or "ridiculously special." What is required is that, whatever measure we use to judge probabilities, it agrees with the uniform measure on which sets have small probability. The argument, as it stands, is symmetric under time-reversal. It supports equally well the conclusion that, with the exception of ridiculously special states, the states in a non-equilibrium macrostate are those that evolved from an equilibrium macrostate a short time before. Yet, if we run across, say, a thermos bottle that happens to contain warm water and some ice cubes, we don't conclude that this condition probably arose from a state of uniform temperature a short while ago. This leads us to ask what grounds we have for regarding the exceptional states, that give rise to antithermodynamic behaviour, as ridiculously special, with an attendant inference to ridiculously improbable. Indiscriminate application of this sort of reasoning would lead one to regard all out-of-equilibrium states as ridiculously special. Yet systems that are far from thermodynamic equilibrium are not (apparently) rare; they are seemingly ubiquitous. Our experience hardly lends support to the claim that out-of-equilibrium systems are atypical! Of course, it is possible that our experience is misleading. One can imagine scenarios on which what we see is not even close to a fair sample of all that there is, and everything we see is atypical indeed. One such scenario is the Boltzmann-Schuetz cosmology, on which the Universe consists of a vast sea of matter whose overall state is thermal equilibrium, with occasional fluctuations here and there away from equilibrium (Boltzmann 1895, Boltzmann 1995, §90). Though they would be mind-bogglingly rare, there would also be low-entropy regions as large as the observable universe. On such a scenario, the states we see around us would not be typical states, as the very existence of living, experiencing beings requires low-entropy matter. One can, without contradiction, maintain that features that are ubiquitous in our experience are rare in the universe. There is a consequence of this cosmology, however, that Boltzmann seems not to have noticed. On such a scenario, the vast majority of occurrences of a given non-maximal level of entropy would be near a local entropy minimum, 16 and so one should regard it as overwhelmingly probable that, even given our current experience, entropy increases towards the past as well as the future, and everything that seems to be a record of a lower entropy past is itself the product of a random fluctuation. Moreover, on such a scenario you should take yourself to be whatever the minimal physical system is that is capable of supporting experiences like yours, and you should regard your apparent experiences of being surrounded by an abundance of low-entropy matter as illusory. That is, you should take yourself to be what has been called a "Boltzmann brain."10 This is a logically possible scenario. But not only does it involve rejecting judgments of what is typical that are based on experience (which tells us that out-of-equilibrium systems are ubiquitous), it even goes so far as to lead us to reject everything we experience as illusory. Empirical evidence does not support this cosmology. Yet it is physics that brought us to these considerations, physics based on empirical evidence that the world is to be described, at least approximately, as a large number of molecules evolving according to Hamiltonian dynamics. A theory that tells us that the experiments on which it is founded are illusory undermines its own empirical base. The conclusion to be drawn is that, whatever judgments may be warranted about probabilities of states of things, they are not to be based on considerations of phase-space volume alone. 7 Gibbsian statistical mechanics The Gibbsian approach involves consideration of probability measures on the phase space of a system. Gibbs thought of probability in frequentist terms, and accordingly enjoined his readers to imagine a great number of independent systems of the same type, all with the same macroscopic properties, but different microstates. Thus, he referred to ensembles of systems, and thought of the probability assigned to a region A of phase space as closely approximating, for a sufficiently large ensemble of similarly-prepared systems, the fraction of systems in the ensemble whose microstate is in A. The goal of statistical mechanics is to identify properties of mechanical systems that are analogues of thermodynamic quantities, in the sense that 10The term is due to Andreas Albrecht. It first appears in print in Albrecht and Sorbo (2004). 17 one can demonstrate, on the basis of the laws of mechanics and appropriate probabilistic assumptions, that, with high probability, to a high degree of approximation these properties stand in relations analogous to those of thermodynamics. According to Gibbs, A very little study of the statistical properties of conservative systems of a finite number of degrees of freedom is sufficient to make it appear, more or less distinctly, that the general laws of thermodynamics are the limit toward which the exact laws of such systems approximate, when their number of degrees of freedom is indefinitely increased (Gibbs, 1902, 166). Gibbs gave names to ensembles of particular interest: the microcanonical, the canonical, and the grand canonical. The microcanonical ensemble is meant to be appropriate for an isolated system in equilibrium whose energy is known. The canonical ensemble is appropriate for a system in thermal contact with a heat bath of fixed temperature, which can exchange energy but not material with its environment, so that it contains a fixed number of molecules. In a grand ensemble, the number of molecules is not held fixed, as there might, for example, be chemical reactions taking place. A grand canonical ensemble is a grand ensemble in thermal contact with a heat bath. Though Gibbs spoke of ensembles, in keeping with his frequentism about probability, in what follows we will speak of probability distributions, without commitment as to whether these are to be thought of in frequentist terms. For our purposes, we need only consider in detail the microcanonical and canonical distributions, as the key conceptual issues associated with Gibbsian equilibrium probability measures arise already with them. The reader should be aware, however, that the scope of statistical mechanics is not limited to considerations of systems with a fixed number of degrees of freedom. 7.1 Microcanonical distributions Consider a system whose total energy is known to lie within a small interval [E,E+δE]. Suppose also that the system is confined to a finite phase volume within this energy shell (the system might, for example, be a gas confined to a box of finite volume). We define a phase space measure that is uniform, in phase space variables, within the accessible region of the energy shell, and zero outside of it. Since the density function is a function of energy alone, this is a stationary distribution. 18 If Γ is a 2N -dimensional phase space of a system with N degrees of freedom, the subset ΓE of all points having an energy E is a 2N −1-dimensional surface within Γ. We can define, as a limiting case, a distribution on this surface, which will be the projection onto the energy surface of the uniform distribution in the energy shell. 7.2 Canonical distributions A canonical distribution is one given by a density function that takes the form ρ(x) = Z−1e−βH(x) (15) for x in the accessible region of the system's phase space. Z is a normalization constant satisfying Z = ∫ e−βH(x)dx, (16) where the integral is taken over the accessible region of phase space. Z is a function of the parameter β, and any external parameters on which the accessible region of phase space or the Hamiltonian depend. It is known as the partition function. Suppose we have two systems S1, S2 that are weakly coupled, so that the total Hamiltonian is approximately the sum of the Hamiltonians of the two systems. Suppose that the two systems initially are characterized by canonical distributions with parameters β1 and β2, respectively. Then the joint distribution will be an approximately stationary one if and only if β1 = β2. This, Gibbs argued, suggests that the canonical distribution is appropriate for representing a system in thermal equilibrium with a heat bath, with a temperature that is a function of β.11 Considerations of the canonical ensemble applied to an ideal gas lead to the identification β = 1 kT , (17) where T is the absolute temperature and k is Boltzmann's constant. 11As Gibbs is careful to point out, this does not amount to a proof, as there are other distributions that share this property (Gibbs, 1902, 35–42). 19 7.3 The Gibbs Entropy Gibbs argued that, for systems for which the canonical distribution is appropriate, the quantity, SG = −k〈log ρ〉 = −k ∫ ρ(x) log ρ(x) dx (18) behaves like the thermodynamic entropy (see Gibbs 1902, 43–45 and Ch. XIV). Though Gibbs' argument is restricted to situations for which the canonical distribution is appropriate, we can consider the quantity SG[ρ] = −k ∫ ρ(x) log ρ(x)dx, (19) for other distributions. SG[ρ] is, in some sense, a measure of how "spread out" the probability distribution is.12 This quantity has come to be known as the Gibbs entropy of the probability distribution given by the density function ρ. For any ρ, SG[ρ] is conserved under Hamiltonian flow. 7.3.1 Gibbs entropy and Boltzmann entropy compared It can be shown that the standard deviation of energy ∆E = √ 〈E2〉 − 〈E〉2 (20) yielded by a canonical distribution will, for systems of very many degrees of freedom, be small compared to the expectation value of energy, ∆E 〈E〉 ∼ 1√ N . (21) Recall that, for macroscopic systems, N is on the order of Avogadro's number, that is, on the order of 1023, so the deviation in energy is very small 12For a finite probability space, to the atoms of which are assigned probabilities {p1, p2, ..., pn}, the Gibbs entropy becomes SG = −k n∑ i=1 pi log pi, which is the quantity that Shannon (1948) named the entropy of the probability assignment {p1, p2, ..., pn}, and for which he used the symbol H, in analogy with Boltzmann's H. 20 indeed. The energy is almost certain to depart only negligibly from its expectation value, and so the canonical distribution can be replaced, for the purpose of calculating expectation values of thermodynamic quantities, with a microcanonical distribution on the energy surface corresponding to the expectation value of energy. Moreover, most of this energy surface will be occupied by the equilibrium macrostate, and there is little difference between calculating the phase-space volume of the energy surface and the volume of its largest macrostate. Thus, for systems in equilibrium and macroscopically many degrees of freedom, the Boltzmann entropy and the Gibbs entropy will be approximately equal, up to a constant, and, crucially, will exhibit the same dependence on external parameters. Suppose we extend the identification of (19) as entropy for systems other than those in thermal contact with a heat bath. We might even extend this identification to non-equilibrium situations, for which thermodynamic entropy is undefined. Then, because of the measure-preserving property of Hamiltonian flow on phase space, for an isolated system, SG will not increase with time. This makes it a poor candidate for tracking entropy changes in a process of relaxation to equilibrium. However, as suggested by Gibbs (1902, 148–151), we can also define a coarse-grained entropy by partitioning the phase space Γ into small regions of equal volume, and replacing the probability distribution over microstates by one that is uniform over elements of the partition. The idea is that, if the elements of the partition are smaller than our ability to discriminate between microstates, this smeared probability distribution will yield virtually the same probabilities for outcomes of feasible measurements as the fine-grained distribution. The Gibbs entropy associated with this smeared probability distribution can increase with time. Recall that the definition of the Boltzmann entropy also requires a coarsegraining of the phase space of the system. The conceptual differences between Boltzmann entropy and coarse-grained Gibbs entropy are not great. 8 Justifying choice of equilibrium measures The microcanonical distribution is uniform, in phase space variables, within a small energy shell. One might be tempted to think that this is mandated by a straightforward application of the Principle of Indifference, and, indeed, some authors have taken some version of the Principle of Indifference as a 21 fundamental postulate of statistical mechanics.13 We should recall, however, the familiar fact that any application of a Principle of Indifference requires some judgment about which possibilities are equiprobable. In the case of a continuum of possibilities, as we have in classical statistical mechanics, an injunction to adopt a uniform probability measure requires specification of which variables the distribution is to be uniform in. A distribution uniform in canonical phase space variables will not be uniform with respect to some other parameterization of the state space of the system. Even if we accept the authority of the Principle of Indifference, we ought to ask, why these variables, rather than some other parameters? A further problem with adopting a Principle of Indifference in statistical mechanics is that there seems to be no reason for restricting its use to systems in equilibrium. One might be led by such a principle to adopt a probability distribution that is as uniform as possible, subject to compatibility with the current macrostate, even for systems that are far from equilibrium. But, as emphasized by Albert (2000), such a procedure would lead to disastrous retrodictions; a probability distribution of this sort would ascribe high probability to entropy increase both to the future and to the past of the current moment. Part of the answer to the question of justifying the choice of measure lies in the fact that we are concerned with equilibrium measures. On the Gibbsian approach, thermal equilibrium is not to be thought of as a static state; it is one on which the microstate is constantly changing and the macrostate, though approximately constant most of the time, is subject to frequent tiny fluctuations and much rarer large ones. An ensemble of systems, however, should not exhibit any tendency to change overall, and this means that the equilibrium distributions should be stationary distributions. As we have seen, it follows from the Liouville equation that, for a conservative system, any distribution given by a density function that is a function of the energy is a stationary distribution. It is thus easy to see that the microcanonical distribution is a stationary one. The question arises whether there might be other stationary distributions that are plausible candidates for an equilibrium probability distribution. 13See e.g., Jackson (1968, 8, 83) and Carroll (2010, 166–169). E. T. Jayne's Principle of Maximum Entropy (Jaynes, 1957a,b) is a version of the Principle of Indifference. 22 8.1 The hypothesis of uniform a priori probabilities In an influential text book published in 1938, Tolman introduced what he called "the fundamental hypothesis" of "equal a priori probabilities for equal regions in the phase space." Although we shall endeavour to show the reasonableness of this hypothesis, it must nevertheless be regarded as a postulate which can ultimately be justified only by the correspondence between the conclusions which it permits and the regularities in the behaviour of actual systems which are empirically found (Tolman, 1938, 59). Tolman argues for the reasonableness of this postulate on the basis of Liouville's theorem, which shows that a distribution uniform in phase space is a stationary distribution; this shows that "the principles of mechanics do not themselves include any tendency for phase points to concentrate in particular regions of the phase space." Under the circumstances we then have no justification for proceeding in any manner other than that of assigning equal probabilities for a system to be in different equal regions of the phase space that correspond, to the same degree, with what knowledge we do have as to the actual state of the system. And, as already intimated, we shall, of course, find that the results which can then be calculated as to the properties and behaviour of systems do agree with empirical findings (Tolman, 1938, 61). This is reminiscent of an invocation of a Principle of Indifference, albeit not an incautious one that ignores the necessity of a choice of variables over which to impose uniformity. It should be noted, however, that, for Tolman, the postulate is ultimately to be justified by empirical evidence. 8.2 Probabilities from dynamics 8.2.1 Approaches based on ergodic theory Boltzmann conjectured that The great irregularity of the thermal motion and the multitude of forces that act on a body make it probable that its atoms, 23 due to the motion that we call heat, traverse all positions and velocities which are compatible with the principle of [conservation of] energy (quoted in Uffink 2007, 40). This (or rather, a variant of it) has come to be known as the ergodic hypothesis. As stated, it cannot be correct, as the trajectory is a one-dimensional continuous curve and so cannot fill a space of more than one dimension. But it can be true that almost all trajectories eventually enter every open neighbourhood of every point on the energy surface. Boltzmann argued, on the basis of the ergodic hypothesis, that the long-run fraction of time that a system spends in a given subset of the energy surface is given by the measure that Gibbs was to call microcanonical. Given a Hamiltonian dynamical system, and an initial point x0, we can define, for any measurable set A such that the requisite limit exists, the quantity 〈A, x0〉time = lim T→∞ 1 T ∫ T 0 χA(Tt(x0)) dt, (22) where χA is the indicator function for A, χA(x) = { 1, x ∈ A 0, x /∈ A. (23) 〈A, x0〉time, provided it exists, is the fraction of time, in the long run, that a trajectory starting at the point x0 spends in the set A. A dynamical system is said to be ergodic if and only if, for any set A of positive measure, the set of initial points that never enter A has zero measure. It is easily shown that this condition is equivalent to metric transitivity : a dynamical system is metrically transitive iff, for any partition of Γ into disjoint subsets A1, A2 such that, for all t, Tt(A1) ⊆ A1 and Tt(A2) ⊆ A2, either m(A1) = 0 or m(A2) = 0. Birkhoff (1931a,b) proved that, for any measure-preserving dynamical system, and any measurable set A, 1. The limit 〈A, x0〉time = lim T→∞ 1 T ∫ T 0 χA(Tt(x0)) dt (24) exists for almost all points x0. (That is, if X is the set of points for which this limit doesn't exist, m(X) = 0.) 24 2. If the dynamical system is ergodic, then 〈A, x0〉time = m(A). (25) for almost all x0. Consider an ergodic system that has been permitted to evolve in isolation for a long time. If we select a random time to look at it, then the probability that the system is in a subset A of ΓE will be given by 〈A, x0〉time, which, by Birkhoff's ergodic theorem, is equal to the measure ascribed to A by the microcanonical measure. Is this a justification for taking the microcanonical measure to be the measure that yields the correct probabilities for an isolated system? Two reservations arise. The first has to do with whether actual systems of interest have ergodic dynamics. Proving this turns out to be very difficult even for some very simple systems. Moreover, there are systems, namely, those to which the KAM theorem applies, that are provably not ergodic.14 The second is the use of the long-term time average. The picture invoked above, of a system isolated for a very long time and observed at a random time, does not fit neatly with laboratory procedures. One argument that has been given for considering the long-term time average is as follows.15 Measurements of thermodynamic variables such as, say, temperature, are not instantaneous, but have a duration which, though short on human time scales, are long on the time scales of molecular evolution. What we measure, then, is in effect a time-average over a time period that counts as a very long time period on the relevant scale. This rationale is problematic. The time scales of measurement, though long, are not long enough that the average over them necessarily approximates the limit in (22); as Sklar (1993, 176) points out, if they were, then the only measured values we would have for thermodynamic quantities would be equilibrium values. This, as Sklar puts it, is "patently false"; we are, in fact, able to track the approach to equilibrium by measuring changes in thermodynamic variables. As mentioned above, if we are to ask for a probability distribution appropriate to thermodynamic equilibrium, the distribution should be a stationary 14See Berkovitz et al. (2006) for discussion of the applicability of ergodic theory. 15Adapted from Khinchin (1949, 44-45). 25 distribution. The microcanonical distribution is a stationary distribution on ΓE. If the system is ergodic, then it is the only stationary distribution among those that assign probability zero to the same sets that it does. For a justification of the use of the microcanonical distribution along these lines, see Malament and Zabell (1980). 8.2.2 Almost-objective probabilities There is, in the mathematical literature on probability, a family of techniques that is known (somewhat misleadingly) as "the method of arbitrary functions." The idea is that, for certain systems, a wide range of probability distributions will be taken, via the dynamics of the system, into distributions that yield approximately the same probabilities for some statements about the system, because small uncertainties about initial conditions become larger uncertainties about macroscopic variables at a later time.16 It is plausible, at least, that the dynamics of the sorts of systems to which we successfully apply statistical mechanics exhibit the requisite sort of forgetting of initial conditions. Consider, for example, an isolated system that is initially out of equilibrium (it might, for example, be a cup of hot water with an ice cube in it). It is left alone to relax to equilibrium. Once it has done so, then, it seems, all trace of its former state has been lost, or rather, buried so deeply in the details of the system's microstate that no feasible measurement would be informative about it. For systems of this sort, a wide class of probability distributions over initial conditions evolve, via Liouville's equation, into distributions that, as far as feasible measurements are concerned, yield probabilities that are indistinguishable from those yielded by the equilibrium distribution. We need not restrict ourselves to states of thermodynamic equilibrium. If we open a thermos bottle and find in it half-melted ice cubes in lukewarm water, it is plausible that no feasible measurement on the system will determine whether the system was prepared a few minutes ago with only a little less ice, or an hour ago with boiling water and a lot of ice. If this is right, then again, a wide variety of probability distributions over initial conditions will evolve into ones that yield virtually the same probabilities for results of 16The method of arbitrary functions was pioneered by von Kries (1886) and Poincaré (1912), and elaborated by a number of mathematicians, notably Hopf (1934, 1936). For a systematic overview of mathematical results, see Engel (1992); for the history, see von Plato (1983). See Myrvold (2012a,b) and Frigg (2014) for examples and discussion. 26 feasible measurements. Ideas of this sort have recently drawn the attention of philosophers; see Strevens (2003, 2011), Rosenthal (2010, 2012), Abrams (2012), and Myrvold (2012a,b) for an array of recent approaches in which the method of arbitrary functions plays a role. The method does not generate probabilities out of nothing; rather, the key idea is that a probability distribution over initial conditions is transformed, via the dynamical evolution of the system, into a probability distribution over conditions at a later time. Hence any use of the method must address the question: what is the status of the input distributions? Poincaré describes them as "conventions," which, it must be admitted, is less than helpful. Strevens (2003) is noncommittal on the interpretation of the input probabilities, whereas Strevens (2011) and Abrams (2012) opt for distributions based on actual frequencies. Savage (1973) suggested that the input probabilities be given a subjectivist interpretation. For the right sorts of dynamics, large differences in subjective probabilities will lead to probability distributions that agree closely on outcomes of feasible measurements; hence the output probabilities might be called "almost objective" probabilities. This suggestion is developed in Myrvold (2012a,b). The conception combines epistemic and physical considerations. The ingredients that go into the characterization of such probabilities are: • a class C of credence-functions about states of affairs at time t0 that is the class of credences that a reasonable agent could have, in light of information that is accessible to the agent, • a dynamical map Tt that maps states at time t0 to states at time t1 = t0 + t, inducing a map of probability distributions over states at time t0 to distributions over states at t1, • a set A of propositions about states of affairs at time t1, to which probabilities are to be assigned, • a tolerance threshold ε for differences in probabilities below which we regard two probabilities as essentially the same. Given these ingredients, we will say that a proposition A ∈ A has an almostobjective probability, or epistemic chance, if all probability functions in C 27 yield, when evolved via Tt to t1, essentially the same probability for A. That is, A has epistemic chance λ if, for all P0 ∈ C, |Pt(A)− λ| < ε. This concept includes an epistemic aspect, as an essential ingredient is the class C of credence-functions that represent reasonable degrees of belief for agents with our limitations.17 This restriction would be eliminable if, for the events of interest, the dynamical map Tt yielded the same probabilities for absolutely all input measures, but this cannot be. Physics also plays a key role; the value of an epistemic chance, if it exists, is largely a matter of the dynamics. Those who hold that epistemic considerations ought not to be brought into physics at all will not be happy with construing statistical mechanical probabilities in this way. However, on the Maxwellian view of thermodynamics and statistical mechanics, on which the fundamental concepts of thermodynamics have to do with our ability to keep track of and manipulate molecular motions, this sort of blending of epistemic and physical consideration is just what one would expect to find in statistical mechanics. 8.2.3 Probabilities from quantum mechanics? So far, we have been considering classical statistical mechanics. However, our world is not classical; it is quantum. Most writers on the foundations of statistical mechanics have assumed, implicitly or explicitly, that the conceptual problems of classical statistical mechanics are to be solved in classical terms; classical statistical mechanics should be able to stand on its own two feet, as an autonomous science, albeit one that gets certain facts about the world, such as the specific heats of non-monatomic gases, wrong. One argument for this might be that we successfully apply statistical mechanics to systems for which quantum effects are negligible. This is questionable. The classical trajectories through phase space that exhibit antithermodynamic behaviour are unstable under random perturbations. Albrecht and Phillips (2014) estimate the relevance of quantum uncertainty to stock examples such as coin flips and billiard-ball gases, and conclude that "all successful applications of probability to describe nature can be traced to quantum origins." As emphasized by Albert (2000, Ch. 7), if we consider isolated quantum systems, and assume the usual Schrödinger evolution to be valid at all 17Objective Bayesians would hold that this class is a singleton. 28 times, then this leaves us in pretty much the same conceptual situation as in classical mechanics. The dynamics governing the wave-function are reversible; for any state that exhibits the expected thermodynamic behaviour there is a state that exhibits anti-thermodynamic behaviour. Moreover, the von Neumann entropy-the quantum analog of the Gibbs entropy-is conserved under dynamical evolution. Considering nonisolated systems only pushes the problems further out; the state of the system of interest plus a sufficiently large environment can be treated as an isolated system; there will be states of this larger system that lead to antithermodynamic behaviour of the subsystem of interest. If, however, collapse of the wave-function is a genuinely chancy, dynamical process, then things are different.18 For any initial state, there will be objective probabilities for any subsequent evolution of the system. Albert (1994, 2000) has argued that these probabilities suffice to do the job required of them in statistical mechanics. This is indeed plausible, though we lack a rigorous proof. If this proposal is correct, we should expect that, on time scales expected of relaxation to equilibrium, the probability distribution yielded by the collapse dynamics approaches a distribution that is appropriately like the standard equilibrium distribution, where "'appropriately like" means that it yields approximately the same expectation values for measurable quantities. It is not to be expected that the equilibrium distribution be an exact limiting distribution for long time intervals. In fact, distributions that are stationary under the usual dynamics (quantum or classical) will not be strictly stationary under the stochastic evolution of dynamical collapse theories such as the GhirardiRimini-Weber theory (GRW) or Continuous Spontaneous Localization theory (CSL), as energy is not conserved in these theories. However, energy increase will be so small as to be under ordinary circumstances unobservable; Bassi and Ghirardi (2003, 310) estimate, for a macroscopic monatomic ideal gas, a temperature increase on the order of 10−15 Celsius degrees per year. Thus, it is possible for collapse dynamics to yield relaxation to something closely approximating a standard equilibrium distribution, followed by exceedingly slow warming. 18See Bassi and Ghirardi (2003) and Ghirardi (2011) for overviews of dynamical collapse theories. 29 9 Puzzles about equilibrium measures, and a resolution Do the standard equilibrium measures represent objective features of the physical world, or should they be thought of as degrees of belief that we ought to have about the microstate of the system, given knowledge of the parameters that define the system's thermodynamic state? On either account, we have a puzzle. They are said to be introduced on the basis of our incomplete knowledge of the state of a system. This suggests an epistemic reading. Nevertheless, we generate from them predictions about expectation values and fluctuations around these expectation values, predictions that can be tested by experiment. This suggests an ontic reading; we are not, when we are performing experimental tests of these predictions, probing the system to learn about our beliefs. We seem to require both an epistemic and an ontic reading, in an inconsistent way. Taken literally, the standard equilibrium measure, applied to an isolated system that has recently relaxed to equilibrium from a non-equilibrium macrostate, is problematic on either an epistemic or ontic reading, as it ascribes high probability to the system's having been in an equilibrium macrostate for an exceedingly long time. Since this was not the case, the measure does not reflect an objective chance distribution, and, since we know it was not the case, it does not represent our epistemic state. This puzzle is easily resolved if we observe that use of the standard measures does not require a commitment to their representing a correct probability measure over the state of the system, where "correct" might either mean either objective chance or a credence function that represents our state of knowledge about the system. Consider a system that is, at t0, in an equilibrium state with a known energy, subject to some constraint (say, a gas confined by a partition to one half of a box). The constraint is removed, and the system evolves, isolated, to a new equilibrium, at time t1. Suppose, now, we apply at t1 the microcanonical distribution appropriate to the new equilibrium state. This will not be the evolute of our initial probability distribution. However: unless there is some feasible measurement on the system that could be performed at t1 that will discriminate between the system's having been at t0 in the same equilibrium state it is in at t1, and the state it actually was in, the microcanonical distribution will yield virtually the same probabilities as the evolute of our initial probability distribution, and 30 we can use the microcanonical distribution for purposes of calculation, as a surrogate for the evolute of our initial probability distribution. It seems to be an empirical fact that, for many systems of interest, the current macrostate of the system is all that is relevant to predictions about future measurements.19 Since a range of macroscopically distinguishable initial conditions can evolve to the same macrostate, this means that, for such systems, a wide range of probability distributions over initial conditions will evolve into distributions that yield substantially the same probabilities for outcomes of future measurements, and conditions are ripe for the existence of epistemic chances, as outlined above. As we have characterized them, physical considerations as well as epistemic considerations come into the definition of epistemic chances, and, as has already been remarked, their values, if they exist, are largely a matter of dynamics. Though they have an epistemic aspect to them, we cannot ascertain their values by consulting our own cognitive states. An agent might have good reason to believe that a proposition has an epistemic chance, without knowing what its value is, either because she doesn't know the exact dynamics of the system (she might be uncertain, for example, whether a roulette wheel is biased), or because (and this is the condition we are in, for typical systems in statistical mechanics), the computational task of forward-evolving a reasonable credence function via the actual dynamics of the system would simply be beyond the computational resources available to her. This means that the values of epistemic chances are things that we can be uncertain about and can have credences about. Moreover, propositions about the values of epistemic chances can be put to experimental test. Consider some proposition A about a system, having to do with the result of some measurement subsequent to a time t1. Suppose that we have good reason to believe that there is a probability p∗ that represents the probability assigned to A by any probability distribution that results from forward evolving, via the actual dynamics of the system, some reasonable credence-function about the state of the system at time t0. We can entertain hypotheses of the form p∗ = p, for various values of p. Suppose that our conditional credences satisfy cr(A | p∗ = p) = p. (26) This is an analogue of the Principal Principle. Just as the Principal Principle turns frequency data into evidence about chances, this analogue turns 19Note that this is not true for retrodictions! 31 frequency data from repeated experiments into evidence on which we can update our credences about the value of p∗.20 Now consider the hypothesis, The value of p∗ is that given by the microcanonical distribution corresponding to the macrostate of the system at t1. Note that this hypothesis may be true, even if the microcanonical distribution is ruled out as a candidate for reasonable credence about the state of the system at t1. If it is true, this means that our knowledge of the state of the system at t0 has become irrelevant to the outcome of the measurement performed at t1. The hypothesis is testable, and its experimental verification will give us a justification for use of the microcanonical distribution to calculate probabilities that we assign to outcomes of the experiment. Tolman's fundamental hypothesis of equal a priori probabilities should be replaced with one that says, The correct probabilities for results of feasible future experiments are those that are yielded by a probability distribution that is uniform over the region of phase space corresponding to a system's macrostate. Construing the standard equilibrium distributions this way, as surrogates for more complicated distributions that result from time-evolved credence functions over earlier states of affairs, resolves the puzzles associated with them. With regards to the first puzzle, though these probabilities enter into our considerations because of incomplete knowledge of the state of a system, the value of an epistemic chance, if there is one, depends on the dynamics of the system, and is, moreover, the sort of thing that we can formulate testable hypotheses about. The Tolman-inspired approach, on which a posit about probabilities introduced on epistemic grounds is vindicated by experiment, begins to seem less mysterious. What is vindicated, however, is not Tolman's postulate, but the modified version above. Typically, there will be a temporal asymmetry in this sort of use of equilibrium distributions. Use of an equilibrium distribution to calculate probabilities for measurements at t1 will only be justified if our past knowledge of the state of the system has been washed out by the evolution of the system, 20See Myrvold (2012a,b) for further discussion. 32 and has become irrelevant for the purpose of anticipating the results of future measurements. Nothing can make our knowledge of the macrostate of the system at time t0 irrelevant for retrodictions about the state of the system at time t0 or before. The source of the asymmetry lies in asymmetry of epistemic access. We can have memories and records of past events, whereas for future events of the sort considered we can typically do no better than to use our knowledge of the current state of the system and evolve it forward. There is no reason for the class C invoked in the characterization of epistemic chances to be invariant under time-reversal, and, typically, it will not be. A common objection to the introduction of epistemic considerations into statistical mechanics is that our ignorance of the exact state of the world surely cannot explain why systems behave as they do.21 This is correct! The coffee in my cup does not cool because I am ignorant of its exact microstate. A system behaves as it does because of its dynamics, together with initial conditions. Explanations of relaxation to equilibrium will have to involve an argument that the dynamics, together with initial conditions of the right type, yields that behaviour, plus an explanation of why the sorts of physical processes that give rise to the sorts of systems considered don't produce initial conditions of the wrong type (or rather, don't reliably produce initial conditions of the wrong type). There is a connection, however, between the epistemic considerations we have invoked, and what would be required of an explanation of relaxation to equilibrium. The processes that are responsible for relaxation to equilibrium are also the processes that are responsible for knowledge about the system's past condition of non-equilibrium becoming useless to the agent. Thus, an explanation of relaxation to equilibrium is likely to provide also an explanation of washing out of the relevance to the future of knowledge about the past. Moreover, an explanation of why no process reliably produces initial conditions that lead to anti-thermodynamic behaviour would also explain the reasonableness of credences that attach vanishingly small credence to such conditions. Our judgments about what sorts of processes occur in nature and our judgments about what sorts of credences are reasonable for well-informed agents are closely linked; if there were processes that could reliably prepare systems in states that lead to antithermodynamic behaviour, then it would not be unreasonable for an agent to 21For a particularly vivid expression of this point, see Albert (2000, 54–65); see also Loewer (2001, 611). 33 attach non-negligible credence to the system having been prepared in such a state, and we would adjust our judgments about what are and are not reasonable credences accordingly. 10 Conclusion Quantum probabilities, viewed as objective chances, sidestep the abovementioned puzzles associated with statistical mechanical probabilities, which have to do with how to mesh our use of probability with deterministic, reversible, dynamics, as the dynamics of collapse theories are neither deterministic nor reversible. Construal of the probabilities in statistical mechanics as epistemic chances also resolves the puzzles associated with them. Moreover, the blending of epistemic and physical consideration employed in their definition is appropriate for statistical mechanics, if the goal is to recover thermodynamics viewed in a Maxwellian light. This is achieved without sacrificing the autonomy of classical statistical mechanics. 11 Acknowledgments This work was supported by the Social Sciences and Humanities Research Council of Canada. 34 References Abrams, M. (2012). Mechanistic probability. Synthese 187, 343–375. Albert, D. (2000). Time and Chance. Cambridge: Harvard University Press. Albert, D. Z. (1994). The foundations of quantum mechanics and the approach to thermodynamic equilibrium. The British Journal for the Philosophy of Science 45, 669–677. Albrecht, A. and D. Phillips (2014). Origin of probabilities and their application to the multiverse. arXiv:1212.0953v2[gr-qc]. Albrecht, A. and L. Sorbo (2004). Can the universe afford inflation? Physical Review D 70, 063528. Bassi, A. and G. C. Ghirardi (2003). Dynamical reduction models. Physics Reports 379, 257–426. Beisbart, C. and S. Hartmann (Eds.) (2011). Probabilities in Physics. Oxford: Oxford University Press. Berkovitz, J., R. Frigg, and F. Kronz (2006). The ergodic hierarchy, randomness and Hamiltonian chaos. Studies in History and Philosophy of Modern Physics 37, 661–691. Birkhoff, G. D. (1931a). Proof of a recurrence theorem for strongly transitive systems. Proceedings of the National Academy of Sciences 17, 650–655. Birkhoff, G. D. (1931b). Proof of the ergodic theorem. Proceedings of the National Academy of Sciences 17, 656–660. Boltzmann, L. (1872). Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen. Sitzungberichte der Kaiserlichen Akademie der Wissenschaften. Mathematisch-Naturwissenschaftliche Classe 66, 275–370. English translation, Boltzmann (1966). Boltzmann, L. (1877a). Bemerkungen über einige Probleme der mechanische Wärmetheorie. Sitzungberichte der Kaiserlichen Akademie der Wissenschaften. Mathematisch-Naturwissenschaftliche Classe 75, 62–100. Reprinted in Boltzmann (1909, 116–122). 35 Boltzmann, L. (1877b). Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung resp. dem Sätzen über das Wärmegleichgewicht. Sitzungberichte der Kaiserlichen Akademie der Wissenschaften. MathematischNaturwissenschaftliche Classe 76, 373–435. Reprinted in Boltzmann (1909, 164-223). Boltzmann, L. (1895). On certain questions of the theory of gases. Nature 51, 413–415. Boltzmann, L. ([1896, 1898] 1995). Lectures on Gas Theory. New York: Dover Publications. Boltzmann, L. (1909). Wissenschaftliche Abhandlung. Leipzig: J. A. Barth. Boltzmann, L. (1966). Further studies on the thermal equilibrium of gas molecules. In Brush (1966b, 88-175). English translation of Boltzmann (1872). Brown, H. R., W. Myrvold, and J. Uffink (2009). Boltzmann's H-theorem, its discontents, and the birth of statistical mechanics. Studies in History and Philosophy of Modern Physics 40, 174–191. Brush, S. G. (Ed.) (1966a). Kinetic Theory, Volume 1. The nature of gases and of heat. Oxford: Pergamon Press. Brush, S. G. (Ed.) (1966b). Kinetic Theory, Volume 2. Irreversible Processes. Oxford: Pergamon Press. Brush, S. G. (1976a). The Kind of Motion We Call Heat, Book 1. Amsterdam: North-Holland Publishing Company. Brush, S. G. (1976b). The Kind of Motion We Call Heat, Book 2. Amsterdam: North-Holland Publishing Company. Carroll, S. (2010). From Eternity to Here. New York: Dutton. Clausius, R. (1857). Über die Art der Bewegung, welche wir Wärme nennen. Annalen der Physik 100, 353–80. Clausius, R. ([1857] 1966). The nature of the motion which we call heat. In Brush (1966a, 111-134). Translation of Clausius (1857). 36 Ehrenfest, P. and T. Ehrenfest (1912). The Conceptual Foundations of the Statistical Approach in Mechanics. New York: Dover Publications. Engel, E. M. (1992). A Road to Randomess in Physical Systems. Berlin: Springer-Verlag. Frigg, R. (2014). Determinism and chance. This volume. Garber, E., S. G. Brush, and C. W. F. Everitt (Eds.) (1995). Maxwell on Heat and Statistical Mechanics: On "Avoiding All Personal Enquiries" of Molecules. Bethlehem, Pa: Lehigh University Press. Ghirardi, G. C. (2011). Collapse theories. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Winter 2011 ed.). http://plato.stanford. edu/archives/win2011/entries/qm-collapse/. Gibbs, J. W. (1875). On the equilibrium of heterogeneous substances. Transactions of the Connecticut Academy of Arts and Sciences 3, 108–248, 343– 524. Reprinted in Gibbs (1961, pp. 55-353). Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundation of Thermodynamics. New York: Charles Scribner's Sons. Gibbs, J. W. ([1906] 1961). The Scientific Papers of J. Willard Gibbs. New York: Dover Publications, Inc. Goldstein, S. (2001). Boltzmann's approach to statistical mechanics. In J. Bricmont, D. Dürr, M. Galavotti, G. Ghirardi, F. Petruccione, and N. Zanghı (Eds.), Chance in Physics, Number 574 in Lecture Notes in Physics, pp. 39–54. Berlin: Springer. Hacking, I. (1975). The Emergence of Probability. Cambridge: Cambridge University Press. Hájek, A. (1997). 'Mises redux'-redux: Fifteen arguments against finite frequentism. Erkenntnis 45, 209–227. Hájek, A. (2009). Fifteen arguments against hypothetical frequentism. Erkenntnis 70, 211–235. 37 Hopf, E. (1934). On causality, statistics, and probability. Journal of Mathematics and Physics 13, 51–102. Hopf, E. (1936). Über die Bedeutung der willkürlichen Funktionen für die Wahrscheinlichkeitstheorie. Jahresbericht des Deutschen MathematikerVereinigung 46, 179–195. Jackson, E. A. (1968). Equilibrium Statistical Mechanics. New York: Dover Publication, Inc. Jaynes, E. (1957a). Information theory and statistical mechanics, I. Physical Review 106, 620–630. Reprinted in Jaynes (1989, 7–16). Jaynes, E. (1957b). Information theory and statistical mechanics, II. Physical Review 108, 171–190. Reprinted in Jaynes (1989, 19–37). Jaynes, E. T. (1965). Gibbs vs Boltzmann entropies. American Journal of Physics 33, 391–398. Reprinted in Jaynes (1989, 79-88). Jaynes, E. T. (1971). Violation of Boltzmann's H theorem in real gases. Physical Review A 4, 747–750. Jaynes, E. T. (1989). Papers on Probability, Statistics, and Statistical Physics. Dordrecht: Kluwer Academic Publishers. Jeffrey, R. (1992). Mises redux. In Probability and the Art of Judgment, pp. 192–202. Cambridge: Cambridge University Press. Khinchin, A. I. (1949). Mathematical Foundations of Statistical Mechanics. New York: Dover Publications. Knott, C. G. (1911). Life and Scientific Work of Peter Guthrie Tait. Cambridge: Cambridge University Press. Krönig, A. (1856). Grundzüge einer Theorie der Gas. Annalen der Physik 175, 315–322. Lebowitz, J. L. (1993). Boltzmann's entropy and time's arrow. Physics Today 46 (September), 33–38. Lebowitz, J. L. (1999). Statistical mechanics: A selective review of two central issues. Reviews of Modern Physics 71, 346–357. 38 Loewer, B. (2001). Determinism and chance. Studies in History and Philosophy of Modern Physics 32, 609–620. Malament, D. B. and S. L. Zabell (1980). Why Gibbs phase averages work- the role of ergodic theory. Philosophy of Science 47, 339–349. Maxwell, J. C. (1867). On the dynamical theory of gases. Philosophical Transactions of the Royal Society 157, 49–88. Reprinted in Niven (1890, 26-78). Maxwell, J. C. (1871). Theory of Heat. London: Longmans, Green, and Co. Maxwell, J. C. (1878a). Diffusion. In Encyclopedia Britannica (Ninth ed.), Volume 7, pp. 214–221. Reprinted in Niven (1890, pp. 625–646). Maxwell, J. C. (1878b). Tait's "Thermodynamics". Nature 17, 257–259, 278–280. Reprinted in Niven (1890, pp. 660-671). Myrvold, W. C. (2011). Statistical mechanics and thermodynamics: A Maxwellian view. Studies in History and Philosophy of Modern Physics 42, 237–243. Myrvold, W. C. (2012a). Determistic laws and epistemic chances. In Y. Ben-Menahem and M. Hemmo (Eds.), Probability in Physics, pp. 73–85. Springer. Myrvold, W. C. (2012b). Probabilities in statistical mechanics: What are they? http://philsci-archive.pitt.edu/9236/. Niven, W. D. (Ed.) (1890). The Scientific Papers of James Clerk Maxwell, Volume Two. Cambridge: Cambridge University Press. Poincaré, H. (1912). Calcul des probabilités. Paris: Gauthier-Villars. Poisson, S.-D. (1837). Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile, précédées des règles générales du calcul des probabilités. Paris: Bachelier, Imprimeur-Libraire. Popper, K. R. (1957). The propensity interpretation of the calculus of probability, and the quantum theory. In S. Körner (Ed.), Observation and Interpretation: a symposium of philosophers and physicists, pp. 65–70. London: Butterworths. 39 Popper, K. R. (1959). The propensity interpretation of probability. The British Journal for the Philosophy of Science 37, 25–42. Rosenthal, J. (2010). The natural-range conception of probability. In G. Ernst and A. Hüttemann (Eds.), Time, Chance and Reduction: Philosophical Aspects of Statistical Mechanics, pp. 71–91. Cambridge: Cambridge University Press. Rosenthal, J. (2012). Probabilities as ratios of ranges in initial-state spaces. Journal of Logic, Language, and Information 21, 217–236. Savage, L. J. (1973). Probability in science: A personalistic account. In P. Suppes (Ed.), Logic Methodology, and Philosophy of Science IV, pp. 417–428. Amsterdam: North-Holland. Shannon, C. E. (1948). A mathematical theory of communication. The Bell Systems Technical Journal 27, 379–423, 623–656. Sklar, L. (1993). Physics and Chance. Cambridge University Press. Strevens, M. (2003). Bigger than Chaos: Understanding Complexity through Probability. Cambridge, MA: Harvard University Press. Strevens, M. (2011). Probability out of determinism. In Beisbart and Hartmann (2011, 339–364). Thomson, W. (1874). Kinetic theory of the dissipation of energy. Nature 9, 441–444. Tolman, R. C. (1938). The Principles of Statistical Mechanics. Oxford: Clarendon Press. Reprint, Dover Publications, 1979. Uffink, J. (2007). Compendium of the foundations of statistical physics. In J. Butterfield and J. Earman (Eds.), Handbook of the Philosophy of Science: Philosophy of Physics, pp. 924–1074. Amsterdam: North-Holland. Uffink, J. (2011). Subjective probability and statistical physics. In Beisbart and Hartmann (2011, 25–49). von Kries, J. (1886). Die Principien Der Wahrscheinlichkeitsrechnung: Eine Logische Untersuchung. Frieburg: Mohr. 40 von Plato, J. (1983). The method of arbitrary functions. The British Journal for the Philosophy of Science 34, 37–47. von Plato, J. (1994). Creating Modern Probability. Cambridge: Cambridge University Press.