1 Introduction

Thermodynamics and statistical mechanics coexist in a collaborative relationship within the envelope of thermal physics. In many presentations of the subject, particularly in undergraduate texts, it is heuristically advantageous to intermingle the macroscopic concepts of thermodynamics with the micro-picture provided by statistical mechanics. And it is, of course, self-evident that statistical mechanicsFootnote 1 needs the basic structure of thermodynamics with inter-theory connecting relationships defining the thermodynamic quantities like internal energy, temperature and entropy. On the other hand, there are some advantages, both aesthetic and mathematical, in producing an account of thermodynamics which makes no reference to the underlying microstructure of the system, as would seem to be one of the aims of (among others) the books of Giles [39] and Buchdahl [17] and the papers of Lieb and Yngvason.Footnote 2 For Buchdahl we have the first law implying the existence of the internal energy function U and Carathéodory’s [23] version of the second law yielding the entropy S and temperature T; for Lieb and Yngvason three sets of axioms accomplish the same task. This, together with an account of the nature of adiabatic processes (as described, for example, in [17, Chaps. 5, 6, 75, Sect. 2.1, 68, Sect. 2.1.1]) provides the basic framework into which the models of statistical mechanics are embedded.

This raises the question of how statistical mechanics and thermodynamics relate to each other. Attempts to answer this question run up against a problem. The neat labels ‘statistical mechanics’ and ‘thermodynamics’ mask the fact that neither theory is a monolithic bloc. Indeed, each has a complicated internal structure with several layers of different theoretical postulates and assumptions. So the question of how statistical mechanics and thermodynamics relate ought to be interpreted as the more complex question of (a) what the internal structure of each theory is and of (b) how the various parts of each theory relate to the various other parts of the other theory. The complexity of the internal structures of both theories, as well as the intricacy of their interrelations, seems to have been somewhat under-appreciated in the philosophical literature on the subject, and so the first aim of this paper is to present an in-depth analysis of the anatomy of both theories and the connections between their parts.Footnote 3

Fig. 1
figure 1

Schematic representation of the relationship between thermodynamics and statistical mechanics

Figure 1 provides a schematic advance summary of the analysis that we develop in this paper. It sees statistical mechanics and thermodynamics as parallel developments, each decomposed into separate levels representing the stages of theory-based development in which features are added to the system. The cross-interactions between the levels in the two columns contain interventions integral to this development. On the left are the levels for thermodynamics, as described in detail in Sect. 2. These levels are related to each other by adopting special assumptions, beginning at the bottom with basic thermodynamic theory (labelled \(\textsf {TD1}\)). Adding the extensivity assumption to this theory takes us to the next level, the density representation of thermodynamics (labelled \(\textsf {TD2}\)). Augmenting \(\textsf {TD2}\) with the notion of phase transitions and critical phenomena (PTCP) gives thermodynamics with PTCP (labelled \(\textsf {TD3}\)). Finally, supplementing \(\textsf {TD3}\) with a version of the Kadanoff scaling hypothesis leads us to thermodynamics with scaling theory (labelled \(\textsf {TD4}\)).

The parallel development for statistical mechanics is represented on the right of Fig. 1, as described in detail in Sect. 3. The picture here is a little more complicated, involving, as we explain in our discussion, three different paths. At the bottom is the fundamental theory, which we here take to be Gibbsian statistical mechanics (labelled \(\textsf {SM1}\)).Footnote 4 Assuming that the systems to which the theory is applied are large leads us to the next layer, large statistical mechanical systems (labelled \(\textsf {SM2}\)). This marks a branching point in the structure of the theory: three different additions can be made to \(\textsf {SM2}\), resulting in three different branches. Adding the thermodynamic limit to \(\textsf {SM2}\) leads to the statistical mechanics of infinitely large systems (labelled \(\textsf {SM3}\)). Adding renormalization group techniques to \(\textsf {SM2}\) leads to the renormalization group approach to statistical mechanics (labelled \(\textsf {SM4}\)). Finally, adding the analysis of phase transitions for finite systemsFootnote 5 to \(\textsf {SM2}\) leads to the statistical mechanics of finite-system phase transitions (labelled \(\textsf {SM5}\)).

It is our aim in this work to keep the developments of thermodynamics and statistical mechanics as separate as possible, in order to make visible the internal structure of each separate theory. However, as indicated above, on close examination it becomes evident that there are in fact some ‘messages’, both implicit and explicit, sent from statistical mechanics (FSM), that is to say from the microstructure, to thermodynamics, which provides the macrostructure. These are spelled out in FSM–1, FSM–2, FSM–3, FSM–4. In the other direction the connecting relationships from thermodynamics (FTD), labelled FTD–1, FTD–2, FTD–3, identify quantities in statistical mechanics with thermodynamic variables. As we shall see FSM–1 also plays a role in the connecting process and can be seen as in dialogue with FTD–3. The remaining interventions FSM–2, FSM–3, FSM–4, can be viewed as an aid to the clarification of a number of important issues. We discuss these links between elements of both theories in the appropriate places in Sects. 2 and 3.

Much of the recent interest in the relationship between thermodynamics and statistical mechanics has concentrated on PTCP. It is the second aim of this paper to revisit the issue of PTCP in the light of our analysis of the internal structure of the two theories and their interrelations. Doing so will lead us to some unexpected, and we think important, conclusions.

In the modern theory of critical phenomena, dating from the middle of the 1960s,Footnote 6 critical exponents, which classify the type of singular behaviour in the approach to a critical region, play an important role. In our development of thermodynamics in Sect. 2 scaling theory is the final destination with scaling laws relating these critical exponents. However, as already indicated and as described below, thermodynamics is a structured shell into which particular models are embedded, either by the assumption of a phenomenological form for the entropy function or from statistical mechanics. In the absence of such an embedding it is not possible to calculate values for critical exponents, nor to discuss universality. This is the idea [54] that all critical situationsFootnote 7 can be divided into universality classes, characterized by the values of their critical exponents and differentiated by a small number of properties of which the most important are the (physical) dimension d of the system and the symmetry group of the order parameter. The first, but not the second, of these plays an important role in our discussions,Footnote 8 in particular in the case of the Ising model, which we shall use as an illustrative example throughout this work. This, the most well-known and thoroughly investigated model in the statistical mechanics of lattice systems, is briefly described in Appendix 2. With the list of critical exponents given there for \(d=\,2\), \(d=\,3\) and \(d\ge 4\), it provides an example of the dependence of these exponents and hence the universality class on the dimension of the system. The dimension d is also of importance, in our discussion of scaling theory in Sect. 2.4, of finite-size scaling in Sect. 3.4.2 and of phenomenological renormalization in Sect. 3.4.3(c).

These observations concerning universality classes together with the inter-theory connecting relationships FSM–2, FSM–3, FSM–4, provide the impetus to investigate, and clarify a number of important issues relating to PTCP. These are (not necessarily in the order in which they arise in the discussion):

  1. (i)

    Are infinite systems really necessary in thermodynamics or statistical mechanics and:

    1. (a)

      If so, what for?

    2. (b)

      If they are, is this solely because extensivity is not exactly true in most cases in statistical mechanics?

    3. (c)

      Is the thermodynamic limit irrelevant to thermodynamics or has it already been implicitly applied?Footnote 9

    4. (d)

      Is the thermodynamic limit in statistical mechanics necessary for the implementation of the procedures of the renormalization group?

    5. (e)

      Is there a meaningful way to represent PTCP in finite systems?

  2. (ii)

    Given that, in thermodynamics, critical behaviour involves discontinuities in densities and singularities in response functions, is this necessarily still the case in statistical mechanics?

  3. (iii)

    Are the ideas of enrichment and substantiation helpful in describing the relationship between thermodynamics and statistical mechanics?

  4. (iv)

    Where do reduction and emergence feature in the accounts of the relationship between thermodynamics and statistical mechanics?

As indicated, in the title of this work and by the progression between levels in the statistical mechanical column in Fig. 1, we will discuss these issues with a special focus on large systems and infinite systems. In particular we shall address the question as to where realism is to be found, in the study of large systems, because real systems are finite but large (in the sense that they typically have \(\sim 10^{23}\) constituents), or in the thermodynamic limit of an infinite system, because singular behaviour (in susceptibilities and compressibilities) is believed to be experimentally observed, and in theories this arises only in the thermodynamic limit. This broad categorization of large systems is refined in Sect. 4. The process of taking the thermodynamic limit is the determination of the asymptotic properties of a system as it becomes infinitely large. In general this will involve taking d limits in each of the linear dimensions of the system and such a d-dimensionally infinite system, which where appropriate we call a fully-infinite system, is implicitly the object of investigation by scaling theory in Sect. 2.4.Footnote 10 However, relevant to our discussions is the case of a partially-infinite system, where the limit is taken in only \({\mathfrak {d}}<d\) dimensions. Here it is \({\mathfrak {d}}\) rather than d which should count for the critical behaviour as the dimension of the system. The idea underlying our approach to PTCP is that reality lies with fully-finite systems (\({\mathfrak {d}}=0\)) and that the judgment as to whether the large system will show behaviour which in practical terms is indistinguishable from singular behaviour is based on comparing the behaviour of systems of ever increasing size to see whether their properties indicate convergence towards those of the infinite system. In principle, as described in Sect. 4 this limiting process is in all d dimensions. In practice, as we see in our discussion of \(d=2\) transfer matrix calculations in Sect. 3.3, it also has relevance to the case where one limit has already been taken and increasing size is in the remaining dimension.

Thus, as we have indicated, Sects. 2 and 3 trace the steps in our developments of thermodynamics and statistical mechanics with the inter-theory connections between them; with Sect. 3.5 addressing different proposed resolutions to the contradiction between the finiteness of real systems and the perceived necessity of phase transitions being portrayed as singularities in infinite systems. Section 3.6 discusses the proposal of Mainwood [80] for representing the occurrence of phase transitions in finite systems. Using the account of finite-size scaling in Sect. 3.4.2 we propose in Sect. 4 our alternative quantitative account for phase transitions in finite large systems. Section 5 contains some after-thoughts on enrichment, substantiation, reduction and emergence and our conclusions are in Sect. 6.

2 From Classical Thermodynamics to Scaling Theory

Accounts of thermodynamics range from those designed for the practical needs of engineers to those which aim for a degree of mathematical rigour. However, all share some common features and assumptions some of which are at variance with the insights gained in statistical mechanics. As indicated above, we flag these differences in the form of messages from statistical mechanics (FSM–1 to FSM–4).

2.1 The Structure of Thermodynamics

All accounts of thermodynamics contain (in some form or another) the first law, which establishes the existence of the internal energy function U and the second law which establishes the existence of the entropy S and temperature T. Details are not necessary for the present discussion. The only thing we need to carry forward is the fundamental thermodynamic differential form. Given a thermodynamic system with:

  1. (i)

    One mechanic extensive/intensiveFootnote 11 conjugate variable pair \((X,\xi )\), where X could stand for the volume V or magnetic moment \(\mathcal {M}\) with conjugate intensive variables, which in the case of V is the (negative) pressure -P and in the case of \(\mathcal {M}\) is the magnetic field \(\mathcal {H}\);

  2. (ii)

    A (dimensionless) extensive variable N which counts the number of units of mass in the system with a conjugate (intensive) energy \(\mu\), called the chemical potential, carried by each unit of mass;Footnote 12

for a differential change in the space \(\varXi _0\) of the variables (UXN) the differential change in the entropy S satisfiesFootnote 13

$$\begin{aligned} \text {d}S=\zeta _1\text {d}U -\zeta _2\text {d}X -\zeta _3\text {d}N, \end{aligned}$$
(1)

where

$$\begin{aligned} \zeta _1:=\,\varepsilon /T,\quad \zeta _2:=\,\xi /T,\quad \zeta _3:=\,\mu /T, \end{aligned}$$
(2)

are couplings. It is clear that the couplings are intensive and dimensionless. That the variables \((U,X,N)\in \varXi _0\) appear as differentials on the right of (1) should be understood as signifying that they are independent variables. This means that the system is thermally, mechanically and chemically isolated with U, X and N fixed by an experimenter. Legendre transformations can be used to replace U and X successively as independent variables by \(\zeta _1\) and \(\zeta _2\). Firstly, with Helmholtz free energy

$$\begin{aligned} \varPhi _1:=\,\zeta _1 U-S, \end{aligned}$$
(3)

we have

$$\begin{aligned} \text {d}\, \varPhi _1=U\text {d}\zeta _1+\zeta _2 \text {d}X +\zeta _3 \text {d}N, \end{aligned}$$
(4)

so that the independent variables are \((\zeta _1,X,N)\in \varXi _1\). The system is in contact with a source of thermal energy at temperature \(T=\varepsilon /\zeta _1\). Secondly, with Gibbs free energy

$$\begin{aligned} \varPhi _2:=\,\zeta _1 U-\zeta _2 X-S, \end{aligned}$$
(5)

we have

$$\begin{aligned} \text {d}\, \varPhi _2=U\text {d}\zeta _1- X\text {d}\zeta _2 +\zeta _3 \text {d}N, \end{aligned}$$
(6)

so that the independent variables are \((\zeta _1,\zeta _2,N)\in \varXi _2\). The system is now, through \(\zeta _2\), also in mechanical contact with its environment, be it a fluid system subject to a pressure P or a magnetic system subject to a field \(\mathcal {H}\). The couplings \(\zeta _1\) and \(\zeta _2\) are referred to as the thermal and field (or mechanical) couplings respectively.

It is tempting to suppose that this process could be taken one step further, interchanging the roles of N and \(\zeta _3\). However, it is not difficult to see that the Legendre transformation implementing this would involve a free energy \(\varPhi _3\) which is constant and can thus without loss of generality be taken to be identically zero. A viable form of thermodynamics must retain (at least) one extensive variable (here we choose that to be N, although we could have used X) which registers the size of the system.

Observing that in thermodynamics the uncontrolled variables remain constant when the corresponding controlled variables are held constant, this is now the point for the first message from statistical mechanics:

FSM–1 Unlike in thermodynamics, extensive variables in statistical mechanics that are uncontrolled quantities fluctuate even when the corresponding controlled variables are kept constant. (In \(\varXi _1\) the energy corresponding to the internal energy U fluctuates, and in \(\varXi _2\) the variable corresponding to X, be it the volume or the magnetic moment, fluctuates. This is born out by experiment [79].) The variances of the fluctuations are given in terms of response functions and are \({\mathcal {O}}(N)\). This means that standard deviations of fluctuations are \({\mathcal {O}}(\sqrt{N})\) and become negligibly small compared to \({\mathcal {O}}(N)\) variables only in the thermodynamic limit \(N\rightarrow \infty\).

For fixed N let \((U,X,N){\mathop {\rightarrow }\limits ^{{{\tiny {\text{ A }}}}\,\,}}(U',X',N)\) denote an adiabatic process. It can be shown [69], from Carathéodory’s first version of the second law [23],Footnote 14 that thermodynamic systems are of four types according to whether the adiabatic process gives \(U\le U'\) or \(U\ge U'\) and \(S\le S'\) or \(S\ge S'\) corresponding, respectively, to the possibilities of the temperature and heat capacity being positive or negative.Footnote 15 Standard accounts of thermodynamics concentrate solely on the case where both internal energy and entropy increase, which is the situation where both temperature and heat capacity are positive. We shall restrict out attention to that case.Footnote 16

2.2 Extensivity and the Thermodynamic Limit

Departing from the formulation TD1 of the structure of thermodynamics we ascend the left-hand column in Fig. 1, where it is now useful to consider the embedding of particular models. In this context they are of two types, ones which posit a phenomenological equation of state and ones derived from some microstructure according to the procedures of statistical mechanics. Most examples in the first category, the perfect gas equation, the Weiss-field equation for ferromagnetism and the van der Waals equationFootnote 17 introduce the models in terms of an equation relating the mechanical variable pair \((X,\xi )\) and N to the temperature. However, it is more consonant with our approach to begin with a defining relationship for the entropy surface S(UXN), from which T, \(\xi\) and \(\mu\), or equivalently the couplings \(\zeta _1\), \(\zeta _2\) and \(\zeta _3\) can be calculated using (1). Thus:

  • For the perfect gas

    $$\begin{aligned} S(U,V,N):=\,Nc+{\textstyle \frac{3}{2}}N\ln \Big (\frac{U}{N}\Big )+N\ln \Big (\frac{V}{N}\Big ), \end{aligned}$$
    (7)

    for some constant c,Footnote 18 givingFootnote 19

    $$\begin{aligned} T=\frac{2U\varepsilon }{3N},\qquad P=\frac{NT}{V}. \end{aligned}$$
    (8)
  • For the van der Waals fluid

    $$\begin{aligned} S(U,V,N):=\,Nc+{\textstyle \frac{3}{2}}N\ln \Big (\frac{U}{N}+\frac{ N}{V}\Big )+N\ln \Big (\frac{V}{N}-1\Big ), \end{aligned}$$
    (9)

    giving

    $$\begin{aligned} T= \frac{2}{3}\varepsilon \Big (\frac{U}{N}+\frac{N}{V}\Big ),\quad P= \frac{NT }{V- N}-\frac{\varepsilon N^2}{V^2}. \end{aligned}$$
    (10)

The entropy (7) is a concave function of (UV), but for (9) it is necessary to take the concave envelope. This is, of course, equivalent in the case of the van der Waals [122] fluid and other phenomenological equations of state to the application of Maxwell’s equal areas rule [81], which avoid the inclusion of unstable states and leads to a first-order gas-liquid phase transition (see Sect. 2.3).

It will be noted that, for both the perfect gas and van der Waals fluid with densities \(u:=\,U/N\) and \(v:=\,V/N\), there exists an entropy density s satisfying

$$\begin{aligned} s:=\,\frac{S(u N,vN,N)}{N}= s(u,v), \end{aligned}$$
(11)

for all \(N>0\), which avoids any reference to the size N of the system. But, of course, these are rather special models and the question arises as to whether entropy, in general, when X replaces V and \(x:=\,X/N\) replaces v, satisfies

$$\begin{aligned} s:=\,\frac{S(u N,{x} N,N)}{N}= s(u,{x}),\qquad \forall \quad N>0. \end{aligned}$$
(12)

For this question the following result is important:

Theorem 1

Equation (12) is true iff

$$\begin{aligned} S(\lambda U,\lambda X,\lambda N)=\lambda S(U,X,N),\qquad \forall \quad \lambda >0, \end{aligned}$$
(13)

is true.

Proof

That (12) follows from (13) is easily seen by taking \(\lambda =\,1/N\) and defining \(s(u,x):=\,S(u,x,1)\).

In the reverse direction, this last relationship \(s(u,x)=S(u,x,1)\) in fact follows from (12) by setting \(N=1\). Then from (12) \(S(U,X,N)= NS(U/N,X/N,1)\) and again setting \(\lambda =\,1/N\) recovers (13). \(\square\)

Equation (13) is the condition that S is an extensive function and it is easily shown from (3) and (5) that the free energies \(\varPhi _1\) and \(\varPhi _2\) are extensive functions if and only if the entropy is an extensive function. But, as pointed out by Menon and Callender [83, Sect. 2] and show in Sect. 3.3,

FSM-2 The extensivity of entropy and of free energies assumed in thermodynamics is not exactly true for all systems in statistical mechanics, but is approximately true for large systems.

For entropy the thermodynamic limit in statistical mechanics, assuming it exists,Footnote 20 is given by

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{S(u N,{x} N,N)}{N}= s(u,{x}). \end{aligned}$$
(14)

But for thermodynamics the corresponding formula is (12), without the need for the limiting process. Exact extensivity in thermodynamics can be regarded as unnecessary or trivially true.

Differentiating (13) with respect to \(\lambda\), and substituting from (12) gives

$$\begin{aligned} S=\zeta _1 U -\zeta _2 X -\zeta _3 N, \end{aligned}$$
(15)

when \(\lambda\) is put equal to 1. From (1) to (12),

$$\begin{aligned} u \text {d}\zeta _1 -{x}\text {d}\zeta _2 -\text {d}\zeta _3 = 0\, , \end{aligned}$$
(16)

which is a version of the Gibbs–Duhem relationship. In terms of densities (15) becomes

$$\begin{aligned} s=\zeta _1 u -\zeta _2 {x} -\zeta _3, \end{aligned}$$
(17)

and substituting into (1)–(6)

$$\begin{aligned} \text {d}s= & {} \zeta _1\text {d}u-\zeta _2\text {d}{x}-(s-\zeta _1 u+\zeta _2 {x}+\zeta _3)\text {d}N/N\nonumber \\= & {} \zeta _1\text {d}u-\zeta _2\text {d}{x}. \end{aligned}$$
(18)

Then, for free-energy densities \(\phi _1:=\,\varPhi _1/N\) and \(\phi _2:=\,\varPhi _2/N\),

$$\begin{aligned} \phi _1= & {} \zeta _1 u-s=\zeta _2 {x} +\zeta _3,\quad \text {d}\phi _1=u\text {d}\zeta _1+\zeta _2\text {d}{x}, \end{aligned}$$
(19)
$$\begin{aligned} \phi _2= & {} \zeta _1 u-\zeta _2{x}-s=\zeta _3,\quad \text {d}\phi _2=u\text {d}\zeta _1-{x}\text {d}\zeta _2. \end{aligned}$$
(20)

These are the fundamental size-free thermodynamic relationships in terms of density variables and density functions. They are exact in thermodynamics but approximately true only for large systems in statistical mechanics. The question of large systems and the thermodynamic limit in statistical mechanics is treated in Sects. 3.3, 3.5 and 4.

2.3 Thermodynamics with PTCP

Having arrived at a formulation of thermodynamics in terms of densities and couplings the modern theory of PTCP is largely concerned with an investigation and classification of the singular properties of systems (see e.g., [18]). Specifically the singularities which could occur on the hypersurface of the entropy density, or the appropriate free-energy density, which defines the state of the system. However we should be forewarned that the account of statistical mechanics in Sect. 3 concludes that:

FSM-3 The association of PTCP with singularities in the entropy and free-energy densities which is made in thermodynamics can be made in statistical mechanics only for infinite systems.

The association of PTCP with singularities in both TD3 and SM3 leads to a tendency for them to be mistakenly conflated. (We shall discuss this in more detail in relation to limit reduction in Sect. 5.1).

We now consider three thermodynamic spaces, \({\widetilde{\varXi }}_0\), \({\widetilde{\varXi }}_1\) and \({\widetilde{\varXi }}_2\), which correspond respectively to the spaces \(\varXi _0\), \(\varXi _1\) and \(\varXi _2\) defined in Sect. 2.1 except that now densities replace extensive variables. In reverse order, since this is more heuristically transparent:

Fig. 2
figure 2

A first-order transition showing as a discontinuity of slope in an isothermal section (\(\zeta _1=\zeta _1^\star\)) of \(\phi _2=\zeta _3\) plotted against \(\zeta _2\)

Fig. 3
figure 3

A first-order transition showing as the linear section \({\mathcal {C}}^\star\) in an isothermal section of the \(\phi _1\) surface

Fig. 4
figure 4

A first-order transition showing as a horizontal part \({\mathcal {C}}^\star\) of an isotherm of \(\zeta _2\) plotted against x together with the isotherm through the critical point \({\textsf {C}}\). As \(\zeta _1\) varies the ends of \({\mathcal {C}}^\star\) trace the boundary of the coexistence region (shaded)

Fig. 5
figure 5

A critical point \((\zeta _{1\text {c}} , \zeta _{2\text {c}} )\) in \({\widetilde{\varXi }}_2\). The first-order transition (coexistence curve) \(\zeta _2 = \zeta _2^\star (\zeta _1 )\) is represented by a broken line and the critical isochore, along which the density x takes its critical value \(x=x_\text {c}\) by a dotted line. The directions of the axes of the two relevant scaling fields at the critical point, as described in Sect. 2.4, are shown

  1. (i)

    \({{{In\, the\, space}\, {\widetilde{\varXi }}_2 {\,of \,the\, vector }\,{\pmb {\zeta }}:=\,(\zeta _1,\zeta _2)}}\) the free-energy density \(\phi _2(\zeta _1,\zeta _2)\) is a surface with normal in the direction \((1,-u,{x})\) and phases are separated by lines of transitions. The simplest example is a line \({\mathcal {L}}^\star\) across which there is a discontinuity of the gradient \(\nabla \phi _2=(u,-{x})\); an isothermal section (\(\zeta _1\) constant) of this surface is shown in Fig. 2. The point \({\pmb {\zeta }}^\star :=\,(\zeta _1^\star ,\zeta _2^\star )\in {\mathcal {L}}^\star\), with \(\zeta _3^\star =\phi _2({\pmb {\zeta }}^\star )\). \({\mathcal {L}}^\star\) can be regarded as representing the coexistence of two phases with different densities. As \({\pmb {\zeta }}\) is varied across \({\mathcal {L}}^\star\) through \({\pmb {\zeta }}^\star\) there is a first-order phase transition where the densities change discontinuously. In the case of both fluid and magnetic systems a first-order transition will involve a discontinuity of the internal energy density u. In a fluid system there will be a discontinuity of the (physical) density as the system changes between a liquid and a gas. In a magnetic system there will be a discontinuity in the magnetization (or equivalently the magnetization density) as shown for the Ising model in Fig. 9.

  2. (ii)

    \({{{In\, the\, space}\, {\widetilde{\varXi }}_1 \,{of \,the \,vector}\, (\zeta _1,{x})}}\) the free-energy density \(\phi _1(\zeta _1,{x})\) is a surface convex with respect to x with normal in the direction \((1,-u,-\zeta _2)\), as shown by an isothermal (\(\zeta _1=\zeta _1^\star\)) section in Fig. 3. A first-order transition corresponds to the part of the isotherm, labelled \({\mathcal {C}}^\star\), which is linear with respect to x. At the ends of \((\zeta _1^\star ,x^{(\star +)})\) and \((\zeta _1^\star ,x^{(\star -)})\) of \({\mathcal {C}}^\star\) all three couplings \(\zeta _1\), \(\zeta _2\) and \(\zeta _3\) have the same values as is otherwise shown in Fig. 2. Typically, as \(\zeta _1^\star\) varies along \({\mathcal {L}}^\star\) the ends of \({\mathcal {C}}^\star\) converge to a critical point where the system exhibits a second-order transition. There the densities are continuous but one or more of the response functions (that is to say the curvature components of the free-energy surface) is singular.Footnote 21 A projection of the linear coexistence region in Fig. 3 is shown in Fig. 4, and the situation where the corresponding transition line \({\mathcal {L}}^\star\) terminates is shown in Fig. 5.

  3. (iii)

    \({{{The \,space}\, {\widetilde{\varXi }}_0\, {of \,the \,vector} (u,{x}),}}\) in which the entropy density s(ux) is a concave surface is similar to that for \(\phi _1(\zeta _1,{x})\),Footnote 22 except that now the linear generator \({\mathcal {C}}^\star\) of the coexistence region has endpoints \((u^{(\star +)},x^{(\star +)})\) and \((u^{(\star -)},x^{(\star -)})\). As \({\pmb {\zeta }}^\star\) varies along \({\mathcal {L}}^\star\), \({\mathcal {C}}^\star\) traces out the boundary of a ruledFootnote 23 region on the entropy surface with \({\mathcal {C}}^\star\) converging in one direction to the critical point described in (ii).

Critical exponents at the critical point are associated with the curvature of the coexistence curve in \({\widetilde{\varXi }}_1\) and the coexistence line in \({\widetilde{\varXi }}_2\), and the asymptotic singular behaviour of the (per particle) heat capacities \(c_{x}\) and \(c_\xi\) at constant density and field respectively and a response function \(\varphi _{{T}}\), which in a fluid corresponds to the compressibility and in a magnet to the susceptibility. It will also be useful to include the coefficient of thermal expansion \(\alpha _\xi\). These are defined together with their critical exponents in Appendix 1. The heat capacities \(c_{x}\) and \(c_\xi\) are normally positive and from (105) it follows that, if \(\varphi _{{T}}>0\), then \(c_\xi\) dominates both \(c_{x}\) and \(\alpha ^2_\xi /\varphi _{{T}}\) as \({T}\rightarrow {T}_{\text {c}}\). For the critical exponents \(\upsigma\) and \(\upsigma '\) characterizing the singularity of \(c_x\) on approach to the critical point from above and below \(T_\text {c}\), and the analogously defined critical exponents \(\upalpha\) and \(\upalpha '\) characterizing the singularity of \(c_\xi\), and \(\upgamma\) and \(\upgamma '\) characterizing the singularity of \(\varphi _{{T}}\), as well \(\upbeta\) characterizing the curvature of the coexistence curve, this means that

$$\begin{aligned} \upsigma \ge \upalpha \, ,\quad \upsigma {}' \ge \upalpha ',\qquad \upsigma '+2\upbeta +\upgamma '\ge 2. \end{aligned}$$
(21)

The condition \(\varphi _{{T}}>0\) is true for a magnetic system and in this case the third inequality in (21) was first established by Rushbrooke [113]. The stronger condition

$$\begin{aligned} \upalpha '+2\upbeta +\upgamma '\ge 2, \end{aligned}$$
(22)

was obtained by Griffiths [42] for both magnetic and fluid systems using the convexity properties of the free energy. In fact it is a consequence of scaling theory (Sect. 4) that, for systems with a special symmetry which is present in magnetic systems where, as for the Ising model in Appendix 2, the coexistence curve coincides with the zero field axis, \(\upsigma '=\upalpha '\) and inequalities (21) and (22) become identical. Otherwise \(\upsigma '=\upgamma '\). Griffiths [42] also derived a number of other inequalities. In particular

$$\begin{aligned} \upgamma '\ge \upbeta (\updelta -1), \end{aligned}$$
(23)

where \(\updelta\), given by (109), is the exponent characterizing the (critical) equation of state.

2.4 Thermodynamics with Scaling Theory

In view of our aim to keep as distinct as possible the developments of thermodynamics and statistical mechanics, we choose here to present scaling theory as a mathematical axiomatization of the properties of PTCP in thermodynamics. Although, as we see below, it has deep roots in, and is substantiated by, statistical mechanics, in particular renormalization group theory,Footnote 24 where, in almost all cases,Footnote 25 the realization of this picture of scaling involves approximations and yields scaling forms of only local validity.

Originating in the work of (among others) Widom [126, 127] and Kadanoff [53] our approach is essentially that of Hankey and Stanley [44]. Given here in brief outlineFootnote 26 it is sufficient for an analysis of power-law singularities in the critical region.Footnote 27

Suppose we have the free-energy density of a system in terms of its maximum number of independent couplings. In the discussion above that maximum number was two, but for the moment we generalize to n couplings so the free-energy density is \(\phi _n({\pmb {\zeta }})\), where \({\pmb {\zeta }}:=\,(\zeta _1,\zeta _2,\ldots ,\zeta _n)\), which is represented as a hypersurface of dimension n in the \((n+1)\)-dimensional space \((\phi _n,{\pmb {\zeta }})\). Now suppose that there is a critical region \({\mathcal {C}}\) of dimension \(n-s\). Although \(\phi _n({\pmb {\zeta }})\) itself is continuous and finite across and within \({\mathcal {C}}\) it may have discontinuous first-order derivatives, meaning that \({\mathcal {C}}\) is a region of phase coexistence with a first-order transition when, as is shown in Fig. 2, the phase point crosses through \({\mathcal {C}}\), or it may have singular second-order derivatives in \({\mathcal {C}}\), as is the case in the situation described above where a line of first-order transition terminates at a critical point.Footnote 28

With respect to some origin \({\pmb {\zeta }}^\circ \in {\mathcal {C}}\) a system of orthogonal curvilinear coordinates \(\theta _1,\theta _2,\ldots ,\theta _n\) called scaling fields is constructed. These are smooth functions of the couplings which parameterize \({\mathcal {C}}\) so that \(\theta _1 = \cdots = \theta _s = 0\) within \({\mathcal {C}}\). The scaling fields in this subset are called relevant with those in the remaining subset \(\theta _{s+1},\theta _{s+2},\ldots \theta _n\), called irrelevant, acting as a local set of coordinates within \({\mathcal {C}}\).Footnote 29 The free-energy density \(\phi _n({\pmb {\zeta }})\) is separated into two parts

$$\begin{aligned} \phi _n({\pmb {\zeta }}) = \phi _{\text {smth}}({\pmb {\zeta }}) + \phi _{\text {sing}}(\triangle {\pmb {\zeta }})\, , \end{aligned}$$
(24)

where \(\phi _{\text {smth}}({\pmb {\zeta }})\) is a regular function and, with \(\triangle {\pmb {\zeta }}:=\,{\pmb {\zeta }}-{\pmb {\zeta }}^\circ\), \(\phi _{\text {sing}}(\triangle {\pmb {\zeta }})\), for which \(\phi _{\text {sing}}({\pmb {0}})=0\), contains all the non-smooth parts of \(\phi _n({\pmb {\zeta }})\) in \({\mathcal {C}}\). It is now assumed that \(\phi _{\text {sing}}(\triangle {\pmb {\zeta }})\) can be re-coordinated in terms of the scaling fields so that it is a generalized homogeneous function satisfying the Kadanoff scaling hypothesisFootnote 30

$$\begin{aligned} {\phi }_{\text {sing}}({\lambda }^{y_1}{\theta }_1,\ldots ,{\lambda }^{y_n}{\theta }_n) = {\lambda }^d {\phi }_{\text {sing}} ({\theta }_1,\ldots ,{\theta }_n)\, , \end{aligned}$$
(25)

for all real \(\lambda >0\), where d is the physical dimension of the system, and \(y_j\), \(j=1,2,\ldots ,n\) are scaling exponents satisfying

$$\begin{aligned} y_j > 0,\quad j = 1,\ldots ,s,\qquad y_j < 0,\quad j = s + 1,\ldots ,n. \end{aligned}$$
(26)

The exponents in the first subset are, like the corresponding scaling fields, called relevant  and those in the latter subset are called irrelevant.Footnote 31 Of the assumptions made here, that scaling fields can be derived is not particularly demanding; at the very least it is usually straightforward to obtain their linear parts near to the origin. And the division of the free-energy density (24) into smooth and singular parts has very little content until we explore in more detail the consequences of the scaling hypothesis (25) which we now do for the case of a critical point terminating a coexistence curve.

There are many general accounts of scaling theory, treating a variety of critical phenomena. Here we restrict attention to the case of a critical point terminating a line of first-order transitions, as shown in Fig. 5. So we have two critical regions. The first is the critical point with two relevant scaling fields and scaling exponents with axes chosen perpendicular to and along the coexistence curve. For this we shall show that the critical exponents defined in Appendix 1, can be expressed in terms of the two scaling exponents. The second is the coexistence curve which has one relevant and one irrelevant scaling field constructed with respect to some chosen origin (not shown in Fig. 5) on the coexistence curve.

For the sake of further simplifying our presentation we restrict attention to a simple ferromagnetic system with \(\xi :=\,\mathcal {H}\), the magnetic field, \(X:=\,\mathcal {M}\), the magnetization and \(x:=\,m=\mathcal {M}/N\), the magnetization density. The coupling \(\zeta _1\) is the thermal coupling so we relabel it as \(\zeta _{T}=\varepsilon /T\) and \(\zeta _2\) is the field coupling which we relabel as \(\zeta _{{\tiny {\mathcal {H}}}}=\mathcal {H}/T\). This model, of which an example in statistical mechanics is the Ising model described in Appendix 2, has the advantage of having the special symmetry that the coexistence curve lies along the zero-field axis in an interval \(T\in [0,T_\text {c}]\) with \(\mathcal {H}_\text {c}=m_\text {c}=0\). This axis with \(T>T_\text {c}\) is the critical isochore. Thus (referring to Fig. 5) the coexistence curve lies along the \(\zeta _{\tiny {\mathcal {H}}}=0\) axis in an interval \([\zeta _{{T}\text {c}},\infty )\). This same phase diagram for the Ising model, now plotted with respect to the temperature T and the magnetic field \(\mathcal {H}\), is shown in Fig. 8.

We consider separately the critical point and the coexistence curve, beginning with the critical point where we can take the scaling fields to be

$$\begin{aligned} \theta _{T}:=\,\zeta _{T}-\zeta _{{T}\text {c}}=\varepsilon \left( \frac{1}{T}-\frac{1}{T_{\text {c}}}\right) \ge 0, \qquad \theta _{\tiny {\mathcal {H}}}:=\,\zeta _{\tiny {\mathcal {H}}}=\frac{\mathcal {H}}{T}. \end{aligned}$$
(27)

The scaling hypothesis (25) becomes

$$\begin{aligned} {\phi }_{\text {sing}}(\lambda ^{y_{T}}\theta _{T},\lambda ^{y_{\tiny {\mathcal {H}}}}\theta _{\tiny {\mathcal {H}}}) = {\lambda }^d {\phi }_{\text {sing}}(\theta _{T},\theta _{\tiny {\mathcal {H}}})\, , \end{aligned}$$
(28)

and, from (24) and (103),

$$\begin{aligned}&m=- \frac{\partial \phi _{\text {smth}}}{\partial \zeta _{\tiny {\mathcal {H}}}} - \frac{\partial \phi _{\text {sing}}}{\partial \theta _{\tiny {\mathcal {H}}}}, \end{aligned}$$
(29)
$$\begin{aligned}&\frac{\partial {\phi }_{\text {sing}}}{\partial \theta _{\tiny {\mathcal {H}}}}({\lambda }^{y_{T}}{\theta }_{T},{\lambda }^{y_{\tiny {\mathcal {H}}}}{\theta }_{\tiny {\mathcal {H}}}) = {\lambda }^{d-y_{\tiny {\mathcal {H}}}} \frac{\partial {\phi }_{\text {sing}}}{\partial \theta _{\tiny {\mathcal {H}}}}({\theta }_{T},{\theta }_{\tiny {\mathcal {H}}}). \end{aligned}$$
(30)

Since \(m_\text {c}=0\), \({\partial \phi _{\text {smth}}}/{\partial \zeta _{\tiny {\mathcal {H}}}}=0\) at the critical point. For an approach to the critical point along the coexistence curve \(\theta _{\tiny {\mathcal {H}}}=0\) and setting \(\lambda =\,\theta _{T}^{-1/y_{T}}\) in (30) and substituting into (29) gives

$$\begin{aligned} m \simeq -\theta _{{T}}^{(d-y_{\tiny {\mathcal {H}}})/y_{T}}\frac{\partial {\phi }_{\text {sing}}}{\partial \theta _{\tiny {\mathcal {H}}}}(1,0) \sim (T_\text {c}-T)^{(d-y_{\tiny {\mathcal {H}}})/y_{T}}, \end{aligned}$$
(31)

which, when comparing with (107) establish the identification

$$\begin{aligned} \upbeta =(d-y_{\tiny {\mathcal {H}}})/y_{T}. \end{aligned}$$
(32)

At this point we could carry out a similar procedure for the response functions in (104) and (105) to determine the critical exponents defined in (106)–(110). However, the analysis can be shortened by a closer examination of the way that the expression (32) for \(\upbeta\) was obtained. From this we see that the scaling exponent \(y_{\tiny {\mathcal {H}}}\) in the numerator indicates that differentiation was once with respect to \(\zeta _{\tiny {\mathcal {H}}}\). And that the approach was in the direction of varying \(\zeta _{T}\) is indicated by the scaling exponent \(y_{T}\) in the denominator. So with the same reasoning it follows from (109) that

$$\begin{aligned} \updelta = y_{\tiny {\mathcal {H}}}/(d-y_{\tiny {\mathcal {H}}}), \end{aligned}$$
(33)

and bearing in mind that the analysis yields singularities for response functions so \(\phi _{\text {smth}}\) can play no role, from (108),

$$\begin{aligned} \upgamma =\upgamma ^\prime =(2y_{\tiny {\mathcal {H}}}-d)/y_{T}. \end{aligned}$$
(34)

When we come to consider \(c_\xi :=\,c_{\tiny {\mathcal {H}}}\), given by (104), the situation becomes a little more complicated, since there are three terms and we need to know which dominates as the critical point is approached. This will depend on the relative magnitudes of \(y_{T}\) and \(y_{\tiny {\mathcal {H}}}\) and it can be shown (Lavis, [67], Sect. 4.5.1) that, in general for a critical point terminating a line of first-transitions, the exponent associated with approaches tangential to the coexistence curve is smaller (less relevant) than that associated with an approach at a non-zero angle to this curve. These are called respective weak and strong approaches and in the present context we have \(y_{\tiny {\mathcal {H}}}>y_{T}\), these being respectively the weak and strong exponents. Returning to the formula for \(c_{\tiny {\mathcal {H}}}\) in (104) we see that the third term on the right-hand side would be the one that dominates meaning that, from (110), \(\upsigma =\upsigma ^\prime = \upgamma\). However, because of the symmetry of the magnetic model \(\zeta _{2\text {c}}=\,\zeta _{{\tiny {\mathcal {H}}}\text {c}}=0\) and the only remaining term is the first, meaning that

$$\begin{aligned} \upsigma =\upsigma ^\prime =(2y_{T}-d)/y_{T}. \end{aligned}$$
(35)

Finally we need to determine the asymptotic form for \(c_x:=\,c_m\) using (105). Here the situation need a more detailed analysis, when it can be shown (Lavis, [67], Sect. 4.5.4) that, whether or not the magnetic symmetry applies cancellation of coefficients leads to an asymptotic form equivalent to that of a second-order derivative with respect to \(\zeta _{T}\); that is,

$$\begin{aligned} \upalpha =\upalpha ^\prime =(2y_{T}-d)/y_{T}. \end{aligned}$$
(36)

This means that it is the asymptotic form of the heat capacity with constant intensive variable (pressure or magnetic field) which is dependent on symmetry. In the magnetic system the exponent is the same as that of the heat capacity with constant extensive variable (the magnetization) and in a fluid, where there is no symmetry it is equal to that of \(\varphi _{{T}}\), which is the compressibility. Equations (32)–(36) are formulae for the exponents \(\upalpha\), \(\upbeta\), \(\upgamma\) and \(\updelta\) in terms of \(y_{T}\) and \(y_{\tiny {\mathcal {H}}}\). They are, therefore, not independent and two relationships exist between them. These can be expressed in the form \(\upalpha + 2\upbeta + \upgamma = 2\), called the Essam–Fisher scaling law [30], which is a strengthening of the inequality (22) and \(\upgamma ' = \upbeta (\updelta - 1)\), called the Widom scaling law [126], which is a strengthening of the inequality (23).

For the coexistence curve, scaling fields, chosen with respect to some arbitrary origin \(\zeta _{T}=\zeta ^\circ _{T}\), \(\zeta _{\tiny {\mathcal {H}}}=0\) are

$$\begin{aligned} \theta ^\prime _{T}:=\,\zeta _{T}-\zeta ^\circ _{{T}}, \qquad \theta ^\prime _{\tiny {\mathcal {H}}}:=\,\zeta _{\tiny {\mathcal {H}}}=\mathcal {H}/T, \end{aligned}$$
(37)

with \(y^\prime _{T}\) and \(y^\prime _{\tiny {\mathcal {H}}}\) irrelevant and relevant exponents respectively. In general it can be shown that relevant exponents are less than or equal to d meaning in this case that \(0<y^\prime _{\tiny {\mathcal {H}}}\le d\). With primes attached to the exponents and fields (29) and (30) continue to applied to the magnetization density. If \(y^\prime _{\tiny {\mathcal {H}}}< d\)

$$\begin{aligned} \frac{\partial {\phi }_{\text {sing}}}{\partial \theta _{\tiny {\mathcal {H}}}}(0,0)=0, \end{aligned}$$
(38)

and m is continuous at the origin; there is no first-order phase transition. If \(y^\prime _{\tiny {\mathcal {H}}}= d\) then (38) does not necessarily hold. There may be a contribution to (29) from the derivative of \({\phi }_{\text {sing}}\). This will be the only way in which the magnetization can be discontinuous across the coexistence curve. So a scaling exponent equal to d is a necessary, but not sufficient condition for a first-order transition. An example of such a first-order transition with an exponent of d is at zero temperature in the one-dimension Ising model (Sect. 3.4.3(a)). Discontinuities in higher-order derivatives can be treated in a similar way.

2.5 Dimensionality and Phase Transitions

Although, as we have seen, thermodynamics, and particularly its treatment of PTCP, assumes that the system is infinite, the dimension d of the system entered into the discussion in Sect. 2.4. And once dimensionality has entered then finiteness has also appeared. Thus, for example, a two-dimensional system can be viewed as a three-dimensional system of ‘thickness’ one in the third dimension and it is only a small step from there to increase the thickness to two. In Sect. 1 we referred to the classification of singularities in terms of universality classes. This, as we asserted, can be discussed only in the context of statistical mechanics, with d one of the factors determining the universality class of an occurrence of singular behaviour. If the number of directions in which the system is infinite is increased, then its critical behaviour will change from one universality class to another. This is an example of what in scaling and renormalization group theory is called ‘cross-over’.Footnote 32 The dimension of the system affects not just the universality class of singular behaviour but whether it occurs at all. However, that dimension is not d but \({\mathfrak {d}}\le d\), the number of directions in which the system is infinite.Footnote 33 And the final message sent from statistical mechanics to thermodynamics is that:

FSM–4 There exists a lower-critical dimension \(d_{{\tiny {\text{ LC}}}}\) such that, if \({\mathfrak {d}}\le d_{{\tiny {\text{ LC }}}}<d\) singular behaviour can occur in the fully-infinite system but not in the partially-infinite system. If \(d>{\mathfrak {d}}>d_{{\tiny {\text{ LC }}}}\) then singular behaviour can occur in both, but in different universality classes.

3 From Gibbsian Statistical Mechanics to the Renormalization Group

The move from thermodynamics to statistical mechanics is, we shall argue, an enrichment and substantiation of the picture we have of any system under investigation. This operates at two levels. The first is structural, where renormalization group theory embedded in statistical mechanics provides a fuller picture in terms of renormalization group transformations and fixed points than scaling theory embedded in thermodynamics. The second is in the provision of specific models which arise from assumptions about the microstructure of the system. We now consider the development represented by the right-hand column in Fig. 1, beginning with the basic structure of statistical mechanics.

3.1 Inter-Theory Connecting Relationships

Let the microstate of the system be given by a value of the vector variable \({\pmb {\sigma }}\) in the phase space \(\varGamma\). In the case of a fluid system \({\pmb {\sigma }}\) will be a set of values for the positions and momenta of all the particles; for a spin system on a lattice, like the Ising model in Appendix 2, \({\pmb {\sigma }}\) will be the set of values of all the spin variables. The microscopic and macroscopic structure of the system is then determined by the Hamiltonian. This is an explicit function of the independent couplings with the independent extensive variables imposing constraints on \({\pmb {\sigma }}\). Thus we have three cases:

  1. (i)

    When \((U,X,N)\in \varXi _0\) are the independent variables the Hamiltonian is \(\widehat{H}_0({\pmb {\sigma }};X,N)\), with values constrained by

    $$\begin{aligned} \widehat{H}_0({\pmb {\sigma }};X,N)=U, \end{aligned}$$
    (39)

    and \({\pmb {\sigma }}\) constrained, according to the nature of the particular model by X and N.Footnote 34

  2. (ii)

    When \((\zeta _1,X,N)\in \varXi _1\) are the independent variables the Hamiltonian \(\widehat{H}_1({\pmb {\sigma }};\zeta _1,X,N)\) is a linear function of \(\zeta _1\). The constraint (39) is removed but \({\pmb {\sigma }}\) remains constrained by X and N.

  3. (iii)

    When \((\zeta _1,\zeta _2,N)\in \varXi _2\) are the independent variables the Hamiltonian \(\widehat{H}_2({\pmb {\sigma }};\zeta _1,\zeta _2,N)\) is a linear function of \(\zeta _1\) and \(\zeta _2\). The only remaining constraint is from N.

Connecting relationships are now invoked in three stages:

FTD–1 The independent variables in \(\varXi _0\), \(\varXi _1\) and \(\varXi _2\) are endowed with their thermodynamic meanings.

To proceed to the next stage of the inter-theory connecting process we need to give a form in cases (i), (ii) and (iii), respectively, for the entropy, and the free energies \(\varPhi _1\) and \(\varPhi _2\). Case (i) gives the microcanonical distributionFootnote 35 and cases (ii) and (iii) give, respectively, the canonical distribution and the constant pressure or magnetic field distribution. For the sake of simplicity we concentrate exclusively on case (iii), where the Gibbs free energy is defined by

$$\begin{aligned} \varPhi _2(\zeta _1,\zeta _2,N):=\,-\ln \{Z_2(\zeta _1,\zeta _2,N)\}, \end{aligned}$$
(40)

where

$$\begin{aligned} Z_2(\zeta _1,\zeta _2,N):=\,\sum _{\{{\pmb {\sigma }}\}} \exp \{-\widehat{H}_2({\pmb {\sigma }};\zeta _1,\zeta _2,N)\}, \end{aligned}$$
(41)

is the Gibbs partition function.Footnote 36 Then

FTD–2 \(\varPhi _2\) is endowed with its thermodynamic properties and, using (3)–(6),

$$\begin{aligned} U=\frac{\partial \varPhi _2}{\partial \zeta _1},\quad X=-\frac{\partial \varPhi _2}{\partial \zeta _2},\quad \zeta _3=-\frac{\partial \varPhi _2}{\partial N}, \end{aligned}$$
(42)
$$\begin{aligned} \varPhi _1=\varPhi _2+\zeta _2 X,\qquad S=\zeta _1 U-\varPhi _1, \end{aligned}$$
(43)

establishes the connection between U, X, \(\zeta _3\), \(\varPhi _1\) and S and their thermodynamic equivalents.

This completes a sufficient set of the connecting relationships. However, we can make some further links. Suppose that

$$\begin{aligned} \widehat{H}_2({\pmb {\sigma }};\zeta _1,\zeta _2;N):=\,\widehat{U}({\pmb {\sigma }})\zeta _1-\widehat{X}({\pmb {\sigma }})\zeta _2. \end{aligned}$$
(44)

Then, from (40)–(43),

$$\begin{aligned} U=\langle \widehat{U}({\pmb {\sigma }})\rangle ,\qquad X=\langle \widehat{X}({\pmb {\sigma }})\rangle . \end{aligned}$$
(45)

FTD–3 U and X are identified respectively as the expectation values of \(\widehat{U}({\pmb {\sigma }})\) and \(\widehat{X}({\pmb {\sigma }})\) with respect to the probability distribution with density

$$\begin{aligned} \rho ({\pmb {\sigma }};\zeta _1,\zeta _2):=\,\frac{ \exp [-\widehat{H}_2({\pmb {\sigma }};\zeta _1,\zeta _2;N)]}{Z_2(\zeta _1,\zeta _2,N)}. \end{aligned}$$
(46)

And it further follows from (40)–(46) that

$$\begin{aligned} \text {Var}[\widehat{X}({\pmb {\sigma }})]=\frac{\partial ^2\varPhi _2}{\partial \zeta _2^2} = N\varphi _{{T}}, \end{aligned}$$
(47)

where \(\varphi _{{T}}\) is the response function given by (104). This is an example of a fluctuation–response function relationship. Similar relationships apply to \(\widehat{U}({\pmb {\sigma }})\) and all uncontrolled extensive variables.

3.2 Correlation Function and Correlation Length

As is already evident, thermodynamics is a ‘black-box’ theory with a set of macro-variables some of which are independent and controllable and others whose values change in response to the changes in the independent variables. The only concession made to internal structure was, in Sect. 2.1, to allow a counting of the number N of mass units of the system. Now with the ‘enrichment’ provided by statistical mechanics we are able to record the microstate \({\pmb {\sigma }}\) of the system, which is simply the aggregate of the states of the individual microsystems.

Suppose that we take the d-dimensional hypercubic latticeFootnote 37\({\mathcal {N}}_d\) with sites \({\pmb {r}}:=\,(n_1,n_2,\ldots ,n_d){\mathfrak {a}}\), for \(n_k=1,2,\ldots ,N_k\) with \(N=N_1N_2\cdots N_d\), where \({\mathfrak {a}}\) is the lattice spacing.Footnote 38 Then, given that the states of the microsystems on sites \({\pmb {r}}\) and \({\pmb {r}}'\) of the lattice are \(\sigma ({\pmb {r}})\) and \(\sigma ({\pmb {r}}')\), respectively, how does the state of one effect the state of the other; that is to say, how are their states correlated? More specifically, how is the correlation between \(\sigma ({\pmb {r}})\) and \(\sigma ({\pmb {r}}')\) affected by:

  1. (i)

    the distance \(|{\pmb {r}}-{\pmb {r}}'|\) between the sites?

  2. (ii)

    the closeness of the thermodynamic state of the system to a critical region?

To begin to answer these questions suppose that, as for the Hamiltonian (44) in the Ising model in Appendix 2, \(\widehat{X}({\pmb {\sigma }})\) is a linear sum of the states on the sites of \({\mathcal {N}}\). And (temporarily) suppose that the coupling \(\zeta _2\) takes different values \(\zeta _2({\pmb {r}})\) at the sites. Then, denoting the set of couplings \(\zeta _2({\pmb {r}})\) by the vector \({\pmb {\zeta }}_2\),

$$\begin{aligned} \widehat{H}_2({\pmb {\sigma }};\zeta _1,{\pmb {\zeta }}_2;N):=\,\widehat{U}({\pmb {\sigma }})\zeta _1-\sum _{\{{\pmb {r}}\}}\sigma ({\pmb {r}})\zeta _2({\pmb {r}}) \end{aligned}$$
(48)

and from (46), the expectation values of \(\sigma ({\pmb {r}})\) is

$$\begin{aligned} \langle \sigma ({\pmb {r}})\rangle = \sum _{\{{\pmb {\sigma }}\}} \sigma ({\pmb {r}}) \rho ({\pmb {\sigma }};\zeta _1,{\pmb {\zeta }}_2)=-\frac{\partial \varPhi _2}{\partial \zeta _2({\pmb {r}})}. \end{aligned}$$
(49)

If the states \(\sigma ({\pmb {r}})\) and \(\sigma ({\pmb {r}}')\) are uncorrelated \(\langle \sigma ({\pmb {r}})\sigma ({\pmb {r}}')\rangle\) will factor into \(\langle \sigma ({\pmb {r}})\rangle \langle \sigma ({\pmb {r}}')\rangle\). So

$$\begin{aligned} \mathsf {\Gamma }({\pmb {r}},{\pmb {r}}';\zeta _1,\{\zeta _2({\pmb {r}})\}):=\,\langle \sigma ({\pmb {r}})\sigma ({\pmb {r}}')\rangle -\langle \sigma ({\pmb {r}})\rangle \langle \sigma ({\pmb {r}}')\rangle =-\frac{\partial ^2\varPhi _2}{\partial \zeta _2({\pmb {r}})\partial \zeta _2({\pmb {r}}')}, \end{aligned}$$
(50)

called the pair correlation function is a measure of the degree of correlation between \(\sigma ({\pmb {r}})\) and \(\sigma ({\pmb {r}}')\). If all the couplings \(\zeta _2({\pmb {r}})\) are set equal to \(\zeta _2\), it follows from (104) that

$$\begin{aligned} \sum _{\{{\pmb {r}},{\pmb {r}}'\}}\mathsf {\Gamma }({\pmb {r}},{\pmb {r}}';\zeta _1,\zeta _2)=N\varphi _{{T}}, \end{aligned}$$
(51)

which is a fluctuation-response function relationship. If translational invariance is assumed, then \(\mathsf {\Gamma }({\pmb {r}},{\pmb {r}}';\zeta _1,\zeta _2)=\mathsf {\Gamma }(\bar{{\pmb {r}}};\zeta _1,\zeta _2)\), where \(\quad \bar{{\pmb {r}}}:=\,{\pmb {r}}-{\pmb {r}}'\) and

$$\begin{aligned} \sum _{\{\bar{{\pmb {r}}}\}}\mathsf {\Gamma }(\bar{{\pmb {r}}};\zeta _1,\zeta _2)=\varphi _{{T}}\quad \text{ with }\quad \mathsf {\Gamma }^\star ({\pmb {0}};\zeta _1,\zeta _2)=\varphi _{{T}}, \end{aligned}$$
(52)

where \(\mathsf {\Gamma }^\star ({\pmb {k}};\zeta _1,\zeta _2)\) is the Fourier transform of \(\mathsf {\Gamma }(\bar{{\pmb {r}}};\zeta _1,\zeta _2)\).Footnote 39 The correlation length \(\upxi (\zeta _1,\zeta _2)\), given byFootnote 40

$$\begin{aligned} \upxi ^2(\zeta _1,\zeta _2):=\,c({\mathfrak {d}})\frac{\sum _{\{\bar{{\pmb {r}}}\}}|\bar{{\pmb {r}}}|^2\mathsf {\Gamma }(\bar{{\pmb {r}}};\zeta _1,\zeta _2)}{\sum _{\{\bar{{\pmb {r}}}\}}\mathsf {\Gamma }(\bar{{\pmb {r}}};\zeta _1,\zeta _2)} = - c({\mathfrak {d}}) \frac{\nabla ^2_{{\pmb {k}}}\mathsf {\Gamma }^\star ({\pmb {0}};\zeta _1,\zeta _2)}{\mathsf {\Gamma }^\star ({\pmb {0}};\zeta _1,\zeta _2)}, \end{aligned}$$
(53)

is a measure of distance over which microscopic degrees of freedom are statistically correlated.

We are now able to augment the scaling theory, described in Sect. 2.4, by applying it to the correlation function and correlation length. Again adopting the magnetic model used in of Sect. 2.4, suppose that near a critical point these functions can be re-expressed in terms of the scaling fields \(\theta _{T}\) and \(\theta _{\tiny {\mathcal {H}}}\); \(\bar{{\pmb {r}}}\) and \({\pmb {k}}\) can also be treated as scaling fields which, on dimensional grounds will have exponents \(-1\) and \(+1\) respectively. Then the relationships (52) between the correlation function and the response function \(\varphi _{{T}}\), together with the formula (112) derived from Ginzburg–Landau theory suggests a scaling formFootnote 41

$$\begin{aligned} \mathsf {\Gamma }(\lambda ^{-1}\bar{{\pmb {r}}};\lambda ^{y_{T}}\theta _{T},\lambda ^{y_{\tiny {\mathcal {H}}}}\theta _{\tiny {\mathcal {H}}})=\lambda ^{\upeta +d-2}\mathsf {\Gamma }(\bar{{\pmb {r}}};\theta _{T},\theta _{\tiny {\mathcal {H}}}), \end{aligned}$$
(54)

for the correlation function, and, hence

$$\begin{aligned} \mathsf {\Gamma }^\star (\lambda {\pmb {k}};\lambda ^{y_{T}}\theta _{T},\lambda ^{y_{\tiny {\mathcal {H}}}}\theta _{\tiny {\mathcal {H}}})=\lambda ^{\upeta -2}\mathsf {\Gamma }^\star ({\pmb {k}};\theta _{T},\theta _{\tiny {\mathcal {H}}}), \end{aligned}$$
(55)

for its Fourier transform. Then, from (53), the scaling form for the correlation length is

$$\begin{aligned} \upxi (\lambda ^{y_{T}}\theta _{T},\lambda ^{y_{\tiny {\mathcal {H}}}}\theta _{\tiny {\mathcal {H}}})=\lambda ^{-1}\upxi (\theta _{T},\theta _{\tiny {\mathcal {H}}}). \end{aligned}$$
(56)

From (52), (55) and (104), \(d-2y_{\tiny {\mathcal {H}}}=\upeta -2\) and, setting \(\lambda =|\theta _2|^{-1/y_{T}}\) in (56) gives, from (111)

$$\begin{aligned} \upnu =\upnu '=1/y_{T}. \end{aligned}$$
(57)

Then, from (34) and (36), \(\upnu (2-\upeta )=\upgamma\), which is the Fisher scaling law [32] and \(d\,\upnu = 2 - \upalpha\), which is the Josephson hyper-scaling law [52].Footnote 42

3.3 Transfer-Matrix Methods

As we have already shown S, \(\varPhi _1\) and \(\varPhi _2\) are all extensive functions of their extensive variables or none of them is. The message FSM–2 sent from statistical mechanics to thermodynamics is that the latter is the case, and in particular that

$$\begin{aligned} \phi _2:=\,\frac{\varPhi _2(\zeta _1,\zeta _2,N)}{N}=\phi _2(\zeta _1,\zeta _2) \end{aligned}$$
(58)

is true only as an approximation for large systems.Footnote 43 We shall now substantiate this claim by considering a particular way to develop statistical mechanical models, namely the method of transfer matrices. Although, of course, statistical mechanics can model systems of microsystems (molecules) moving, as in a fluid, through a continuum of points, transfer matrix methods are restricted to microsystems confined to the points of a lattice. In principle lattices of any dimension can be considered, but we shall, for easy of presentation, consider only the two-dimensional case. A virtue of this development is that it can be clearly seen how it unfolds as the two lattice directions in which the system gets larger and then infinite are applied separately.

Consider a square lattice, of lattice spacing \({\mathfrak {a}}\), with \(N_{{\tiny {\text{ H }}}}\) sites in the horizontal direction, \(N_{{\tiny {\text{ V }}}}\) in the vertical direction, so that \(N=N_{\tiny {\text{ H }}}N_{\tiny {\text{ V }}}\). This situation is like the one considered for finite-size scaling in Sect. 3.4.2, when extensivity can be considered separately in the two directions. Periodic boundary conditions are applied so that the lattice forms a torus with horizontal rings of \(N_{\tiny {\text{ H }}}\) sites and rings in a vertical plane of \(N_{\tiny {\text{ V }}}\) sites.Footnote 44 We suppose that the sites of the lattice are occupied by identical microsystems having \(\nu\) possible states.Footnote 45 The state of the whole system is \({\pmb {\sigma }}:=\,({\tilde{{\pmb {\sigma }}}}_1,{\tilde{{\pmb {\sigma }}}}_2,\ldots ,{\tilde{{\pmb {\sigma }}}}_{N_{\tiny {\text{ H }}}})\), where \({\tilde{{\pmb {\sigma }}}}_i\), the state of the i-th vertical ring of sites, has one of \(N_{\tiny {\text{ R }}}:=\,\nu ^{N_{\tiny {\text{ V }}}}\) values. Given that contributions to the Hamiltonian arise (at least in the horizontal direction) only between first-neighbour sites the Hamiltonian can be decomposed into interactions between neighbouring rings of sites and within rings. The latter can be distributed between interacting pairs of rings so that the Hamiltonian takes the form of the sum of contributions of interactions between rings and it is straightforward to show that the partition function is expressible in the form

$$\begin{aligned} Z_2(\zeta _1,\zeta _2,N)=\text {Trace}\{{\pmb {V}}^{N_{\tiny {\text{ H }}}}\}, \end{aligned}$$
(59)

where \({\pmb {V}}\) is the \(N_{\tiny {\text{ R }}}\)-dimensional transfer matrix with elements consisting of the exponentials of the negatives of the inter-ring interactions. Assuming that \({\pmb {V}}\) is diagonalizable,Footnote 46 it is an elementary algebraic result that its trace is equal to the sum of its eigenvalues, which in decreasing order of magnitude we denote as \(\varLambda ^{(\ell )}(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})\), \(\ell =1,2,\ldots ,N_{\tiny {\text{ R }}}\). Then, from (40) and (59),

$$\begin{aligned} \varPhi _2(\zeta _1,\zeta _2,N)=-\ln \{[\varLambda ^{(1)}(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})]^{N_{{\tiny {\text{ H }}}}} +\cdots +[\varLambda ^{\left( N_{\tiny {\text{ R }}}\right) }(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})]^{N_{{\tiny {\text{ H }}}}}\}. \end{aligned}$$
(60)

As we can see the factors \(N_{\tiny {\text{ H }}}\) and \(N_{\tiny {\text{ V }}}\) of N are ‘buried’ at different places in this expression and it is clear that the extensivity condition (58) is not satisfied and the negative aspect of the message FSM–2 from statistical mechanics to thermodynamics is justified. However, we can make some progress because, if all the elements of \({\pmb {V}}\) are strictly positive, as will usually be the case, an important theorem of Perron [104] (see also, [37, p. 64], [67, p. 673]) states that the largest eigenvalue of \({\pmb {V}}\) is real, positive and non-degenerate. This means that, in the approximation when \(N_{\tiny {\text{ H }}}\) becomes large,

$$\begin{aligned} \varPhi _2(\zeta _1,\zeta _2,N)\simeq - N_{\tiny {\text{ H }}}\ln \{\varLambda ^{(1)}(\zeta _1,\zeta _2, N_{\tiny {\text{ V }}})\} \end{aligned}$$
(61)

with extensivity achieve in the horizontal direction. Two strategies emerge at this point:

The first is to calculate an expression of the form

$$\begin{aligned} \varLambda _1(\zeta _1,\zeta _2, N_{\tiny {\text{ V }}}):=\,[\psi (\zeta _1,\zeta _2)]^{N_{\tiny {\text{ V }}}}, \end{aligned}$$
(62)

valid in the limit \(N_{\tiny {\text{ V }}}\rightarrow \infty\) and giving

$$\begin{aligned} \phi _2(\zeta _1,\zeta _2)=- \ln \{\psi (\zeta _1,\zeta _2)\} \end{aligned}$$
(63)

in the limit \(N\rightarrow \infty\). If this calculation can be carried out it is an effective proof of the existence of the thermodynamic limit,Footnote 47 which achieves complete extensivity, with free-energy density given by (63). It is, however, a strategy that has been successfully applied in only a few cases, of which Onsager’s [98] solution of the two-dimensional zero-field Ising model and Baxter’s [11] solution of the eight-vertex model are the most well-known instances.

In the absence of a complete solution as represented by (63), the strategy most often adopted is to treat \(N_{\tiny {\text{ V }}}\) as a parameter indexing a sequence of models. That is

$$\begin{aligned} \varPsi ^{(N_{\tiny {\text{ V }}})}(\zeta _1,\zeta _2):=\,\varLambda ^{(1)}(\zeta _1,\zeta _2, N_{\tiny {\text{ V }}}) \end{aligned}$$
(64)

and

$$\begin{aligned} \phi ^{(n)}_2(\zeta _1,\zeta _2)\simeq - \frac{\ln \{\varPsi ^{(n)}(\zeta _1,\zeta _2)\}}{n}. \end{aligned}$$
(65)

In the case of the Ising and similar semi-classical models it can be shown by a method due to Peierls [103] that \(\phi ^{(n)}_2(\zeta _1,\zeta _2)\) is a smooth function for all \(n>0\) which exhibits maxima in response functions. A quantitative analysis using finite-size scaling theory (see Sect. 3.4.2) shows that such maxima become increasingly steep for increasing values of n, with convergence to the singularity associated with the transition in the two-dimensionally infinite system as \(n\rightarrow \infty\). In particular to the corresponding singularities in Onsager’s solution of the two-dimensional zero-field Ising model. However, in view of the discussion later in this work it should be noted that the limiting process is singular. Although the maxima in the finite-\(N_{\tiny {\text{ V }}}\) models converge to the singularities in the \(N_{\tiny {\text{ V }}}=\infty\) model they remain of a different (non-singular) character however large \(N_{\tiny {\text{ V }}}\) becomes.

The pair correlation function and correlation length were defined in Sect. 3.2. In terms of this transfer matrix formulation it can be shown [67, Sect. 11.1.3] that in the limit \(N_{\tiny {\text{ H }}}\rightarrow \infty\)

$$\begin{aligned} \upxi (\zeta _1,\zeta _2,N_{\tiny {\text{ V }}}) \simeq - {\mathfrak {a}}{\left\{ \ln \left| \varOmega _2(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})\right| \right\} }^{-1}, \end{aligned}$$
(66)

where \({\mathfrak {a}}\), the lattice spacing, is now the distance between neighbouring rings of sites,

$$\begin{aligned} \varOmega _2(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}}):=\,{\varLambda ^{(2)}(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})} /{\varLambda ^{(1)}(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})} \end{aligned}$$
(67)

and

$$\begin{aligned} \mathsf {\Gamma }_2({\pmb {r}},{\pmb {r}}';\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})\sim \exp \{-|{\pmb {r}}-{\pmb {r}}'|/\upxi (\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})\}, \end{aligned}$$
(68)

in the limit \(|{\pmb {r}}-{\pmb {r}}'|\rightarrow \infty\), where \({\pmb {r}}\) and \({\pmb {r}}'\) lie on the same vertical ring of sites which establishes an asymptotic form for \(f_d(|\bar{{\pmb {r}}}|/\upxi )\) in (112).

The situation where \(N_{\tiny {\text{ H }}}\rightarrow \infty\) and \(N_{\tiny {\text{ V }}}\) is finite corresponds to that to be discussed in Sect. 3.4.2, below, for finite-size scaling, where here \({\mathfrak {d}}:=\,1\) and the thickness of the lattice \(\aleph :=\,N_{\tiny {\text{ V }}}\), with a maximum in \(\varphi _{T}\) and in other response functions signalling an incipient singularity.Footnote 48 The eigenvalue ratio \(\varOmega _2(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})\) can also be used as a means of detecting an incipient singularity, but in a slightly different way. Since, in Onsager’s solution for the Ising model, the largest eigenvalue is degenerate along the first-order transition line below the critical temperature [27, p. 194], we expect that \(\varOmega _2(\zeta _1,\zeta _2,N_{\tiny {\text{ V }}})\) will begin, as \(N_{\tiny {\text{ V }}}\) is increased, to form a ‘plateau’ with small (negative) slope for small temperatures. The end of this plateau, where the negative curvature is a maximum can then be construed as the location of an incipient singularity.Footnote 49 The finite-size scaling argument of Sect. 3.4.2 can be applied to all these quantities showing that the maxima converge towards the infinite-system critical value as \(N_{\tiny {\text{ V }}}\) increases. However, of course, for finite \(N_{\tiny {\text{ V }}}\) we cannot expect these locations to exactly coincide. These perceptions are given further weight by the phenomenological renormalization group procedure described in Sect. 3.4.3(c).

As we have already indicated, the use of transfer matrix methods to determine exact solutions for infinite systems leads into our discussion in Sect. 3.5.1 of the thermodynamic limit. In a similar way our account of incipient singularities resulting from an analysis of systems with \(N_{\tiny {\text{ V }}}\) finite leads into our discussion of phase transitions in finite systems is Sect. 3.6.

3.4 The Renormalization Group Method

Once it became evident, around the turn of the twentieth century that the exponents associated with a critical point, both in experimental systems and theoretical models were not those derived from classical models, like van der Waals equation, an interest developed in determining their exact values, in experimental systems and also in theoretic models, where of course it was also necessary in many cases to derive the critical temperature. Before the advent of renormalization group methods the most successful way to do this was by using high and low temperature series. These were very successful in obtaining critical temperatures and exponents at second-order critical points. However, although they can be adapted to deal with first-order transitions, this is not their main strength and they are also not designed to map out the whole picture of phase transition curves in thermodynamic space. This contrasts with the renormalization group methods developed in the late sixties–early seventies. They are able (when they work) not only to deal with critical points but also curves of first-order and second-order transitions. However, any account of these methods should be proceeded by some words of warning, like those of John Cardy. As he says [25, pp. 28–29]:

Not only are the words ‘renormalization’Footnote 50 and ‘group’Footnote 51 examples of unfortunate terminology, the use of the definite article ‘the’ which usually precedes them is even more confusing. It creates the misleading impression that the renormalization group is a kind of universal machine through which any problem may be processed, producing neat tables of critical exponents at the other end. This is quite false. It cannot be stressed too strongly that the renormalization group is merely a framework, a set of ideas, which has to be adapted to the nature of the problem at hand. In particular, whether or not a renormalization group approach is quantitatively successful depends to a large extent on the nature of the problem, but lack of success does not necessarily invalidate the qualitative picture it provides.

Here we shall concentrate solely on the approach to the renormalization group which is usually referred to as happening in ‘real space’; in contrast to the approach initiated by Wilson [128] where renormalization is performed in wave-vector space resulting in expansions in the parameter \(\epsilon :=\,d-4\).Footnote 52

The core of real-space renormalization group (RSRG) methods is the construction of a semi-group of transformations on the independent couplings, or functions thereof. There is a variety of procedures for doing this. Many are based on the block-spin method of Kadanoff [53], and another popular technique is decimation, where the states of a proportion of the microsystems is summed out of the partition function. In fact decimation applied to the one-dimensional Ising model, or related models like the Potts model, (see, e.g. [67] Sect. 15.5.1, and Sect. 3.4.3(a) below) is one of the few examples of an exact RSRG transformation. Most transformation involve approximations, which thus means that the critical exponents are approximations with, in many cases no obvious way to make improvements, unlike series methods where, in principle and often with a great deal of labour, improvements are made by extending the series.

In essence the RSRG transformation involves some fractional reduction in the number of degrees of freedom. It would, therefore, seem to follow that there must have been a prior application of the thermodynamic limit. Whether this is required for the renormalization group and, more generally, whether it is needed at all in the statistical mechanics of critical phenomena is a question that we return to in Sect. 3.5, following a brief account of the ideas involved in the RSRG.

3.4.1 General Theory

Underlying the semigroup of transformations on couplings, which is the real-space renormalization group, is a mapping from a lattice \({\mathcal {N}}\) to a lattice \({\widetilde{{\mathcal {N}}}}\). For the sake of simplicity we suppose that both are hypercubic lattices with periodic boundary conditions. Then:

  1. (i)

    The number of sites N and \(\widetilde{N}\) of \({\mathcal {N}}\) and \({\widetilde{{\mathcal {N}}}}\) are related by \(\widetilde{N}= N/\lambda ^d\), where \(\lambda >1\).

  2. (ii)

    The lattice spacings \({\mathfrak {a}}\) and \({\tilde{{\mathfrak {a}}}}\) of \({\mathcal {N}}\) and \({\widetilde{{\mathcal {N}}}}\) are related by \({\tilde{{\mathfrak {a}}}}=\lambda {\mathfrak {a}}\).

  3. (iii)

    The size of \({\widetilde{{\mathcal {N}}}}\) is reduced by a length scaling \(|{\tilde{{\pmb {r}}}}|= |{\pmb {r}}|/\lambda\).

The renormalization group is constructed by imposing onto the lattice transformation a statistical mechanical transformation. To do this we modify the Hamiltonian (44) to

$$\begin{aligned} \widehat{H}^\prime _2({\pmb {\sigma }};\zeta _0,{\pmb {\zeta }};N):=\,N\zeta _0 +\widehat{H}_2({\pmb {\sigma }};{\pmb {\zeta }};N), \end{aligned}$$
(69)

where, for reasons that will become evident below, we have added a term including a trivial coupling \(\zeta _0\) and, as in the presentation of scaling theory at the beginning of Sect. 2.4, generalized the number of non-trivial couplings from two to n, with \({\pmb {\zeta }}:=\,(\zeta _1,\zeta _2,\ldots ,\zeta _n)\).Footnote 53 The terminology ‘trivial’ signals the fact that, if in (46) \(\widehat{H}_2\) is replaced by \(\widehat{H}^\prime _2\) and \(Z_2\) by

$$\begin{aligned} Z^\prime _2(\zeta _0,{\pmb {\zeta }},N):=\,\sum _{\{{\pmb {\sigma }}\}} \exp \{-\widehat{H}^\prime _2({\pmb {\sigma }};\zeta _0,{\pmb {\zeta }},N)\}, \end{aligned}$$
(70)

then the probability density function is left unchanged and

$$\begin{aligned} \varPhi _2({\pmb {\zeta }},N):=\,-\ln \{\exp (N\zeta _0)Z^\prime _2(\zeta _0,{\pmb {\zeta }},N)\}. \end{aligned}$$
(71)

Bearing in mind the remarks of Cardy, given above, a successful application of this method depends on being able to construct relationships between the couplings \(\zeta _0,{\pmb {\zeta }}\) in the system on \({\mathcal {N}}\) and the couplings \(\tilde{\zeta }_0,\tilde{{\pmb {\zeta }}}\) in the system on \({\widetilde{{\mathcal {N}}}}\), done in such a way that the values for the couplings for \({\mathcal {N}}\) place it in a critical region if and only the same is the case for the values of the couplings for \({\widetilde{{\mathcal {N}}}}\). Since the critical properties of a system are contained within the partition function the invariance

$$\begin{aligned} Z^\prime _2(\tilde{\zeta }_0,\tilde{{\pmb {\zeta }}},\widetilde{N})=Z^\prime _2(\zeta _0,{\pmb {\zeta }},N) \end{aligned}$$
(72)

of that function is a sufficient guarantee; and this is achieved by the relationship

$$\begin{aligned} \exp \{-\widehat{H}^\prime _2({\tilde{{\pmb {\sigma }}}};\tilde{\zeta }_0,\tilde{{\pmb {\zeta }}},\widetilde{N})=\sum _{\{{\pmb {\sigma }}\}} w({\pmb {\sigma }},{\tilde{{\pmb {\sigma }}}})\exp \{-\widehat{H}^\prime _2({\pmb {\sigma }};\zeta _0,{\pmb {\zeta }},N)\}, \end{aligned}$$
(73)

where the weight function \(w({\pmb {\sigma }},{\tilde{{\pmb {\sigma }}}})\) satisfies

$$\begin{aligned} \sum _{\{{\tilde{{\pmb {\sigma }}}}\}} w({\pmb {\sigma }},{\tilde{{\pmb {\sigma }}}})=1. \end{aligned}$$
(74)

Running over the set of states \({\tilde{{\pmb {\sigma }}}}\) in (73) will, in principle, produce recurrence relationshipsFootnote 54

$$\begin{aligned} \tilde{\zeta }_j=\mathscr {K}_j({\pmb {\zeta }}),\quad j=1,2,\ldots ,n, \end{aligned}$$
(75)

and for \(\tilde{\zeta }_0\) a recurrence relationship which we choose, for convenience to express in the form

$$\begin{aligned} \tilde{\zeta }_0=\lambda ^d[\zeta _0+\mathscr {K}_0({\pmb {\zeta }})]. \end{aligned}$$
(76)

The ‘in principle’ caveat entered here is important. As we shall see it is rarely possible to implement this programme and to choose a weight function without some kind of approximation being applied. And it is frequently the case that consistency can be achieved only by increasing the value of n from its initial value. When this happens it is necessary, in order to apply repeated iterations, to back-track and for the extra couplings to be included from the start.

The importance of (76) is that it can be used, together with (71) and (72) to obtain the relationship

$$\begin{aligned} \phi _2(\tilde{{\pmb {\zeta }}})=\lambda ^d\phi _2({\pmb {\zeta }})-\lambda ^d\mathscr {K}_0({\pmb {\zeta }}), \end{aligned}$$
(77)

between the free-energy densities per lattice site at \({\pmb {\zeta }}\) and \(\tilde{{\pmb {\zeta }}}\). Then, given that (75) can be iterated to produce a sequence of points \({\pmb {\zeta }}^{(0)}\rightarrow {\pmb {\zeta }}^{(1)}\rightarrow {\pmb {\zeta }}^{(3)}\rightarrow \cdots\) in \(\varXi _2\),

$$\begin{aligned} \phi ({\pmb {\zeta }}^{(0)}) = \sum _{s=0}^{\infty }{\frac{1}{\lambda ^{sd}}}{\mathscr {K}}_0({\pmb {\zeta }}^{(s)}), \end{aligned}$$
(78)

is the free-energy density at an initial point \({\pmb {\zeta }}^{(0)}\). Although this result seems to imply the need for an infinite number of iterations, this is clearly not possible in practical computations. It is, therefore, fortunate that it is usually found that this series converges after a very few iterations, allowing densities and response functions to be calculated (see the discussion Sect. 3.5.1).

A fixed point \({\pmb {\zeta }}^\star\) of (75) is associated with either a single-phase region or a critical region \({\mathcal {C}}\) in \(\varXi _2\). To analyze its nature we linearize with \([{\pmb {\triangle }}{{\pmb {\zeta }}}^{(s)}]^{\tiny {\text{ T }}}:=\,{\pmb {\zeta }}^{(s)}-{\pmb {\zeta }}^{\star }\) to give \({\pmb {\triangle }}{\pmb {\zeta }}^{(s+1)}\simeq {{\pmb {L}}}^{\star }{\pmb {\triangle }}{\pmb {\zeta }}^{(s)}\), where \({\pmb {L}}^{\star }\) is the fixed-point value of the matrix \({\pmb {L}}\) with elements \(L_{ij}:=\,\partial \mathscr {K}_i/\partial \zeta _j\). In general \({\pmb {L}}^{\star }\) is not symmetric, with different left and right eigenvectors \({\pmb {w}}_j\) and \({\pmb {x}}_j\) for the eigenvalue \(\varLambda _j\). It can then be shownFootnote 55 that in a neighbourhood of the fixed point there exist scaling fields \(\theta _j = \theta _j({\pmb {\triangle }}{\pmb {\zeta }})\), \(j=1,2,\ldots ,n\) which are smooth functions of the couplings with

$$\begin{aligned} \theta _j(0) = 0,\qquad&\theta _j^{(s+1)} = \varLambda _j\theta _j^{(s)}, \end{aligned}$$
(79)
$$\begin{aligned} \theta _j \simeq \quad {\pmb {w}}_j\centerdot {\pmb {\triangle }}{\pmb {\zeta }}, \qquad&{\pmb {\triangle }}{\pmb {\zeta }}\simeq \sum _{j=1}^n {\pmb {x}}_j \theta _j, \end{aligned}$$
(80)

which is a realization of the relationship between scaling fields and couplings described in Sect. 2.4.

From (79) \(\theta _j^{(s+k)} = \varLambda ^{k+s}_j\theta _j^{(0)}\) and the semi-group character of this transformation implies that \(\varLambda _j=\,\lambda ^{y_j}\), for \(j=1,2,\ldots ,n\) and a set of exponents \(y_1,y_2,\ldots ,y_n\). Then, in a neighbourhood of the fixed point \({\pmb {\zeta }}^\star\) the couplings \(\zeta _j\) and \(\tilde{\zeta }_j\) in (77) can be expressed as

$$\begin{aligned} \zeta _j=\zeta _j^\star +\sum _{i=1}^n x_i^{(j)}\theta _i,\qquad \tilde{\zeta }_j=\zeta _j^\star +\sum _{i=1}^n x_i^{(j)}\lambda ^{y_i}\theta _i, \end{aligned}$$
(81)

where \({\pmb {x}}_i:=\,(x_i^{(1)},x_i^{(2)},\ldots ,x_i^{(n)})\). In (77) the function \(\mathscr {K}_0({\pmb {\zeta }})\) is regular. So in a region around \({\pmb {\zeta }}^\star\) the singular part \(\phi _{\text {sing}}(\triangle {\pmb {\zeta }})\) of \(\phi _2({\pmb {\zeta }})\), with \(\phi _{\text {sing}}(0)=0\), can be re-expressed in terms of the scaling fields to give

$$\begin{aligned} {\phi }_{\text {sing}}({\lambda }^{y_1}{\theta }_1,\ldots ,{\lambda }^{y_n}{\theta }_n) = {\lambda }^d {\phi }_{\text {sing}} ({\theta }_1,\ldots , {\theta }_n), \end{aligned}$$
(82)

which is a substantiation of (25).

3.4.2 Finite-Size Systems

This treatment of criticality, which plays an important role in our understanding of PTCP in real systems (see Sect. 4), was initiated by Fisher [33] and Fisher and Barber [34].Footnote 56 For simplicity we suppose, as in Sect. 3.3, that the system under consideration consists of N identical microsystems on the sites of a d-dimensional hypercubic lattice \({\mathcal {N}}_d\) with \(N_k\) sites in the k-direction and \(N_1N_2\ldots N_d=N\). A partially-infinite system of thickness \(\aleph :=\,[N^{({\mathfrak {d}})}]^{1/(d-{\mathfrak {d}})}\), where \(N^{({\mathfrak {d}})}:=\,N_{{\mathfrak {d}}+1}N_{{\mathfrak {d}}+2}\ldots N_d\), is obtained if the thermodynamic limit \(N_k\rightarrow \infty\) is taken only for \(k=1,2,\ldots ,{\mathfrak {d}}<d\). In a fully-finite system \({\mathfrak {d}}=0\) and \(N^{({\mathfrak {d}})}=N\). We denote the critical region in the partially-infinite system, when \({\mathfrak {d}}>d_{{\tiny {\text{ LC }}}}\), by \({\mathcal {C}}({\mathfrak {d}};\aleph )\), with \({\mathcal {C}}(d;\infty )={\mathcal {C}}\). Finite-size scaling theory can be applied both to a partially-infinite system, where there is the possibility of a critical region consisting of some kind of singular behaviour, and a fully-finite system where there is not. In a fully-finite system or a partially infinite system with \({\mathfrak {d}}\le d_{{\tiny {\text{ LC }}}}\) the critical region is replaced by:

Definition 1

For a fully-finite, or partially-infinite system with \({\mathfrak {d}}\le d_{{\tiny {\text{ LC }}}}\), a region \({\mathcal {I}}{\mathcal {S}}({\mathfrak {d}};\aleph )\) in the space of couplings is one of incipiently singularity,Footnote 57 if in the limit \(\aleph \rightarrow \infty\), it maps into a critical region \({\mathcal {C}}\) of the infinite system.

Expressed in a slightly different way a system has an incipient singularity at certain size-dependent values of it couplings if, as the system size \(\aleph\) is increased, those values converge to ones where thermodynamic functions exhibit properties that have no finite limits.

The basic assertion of finite-size scaling is that \(\theta _\aleph :=\,1/{\aleph }\) , which is a measure of the inverse of finite linear extent of the system measured in units of lattice spacing, can be treated as another scaling field with \(y_\aleph =1\), meaning that \(\theta _\aleph\) is a relevant scaling field, and \(\theta _\aleph =0\) for the infinite system. The only condition required for this is that the system is sufficiently large for the renormalization group transformation in the space of all the other couplings to be unmodified by the finite size of the system. That is to say, that the renormalized couplings can be represented in the system. For simplicity we confine our attention to the simple magnetic system used in Sect. 2.4. The critical region for the infinite system is just a critical point \(T=T_\text {c}\), \(\mathcal {H}=0\) with scaling fields \(\theta _{T}\) and \(\theta _{\tiny {\mathcal {H}}}\), given by (27), measuring departures from this point. When the system has finite thickness (\(\theta _\aleph \ne 0\)), the incipient singularity is at a different temperature, but because of the symmetry of the system still with \(\mathcal {H}=0\). Again, for simplicity, attention will be restricted to the zero-field axis where two temperatures come into play:

  1. (i)

    For a system of finite thickness \(\aleph\), \({\widetilde{T}}(\aleph )\) is the shift temperature such that, as \(\aleph \rightarrow \infty\), \({\widetilde{T}}(\aleph )\rightarrow T_\text {c}\), the temperature at which the infinite system has a singularity. If \({\mathfrak {d}}>d_{{\tiny {\text{ LC }}}}\) then \({\widetilde{T}}(\aleph )\) is also a critical temperature, but for the system of finite thickness. If \({\mathfrak {d}}\le d_{{\tiny {\text{ LC }}}}\), and in particular when \({\mathfrak {d}}=0\) and the system is fully-finite, \(T={\widetilde{T}}(\aleph )\) is a quasicritical temperature [34] which is exhibited by a maximum in the susceptibility.Footnote 58 This temperature is an example of an incipient singularity. In keeping with the other assumptions of scaling theory it is assumed that this convergence is algebraic, so, with scaling field

    $$\begin{aligned} {\tilde{\theta }}_{T}(T,\aleph ):=\,\varepsilon \left( \frac{1}{T}-\frac{1}{{\widetilde{T}}(\aleph )}\right) , \end{aligned}$$
    (83)

    the condition

    $$\begin{aligned}\tilde{\triangle }(\aleph )&:=\,\theta _{T}(T)-{\tilde{\theta }}_{T}(T,\aleph ) =\varepsilon \left( \frac{1}{{\widetilde{T}}(\aleph )}-\frac{1}{T_\text {c}}\right) \nonumber \\& =\theta _{T}({\widetilde{T}}(\aleph ))=- {\tilde{\theta }}_{T}(T_\text {c},\aleph ) \simeq C_\text {s}\aleph ^{-\upchi }\quad \text{ as } \aleph \rightarrow \infty , \end{aligned}$$
    (84)

    where \(\upchi >0\) is the shift exponent, is sufficient to ensure convergence.

  2. (ii)

    \({\mathring{T}}(\aleph )\), called the rounding temperature is an important, but rather more elusive, property of the system. It is the temperature at which the susceptibility first shows significant deviation from that of the fully-infinite system. With

    $$\begin{aligned} {\mathring{\theta }}_{T}(T,\aleph ):=\,\varepsilon \left( \frac{1}{T}-\frac{1}{{\mathring{T}}(\aleph )}\right) , \end{aligned}$$
    (85)

    it is supposed that

    $$\begin{aligned}\mathring{\triangle }(\aleph )&:=\,{\tilde{\theta }}_{T}(T,\aleph )- {\mathring{\theta }}_{T}(T,\aleph )=\varepsilon \left( \frac{1}{{\mathring{T}}(\aleph )}-\frac{1}{{\widetilde{T}}(\aleph )}\right) \nonumber \\& ={\tilde{\theta }}_{T}({\mathring{T}}(\aleph ),\aleph )=- {\mathring{\theta }}_{T}({\widetilde{T}}(\aleph ),\aleph )\simeq C_\text {r}\aleph ^{-\uptau },\quad \text{ as } \aleph \rightarrow \infty , \end{aligned}$$
    (86)

    where \(\uptau >0\) is the rounding exponent.

Fig. 6
figure 6

Scaling around the critical point C, showing the curves \({\mathring{\theta }}(T,\aleph )=0\) and \({\tilde{\theta }}(T,\aleph )=0\) of rounding and shift temperatures

Scaling around the infinite system critical point is shown in Fig. 6. Our interest in this work is in the occurrence of an incipient singularity; so henceforth the assumption is that \({\mathfrak {d}}\le d_{{\tiny {\text{ LC }}}}\).Footnote 59 Thus we have three relevant scaling fields with the critical region of the infinite system at the origin \((\theta _{T},\theta _{\tiny {\mathcal {H}}},\theta _\aleph )=(0,0,0)\). However, this is not the complete picture; in general there will be a number of irrelevant scaling fields, which parametrize the critical region and affect its asymptotic properties. For the sake of simplicity we just include the most nearly relevant.Footnote 60 of these designated as \(\theta _\star\), with exponent \(y_\star <0\). Then on the zero-field axis (82) is replaced by

$$\begin{aligned} {\phi }_{\text {sing}}(\lambda ^{y_{T}}\theta _{T},\lambda ^{y_\star }\theta _\star ,\lambda \theta _\aleph ) = {\lambda }^d {\phi }_{\text {sing}}(\theta _{T},\theta _\star ,\theta _\aleph )\, . \end{aligned}$$
(87)

As we have already seen, singular parts of thermodynamic functions like densities and response functions are obtained by differentiations with respect to the scaling fields. In particular, for the susceptibility \(\varphi _{T}\), given by (108),

$$\begin{aligned} \varphi _{T}(\theta _{T},\theta _\star ,\theta _\aleph ) = {\lambda }^{\upomega } \varphi _{T}(\lambda ^{y_{T}}\theta _{T}, \lambda ^{y_\star }\theta _\star ,\lambda \theta _\aleph )\ , \end{aligned}$$
(88)

with \(\upomega :=\,2y_{\tiny {\mathcal {H}}}-d=\upgamma /\upnu\), where \(\upgamma\) is given by (34) and \(\upnu :=\,1/y_{T}\), given in (57), is the critical exponent of the correlation length. Asymptotic behavior in a neighbourhood of the critical point, that is when \(|\theta _{T}\ll 1\), is then as usual exposed by choosing the scale parameter \(\lambda =\,|\theta _{T}|^{-1/y_{T}}\), giving

$$\begin{aligned} \varphi _{T}(\theta _{T},\theta _\star ,\theta _\aleph ) = |\theta _{T}|^{-\upgamma } \varphi _{T}(\pm 1,{\mathfrak {X}}_\star ,{\mathfrak {X}}_\aleph )\ , \end{aligned}$$
(89)

where the \(\pm 1\) branches of \(\varphi _{T}(\pm 1,{\mathfrak {X}}_\star ,{\mathfrak {X}}_\aleph )\) apply to the cases \(\theta _{T}>0\) and \(\theta _{T}< 0\), respectively, and \({\mathfrak {X}}_\star (T,\aleph ):=\,|\theta _{T}(T)|^{-y_\star \upnu }\theta _\star\), \({\mathfrak {X}}_\aleph (T,\aleph ):=\,|\theta _{T}(T)|^{-\upnu } \aleph ^{-1}\) are scaling functions. In a similar way, with \(\lambda =\,\aleph\),

$$\begin{aligned} \varphi _{T}(\theta _{T},\theta _\star ,\theta _\aleph ) = \aleph ^\upomega \varphi _{T}({\mathfrak {X}}_\aleph ^{-1/\upnu },{\mathfrak {X}}_\star ^{1/y_\star },1 )\ . \end{aligned}$$
(90)

In the thermodynamic limit \(\aleph \rightarrow \infty\), it follows from (89) that the susceptibility has the form

$$\begin{aligned} \varphi _{T}(\theta _{T},\theta _\star ,0) =A_{T}^{(\pm )}({\mathfrak {X}}_\star ) |\theta _{T}|^{-\upgamma }, \end{aligned}$$
(91)

where the amplitudes

$$\begin{aligned} A_{T}^{(\pm )}({\mathfrak {X}}_\star ):=\,\varphi _{T}(\pm 1,{\mathfrak {X}}_\star ,0), \end{aligned}$$
(92)

which are, in general, different for \(\theta _{T}>0\) and \(\theta _{T}< 0\), are dependent on \(\theta _{T}\) by virtue of the presence of the irrelevant scaling field \(\theta _\star\). This contribution will become small, as \(|\theta _{T}|^{-y_*\upnu } \rightarrow 0\) for \(|\theta _{T}| \rightarrow 0\), eventually becoming negligible for sufficiently small \(|\theta _{T}|\). The susceptibility will then display an asymptotic algebraic singularity of the form

$$\begin{aligned} \varphi _{T}\simeq A_{T}^{(\pm )}(0) |\theta _{T}|^{-\upgamma }\ ,\quad \text{ as }~~~|\theta _{T}|\rightarrow 0\ . \end{aligned}$$
(93)

The singularity is a divergence, if \(\upgamma > 0\), which is generally the case for response functions.

Given that both (89) and (90) are valid, and that a finite statistical mechanical system cannot exhibit non-analytic behaviour, whereas singular behaviour does occur at critical points in the limit of infinite system size, the scaling function \(\varphi _{T}(\pm 1,{\mathfrak {X}}_\star ,{\mathfrak {X}}_\aleph )\) in (89) must exhibit asymptotic behaviour of the form

$$\begin{aligned} \varphi _{T}(\pm 1,{\mathfrak {X}}_\star ,{\mathfrak {X}}_\aleph ) \simeq B_{T}^{(\pm )}({\mathfrak {X}}_\star )\, {\mathfrak {X}}_\aleph ^{-\upomega }. \end{aligned}$$
(94)

Since the susceptibility has maxima along the curve \(T={\widetilde{T}}(\aleph )\) of shift temperatures in Fig. 6 these maxima will be in one of the branches of \(B_{T}^{(\pm )}({\mathfrak {X}}_\star )\) with the other branch being a monotonically decreasing function of \({\mathfrak {X}}_\star\) in the vicinity of \({\mathfrak {X}}_\star =0\). Along the curve of shift temperatures, from (84), \({\mathfrak {X}}_\star ({\widetilde{T}}(\aleph ),\aleph )\simeq C_\text {s}^{-\upnu y_\star }\aleph ^{\upchi \upnu y_\star }\theta _\star\) and \({\mathfrak {X}}_\aleph ({\widetilde{T}}(\aleph ),\aleph )\simeq C_\text {s}^{-\upnu }\aleph ^{\upchi \upnu -1}\). On this curve \(\theta _\star \ne 0\), and if it is supposed that the two shift functions have the same asymptotic dependence on \(\aleph\), the shift exponent will be related to \(y_{T}=1/\upnu\) and \(y_\star <0\) by \(\upchi =y_{T}/(1-y_\star )\) with the shift amplitude \(C_\text {s}\simeq [{\mathfrak {X}}_\star ({\widetilde{T}}(\aleph ),\aleph )/{\mathfrak {X}}_\aleph ({\widetilde{T}}(\aleph ),\aleph )\theta _\star ]^\upchi\).

As already indicated finite-size corrections to the pure power-law behaviour of \(\varphi _{T}\), as described by (93), will begin to be observed whenever the system is finite (with \(\theta _\aleph :=\,\aleph ^{-1} \ne 0\)) at the rounding temperature \({\mathring{T}}(\aleph )\). It has been argued [31] that this is the temperature at which the size \(\aleph\) of the system is of the same order as the correlation length \(\xi (T)\).Footnote 61 It follows from (111) that \(|{\tilde{\theta }}_{T}({\mathring{T}}(\aleph ),\aleph )|^{-\upnu } \aleph ^{-1}\simeq C\), where C is a constant, which establishes, from (86), that \(C=\,C_\text {r}\) and the rounding exponent \(\uptau =1/\upnu =y_{T}\) with \(\upomega =\upgamma \uptau\). Thus on the basis of some plausible assumptions we have the condition \(\upchi <\uptau\), which, for large systems, motivates the disposition of the curves in Fig. 6.

3.4.3 Renormalization Schemes

The practical implementation of the renormalization group procedure in Sect. 3.4.1 involves the choice of a weight function and leads to recurrence relationships between systems related by a size parameter \(\lambda\), together with a method for calculating the free-energy density which satisfies the scaling relationship. In (a) and (b) in this section we give examples of the implementation of two of the most commonly used weight functions and in (c) we briefly outline a different scheme which, using transfer matrix methods, relates the correlation lengths of systems of different sizes.

For d-dimensional lattices, most weight functions are based on a division of the lattice \({\mathcal {N}}\) into equal blocks of \(\lambda ^d\) sites. The mapping from \({\mathcal {N}}\) to \({\widetilde{{\mathcal {N}}}}\) is given by associating each lattice site \({\tilde{{\pmb {r}}}}\in {\widetilde{{\mathcal {N}}}}\) with a blocks of sites in \({\mathcal {N}}\) denoted by \({{\mathcal {B}}}({\tilde{{\pmb {r}}}})\).

(a):

The decimation weight function. For this weight function the sites of \({\widetilde{{\mathcal {N}}}}\) consist of a subset of the sites of \({\mathcal {N}}\), chosen so that \({\widetilde{{\mathcal {N}}}}\) forms a lattice which is isomorphic to \({\mathcal {N}}\). So we can take \({\tilde{{\pmb {r}}}}\in {\mathcal {B}}({\tilde{{\pmb {r}}}})\) with

$$\begin{aligned} w({\pmb {\sigma }},{\tilde{{\pmb {\sigma }}}}):=\,\prod _{\{{\tilde{{\pmb {r}}}}\}}\delta ^{\hbox {kr}}({\tilde{\sigma }}({\tilde{{\pmb {r}}}})-\sigma ({\tilde{{\pmb {r}}}}))\, . \end{aligned}$$
(95)

The effect of this is that the summation on the right-hand side of (73) is a partial sum over all the sites of the lattice \({\mathcal {N}}\) except those of \({\widetilde{{\mathcal {N}}}}\). For a range of one-dimensional models (including the Ising and Potts models), which can be solved exactly using transfer matrix methods, exact RSRG decimation transformations can also be obtained. For the one-dimensional ferromagnetic case of the Ising model it can be shown [67, 87] that the most convenient variables are not those given in Appendix 2 but rather \(\zeta _1:=\,\tanh ([2J+H]/2T)\), \(\zeta _2:=\,\exp (-2H/T)\), and for \(\lambda =\,2\), with the partial summation in (73) over alternate sites, the recurrence relationships take the form

$$\begin{aligned} \tilde{\zeta }_1= {\frac{4{\zeta _1}^2 -(1-{\zeta _2})({\zeta _1}^2-1)}{4+(1-{\zeta _2})({\zeta _1}^2-1)}},\qquad \tilde{\zeta }_2 = {\frac{{\zeta _2}^2{(1 +{\zeta _1})}^2+{(1-{\zeta _1})}^2}{2(1+{\zeta _1}^2)}}. \end{aligned}$$
(96)

It is then not difficult to show that there is a fixed point \(\zeta _1=\zeta _2=1\) (\(T=H=0\)), with both scaling exponents equal to \(d=1\). As we saw in the discussion of scaling theory in Sect. 2.4, an exponent equal to the dimension of the system is indicative of the possibility of a first-order transition. In this case the critical point is at zero temperature on the zero field line, meaning that the first-order coexistence curve has contracted to a point coinciding with the critical point at zero-temperature. At this point there is a first-order transition across the zero-field axis with a change of sign of the magnetization. It can also be shown that the curve

$$\begin{aligned} \zeta _2=\left( \frac{1-\zeta _1}{1+\zeta _1}\right) ^2, \end{aligned}$$
(97)

which corresponds to the interaction J between microsystems being set to zero, is invariant under (96). At every point it has exponents 0 and \(-\infty\); the first of these is marginal, which indicates that the line consists of fixed points, and the latter that it is ‘infinitely attractive’ to points not on the line. The end points of the line \(\zeta _1=0\), \(\zeta _2=1\) (\(H=\infty\), \(T=0\)) and \(\zeta _1=1\), \(\zeta _2=0\) (\(T=\infty\), \(H=0\)) are fixed points in their own right in the invariant subspaces \(T=0\) and \(H=0\) respectively. The phase diagram is shown in Fig. 7. Of course, for reasons just explained, the one-dimensional Ising model is less interesting than the two-dimensional model where the ferromagnetic critical point is not at zero temperature. So, suppose that we try to carry out the same procedure in that case. A possibility is to choose blocks of two sites as shown in Fig. 8. The lattice \({\widetilde{{\mathcal {N}}}}\) consists of the black sites and the partial summation in (73) is over the spin states on white sites. But this will create an interaction between the four sites surrounding each white site. So we would need to back-track and increase n from two to three, inserting this interaction from the beginning. But this would in turn generate an interaction between nine sites. And so on. This proliferation of interactions is typical of the problems encountered with decimation. The usual trick is to cut off the proliferation at a certain level. Such an approximation for this model was investigated by Wilson [128] with a rather poor outcome compared to the known exact results.

(b):

The majority-rule weight function. This weight function was introduced by Niemeijer and van Leeuwen [90, 91]. The first step in assigning \({\tilde{\sigma }}({\tilde{{\pmb {r}}}})\) for the block \({\mathcal {B}}({\tilde{{\pmb {r}}}})\) can be described in terms of the ‘winner takes all’ voting procedure used in some democracies. Given that each microsystem has \(\nu\) states and that among the sites of \({\mathcal {B}}({\tilde{{\pmb {r}}}})\) one of the \(\nu\) state occurs more that any other, \({{\tilde{\sigma }}}({\tilde{{\pmb {r}}}})\) is assigned to have this value. If \(\nu :=\,2\) and the number of sites \(\lambda ^d\) in a block is odd this rule works; a case in point being the treatment of the Ising model on the triangular lattice with a block of nine sites (\(\lambda :=\,3\)) by Schick et al. [115]. But unless these conditions hold it is clear that the simple majority rule is not sufficient to determine \(\sigma ({\tilde{{\pmb {r}}}})\) for every configuration of the block. A ‘tie’ can occur in the voting procedure and a strategy must be adopted to deal with such cases. One possibility is to assign to \(\sigma ({\tilde{{\pmb {r}}}})\) one of these predominating values on the basis of equal probabilities. In some cases this may not, however, be the most appropriate choice. In their work on the Ising model using a square first-neighbour block (\(\lambda =\,2\)) Nauenberg and Nienhuis [86] divided the configurations with equal numbers of up and down spins between block spins up and down with equal probabilities. The rule (one of four) which they chose ensured that the reversal of all the spins in the block reversed the block spin.

(c):

Phenomenological renormalization. The idea of finite-size scaling, introduced in Sect. 3.4.2, leads quite naturally [3, Sect. IV] to the RSRG method developed by Nightingale [93]. The essential feature of finite-size scaling is that, for a d-dimensional system, infinite in \({\mathfrak {d}}\) dimensions and of thickness \(\aleph\), the quantity \(1/{\aleph }\) is treated as an additional scaling field \(\theta _{\aleph }\). If attention is restricted to the simple magnetic system with the two other scaling fields \(\theta _{T}\) and \(\theta _{\tiny {\mathcal {H}}}\), the response function \(\varphi _{T}\) satisfies the scaling relationship (88). A similar inclusion of \(\theta _\aleph\) in the scaling relationship (56) for the correlation length gives

$$\begin{aligned} \upxi (\lambda ^{y_{T}}\theta _{T},\lambda ^{y_{\tiny {\mathcal {H}}}}\theta _{\tiny {\mathcal {H}}}, \lambda \theta _\aleph )=\lambda ^{-1}\upxi (\theta _{T},\theta _{\tiny {\mathcal {H}}},\theta _\aleph ). \end{aligned}$$
(98)

With the slight change of notation \({\upxi }^{({\aleph })}(\theta _{T},\theta _{\tiny {\mathcal {H}}}):=\,{\upxi }(\theta _{T},\theta _{\tiny {\mathcal {H}}},\theta _\aleph )\), (98) can be regarded as relating the correlation lengths of two similar systems denoted by \({{\mathcal {L}}}_{{\mathfrak {d}}}({\aleph })\) and \({{\mathcal {L}}}_{{\mathfrak {d}}}({\widetilde{\aleph }})\) with couplings \(\zeta _{T},\zeta _{\tiny {\mathcal {H}}}\) and \(\tilde{\zeta }_{T},\tilde{\zeta }_{\tiny {\mathcal {H}}}\) and thicknesses \(\aleph\) and \({\widetilde{\aleph }}:=\,\aleph /\lambda\), \(\lambda >1\), respectively. The relationship (98) can be reexpressed as

$$\begin{aligned} {\upxi }^{({{\aleph }})}(\theta _{T},\theta _{\tiny {\mathcal {H}}}) =\lambda {\upxi }^{({{\tilde{\aleph }}})}({{{\tilde{\theta }}}}_{T}, {{{\tilde{\theta }}}}_{\tiny {\mathcal {H}}}), \end{aligned}$$
(99)

where

$$\begin{aligned} {{{\tilde{\theta }}}}_{T}=\theta _{T}({{{\tilde{\zeta }}}}_{T}, {{{\tilde{\zeta }}}}_{\tiny {\mathcal {H}}})= \lambda ^{y_{T}}\theta _{T}(\zeta _{T},\zeta _{\tiny {\mathcal {H}}}),\quad {{{\tilde{\theta }}}}_{\tiny {\mathcal {H}}}=\theta _{\tiny {\mathcal {H}}}({{{\tilde{\zeta }}}}_{T}, {{{\tilde{\zeta }}}}_{\tiny {\mathcal {H}}})= \lambda ^{y_{\tiny {\mathcal {H}}}}\theta _{\tiny {\mathcal {H}}}(\zeta _{T},\zeta _{\tiny {\mathcal {H}}}) \end{aligned}$$
(100)

relate the scaling fields for \({{\mathcal {L}}}_{{\mathfrak {d}}}({\aleph })\) and \({{\mathcal {L}}}_{{\mathfrak {d}}}({\widetilde{\aleph }})\). These relationships form the basis of Nightingale’s phenomenological renormalization method, where the correlation lengths for systems of the two widths are obtained from transfer matrix calculations using (66). In the case of one scaling field (\(\mathcal {H}=0\)) the method yields the critical temperature fixed point \(\theta ^\star _{T}:=\,\theta _{T}={\tilde{\theta }}_{T}\) and the thermal exponent for a number of different models [60, 93, 94, 118], which in the case where exact results are know are at a high level of accuracy.Footnote 62

Fig. 7
figure 7

The trajectory flows for the renormalization group transformation of the one-dimensional Ising model

Fig. 8
figure 8

Two site blocks for the first-neighbour Ising model on a square lattice. The lattice \({\mathcal {N}}\) consists of both white and black sites and \({\widetilde{{\mathcal {N}}}}\) of only black sites

3.5 The Thermodynamic Limit

In the development of statistical mechanics represented by the right-hand column in Fig. 1 system size appears twice. Firstly in the passage for \(\textsf {SM1}\) to \(\textsf {SM2}\), where the system becomes large yielding approximate extensivity. This is needed for the discussion of finite-size phase transitions represented by \(\textsf {SM5}\). Secondly in the other branch from \(\textsf {SM2}\), via the thermodynamic limit, to an infinite system represented by \(\textsf {SM3}\). This entails the identification of the infinite statistical mechanical system SM3 with thermodynamics, or at least the version, labelled TD3 in Fig. 1, of thermodynamics with some PTCP defined. But are SM3 and TD3 actually identical? The answer is clearly ‘no’. TD3 is the result of a development in the left-hand column of Fig. 1, from the basic structure through the assumption of extensivity to a grafting on of a picture of PTCP, in the manner of Pippard [105] or Buckingham [18]. On the other hand, as we have just indicated, SM3 is the result of a statistical mechanical development in the right-hand column in Fig. 1. It retains its microstructure with a probability distribution, and in most cases it is the result of the implementation of the thermodynamic limit for a particular model, the most well-known examples being the two-dimensional zero-field Ising model and the eight-vertex model. Thus it should be recognised for later reference (see Sect. 5.1) that this way of understanding the relation between thermodynamics and statistical mechanics involves the unwarranted conflation of two quite different pictures. Although one can argue that SM3 is an enrichment of TD3, since the former has all the features of the latter together with the extra ones provided by microstructure and precise results concerning critical values and exponents. That having been said, one may still question whether the thermodynamic limit is:Footnote 63

  1. (1)

    Necessary, in principle, because statistical mechanics is not complete without it.

  2. (2)

    Useful because calculations become much simpler in the thermodynamic limit and the relationship FSM–3 of SM3 to TD3 makes it easier to identify the order of phase transitions.

Although both of these possibilities deserve consideration it is the the first which has received the most attention, principally because of the role of the thermodynamic limit in the understanding of PTCP; this will be discussed in detail is Sect. 3.5.1.

In this work we propose, in Sect. 4, a particular view of the usefulness of the thermodynamic limit in the context of phase transitions in finite systems. However, it is pertinent to note the range of possible circumstances calling for the use of the thermodynamic limit. In particular one might suppose an additional kind of necessity interposed between the two items in our list:

(1a) Necessary in practice, because calculations for particular models are not tractable without its use.

However, of course, tractability, and hence necessity in practice, is ephemeral, evolving (one might hope) with an increase in computing power and technical ingenuity into mere usefulness.

3.5.1 Phase Transitions in Infinite Systems

The argument for the necessity in principle of the thermodynamic limit for PTCP effectively involves asserting the truth of the contradictory set of propositions:

P–IA:

PTCP occur in nature.

P–IB:

PTCP occur in nature as discontinuities in densities (first-order transitions) and as singularities in response functions (higher-order transitions).Footnote 64

P–IIA:

PTCP in thermodynamics are defined by singularities in derivatives of first or higher order in the free energies and are treated as such using scaling theory.

P–IIB:

PTCP must necessarily be represented in thermodynamics by singularities.

P–IIIA:

PTCP should be able to be modelled in statistical mechanics.

P–IIIB:

PTCP should be modelled in statistical mechanics in the same way that they are in thermodynamics.

P–IV:

Real systems are of finite size.

P–V:

Thermodynamic functions for finite systems in statistical mechanics are regular functions.

P–VI:

Thermodynamic functions for infinite systems in statistical mechanics can show singularities.

For later use it is relevant to compare this list with that of Callender [22, p. 589] (repeated by Mainwood [80, pp. 13–14]):

CP–I:

Real systems have finite [size].

CP–II:

Real systems display phase transitions.

CP–III:

Phase transitions occur when the partition function has a singularity.Footnote 65

CP–IV:

Phase transitions are governed/described by classical or quantum statistical mechanics (through [the partition function]).

A number of items in our list are indisputable and are not included in Callender’s list:

  • That PTCP are defined in thermodynamics by singularities, can be confirmed by a visit to the thermodynamics section of any academic library (P–IIA is true). Whether it is necessary for thermodynamics to be formulated in this way (that P–IIB should be accepted), given a possible denial that PTCP occur in nature as singularities (that P–IB is true) is a different question.

  • The joint assertions that thermodynamic functions are regular for finite systems but can have singularities for infinite systems (included in our list as P–V and P–VI, respectively, but not contained in Callender’s list) are facts about the mathematical structure of statistical mechanics which cause the total list to be contradictory.

And on Callender’s list:

  • It is difficult to argue that phase transitions do not occur in real systems (that P–IA (CP–II) is false), although it is plausible to deny that they arise as some kind of singularities (to argue that P–IB (not in Callender’s list) is false), on the grounds that a first-order transition (say that between liquid water and water vapour) may look like a sudden change of density, but on closer observation would turn out to be a very steep continuous change. Likewise, apparent singularities in compressibility in fluids and susceptibility in magnets may just be very steep maxima.

  • It is also difficult to argue that real systems are not finite (that P–IV (CP–I) is false), given that no system takes up the whole of the universe.Footnote 66 A sort of argument could be constructed on the basis that no system is completely isolated, but this would mean accepting the need for computation, not with an infinite system as envisaged here, but with a system joined to a complicated and largely undetermined environment.

  • If the ability to model PTCP were not deemed to be a necessary part of statistical mechanics (P–IIIA (CP–IV) is rejected), then most of the work on statistical mechanics in the last half century and more would be pointless. It is, however, relevant here to mention the work of the late Ilya Prigogine (in particular, [106]). Although, in a sense he accepts P–IIIA, it is a radically different form of statistical mechanics that he has in mind. From the assertion that “[a]s long as we consider merely a few particles, we cannot say if they form a liquid or gas” (ibid, p. 45) he concludes that “[s]tates of matter as well as phase transitions are ultimately defined by the thermodynamic limit. \(\ldots\) Phase transitions correspond to emerging properties. They are meaningful only at the level of populations and not of single particles” (op. cit.). This entails for him the reformulation of statistical mechanics so that the underlying dynamics in not that of trajectories but of measure.Footnote 67

There remain P–IB and P–IIIB, which together with P–IIB is equivalent to CP–III, and we now consider the consequences of denying one or both of them.

  1. (i)

    If P–IB is accepted, that is that PTCP in nature do occur as singularities, then it is clearly necessary for thermodynamics to represent them in this way; P–IIB must be accepted. Then we seem to be driven toward the conclusion that statistical mechanics should model them in the same way (that is the acceptance of P–IIIB) which leads back to the contradiction. This is avoided by denying P–IIIB. Then PTCP can be modelled in statistical mechanics without singularities, by, for example, transfer matrix methods, while at the same time admitting that this is not the situation in reality.

  2. (ii)

    If P–IB is denied then it can be argued either:

    1. (a)

      That it is not necessary for thermodynamics to model PTCP as singularities (P–IIB is false). In this case P–IIIB can be accepted, with PTCP modelled without singularities in statistical mechanics, with thermodynamics reformulated to do the same.

    or

    1. (b)

      That in statistical mechanics PTCP should be modelled without singularities, but because for large systems steep maxima in response functions and steep changes in densities look very much like singularities and discontinuities, it is still necessary (on the grounds of tractability and simplicity) to model PTCP in thermodynamics as singular behaviour; P–IIB is accepted and P–IIIB is rejected.Footnote 68

So given that all of P–I to P–VI are accepted is there any way out of the paradox? One radical approach, which has already been noted, is that due to Prigogine, where statistical mechanics is reformulated to ‘build in’ the thermodynamic limit.Footnote 69 Somewhat similar, but less radical, is the approach of Robert Batterman, a philosopher of physics who has written extensively on questions related to phase transitions, the renormalization group and the thermodynamic limit [4,5,6,7,8,9]. Rather than formulating a novel form of the mechanics underlying statistical mechanics, his argument, following the lead of Kadanoff [57], is that the renormalization group is itself a novel approach, revolutionary in the sense of Kuhn [62], which has the thermodynamic limit built in. His starting point is that thermodynamicsFootnote 70

is correct to represent [phase transitions] mathematically as singularities. (A: [5, p.234].)

And:

Further, without the thermodynamic limit, statistical mechanics would completely fail to capture a genuine feature of the world. Without the thermodynamic limit, in fact, statistical mechanics is incapable even of establishing the existence of distinct phases of systems. (B: op. cit.)  

If there is any doubt about his view of real systems, this is dispelled by his forthright assertion that he wants

to champion the manifestly outlandish proposal that despite the fact that real systems are finite, our understanding of them and their behaviour requires, in a very strong sense, the idealization of infinite systems and the thermodynamic limit. (C: ibid, p. 231.)

‘Outlandish’ or not his position is one which would appear, in our experience, to be that adopted implicitly or explicitly by many working physicists, including, albeit in a radical sense as indicated above, by Prigogine, and Kadanoff [55, p. 238], who asserts that the “ existence of a phase transition requires an infinite system. No phase transitions occur in systems with a finite number of degrees of freedom”. Kadanoff calls this the “extended singularity theorem” [57, pp. 154–156] because “these singularities have effects that are spread out over large regions of space” [58, p. 24]. Having asserted that

the idea that we can find analytic partition functions that “approximate” singularities is mistaken, because the very notion of approximation required fails to make sense when the limit is singular, [which it is in this case because the] behaviour at the limit (the physical discontinuity, the phase transition) is qualitatively different from the limiting behaviour as that limit is approached. (D: ibid, p. 236)

Batterman’s proposal for resolving the puzzle is to resort to the renormalization group. In the next section this possibility is examined.

3.5.2 Infinite Systems and the Renormalization Group

‘Infinity’ as it arises in accounts of renormalization group methods consists not so much in the limiting process, evident in, say, Onsager’s solution of the zero-field two-dimensional Ising model, whereby the dimensions of the system are taken to infinity, but rather in the perception that to make the method intelligible one must be working with a system which is already infinite [101, 102]. To spell this out, a renormalization group scheme consists of the following:

  1. (i)

    In the space of couplings (or of functions thereof) a semigroup of transformations is derived which generates recurrence relationships under which any critical regions are invariant.

  2. (ii)

    In this ‘dynamic system’ the critical regions are the basins of attraction of critical fixed points. And there are sinks associated with non-critical regions (phases) of the system.

  3. (iii)

    A critical fixed point determines the universality class of the system at each point in its basin of attraction, with an associated set of critical exponents.

  4. (iv)

    In general a system may be able to be in more than one universality class determined by the symmetry group of the Hamiltonian when there is a particular relationship between the couplings.Footnote 71

It is clear that this way to do statistical mechanics is very different from the standard procedures (mean-field and other classical approximations, series expansions and exact solutions). So much so that, as we have already indicated, it is characterized by Kadanoff [57] as a Kuhn-type revolution, a view endorsed by Batterman [8]. The argument presented by Batterman concerning the whole question of singularities/real singular systems/the thermodynamic limit needs to be carefully rehearsed and for this his [8] tribute to Leo Kadanoff provides the clearest account.

He presents his view in contradistinction to that of Jeremy Butterfield who contends [21, p. 1077] that: “The use of the infinite limit \(\ldots\) is justified, despite N being actually finite, by its being mathematically convenient and empirically correct (up to the required accuracy)”. For an understanding of Batterman’s view two quotes are particularly useful. In the first he asserts that:

the RG is not just a theory of the critical point, but rather it is a theory of the critical region. And this covers large but finite systems. So contrary to the line of reasoning presented [by Butterfield] the explanation of the behaviour of real finite systems requires the use of mathematical infinities, but does not require there to be infinite real systems. (E: [8, p.571].)

At this point we have cause to be grateful to a referee of his paper, who objected that this quote was actually in line with “the claims of those supporting the idea that real phase transitions aren’t sharp”. In response to this Batterman added a footnote in which he clarified his position in the following way:

It seems to me that if one is going to hold that the use of the infinite limits is a convenience, then one should be able to say how (even if inconveniently) one might go about finding a fixed point of the RG transformation without infinite iterations. I have not seen any sketch of how that is to be done. The point is that the fixed point, as just noted, determines the behaviour of the flow in its neighbourhood. If we want to explain the universal behaviour of finite large systems using the RG, then we need to find a fixed point and, to my knowledge, this requires an infinite system. (F: op. cit.)

So to summarize his view (using the labelled quotes AF, given above):

  1. (a)

    Phase transitions are real discontinuities in experimental systems (A). [An acceptance of P–IA,B and P–IIA,B].

  2. (b)

    The thermodynamic limit is needed in statistical mechanics to exhibit phase transitions (B). [An implicit acceptance of P–V and P–VI and an endorsement of P–IIIA,B.]

  3. (c)

    Real systems are finite but in order to understand them we need the idealization of infinite systems and the thermodynamic limit (C). [An endorsement of P–IV, and more.]

  4. (d)

    The idea that the study of large systems can play a role here is wrong because the properties of large systems and infinite systems are qualitatively different (D).

  5. (e)

    To represent the situation correctly we need to engage with mathematical singularities but not real infinite systems (E).

  6. (f)

    We need infinite iteration (of the RG transformation) to obtain fixed points (and all the information they provide) (F).

And to summarize the summary of Batterman’s position:

Although phase transitions in real systems are accompanied by singular behaviour, and in statistical mechanical models this singular behaviour is exhibited only by infinite systems, we don’t need infinite systems, just the use of mathematical singularities, these being required to derive the fixed points in renormalization group calculations.

At this point we wish to challenge the last part of this statement by providingFootnote 72 the ‘sketch’ that Batterman (quote F) requires of the means of the determination of renormalization group fixed points.

The first thing to note is that the recurrence relationships (75) and (76) are derived (almost always with some approximations involved) between the couplings of two finite systems with sizes N and \(\widetilde{N}\) with \(N/\widetilde{N}=\lambda ^d>1\). Once this is done no point in the space of couplings is intrinsically associated with a system of a particular size and, by the same token, fixed points, obtained from (75) with \(\tilde{\zeta }_j=\zeta _j\), are not associated with infinite systems. However, if we were to choose to associate a particular system-size N with the first point of a trajectory, it would be necessary to assume only that we are working with a system large enough to allow the required number of iterations.Footnote 73 (Hence the inclusion of SM2 in the path from SM1 to SM4 in Fig. 1.) As Norton [97, p. 222] says, fixed points are the “limit points” of the sequences generated by the recurrence relationships; the “mathematical pegs on which to hang limit properties” which are never reached in a finite number of iterations. They do not arise from an investigation of the properties of infinite limit systems, and, although they are properties of the transformation, iteration is not always needed for their determination. In some simple cases, like the one-dimensional Ising model described in Sect. 3.4.3, the fixed points can be extracted by direct analytic solution of the fixed point equations. But, in more complicated cases numerical computation comes into play. Although in principle iteration of the recurrence relationships starting from a point in the basin of attraction of a fixed point will generate a sequence of points approaching the fixed point, this is not usually a viable strategy for their determination. Since those of greatest interest, associated with critical regions, have both irrelevant directions of attraction within the critical region (the basin of attraction) and relevant directions along which the trajectory is driven away from the critical region. Except in special cases it is difficult to start a trajectory in a critical region, but nearby points are useful and possible. Then the trajectory will hover near the critical fixed point before it moves away to the sink associated with the phase containing the trajectory. These ‘hover points’ can be spotted by inspection of the computer output and used as initial guesses for a numerical solution of the fixed point equations. These kinds of numerical techniques, used also to map the critical regions themselves, provide a good picture of the whole phase diagram. And linearization of the recurrence relationships about the fixed points allows the critical exponents to be obtained.

3.6 Phase Transitions in Finite Systems: Mainwood’s Proposal

Given, as we have concluded in the previous section, that the thermodynamic limit is not necessary to enable renormalization group calculations to provide the PTCP structure, is it still useful in other statistical mechanical treatments of PTCP? An assessment of usefulness, as distinct from necessity, is obviously heavily influenced by the position adopted with respect to whether PTCP occur in nature as singularities (P–IB). If it is false and real systems, by virtue of their size (\(\sim 10^{23}\) microsystems) exhibit behaviour approximating to singular behaviour, in the sense, say, that the maximum in the compressibility of a fluid is experimentally indistinguishable from a singularity, then we have the means to remove the contradiction in the set of statements at beginning of Sect. 3.5.1. One way would be to deem it unnecessary for PTCP to be treated as singularities in thermodynamics (a denial of P–IIB). Although this would allow thermodynamics and statistical mechanics to be modelled in the same way (for P–IIIB to be accepted) we would argue, for the reasons given in Sect. 4, that it is not a tenable possibility.

The alternative, which is the one discussed in this section, and which is favoured by ourselves, is to accept that thermodynamics must represent PTCP in terms of singularities (P–IIB) on the basis that this is an appropriate approximation to real systems. Thus rejecting the assertion that thermodynamics and statistical mechanics must model PTCP in the same way (P–IIIB), since statistical mechanics models phase transitions in finite systems. Given that real systems are very large (in terms of the number of microsystems) and finite, with phase transition giving the appearance, but not the exact reality of singularities, can calculations avoid using the thermodynamic limit? Or, more generally can recourse to a system where PTCP occur as singularities be avoided? Here we examine a proposal of Mainwood [80] which definitely answers the question in the negative and in the next section we propose an answer which is more nuanced.

The definition of a phase transition provided by Mainwood (ibid, p. 28) canFootnote 74 be described in the following way. For a statistical mechanical system \({\mathfrak {S}}_{N}\) of size N with partition function \(Z_2(\zeta _1,\zeta _2,N)\), the free energy \(\varPhi _2(\zeta _1,\zeta _2,N)\) is given by (40) and satisfies (5) and (6).Footnote 75 Suppose that the thermodynamic limit

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\varPhi _2(\zeta _1,\zeta _2,N)}{N}= \phi _2(\zeta _1,\zeta _2) \end{aligned}$$
(101)

exists, with \(\phi _2(\zeta _1,\zeta _2)\) the free-energy density of the system \({\mathfrak {S}}_\infty\). Then:

Definition 2

\((\zeta _1,\zeta _2)\) is a point with a particular criticality for \({\mathfrak {S}}_{N}\) iff \((\zeta _1,\zeta _2)\) is a point where \({\mathfrak {S}}_\infty\) has a singularity associated with this same criticality.

And Mainwood (ibid, p. 29) asserts that:Footnote 76

Rather surprisingly, using this definition it is possible to hold on to all of Callender’s four statements [(given above as CP–I to CP–IV)] without contradiction; though only in a Pickwickian sense—it is a “trick” possible only due to his choice of wording. Namely, the singularity referred to in [CP–III] is one not in the partition function [of \({\mathfrak {S}}_{N}\)] but in [the partition function of \({\mathfrak {S}}_\infty\)].

If this is regarded as a positive point in favour of Mainwood’s definition, the overall conclusion seems to be more mixed. Mainwood ‘worries’ that:Footnote 77

  1. (1)

    The definition means that a phase transition can be predicted in a finite system, however small it might be (ibid, p. 32).

  2. (2)

    “While there exist standard procedures for taking the thermodynamic limit, \(\ldots\) these procedures are human inventions, and choices could have been made differently. \(\ldots\) The definition of a phase transition thus seems arbitrary in a disastrous sense: we can choose whether one is occurring or not by modelling it differently, or taking the limit according to a different scheme” (ibid, p. 31).

  3. (3)

    “[T]he facts we need to decide whether or not [a physical system] is undergoing a phase transition should be physical facts, about actual states of affairs \(\ldots\) They should not exist only in an idealized model on a theoretician’s blackboard” (ibid, p. 29).

Although Mainwood adds (1) as a final difficulty it is probably the one which would first spring to mind, since the definition would imply a phase transition in an Ising model of four spins in a square at the critical temperature given by Onsager’s solution. Mainwood thinks that “this bullet can and should be bitten” (ibid, p. 32), but the consequences are not, we think, ones which would recommend themselves to any working physicists; not to put too fine a point on it, they would bring chaos to discussions of critical phenomena. The tractable alternative, also suggested by Mainwood, is to restrict the definition to large systems.Footnote 78 This would seem to us to be an inevitable step, but it also has consequences which we discuss in more detail below.

At one level both (2) and (3) are examples of the standard concern with respect to modelling, namely that we may not have a very good model which is not giving results which agree with experiment. And Mainwood’s response to this is, as would be expected, that we should find a better model. But worry (3) also contains a second element, namely that his definition contains the use of a counterfactual, an idealized infinite model. His argument here is more complex and draws on a strong parallel with Lewis’s [72] analysis of counterfactuals. On this basis he argues that

it is the character of [the real finite system] that determines the nature of the infinite system that we then consider. When we draw conclusions about the nature of the phase transitions, they are conclusions about the character of [the real finite system], but by reference to the infinite model we can express them in a concise and illuminating form (ibid, p. 30).

However we have worries of our own which do not seem to concern Mainwood. These can best be described by considering the transfer matrix treatment in Sect. 3.3, where, if we restrict attention to the two-dimensional square-lattice spin-\(\frac{1}{2}\) Ising model in zero field, the exact critical temperature is known for the model on an infinite lattice (see Appendix 2). To apply the transfer matrix method (see Sect. 3.3), the square lattice is taken to have \(N_{{\tiny {\text{ H }}}}\) sites in the horizontal direction and \(N_{{\tiny {\text{ V }}}}\) sites in the vertical direction, so that \(N=N_{\tiny {\text{ H }}}N_{\tiny {\text{ V }}}\). Periodic boundary conditions are applied so that the lattice forms a torus with horizontal rings of \(N_{\tiny {\text{ H }}}\) sites and rings in a vertical plane of \(N_{\tiny {\text{ V }}}\) sites. It is assumed that the system is large in the horizontal direction, so that, parameterized by \(N_{\tiny {\text{ V }}}\), we have a sequence of one-dimensional models of increasing complexity. Each exhibits a maximum in the heat capacity, including the simplest case \(N_{\tiny {\text{ V }}}=1\) [27, p. 166].Footnote 79 These maxima (although they will differ slightly for all \(N_{\tiny {\text{ V }}}\) however large and finite) are taken as incipient singularitiesFootnote 80 and for increasing \(N_{\tiny {\text{ V }}}\) show good agreement with the Onsager result, which is the case \(N_{\tiny {\text{ V }}}=\infty\).

However, the prescription to be applied by the Mainwood proposal is that their critical temperatures, for all \(N_{\tiny {\text{ V }}}\), are the Onsager value. This would seem to us to reverse the order of the way of working of physicists. We think it is probably true to say that, with notable exceptions like Kadanoff [56,57,58], physicists involved in model calculations do not consider whether their interest is in very large systems or infinite systems. Their concern is whether a phase transition occurs. If they suppose that it does, one toolFootnote 81 to determine its location is to use transfer matrix calculations [12, 65, 99, 100, 111, 112]. The method is to determine incipient singularities for as large a vertical width of system as possible as an estimate for the transition temperature for a very large/infinite width. Here one cannot use Mainwood’s prescription to assign the infinite-width result to the finite-width systems, since the former is not known.Footnote 82 When, as in the case of the zero-field spin-\(\frac{1}{2}\) Ising model, the infinite-width result is known exactly or has been determined to a good approximation by series methods, the motivation for determining finite-width results is to test the efficacy of the method, or to cross-check with other results.

In his discussion of Mainwood’s proposal Butterfield [21, p. 1130] states it in a more restricted form. Again using our notation this is:

Definition 3

A phase transition occurs in \({\mathfrak {S}}_{N}\) iff \({\mathfrak {S}}_\infty\) has non-analyticities.

This Mainwood–Butterfield proposal has the advantage that it doesn’t project a result from the infinite system onto finite systems of any size (or maybe onto just large-size systems). However, given that it asserts the existence of a phase transition in a finite system of any size N, where does this occur? At the maximum of one of the response functions (heat capacity or susceptibility/compressibility), or by extraction from the behaviour of the ratio of the two largest eigenvalue of the transfer matrix? These will all give different results, as will also the results of taking the limits in different ways and for differing numbers of dimensions, all of which in turn will differ with N. If all these values are taken to be estimates of some ‘true’ value will this be N-dependent or the same for all N, including presumably \(N=\infty\), when we would be back with the problems of Mainwood’s original proposal?

4 Phase Transitions in Large Systems: Our Proposal

As we shall see, our discussion in previous sections of the structure of thermodynamics and of statistical mechanics in general, and of PTCP in particular, will allow us to paint a more nuanced and quantitative picture of their relationship than that provided by previous approaches. In particular we are concerned with the role in that relationship played by large finite systems. Mainwood suggests that we ‘bite the bullet’ by countenancing the possibility of phase transitions in small systems. However, we suggest that he is proposing to bite the wrong bullet. The one which should be bitten is the need for a criterion giving a demarkation in system size between small systems and large systems, and our proposal, which uses the discussion of finite-size scaling in Sect. 3.4.2, is intended to encompass this need.

Thermodynamics, on the one hand, characterises PTCP in terms of singularities of thermodynamic functions, which may occur at special values of externally controllable parameters. This characterisation appears, at first sight, to be warranted by the phenomenology of phase transitions as they are observed in nature–apparent discontinuities of thermodynamic functions at first-order phase transitions, and apparent algebraic singularities of thermodynamic functions including divergent response functions at second-order phase transitions. In statistical mechanics, on the other hand, singularities of thermodynamic functions can emerge only in the limit of infinite system size. As realistic systems are clearly of finite size, this creates an internal inconsistency in the list P–I to P–VI of propositions given above, if indeed the characterisation of PTCP as they occur in nature in terms of singularities (that is proposition P–IB) is accepted.

Our aim now is to present an argument, based on the account of finite-size scaling in Sect. 3.4.2, which shows that this inconsistency can be resolved within statistical mechanics and in a fully quantitative manner. In Sect. 3.4.2, and also here, discussion is restricted to a system with a thermal coupling \(\theta _{T}\) and a magnetic coupling \(\theta _{\tiny {\mathcal {H}}}\), in the cases where (i) it is fully-finite with thickness \(\aleph\) and (ii) it is fully-infinite with \(\aleph =\infty\). In case (ii) on the zero-field axis \(\mathcal {H}=0\), \(\theta _{\tiny {\mathcal {H}}}=0\) there is a critical temperature \(T=T_\text {c}\) with \(\theta _{T}=0\) where response functions are singular. There is no singularity in the finite system but maxima appear in the response functions. We now summarize the relevant conclusions of finite-size scaling:

FSS–I:

In the thermodynamic limit \(\aleph \rightarrow \infty\) when \(\theta _{T}\) is small, but not infinitesimal, the asymptotic form for the susceptibility at \(T=T_\text {c}\), given by (91), has a singular component with exponent \(\upgamma\), but amplitudes which, by virtue of the presence of an irrelevant field \(\theta _\star\), are dependent on \(\theta _{T}\).

FSS–II:

As \(\theta _{T}\rightarrow 0\), the influence of \(\theta _\star\) becomes negligible and the susceptibility exhibits a pure power-law singularity at \(T=T_\text {c}\) as described by (93).

FSS–III:

When \(\aleph\) is finite there is no singular behavior and two temperatures are defined:Footnote 83 the shift temperature \({\widetilde{T}}(\aleph )\) where the susceptibility has a maximum and the rounding temperature \({\mathring{T}}(\aleph )\) at which the profile of the susceptibility in the finite system begins to diverge from that in the infinite system.

FSS–IV:

Assuming, as in (85) and (86), that \(|T_\text {c}-{\widetilde{T}}(\aleph )|\sim {\mathcal {O}}(\aleph ^{-\upchi })\) and \(|{\mathring{T}}(\aleph )-{\widetilde{T}}(\aleph )|\sim {\mathcal {O}}(\aleph ^{-\uptau })\), it can be shown that the shift exponent \(\upchi =[\upnu (1-y_\star )]^{-1}\) and the rounding exponent \(\tau =\upnu ^{-1}\); that is that the rate of convergence of both the incipient singularity and the range of influence of finite-size effects around the incipient singularity are determined by exponents present in the infinite system.

This renormalization group scaling approach to the description of critical phenomena thus explains in a quantitative way, how singularities that might occur in infinite systems are smoothed out by finite-size effects. This, being fully in line with the fundamental observation that statistical mechanical systems of finite size cannot exhibit any singularities, resolves the inconsistency in the list of propositions P–I to P–VI. In particular FSS–IV gives a quantitative measure of the deviations of critical phenomena, as observed in finite systems, from the behaviour expected for infinite system size. From (89), deviations from critical behaviour characteristic of the infinite system will be observable in a narrow region around the infinite system critical point. This, however, is precisely the region, where one would stand the chance of observing asymptotic singular behaviour, as only in this region is the influence of irrelevant scaling fields on PCTP expected to be sufficiently small. In order to observe asymptotic critical singularities it is thus required that \(|\theta _{T}|\) be sufficiently small to keep corrections to asymptotic critical singularities due to irrelevant scaling fields under control, but also not too small, in order to prevent finite-size corrections from becoming significant. As the range of \(\theta _{T}\) within which finite-size corrections dominate critical behaviour shrinks with system size \(\aleph\) like \(\aleph ^{-1/\nu }\), one has to choose systems sufficiently large in a quantitatively well-defined sense in order to be able to observe asymptotic critical singularities characteristic of the respective universality class of a system.

In the context of the list P–I to P–VI of propositions, it is important to realise that the characterisation of PTCP in terms of singularities of thermodynamic functions constitutes an extrapolation of empirical observations, as properly establishing the existence of a discontinuity of a thermodynamic function would require experimental control of infinite precision, while establishing a divergence of a response function would require an actual measurement of an infinite quantity. Neither requirement can conceivably be met in any realistic experiment. Given that realistic systems contain \({\mathcal {O}}(10^{23})\) constituents, the linear dimension \(\aleph\) of such systems, measured in terms of atomic distances, is very large and the temperature range over which finite-size corrections to singular behaviour would manifest themselves, will be very small. It is thus understandable that such effects have been beyond experimental resolution.Footnote 84 On the other hand, in computer simulations of statistical mechanical systems, one can handle only relatively small systems, and finite-size roundings of critical singularities are therefore quite prominent. In such situations such roundings, as predicted (and captured) by finite-size scaling are indeed observed and routinely used to extract asymptotic critical exponents from finite-size data [14]. The renormalization group and its formulation of finite-size scaling theory thus predicts in a quantitative way, both, the emergence of critical singularities, described as pure power-law singularities sufficiently close to an infinite system critical point,Footnote 85 and their shifting and rounding in systems of finite size.

Fig. 9
figure 9

Isothermal curves of magnetization density plotted against the field coupling. System size increases from the broken to the chain to the dotted curves with the infinite system represented by the continuous line

According to our definition of an incipient singularity (Definition 1, above) such will occur in a finite system at certain values of their external parameters, if at those values thermodynamic functions exhibit properties that have no finite limits as the system size \(\aleph\) is increased. This could be a steep increase in the the slope of magnetization as a function of the external field across the zero-field axis at low temperatures, as shown in Fig. 9, which is indicative of the possibility of a first-order transition in the infinite system. Or it could be the size-dependent height of the maximum of a response function as shown in (90) with \(\upomega >0\), which is indicative of the possibility of a second-order transition in the infinite system. However it is important to note that an assertion of the occurrence of a incipient singularity in a finite system can never be made with absolute certainty by looking at the behaviour of a single system of any fixed finite size, but only by comparing the behaviour of systems of different sizes. That said, our investigations have now provided us with a well-defined notion of a large system:

Definition 4

For a system to be counted as large it must be big enough to exhibit a range of values of a thermodynamic variable (for example, the temperature) within which the following two phenomena can both be avoided:

  1. (i)

    the corrections to scaling (due to the existence of non-zero irrelevant scaling fields) which require the system to be close to an incipient singularity,

  2. (ii)

    the noticeable finite-size corrections in a close neighbourhood of an incipient singularity (due to a finite value of \(\aleph\)), which requires the system to be sufficiently far away from an incipient singularity.

Although, as we saw above, these two conditions pull in opposite directions this tension will become less acute as the system size increases. For such systems incipient singularities will be observable in a range of temperatures (or couplings), which are described by the asymptotic critical exponents of infinite systems. These exponents describe incipient singularities which will never fully materialize in a system of finite extent. They do, however, provide an economy of description, and lead to a classification of systems according to their universality class, as described earlier. Quite often the full complexity of the crossover between behaviour described by asymptotic critical exponents and finite-size rounding of thermodynamic functions is far beyond the capabilities of available analytic tools. Taking the thermodynamic limit in a statistical mechanical analysis of a system is also often,Footnote 86 the only way to carry the calculation through to its end.

The renormalization group approach to PTCP actually plays a dual role in the analysis of critical phenomena.Footnote 87 On the one hand it provides micro-reductive methods, firmly embedded in the arsenal of techniques of statistical mechanics, to evaluate critical exponents for given statistical mechanical systems, albeit in most cases only approximately. On the other hand it embodies a new way of looking at such systems, by describing statistical properties of systems at different length scales. It is this radically new way of analysing systems which allows it to put systems with different microscopic properties into a common context, which in turn leads to the identification of fixed points and their basins of attraction as universality classes, thereby revolutionizing the analysis of critical phenomena.

It is perhaps appropriate to add a final twist. Asymptotic critical exponents characterising singularities at phase transitions as they would occur in infinite systems, including exponents that describe corrections to scaling due to irrelevant scaling fields, are obtained from the eigenvalues of a renormalisation group transformation that is linearized in the vicinity of (one of) its fixed points. They are thus obtainable without ever touching or contemplating systems of infinite size! As we have seen in our discussion above, these critical exponents also govern the way in which finite-size corrections to critical phenomena manifest themselves. In some sense, therefore, it would be fair to say that critical exponents are bona-fide properties of finite systems—rather than, as mostly discussed, simply properties of potentially infinite systems.

The aim of our analysis has been to eliminate some of the confusion that has characterised much of the discussion surrounding PTCP in the philosophical (and physics) literature. To summarize our position:

  • It cannot be denied that phase transitions occur in nature. (P–IA is accepted).

  • The assertion that they are characterized by singularities is an unwarranted extrapolation of empirical findings. (P–IB is rejected). (Asserting the existence of a singularity in an experimental result requires infinitely precise experimental control, or an actual ‘measurement of the infinite’, which is clearly infeasible.)

  • Within thermodynamics, there is no choice but to describe phase transitions in terms of singularities. (That is, P–IIA and P–IIB are valid statements about the structure of thermodynamics). Equations of state either have unique solutions – in which case there is no phase transition – or they may exhibit bifurcations in their solution manifolds, in which case singularities and discontinuities arise.

  • Phase transitions, as they occur in nature, are correctly described by statistical mechanics, the renormalization group and finite-size scaling. Thermodynamics, on the contrary, is fundamentally incapable of an adequate description as it is, from the outset, conceived as a theory of infinitely large systems. (P–IIIA is accepted but P–IIIB is rejected).

  • Investigating systems in the limit of infinite system size provides added value in that it allows one to (i) identify exact asymptotic power laws, which the incipient singularities would follow if system sizes could be taken arbitrarily large, (ii) provide a classification of systems according to their universality class.

5 After-Thoughts on Reduction and Emergence

Figure 1 is a diagrammatic attempt to encapsulate the relationship between thermodynamics including scaling theory, and the Gibbsian version of statistical mechanics including various approaches to PTCP: the use of the thermodynamic limit, the renormalization group and phase transitions in finite systems. Apart from the formal links spelled out as messages FSM–1, \(\ldots\), FSM–4 from statistical mechanics to thermodynamics and the connecting relationships FTD–1, \(\ldots\), FTD–3, provided by thermodynamics to statistical mechanics, there is another element of collaborative interaction, as shown in Fig. 1, in the direction from statistical mechanics to thermodynamics; specifically from the renormalization group to scaling theory. This has two aspects substantiation and enrichment:

  • The Kadanoff scaling relationship (25) is introduced as a hypothesis, which is substantiated as (82) in renormalization group theory.

  • Scaling about an arbitrary origin in a critical region with relevant and irrelevant directions is a consequence of scaling theory. This picture is enriched in renormalization group theory, where scaling origins are not arbitrary, but fixed points of the flow determined by the recurrence relationships, and corresponding to different universality classes. Relevant and irrelevant directions correspond to directions in which a fixed point is repulsive and attractive to the flow. Following a trajectory as it approaches one fixed point, but is finally repulsed towards another, is an example of crossover between different types of critical behaviour, that is between different universality classes.

Having spelled out a picture of the anatomy of thermodynamics and statistical mechanics, as well as the relationships between their different parts, we can now ask what consequences this has for our understanding of reduction and emergence as regards PTCP. The literature on reduction and emergence is vast, even when restricted to the specific area of PTCP. So reviewing and discussing this entire literature is beyond the scope of this work,Footnote 88 with a more extensive treatment being the subject of a future paper. Our aim in this section is simply to sketch the main contours of the lie of the land in the light of the picture we have developed, hoping that this will serve as a springboard for further discussions.

To aid our account, we introduce the following terminology. Let \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) and \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) be two theories, where ‘\({\small {\text{ C }}}\)’ stands for ‘coarse’, meaning less detailed, and ‘\({\small {\text{ F }}}\)’ stands for ‘finer’, meaning more detailed.Footnote 89 Intuitively, \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) is the theory that is supposed to be reduced to \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\). In the terminology that has become standard in the philosophical literature on the topic, \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) is supposed to be the reducing theory and \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) is supposed to be the reduced theory. We say ‘supposed to be’ because this is what reductionists would expect. The question is whether this expectation bears out, and if so in what sense of reduction.

Accounts of reduction might be divided into two broad families, called limit reduction and deductive reduction.Footnote 90 We now have a look at each in turn and consider whether they can account for the relation between \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) and \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) that emerges from our account.

5.1 Limit Reduction

The core idea of limit reduction is that \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) reduces to \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) if the former turns out to be a regular limit of the latter. An example of such a reduction is letting the parameter c, the speed of light in the special theory of relativity, tend toward infinity and thereby recovering classical Newtonian mechanics [89].Footnote 91 In general, let us call the relevant parameter \(\alpha\); the limit, denoted as \(\lim _\alpha\), can be toward any value of \(\alpha\), the most frequent cases being \(\alpha \rightarrow 0\) and \(\alpha \rightarrow \infty\). Batterman [10] adds the further requirement that the limit be regular, which means that the relevant formulae in \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) approach the relevant formulae in \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) smoothly as the parameter approach the relevant limit value.Footnote 92 Taking these elements together yields the following:

Definition 5

Limit Reduction

\({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) limit-reduces to \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) iff \(\lim _{\alpha } {\mathfrak {T}}_{{\tiny {\text{ F }}}} = {\mathfrak {T}}_{{\tiny {\text{ C }}}}\) and the limit is regular.

This definition plays an important role in the discussion about the reduction of PTCP because \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) is commonly associated with thermodynamics and \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) with statistical mechanics. The failure of the limit to be regular as the number of microsystems tends to infinity is then seen as an indication that reduction fails.

How does this argument play out in our scheme? To answer this question we first need to identify certain elements in Fig. 1 with \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) and \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\). There are two possibilities:

  1. (a)

    Work within the renormalization group \(\textsf {SM4}\). In this case, as described in Sect. 3.4.1, the limiting process is implemented by the renormalization transformation which applies a succession of reductions in the number of lattice sites. This reduces the fluctuations (and correlation length) away from critical regions, but leaves the essential statistical mechanical structure intact.

  2. (b)

    Apply the infinite system limit \(\textsf {SM2}\rightarrow \textsf {SM3}\). Away from critical regions this removes fluctuations in the uncontrolled extensive variables, but leaves the microstructure and the probability distribution intact.

However, neither of these is a reduction to a version of thermodynamics. Both (a) and (b) are procedures lying entirely within statistical mechanics. That having been said, (b) is probably the closest to the above idea of reduction. However, while it uses the thermodynamic limit, that limit does not take the system to a thermodynamic system, but to an infinite statistical mechanical system (SM3). To arrive at thermodynamics it is necessary to conflate SM3 with TD3. While SM3 like TD3 contains the singular characteristics deemed necessary (by some) for the occurrence of phase transitions it also has a microstructure which is lacking in TD3.

So there is no part of Fig. 1 which involves the kind of limit that would ground a limit reduction. However, far from being a problem, this is simply irrelevant to the issue of the reduction of PTCP. As we have indicated in Sect. 4 the role of the thermodynamic limit is, in the first instance, to provide a condition for maxima in response functions to be incipient singularities; some finite systems do not show PTCP no matter how large they become. In the second instance it provides the critical exponents that can be regarded as properties of the real system. Limits and renormalization group techniques are classification tools that enable us to separate phase transitions into different universality classes.

5.2 Deductive Reduction

This notion of reduction is closely associated with Nagel. The broad idea is that \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) is reduced to \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) if the laws of \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) are deducible from the laws of \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) and some auxiliary assumptions. A mature formulation of this idea, known as the Generalised Nagel-Schaffner Model of Reduction, is as follows:Footnote 93

Definition 6

Deductive Reduction

\({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) reduces to \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) iff there is a corrected version \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) of \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) such that:

  1. (i)

    Connectability: If \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) contains terms that do not appear in \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\), then for every such term there is a bridge law connecting it to a term in \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\).

  2. (ii)

    Derivability: Given the associations in (i), \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) is derivable from \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) plus bridge laws and, possibly, some auxiliary assumptions.

  3. (iii)

    Strong analogy: \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) and \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) are strongly analogous to one another.

As a simple example, consider the derivation of the perfect gas law \(PV=NT\) (given as the second of equations (8)) from the kinetic theory of gases. Here the perfect gas law is \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) and the kinetic theory is \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\). \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) contains the term ‘temperature’, which is not in \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\). The bridge law \(T:=\,2U\varepsilon /(3N)\) (which is the first of equations (8)) with the internal energy U identified as the expectation value of the kinetic energy of the gas, connects this term to \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\). \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) is the version of the perfect gas law in which, subject to the physical constraints on the system, P, V, and T are variables that can fluctuate (something they cannot do in \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\)). \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) and \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) are strongly analogous in that fluctuations are small (to the point of being negligible) in contexts in which \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) is applied.

The introduction of \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) is a concession to practice. Ideally one would be able to derive \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) from \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\), but that is usually not possible. So one rests content with deriving a theory \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) that is not identical with, but strongly analogous to, \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\). What does it mean to be strongly analogous? Schaffner [114] blocks the worry that an appeal to strong analogy is an entry ticket to ‘anything goes’ by imposing the following two conditions:

  1. (a)

    \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) corrects \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) in that \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) makes more accurate predictions than \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\).

  2. (b)

    \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) is explained by \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) through \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) being a deductive consequence of \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) and \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) being strongly analogous to \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\).

With this in place, we can now ask whether the above schema indicates that a deductive reduction is taking place. For this we first need to know which theories are in play: what is reduced to what? Since we are interested in a reduction of PTCP, we should focus on a version of thermodynamics with PTCP in it. So we set \({\mathfrak {T}}_{{\tiny {\text{ C }}}}:=\,\) TD3. Then it might seem tempting to choose SM3 as the reducing theory because, in Fig. 1, TD3 ‘communicates’ with SM3. This, however, would be the wrong choice. What we are interested in is the reduction of thermodynamics to a fundamental theory of large systems, and this is SM2. This is because SM2 contains the fundamental principles of statistical mechanics with the only added assumption being that systems are large; so \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) \(:=\,\) SM2 is appropriate. SM3, by contrast, contains a limit assumption which does not belong in the fundamental theory. So the task we set ourselves here is to check whether the reduction of \(\textsf {TD3}\) to \(\textsf {SM2}\) fits the mould of deductive reduction. We shall argue that it does and, to this end, we now consider this contention in relation how to the elements (i)–(iii) of Def. 6 play out in Fig. 1.

For (i):

connectivity requires a number of bridge laws. We have avoided this designation for the relationships FTD–1, FTD–2 and FTD–3 in Fig. 1, preferring to call them ‘inter-theory connections’. However, now we shall consider the possibility that they can assume the role of bridge laws as required in the present context. The paradigmatic example of a bridge law in the philosophical literature is provided, as indicated above, by the perfect gas. There the bridge law identifies the temperature in statistical mechanics using the underlying identification of the expectation value of kinetic energy of the gas with its internal energy. But, on closer examination, this example glosses over two other identifications, of volume and pressure.Footnote 94 In a perfect gas contained in a cylinder closed by a movable piston, the piston position will fluctuate; that is to say, from a statistical mechanical point of view, the volume of the gas is a fluctuating quantity. So, just as the internal energy must be identified with the expectation value of the kinetic energy, the thermodynamic volume must be identified with the expectation value of the statistical mechanical volume. Other instances of the same kind are provided by other systems and they are all covered by FTD–3, which in the current context plays the role of a bridge law. In the case of the perfect fluid the identification of internal energy and the expectation value of the kinetic energy and of the thermodynamic volume with the expectation value of the statistical mechanical volume is sufficient to provide a bridge for temperature, pressure and for entropy via the Sackur-Tetrode formula and consequentially for all other thermodynamic variables, as described by the connecting relationships FTD–1 and FTD–2. These could, therefore, be regarded as consequences of the underlying bridge law FTD–3, rather than as bridge laws in their own right. In more complicated situations, where there is a need to connect a larger set of thermodynamic and statistical mechanical variables, it is a reasonable economy to regard them, together with FTD–3 as comprising an exhaustive set of bridge laws.

For (ii):

by definition \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) is a corrected version of \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) that can be derived from \({\mathfrak {T}}_{{\tiny {\text{ F }}}}\) plus bridge laws. In the current context \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) is a version of TD3 in which the relevant quantities are allowed to fluctuate, and the fluctuations show roughly the pattern given in SM2 (but without \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) containing any of the microstructure of matter specified in statistical mechanics). It is obvious that \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) thus defined is a deductive consequence of SM2: it is obtained simply by applying the bridge laws to SM2.Footnote 95

For (iii):

we need to show that \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) and \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) stand in the proper strong analogy relationship. In effect the derivation of SM3 from SM2 through the thermodynamic limit and the fact that SM3 corresponds to TD3 amounts to saying that there is a strong analogy between SM2 and TD3. However, a more detailed analysis is useful and for this we check whether Schaffner’s two criteria are satisfied:

For (a):

the messages FSM–1 and FSM–2 are relevant. FSM–1 asserts that uncontrolled thermodynamic variables fluctuate with variances of \({\mathcal {O}}(N)\) related to response functions. This means that the variances of the corresponding density variables are \({\mathcal {O}}(1/N)\). That these fluctuations are small for large systems is related to, but not exactly equivalent to the fact, asserted in FSM–2, that extensivity is an approximate property of large systems. So \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) modifies \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) by replacing equality in the basic relationship with approximate equality, valid when the system is large. It also contains fluctuation–response function relationships between fluctuations, which are recognised in \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) but not in \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\), and response functions which appear in both. Thus \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) makes more adequate predictions than \({\mathfrak {T}}_{{\tiny {\text{ C }}}}\) because real systems do show fluctuations.

For (b):

the way that \({\mathfrak {T}}_{{\tiny {\text{ C }}}}:=\,\) TD3 is explained by \({\mathfrak {T}}_{{\tiny {\text{ F }}}}:=\,\) SM2 follows straightforwardly from Sect. 4 once the bridge laws are accepted and we have in place the definition of an incipient singularity (Def. 1). Maxima in response functions are identified as incipient singularities if they map into real singularities in the thermodynamic limit, which is the step from SM2 to SM3. And, as we have already noted, TD3 communicates with SM3 in the sense that it communicates its understanding of the singularities in SM3 to TD3.

From the above we conclude, that TD3 reduces to SM2 in the sense of deductive reduction. However, the structure of Fig. 1 prompts a consideration of the possibility of further reductive relationships higher in the figure. In particular does \({\mathfrak {T}}_{{\tiny {\text{ C }}}}:=\,\) TD4 and \({\mathfrak {T}}_{{\tiny {\text{ F }}}}:=\,\) SM4 satisfy the required conditions?Footnote 96 It is straightforward to see that connectability and derivability, where \({\mathfrak {T}}_{{\tiny {\text{ C }}}}^\star\) is a version of TD4 that has certain of the features of SM4 built into it, are satisfied as before. Scaling in TD4 is a phenomenological means of capturing the structure of the way thermodynamic functions in critical regions depend on variables (in the form of homogeneous functions of controllable variables). It can in a sense be regarded as being built from renormalization group theoory with the scaffolding removed. This is what we referred to above as the substantiation of scaling theory by the renormalization group. On the other hand the values of critical exponents and the interpretation of the origin of scaling as the fixed point of a semi-group transformation is absent from TD4 but present in SM4. In that sense the later provides an explanation or enrichment of the former.

5.3 Emergence

In the case of emergence things are even more difficult than with reduction. As Humphreys notes in a recent review of the field, not only is there no unified framework or account of emergence, there is not even a generally agreed set of core examples of emergent phenomena on which a discussion could build [48]. Our aim here is not, therefore, to comprehensively review the field; we rather discuss some senses of emergence that have played a role in the debate and assess whether, in the light of our analysis, PTCP are emergent in these senses.

For Butterfield, whose view of reduction is essentially Nagelian, there is no conflict between reduction and emergence. The view that reduction and emergence are compatible is based on an understanding of emergence as there being “properties or behaviour of a system which are novel and robust relative to some appropriate comparison class” ([20, p. 921], orig. emph.). He adds the comment that this is intended to cover the case where a system consists of parts, where the idea is that a composite system’s “properties and behaviour are novel and robust compared to those of its component systems, especially its microscopic or even atomic component” (op. cit.). We agree that thus understood, there is emergence in the large but finite systems we are studying and PTCP can be regarded as both emergent and reduced. Illustrative of this is the transfer matrix approach where maxima in response functions and the correlation length (or critical properties if \({\mathfrak {d}}>d_{{\tiny {\text{ LC }}}}\)), calculated for a lattice which is infinite in \({\mathfrak {d}}\) dimensions, converge towards the critical properties of a \(({\mathfrak {d}}+1)\)-dimensional system as the size in that dimension is increased. This account, affords a understanding of dimensional crossover between universality classes, with the ‘gradual emergence’ of critical behavior.

Humphreys [48] introduces the triplet of conceptual emergence, ontological emergence and epistemological emergence, which we now consider:

  1. (1)

    We have conceptual emergence “when a reconceptualization of the objects and properties of some domain is required in order for effective representation, prediction, and explanation to take place” (op. cit. p. 762). This is close to Butterfield’s notion of reduction, and there is emergence in this sense because various notions that are not native to statistical mechanics, have been introduced into the theory in order to deal with PTCP, both through inputs from thermodynamics (FTD-1, FTD-2 and FTD-3) and through the introduction of the notion of a large system at level SM2. As we have argued in Sect. 4 and in our discussion of transfer matrix methods, it is precisely in such large systems that PTCP are manifested in the form of incipient singularities.

  2. (2)

    Ontological emergence amounts to the following: “\(\textit{A}\) ontologically emerges from \(\textit{B}\) when the totality of objects, properties, and laws present in \(\textit{B}\) are insufficient to determine \(\textit{A}\)” (op. cit. p. 762). As we have seen in Sect. 4, the properties of a system’s micro-constituents together with the laws that govern them are sufficient to determine PTCP; in fact they can be shown to happen in finite systems. So PTCPs are not ontologically emergent.

  3. (3)

    Epistemological emergence is present when the limitations in our knowledge prevent us from predicting the relevant phenomenon. As Humphreys puts it, \(\textit{A}\) epistemically emerges from \(\textit{B}\) “when full knowledge of the domain to which \(\textit{B}\) belongs is insufficient to allow a prediction of \(\textit{A}\) at the time associated with \(\textit{B}\)” (op. cit. p. 762). This is also the notion of emergence that Morrison appeals to when she notes that “what is truly significant about emergent phenomena is that we cannot appeal to microstructures in explaining or predicting these phenomena, even though they are constituted by them” [84, p. 143].Footnote 97 We submit that PTCP are not epistemically emergent because, as we have seen in Sect. 4, they in fact can be deduced and predicted from the underlying micro-theory. What is important here is PTCP appear in finite systems.

Batterman’s account of emergence [7], centres around the application of the renormalization group. As we have seen in Sect. 3.5.2 he (and Kadanoff) regard the use of renormalization group as a wholly different type of approach to PTCP from which novel properties emerge. In particular the fixed points of the renormalization transformation which allocate the universality classes. We agree with this except for two reservations:

  1. (i)

    Batterman takes the thermodynamic limit as an essential feature of this method. As we have indicated in Sect. 3.5.2 we do not regard this as being necessary.

  2. (ii)

    There is nothing automatic about setting up a renormalization group analysis of a system. It does not arise in a straightforward algorithmic way from the basic structure of statistical mechanics. Indeed physical insight is required both in the the choice of the lattice scaling \({\mathcal {N}}\rightarrow {\widetilde{{\mathcal {N}}}}\) and of the weight function. These must be compatible with the nature of the ordered state and the critical phenomena to be explored. The recurrence relationships are determined by these choices, and the fixed points ‘emerge’ as properties of the recurrence relationships. These in turn have exponents which give the universality classes of the various critical regions. As we have already indicated, most renormalization schemes involve some degree of approximation, with a consequent variation in fixed points and their exponents.Footnote 98 However, weight-function dependent variations can also occur even when no approximation is involved. An example of this is the one-dimensional Ising model with the scheme described in Sect. 3.4.3 with \(\lambda =\,2\), but with \(J<0\), that is the antiferromagnetic case. In principle one expects a fixed point associated with antiferromagnetism, but, although the free-energy density is correctly computed the fixed point is missing. For this to appear, as is shown by Nelson and Fisher [87], one needs to take \(\lambda =\,3\); that is blocks of three sites. That, in general, different fixed points and hence different universality classes emerge from different choices of lattice scaling and weight function for the same system means that this is a qualified type of emergence.

Finally, emergence is often characterised as the failure of reduction [59, p. 21]. That is, reduction and emergence are taken to be mutually exclusive and a property is emergent only if it fails to be reducible. PTCP are not emergent in this sense because, as we have seen above, they are reducible in the sense of a deductive reduction.

6 Conclusions

We have presented a picture of the way that thermodynamics and statistical mechanics coexist and collaborate within the envelope of thermal physics. We showed that the relationship between the two developments, represented by the columns in Fig. 1 depends, on the one hand, on inter-theory connecting relationships from thermodynamics to statistical mechanics, one of which, FTD–3, can, in the context of deductive reduction be regarded as a bridge law, with the remaining two, FTD–1 and FTD–2, being consequences of FTD–3. On the other hand, from statistical mechanics to thermodynamics, there is also a sequence of ‘messages’ that are effectively warnings about the idealized nature of thermodynamics.

We address the problem that real systems are finite, and singular behaviour associated with PTCP can occur only in infinite systems, using finite-size scaling and a clear specification of a large system. This enables us to develop a picture of the way that PTCP in finite systems can be defined in terms of incipient singularities. Within this picture the role of the infinite system is threefold: (a) the existence of a critical region in the thermodynamic limit is a necessary condition for there to be a region of incipient singularity in the real finite system, (b) as one (but not the only) way to determine quantitative properties like the value of critical exponents of the real system (c) to simplify calculations. In these senses the infinite system is an indispensable, idealized approximation to the real finite system.

The usual arguments for limit reduction are based on an unwarranted conflation between a thermodynamic system with critical behaviour (TD3) and an infinite statistical mechanical system (SM3). On the other hand, the arguments for the deductive reduction of TD3 to the statistical mechanics of a large system (SM2) are valid. Next we argue that PTCP are neither ontologically or epistemologically emergent, but they are conceptually emergent. Rather less frequently remarked upon are the ways that statistical mechanics both substantiates and enriches the picture of PTCP in thermodynamics.