1 Introduction

The mathematical formalization of the continuum in its current form, what we shall call the arithmetical continuum or equivalently the set of real numbers \({{\mathbb {R}}}\) here, is widely used everywhere in mathematics, physics, engineering sciences, mathematical biology, economy and sociology, etc., etc. However, in spite of its success, the utilization of the arithmetical continuum leads to various difficulties both in pure mathematics [29] and its applications (for an excellent survey of physics see [1]); all of these difficulties basically originate from the fact that the arithmetical continuum has an inherent infinite transcendental structure (in many aspects). Perhaps the most relevant question related with the problematics of the arithmetical continuum on the pure mathematical side is the (in our opinion) unsettled status of Cantor’s continuum hypothesis; while a simple but painful example from theoretical physics is the divergence of the total electric energy of an electrically charged point particle in Faraday–Maxwell electrodynamics (leading eventually to the complicated renormalization issues in classical and quantum field theories). Fortunately, by performing simple experiments, we know that this and the various other occurences of divergences in classical and quantum electrodynamics are nothing but artifacts originating from the mathematical formulation (which uses the arithmetical continuum) of these theories hence we can identify and isolate these divergences quite easily. But what about the singularities or other phenomena (mathematically) predicted by general relativity for instance? Lacking unambigous experimental evidences we cannot make a commitment about their ontological status yet.

The easiest way to get rid of the various divergence problems in physical theories arising from modeling mathematically the physical spatio-temporal continuum with the arithmetical one, is to declare that the mathematical description of space-time should be simply finite; that is the mathematical structure modeling physical space-time should have non-zero finite cardinality \(0<N<+\infty \) as a set. Finiteness necessarily implies the geometric relation \(N\sim V/V_{\text{Planck}}\) where V is the volume of a spatio-temporal region. In this framework it is yet reasonable to suppose that at macroscopic scales (i.e. as \(V\rightarrow +\infty \)) the cardinality of space-time is extremely huge, practically infinite hence the classical geometric description (provided by general relativity theory with its precise mathematics) applies as an approximation; nevertheless the macroscopic cardinality N of the observable Universe is determined by some microscopic cardinality \(0<N_{\text{Planck}}<+\infty \) hence is finite. However as one approaches microscopic scales (i.e. as \(V\rightarrow V_{\text{Planck}}\)) the finiteness gets more and more relevant and the supposed finite microscopic cardinality attained at the Planck scales. If one indeed wants to describe not merely a geometric continuum but the physical space-time itself then one also has to take into account that at microscopic scales the spatio-temporal continuum more-and-more resembles the vacuum state of a relativistic quantum field with its known microscopic properties (described by some yet mathematically problematic relativistic quantum field theory). Thus talking about the finiteness of space-time in fact means that one supposes that the physical vacuum has finite physical degrees of freedom. Consequently the cardinality N of a volume V containing vacuum is proportional to its energy content, i.e. expected to satisfy \(N\sim E/E_{\text{Planck}}\) too, where E is the vacuum energy within V.

The quantum vacuum is subject to Heisenberg’s various uncertainty principles. But if these are also involved in the description, the cardinality of the set modeling the physical space-time, if finite, gets problematic at the microscopic (hence the macroscopic) level. This is simply because as the size of a spatial volume approaches the Planck length or its time of existence gets very short, the fluctuation of the spatio-temporal cardinality gets comparable with the cardinality itself. More precisely we assumed that \(N\sim V/V_{\text{Planck}}\sim (L/\ell _{\text{Planck}})^3\). Then using \(\Delta L\Delta p\geqq \hbar \) the relative fluctuation of the vacuum cardinality within a volume V is estimated from below as

$$\begin{aligned} \frac{\Delta N}{N}\sim \left( \frac{\Delta L}{L}\right) ^3 \geqq \left( \frac{\hbar }{L\Delta p}\right) ^3 . \end{aligned}$$

On substituting the minimal length \(\ell _{\text{Planck}}\) and momentum uncertainty \(\Delta p_{\text{Planck}}\sim p_{\text{Planck}}= m_\text{Planck}c\) we find

$$\begin{aligned} \frac{\Delta N_{\text{Planck}}}{N_{\text{Planck}}}\geqq \left( \frac{\hbar }{\sqrt{\hbar G/c^3} \sqrt{\hbar c/G}\,c}\right) ^3=1 \end{aligned}$$

in a small volume comparable to the Planck length. Likewise, based on the finiteness assumption, we know that \(N\sim E/E_{\text{Planck}}\) too. Putting this together with \(\Delta E\Delta t\geqq \hbar \) gives

$$\begin{aligned} \frac{\Delta N}{N}\sim \frac{\Delta E}{E}\geqq \frac{\hbar }{E\Delta t} . \end{aligned}$$

The minimal energy of the vacuum is \(E_{\text{Planck}}=m_\text{Planck}c^2\) and the minimal observation time of it is \(\Delta t_\text{Planck}\sim t_{\text{Planck}}\). Hence we obtain again

$$\begin{aligned} \frac{\Delta N_{\text{Planck}}}{N_{\text{Planck}}}\geqq \frac{\hbar }{c^2\sqrt{\hbar c/G} \sqrt{\hbar G/c^5}}=1 \end{aligned}$$

shortly after the Big Bang for instance. Since the relative fluctuation is a dimensionless quantity, having unit magnitude means that it is meaningless to talk about a sharp (i.e. finite) cardinality \(N_{\text{Planck}}\) of the vacuum at short space or time scales, if quantum mechanics is also taken into account. This overall uncertainty might be the core physical reason why physical space-time is modeled by infinite mathematical structures: if one indeed wants to describe the physical space and time as the vacuum of a relativistic quantum field (and not merely as a geometric continuum) then using infinite sets is an apparently unavoidable mathematical way to grasp the overall spatio-temporal uncertainity of the relativistic quantum field comprising the true physical vacuum.

Having seen that modeling physical space-time with infinite mathematical structures is not easy to exclude, we return to our former question concerning general relativity: the purpose of this paper is to examine whether the several conceptional, technical, common sense, etc. controversies connected with black hole entropy [2, 11, 12, 23, 25, 27] do at least in part emanate from the fact that general relativity mathematically rests on the arithmetical continuum, an infinite structure?

The paper is organized as follows. In Sect. 2 we recall Chaitin’s reformulation of Gödel’s first incompleteness theorem (see Theorem 2.1 here) and interpret its content—with hindsight dictated by Heisenberg’ uncertainity—as the presence of an inherent uncertainty or fuzziness within the arithmetical continuum (which might also be a consequence that its final constituents are extensionless). This strongly motivates to introduce a statistico-physical analogy and to talk in this context about the pure “set-theoretic entropy” of the arithmetical conntinuum. Quite interestingly this idea can be rigorously grasped by the aid of a key concept of current information theory, namely the Kullback–Leibler relative entropy or divergence [6, Chapter 8] adapted to a pair of Riemannian manifolds (see Theorem 2.2 here). Computing this quantity over a compact manifold-with-boundary and exploiting its non-negativity an abstract Riemannian geometric analogue of Hawking’s area theorem [11] is obtained (Theorem 2.3 here). Then, in Sect. 3 basically following [9, Section 3] we recall and refine Noether’s theorem on symmetries and conserved quantities and prove that within general relativity a conserved quantity to diffeomorphisms can be assigned which is not zero if a stationary black hole is present and can be identified with its entropy (proportional to the area of the “instantaneous” event horizon [11]). Then we argue, based on their common diffeomorphism invariance, that the “set-theoretic” and the black hole entropies are strongly related. This suggests that the latter entropy might be a consequence of the former one as a result of modeling general relativity mathematically over the arithmetical continuum therefore the long-sought physical degrees of freedom responsible for black hole entropy have at least in part simply a pure mathematical origin only, corresponding to the artificial division of the space-time continuum into points (see the discussion at the end of Sect. 3).

In the “Appendix”, we make an attempt (certainly incomplete at this stage of the art) to introduce a temporal structure in a covariant way into general relativity: recall that although being our most advanced theory dealing with the structure of space and time, general relativity is in fact a “timeless theory” due to its diffeomorphism invariance (cf. e.g. [3, Chapter 3]).

2 Chaitin Incompleteness and the Entropy of the Continuum

Accepting the structure of the arithmetical continuum or equivalently the set \({{\mathbb {R}}}\) of real numbers,Footnote 1 it is the totality of, or more precisely the disjoint union of, its individual constituents called real numbers or—speaking geometrically—its points:

$$\begin{aligned} {{\mathbb {R}}}=\bigsqcup _{x\in {{\mathbb {R}}}}\{x\}\, . \end{aligned}$$

If indeed this is the optimal mathematical structure of the continuum (e.g. it is possible to construct it without points [13]), then one would expect that using mathematical tools only, one is able to “locate” the individual constituents x within their totality \({{\mathbb {R}}}\) or equivalently, to make a mathematical distinction between them in an effective way. We know from our elementary university studies that upon fixing a notational convention every real number admits a well-defined (for instance decimal) expansion which means that this expansion exists for all real numbers and is unique in the sense that two expansions coincide if and only if the corresponding two real numbers are equal. It is however a quite surprising observation that in general the existence of this well-defined and unique expansion is the only available property of a “truely generic” real number i.e. a typical element \(x\in {{\mathbb {R}}}\). Therefore our question about an effective mathematical way of “picking” a single element \(x\in {{\mathbb {R}}}\) is in fact a question about the effectiveness of making distinctions between generic real numbers in terms of their (decimal) expansions.

To approach this problem first let us recall the idea of Kolmogorov complexity or algorithmic compressibility or computability of a real number (cf. e.g. [6, Chapter 14]). Let \(x\in {{\mathbb {R}}}\) be given and denote by \(\textbf{T}_x\) the (probably empty) set of those Turing machines which reproduce x in the following way: if \(\textbf{T}_x\not =\emptyset \) and some \(T_x\in \textbf{T}_x\) is given with an input \(n\in {{\mathbb {N}}}\) then the output \(T_x(n)\in {{\mathbb {N}}}\) consists of e.g. precisely the first n digits of the expansion of x. Denoting by \(\vert T_x\vert \in {{\mathbb {N}}}\) the length of the Turing machine (considered as an algorithm or program in some programming language) define

$$\begin{aligned} K(x):=\left\{ \begin{array}{ll} +\infty &{} \text{ if }\,\,\,\textbf{T}_x=\emptyset \\ &{} \\ \inf \limits _{T_x\in \textbf{T}_x} \vert T_x\vert <+\infty &{} \text{ if }\,\,\,\textbf{T}_x\not =\emptyset \end{array}\right. \end{aligned}$$

and call the resulting (extended) natural number \(K(x)\in {{\mathbb {N}}}\cup \{+\infty \}\) the Kolmogorov complexity of \(x\in {{\mathbb {R}}}\). Then \(K(x)=+\infty \) corresponds to the situation when no algorithms reproducing x in the above sense exist hence x is not algorithmically compressible or not computable (by simple cardinality arguments the vast majority of real numbers belongs to this class including, as a yet definable hence mild example, Chaitin’s famous \(\Omega \) number [6, Section 14.8]) while \(K(x)<+\infty \) corresponds to the opposite case (containing all familiar real numbers like 7, \(\frac{5}{8}\), \(\sqrt{2}\), \(\pi \), \(\text{e}\),...). It is clear that the only important question about x in this context is whether \(K(x)=+\infty \) or \(K(x)<+\infty \) and in the latter case only the magnitude of K(x) is relevant, for its particular value depends on the details of the sort of expansion of x, the programming language for \(T_x\), etc. hence does not carry essential information. A remarkable observation about Kolmogorov complexity is the following result:

Theorem 2.1

(Chaitin’s version of Gödel’s first incompleteness theorem [4]) For any (sufficiently rich, consistent, recursively enumerable) axiomatic system S based on a first order language L there exists a natural number \(0<N_S<+\infty \) such that there exists no real number x for which the proposition

$$\begin{aligned} K(x)\geqq N_S \end{aligned}$$

is provable within S. \(\square \)

Motivated in various ways by [10, 20, 21] we interpret this quite surprising mathematical fact from our viewpoint as follows: taking into account that the only known property of a generic real number which fully characterizes it is its existing (decimal) expansion, but the Kolmogorov complexity of this expansion hence the expansion itself generally is not fully determinable (by proving theorems on it in an axiomatic system), there is in general no way, using standard mathematical tools in the broadest sense, to “sharply pick” any element from the arithmetical continuum. Consequently, from the viewpoint of an “effective mathematical activity”, the structure of the arithmetical continuum i.e. the set \({{\mathbb {R}}}\) of real numbers contains an inherent uncertainty or fuzziness in the sense that its individual disjoint constituents cannot be distinguished from each other in a universal and effective mathematical way.

The above interpretation of Theorem 2.1 serves as a motivation to introduce a statistico-physical analogy for the arithmetical continuum offering a fresh look into its structure. First, generalizing the above decomposition of the real line into its points, we accept as usual that every (finite dimensional, real) differentiable manifold M admits a decomposition into its disjoint constituent points:

$$\begin{aligned} M=\bigsqcup _{x\in M}\{x\}\, . \end{aligned}$$
(1)

Speaking intutively, we can make three observations about this decomposition: all the points x of M (i) are homologous i.e. “look the same”, (ii) are terminal objects i.e. they do not possess any further internal structure nevertheless their collection gives back precisely M and (iii) are disjoint from each other i.e. they “do not interact”. Except its cardinality this decomposition of M therefore strongly resembles the structure of an ideal gas as usually defined in statistical physics. Take an abstract set X whose cardinality coincides with that of the continuum in ZFC set theory and regard it as an abstract ideal gas X such that its elements correspond to the atoms of the ideal gas X. Extending this analogy further, the left hand side of (1) i.e. a differentiable manifold M with its global topological, smooth, etc. “macroscopic” properties can be regarded as one possible macrostate of X while the right hand side of (1) i.e. the particular identification of M with its elements as a particular microstate of X within the macrostate M. The equilibrium dynamics of X in its macrostate M is generated by diffeomorphisms; hence another microstate of X within the same macrostate M is achieved by picking any diffeomorphism \(f: M\rightarrow M\) and writing \(f(M)=M\) again, and then taking the corresponding new decomposition

$$\begin{aligned} M=\bigsqcup \limits _{x\in M}\{f(x)\} . \end{aligned}$$
(2)

Another differentiable manifold N not diffeomorphic to M (in the broadest sense i.e. possibly having different dimension, number of connected components, etc.) might be interpreted as a different macrostate (with its corresponding assembly of microstates created by diffeomorphisms) of the same abstractly given ideal gas X. However this abstract ideal gas can even appear in completely different i.e. non-geometric, discontinuous macrostates as well like e.g. in the form of the Cantor set \(C\subset {{\mathbb {R}}}\) or some other abstract topological space (with its homeomorphisms creating the corresponding assembly of microstates), or just simply in the form of some set (together with its bijections), etc., etc.

In accord with this analogy Theorem 2.1 is interpreted as a fundamental result about the indistinguishability of the individual microstates of X realizing the same macrostate M. The next standard step in statistical mechanics is to introduce a tool, a measure, capable to capture the amount of information loss created by the passage from individual microstates to their common macrostate. This measure is known as the entropy of the ideal gas in a given macrostate. How could we characterize this entropy within our analogy? Proceeding completely formally along the way of Boltzmann’s classical approach to entropy we can argue as follows. Certainly all possible microstates of the abstract ideal gas X are parameterized by the elements of the group of its all set-theoretic bijections \(\text{Bij}(X)\) while its possible microstates within the macrostate M are parameterized by its subgroup of diffeomorphisms \(\text{Diff}(M)\). Therefore restricting the dynamics of X from \(\text{Bij}(X)\) to \(\text{Diff}(M)\) by construction means that M is an equilibrium state of X. If we further assume that all microstates appear with equal probability i.e. a sort of ergodicity holds for X then as a first trial we formally put

$$\begin{aligned} \text{ Entropy } \text{ of } \text{ the } \text{ set } X \text{ in } \text{ its } \text{ manifold-macrostate } M\,\sim \, \log \Gamma (\text{Diff}(M)) \end{aligned}$$

with \(\Gamma \) being an, at this state of the art admittedly hypothetical, volume measure on \(\text{Bij}(X)\) depending on a particular choice of the axiomatic system S in Theorem 2.1. Taking into account that if M has positive dimension then \(1\subsetneqq \text{Diff}(M)\subsetneqq \text{Bij}(X)\), we expect \(1\lvertneqq \Gamma (\text{Diff}(M))\lvertneqq \Gamma (\text{Bij}(X))\) to hold such that the resulting entropy expression is a finite positive number and at least in its magnitude being independent of any choice for S as dictated by the universality of Theorem 2.1. Note that despite being formally ill-defined, by construction this entropy formula is invariant under diffeomorphisms of M since a diffeomorphism does not change the given macrostate.

Let us try to grasp this set-theoretic entropy more precisely from a mathematical viewpoint. To achieve this we follow [6, Chapter 8] and introduce the Kullback–Leibler relative entropy adapted to a pair of Riemannian manifolds. Let (Mg) be an m dimensional Riemannian manifold what we assume to be oriented and compact for technical reasons. Consider the associated volume measure \(\mu _g\in \Omega ^m(M)\) defined by the aid of the Hodge operator associated with the orientation and the metric as \(\mu _g:=*_g1\). Suppose that it is normalized i.e. \(\int _M\mu _g=1\). If \({{\mathscr {A}}}_g\) denotes the \(\sigma \)-algebra of \(\mu _g\)-measurable subsets of M then g improves M to a Kolmogorov probability measure space \((M,{{\mathscr {A}}}_g,p_g)\). It is remarkable that these probability spaces in fact do not depend on which particular normalized-volume metrics g or h they come from. This is because by a theorem of Moser [17] the only diffeomorphism invariant of a smooth positive density over M is its volume. Therefore taking two different \((M,{{\mathscr {A}}}_g, \mu _g)\) and \((M,{{\mathscr {A}}}_h,\mu _h)\) there exists an orientation-preserving diffeomorphism \(f:M\rightarrow M\) such that for every \(A\in {{\mathscr {A}}}_g\) one finds \(f(A)\in {{\mathscr {A}}}_h\) and \(\int _A\mu _g=\int _{f(A)}\mu _h\). Thus switching to another probability space simply corresponds to identify the manifold M not with its particular microstate as in (1) but with its different one (2) still belonging to the same manifold macrostate M of the abstract ideal gas X.

Keeping in mind this universality of the Riemannian probablity spaces and fixing from now on a particular one \((M,{{\mathscr {A}}}_g,\mu _g)\) we can assign a meaning to the emerging probabilities \(p_g(A):=\int _{A}\mu _g\) for every \(A\in {{\mathscr {A}}}_g\) as follows. Motivated in a straightforward way by interpreting Theorem 2.1 above as the inherent indistinguishability of the points of the continuum we accept that the apparently simple task of identifying or localizing a point \(x_0\in M\) within M cannot be carried out. The best we can do is to introduce the following

Assumption–mathematical form. The number \(0\leqq p_g(A)\leqq 1\)is the probability that a distinguished point satisfies that \(x_0\in A\subseteqq M\).

Accepting this interpretation let \(\{A_i\}_{i=1,\dots ,n}\) be a finite covering of M by \(\mu _g\)-measurable almost-disjoint subsets i.e. \(M=\cup _iA_i\) such that \(A_i\in {{\mathscr {A}}}_g\) thus \(0\leqq p_g(A_i)\leqq 1\) exists for all \(i=1,\dots ,n\) and \(p_g(A_i\cap A_j)=0\) for all \(i\not =j\). This implies that \(\sum _ip_g(A_i)=1\). Associated with the metric g and the covering \(\{A_i\}_{i=1,\dots ,n}\) one introduces in a natural way the approximate Shannon entropy of M with respect to the covering:

$$\begin{aligned} S\big (M,g,\{A_i\}_{i=1,\dots ,n}\big ):=-\sum _{i=1}^np_g(A_i)\log p_g(A_i) . \end{aligned}$$
(3)

If the information that the point \(x_0\in M\) actually satisfies \(x_0\in A_i\) is interpreted as saying that “M is in its \(i^\text{th}\) state” then taking into account the interpretation of the probabilities involved we can say that (3) describes the entropy of a “state” of M which is the mixture of the “pure states” \(i=1,\dots ,n\) with corresponding probabilities \(p_g(A_i)\). Observe that this formally agrees with the general definition of the entropy of a system in statistical physics. It is clear that the “knowledge” about the “point distribution” of M is improved if the covering is refined. Therefore it is challenging to define the entropy of M by taking the limit of (3) over all coverings, if exists. However we cannot expect to come up with any reasonable number in this way since taking for example an equipartition i.e. for which \(p_g(A_i)=\frac{1}{n}\) for all \(i=1,\dots ,n\) with corresponding entropy (3) then

$$\begin{aligned} \lim \limits _{n\rightarrow +\infty } S\big (M,g,n\big )=\lim \limits _{n\rightarrow +\infty }\log n=+\infty \end{aligned}$$

demonstrating that the naive Shannon entropy of the continuum diverges at least logarithmically. Nevertheless using the physical language this equipartition corresponds to ergodicity of the equilbrium dynamics of the continuum in its manifold-macrostate M provided by its orientation-preserving diffeomorphisms; hence comparing this formula with the formal entropy expression above which also expresses entropy in an equilibrium state under ergodic dynamics we find that \(\Gamma (\text{Diff}^+(M))\sim n\) as \(n\rightarrow +\infty \) hence indeed regularization needed.

However it turns out that this is the only sort of divergence and one can renormalize the entropy of a compact Riemannian manifold essentially by removing a single logarithmically divergent universal term from (3). To this end we will follow [6, Section 8.3]. Assume that in \(\{A_i\}_{i=1,\dots ,n}\) every \(A_i\) has the form of a closed m-ball hence we can choose a local coordinate system \((A_i,x^1,\dots , x^m)\) in each of them such that there exists a uniform number

$$\begin{aligned} \Delta :=\int \limits _{A_i}\text{d}x^1\dots \text{d}x^m \end{aligned}$$

for all \(i=1,\dots , n\) satisfying \(0<\Delta <+\infty \) taking into account the orientability of M. Moreover \(p_g(A_i)=\int _{A_i}\mu _g=\int _{A_i}\sqrt{\det g(x^1,\dots ,x^m)} \text{d}x^1\dots \text{d}x^m\) therefore introducing the smooth strictly positive local function \(\rho _i:=\sqrt{\det (g\vert _{A_i})}\) by the mean value theorem there exists a point \(y_i\in A_i\) such that \(p_g(A_i)=\rho _i(y_i)\Delta \). Thus (3) takes the shape

$$\begin{aligned} S\big (M,g,\Delta \big )=-\sum \limits _{i=1}^n\rho _i(y_i)\Delta \, \log \big (\rho _i(y_i)\Delta \big ) \end{aligned}$$

which is however a highly coordinate-depending expression. To overcome this difficulty introduce another Riemannian structure (Mh) having normalized volume too i.e. \(\int _M\mu _h=1\) with corresponding strictly positive local density functions \(\sigma _j:=\sqrt{\det (h\vert _{A_j})}\) hence \(p_h(A_j)=\sigma _j(z_j)\Delta \) with some point \(z_j\in A_j\). Then the Shannon entropy can be expanded like

$$\begin{aligned} S\big (M,g,\Delta \big )= & {} -\sum \limits _{i=1}^n\rho _i(y_i)\Delta \,\log \left( \frac{\rho _i(y_i)}{\sigma _i(z_i)}\sigma _i(z_i)\Delta \right) \\= & {} -\sum \limits _{i=1}^n\rho _i(y_i)\Delta \left( \log \frac{\rho _i(y_i)}{\sigma _i(z_i)}+ \log \sigma _i(z_i)+\log \Delta \right) \\= & {} -\sum \limits _{i=1}^n \log \left( \frac{\rho _i(y_i)}{\sigma _i(z_i)}\right) \rho _i(y_i)\Delta -\sum \limits _{i=1}^n\log (\sigma _i(z_i))\rho _i(y_i)\Delta - \log \Delta \sum \limits _{i=1}^n\rho _i(y_i)\Delta . \end{aligned}$$

Let us say that the countable sequence \(\{A_1\}, \{A_1,A_2\},..., \{A_i\}_{i=1,\dots ,n},\dots \) is a refinement (of a finite covering of M by \(\mu _g\)-measurable almost-disjoint subsets as above) if for every points \(x,y\in M\) with \(x\not =y\) there exists an \(n_{x,y}\) such that for every \(n>n_{x,y}\) the corresponding \(\{A_i\}_{i=1,\dots ,n}\) in this sequence contains no single element \(A_j\) satisfying \(x,y\in A_j\). Applying this to our covering with uniform balls, refinement implies \(\Delta \rightarrow 0\) but not the other way round. We make now the following four observations. The first and most important is that the ratios of the local functions already extend globally: using the globally existing volume-forms \(\mu _g,\mu _h\in \Omega ^m(M)\) there exists a positive smooth function \(f:M\rightarrow {{\mathbb {R}}}\) satisfying \(\mu _g=f\mu _h\) and obviously \(f\vert _{A_i}=\frac{\rho _i}{\sigma _i}\). We can write this fact as \(\frac{\rho _i}{\sigma _i}=\frac{\text{d}\mu _g}{\text{d}\mu _h}\big \vert _{A_i}\) in terms of the globally well-defined Radon–Nikodym derivative of the involved measures. The second observation is that \(\sum _i\rho _i(y_i)\Delta =\int _M\mu _g=1\). These two observations make sure that the first term on the right hand side converges to a coordinate-free i.e. globaly well-defined integral \(-\int _M\log \Big (\frac{\text{d}\mu _g}{\text{d}\mu _h}\Big )\mu _g\) during a refinement. Thirdly, the second term on the right hand side is a coordinate-dependent hence not well-defined number \(I(M,g,h,\Delta )\) nevertheless satisfying \(\vert I(M,g,h,\Delta )\vert \leqq c(M,h)\int _M\mu _g=c(M,h)\) hence remains bounded (possibly vanishes) during a refinement. Finally the third term is equal to \(\log \Delta \) representing the already recognized logarithmic divergence in the Shannon entropy.

Putting all of these findings together we arrive at the following result which can be understood as the appropriate renormalization of the Shannon entropy of a compact Riemannian space; that is upon removing two ill-defined terms from it we come up with a well-defined i.e. diffeomorphism-invariant quantity:

Theorem 2.2

(cf. [6, Theorem 8.3.1]) Let (Mg) be an m dimensional compact oriented Riemannian manifold having unit volume and \(\{A_i\}_{i=1,\dots ,n}\) as above which is uniform in the sense that there exists a local coordinate system \((A_i,x^1,\dots ,x^m)\) such that \(\Delta =\int _{A_i}\text{d}x^1\dots \text{d}x^m\) is a positive number independent of \(i=1,\dots ,n\). Let (Mh) be another Riemannian structure having unit volume too.

Then the approximate Shannon entropy \(S(M,g,\Delta )\) under the refinement of the corresponding covering behaves like

$$\begin{aligned} \lim \limits _{\Delta \rightarrow 0} \big (S(M,g,\Delta )+I(M,g,h,\Delta )+\log \Delta \big )= -\int \limits _M\log \left( \frac{\text{d}\mu _g}{\text{d}\mu _h}\right) \mu _g \end{aligned}$$

which means that the sum of three expressions including the approximate Shannon entropy and which are ill-defined separately in different ways, already converge under refinements to a well-defined expression.

The quantity \(-\int _M\log \Big (\frac{\text{d}\mu _g}{\text{d}\mu _h}\Big )\mu _g\) is called the Kullback–Leibler relative entropy of (Mg) with respect to (Mh) regarded as an ambient fixed Riemannian structure. \(\square \)

Remark

The Kullback–Leibler relative entropy is invariant under orientation-preserving diffeomorphisms of M however is not symmetric under \(g\leftrightarrow h\); in particular it follows from the Jensen inequality that it is always non-negative and is equal to zero if and only if \(\mu _g\)-almost everywhere \(\mu _g=\mu _h\) holds [6, Theorem 8.6.1]. This is the case for instance if (Mg) and (Mh) are isometric. Moreover recalling again Moser’s theorem [17] without loss of generality we can assume that for instance \(\mu _h\) is equal to a once and for all fixed density \(\mu _0\in \Omega ^m(M)\) having unit volume. Therefore the Kullback–Leibler relative entropy does not really depend on (Mh) hence it can be understood as a quantity which measures the “knowledge” on various changing geometries like (Mg) from the “viewpoint” of a once and for all fixed but otherwise arbitrary geometry like (Mh).

The Kullback–Leibler relative entropy admits a hypersurface formulation too:

Theorem 2.3

(An analogue of Hawking’s area theorem [11]) Let M be an \(m>1\) dimensional compact oriented manifold with non-empty connected boundary \(\partial M\) and let \((M,g_i)\) with \(i=0,1\) be two smooth Riemannian structures on it having (non-normalized) volume-forms \(\mu _i\in \Omega ^m(M)\) and corresponding volumes \(0<V_i<+\infty \) respectively.

Then, upon modifying the metric \(g_0\) with an inessential homothety if necessary, an equality

$$\begin{aligned} -\frac{1}{V_1}\int \limits _M\log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0}\right) \mu _1+ \log \frac{V_1}{V_0}=\text{Area}_1(\partial M)-\text{Area}_0(\partial M) \end{aligned}$$

holds where \(\text{Area}_i(\partial M)=\int _{\partial M}\sigma _i\) is the area of the boundary with respect to its induced orientation and the surface-form \(\sigma _i\) provided by the volume-form \(\mu _i\).

Therefore taking into account the non-negativity of the left hand side as well, an inequality

$$\begin{aligned} \text{Area}_1(\partial M)\geqq \text{Area}_0(\partial M) \end{aligned}$$

exists. Referring to our interpretation of the Kullback–Leibler relative entropy above, the surface area of the boundary with respect to the “unknown” geometry \((M,g_1)\) is not smaller than the surface area of the boundary with respect to the “known” (fixed but modified with a homothety if necessary) geometry \((M,g_0)\).

Remark

  1. 1.

    Before embarking upon the proof let us recall that a homothety is the scaling of a Riemannian metric with a positive constant i.e. \(g_0\mapsto c^2g_0\) with an arbitrary \(0\not =c\in {{\mathbb {R}}}\). Such a constant scaling is generally considered (by both mathematicians and physicists) as irrelevant. At this level of generality an application of a homothety on \((M,g_0)\) might be necessary in order our statements to be valid, see the proof below. However it is possible that in more restricted situations (like being \((M,g_1)\) a spatial section “preceded by” \((M,g_0)\) in a common ambient space-time satisfying the Einstein equation, etc.) performing homotheties turns out to be unnecessary.

  2. 2.

    Moreover with some technical effort the theorem in an appropriate form could be stated over non-compact manifolds as well however we skip that formulation here.

Proof

Expand the relative entropy expression like

$$\begin{aligned} -\int \limits _M\log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1} \right) \frac{\mu _1}{V_1}= & {} \text{Area}_1(\partial M) -\int \limits _M\log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0} \frac{V_0}{V_1}\right) \frac{\mu _1}{V_1}-\text{Area}_1(\partial M)\\= & {} \text{Area}_1(\partial M)- \left( \,\,\int \limits _M\log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0} \frac{V_0}{V_1}\right) \frac{\mu _1}{V_1}+ \int \limits _M\text{Area}_1(\partial M)\frac{\mu _1}{V_1}\right) \\= & {} \text{Area}_1(\partial M)- \int \limits _M\log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1} \text{e}^{\text{Area}_1(\partial M)} \right) \frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1}\frac{\mu _0}{V_0} \end{aligned}$$

and consider the following Dirichlet problem:

$$\begin{aligned} \left\{ \begin{array}{lll} \Delta _0u&{}=&{}\frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1} \log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1}\, \text{e}^{\text{Area}_1(\partial M)}\right) \\ u\vert _{\partial M}&{}=&{}0\end{array}\right. \end{aligned}$$

where \(\Delta _0\) is the scalar Laplacian on \((M,g_0)\). It is known (cf. e.g. [24, Section 5.1]) that this problem has a unique solution \(u\in C^\infty (M;{{\mathbb {R}}})\) whose smoothness follows from that of \(g_0\) and the inhomogeneous term on the right hand side. We proceed further by applying the divergence expression \(\text{div}X=L_X\mu _0\) where \(L_X\) is the Lie derivative along a vector field \(X\in C^\infty (M;TM)\) and then Cartan’s magic formula \(L_X\omega =\text{d}\omega (X ,\,\cdot \,)+ \text{d}(\omega (X,\,\cdot \,))\) and finally Stokes’ theorem to get

$$\begin{aligned} \int \limits _M\frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1} \log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1} \text{e}^{\text{Area}_1(\partial M)}\right) \frac{\mu _0}{V_0}= & {} \frac{1}{V_0}\int \limits _M(\Delta _0u)\mu _0= \frac{1}{V_0}\int \limits _M\text{div}(\text{grad}\,u)\mu _0= \frac{1}{V_0}\int \limits _ML_{\text{grad}\,u}\,\mu _0\\= & {} \frac{1}{V_0}\int \limits _M\!\!\!\big (\text{d}\mu _0(\text{grad}\,u,\cdot )+ \text{d}(\mu _0(\text{grad}\,u,\cdot ))\big )= \frac{1}{V_0}\int \limits _{\partial M}\!\!\!\mu _0(\text{grad}\,u,\cdot )\\= & {} \frac{1}{V_0}\int \limits _{\partial M}g_0(N_0 ,\,\text{grad}\,u)\sigma _0 \end{aligned}$$

where \(N_0\) denotes the unit normal to \(\partial M\) with respect to the orientation of M and the metric \(g_0\).

Consider the function \(g_0(N_0 ,\,\text{grad}\,u):\partial M\rightarrow {{\mathbb {R}}}\). Because u is surely not constant over M but is surely constant along \(\partial M\) we know that \(\text{grad}\,u\not =0\) and is parallel with \(N_0\) hence \(g_0(N_0 ,\,\text{grad}\,u)\) is a not-identically-zero function. Let \(Y\in C^\infty (\partial M;T(\partial M))\) be a tangent field. If \(\nabla \) denotes the Levi–Civita connection of \(g_0\) over M then \(Yg_0(N_0 ,\,\text{grad}\,u)=g_0(\nabla _Y N_0 ,\,\text{grad}\,u)+g_0(N_0 , \,\nabla _Y\text{grad}\,u)\). However by the definitions of Y and \(N_0\) they are not only orthogonal but even \(\nabla _YN_0=0\) holds; moreover we also find that \(\nabla _Y\text{grad}\,u=\text{grad}(Yu)=0\) because \(u=0\) along the boundary. Therefore we conclude that \(Yg_0(N_0 ,\,\text{grad}\,u)=0\) for every tangent field hence by the connectivity of \(\partial M\) in fact \(g_0(N_0 ,\,\text{grad}\,u)=a\) is a non-zero constant (depending on \(g_1\) through u as well).

We want to eliminate this constant upon applying a homothety with \(0\not =c\in {{\mathbb {R}}}\) on the metric \(g_0\). Beyond the scaling \(g_0\mapsto c^2g_0\) of the metric itself let us collect the induced scalings of the other things involved in the last integral too. These are \(N_0\mapsto c^{-1}N_0\) and \(\mu _0\mapsto c^m\mu _0\) hence \(V_0\mapsto c^mV_0\) however \(\sigma _0\mapsto c^{\frac{m-1}{m}}\sigma _0\). Next, the function comprising the inhomogeneous term in the Dirichlet problem above is not sensitive for the homothety on \(g_0\) meanwhile \(\Delta _0\mapsto c^{-2}\Delta _0\); therefore \(u\mapsto c^2u\). In addition to this by definition \(\text{grad}\,f=g_0(\text{d}f,\cdot )\) with \(g_0\) here being the inverse metric hence scaling as \(g_0\mapsto c^{-2}g_0\); thus \(\text{grad}\mapsto c^{-2}\text{grad}\) yielding eventually \(\text{grad}\,u\mapsto \text{grad}\,u\). Putting all of these together the last integral on the right hand side above scales as

$$\begin{aligned}{} & {} \frac{1}{V_0}\int \limits _{\partial M}g_0(N_0 ,\,\text{grad}\,u)\sigma _0 \longmapsto \frac{1}{c^mV_0} \int \limits _{\partial M}c^2g_0\big (c^{-1}N_0 ,\,\text{grad}\,u \big )c^{\frac{m-1}{m}}\sigma _0\\{} & {} \quad = \frac{c^{1-m}a}{V_0}\text{Area}_{c^2g_0}(\partial M) \end{aligned}$$

demonstrating that if \(\dim _{{\mathbb {R}}}M=m>1\) then we can adjust the homothety so that \(\frac{c^{1-m}a}{V_0}=1\) rendering the integral in question equal to \(\text{Area}_{c^2g_0}(\partial M)\). However the original integral on the left hand side is invariant under homotheties. Therefore, upon modifying the metric \(g_0\) with a homothety \(g_0\mapsto c^2g_0\) if necessary, we come up with the equality of the theorem.

The inequality then also follows taking into account the non-negativity of the left hand side provided by the aforementioned non-negativity of the Kullback–Leibler relative entropy (observe again the asymmetry of the relative entropy formula in its metric content!). \(\square \)

After these preparations we are in a position to offer a mathematically meaningful definition of the entropy of the continuum at least in a relative way by comparing its two particular microstates within a common compact-orientable-manifold-macrostate. Let M be a compact orientable m-manifold carrying two strictly positive measures \(\mu _0,\mu _1\in \Omega ^m(M)\) satisfying \(\int _M\mu _i=V_i\) for \(i=0,1\). Using \(\frac{\mu _0}{V_0}\) as a fixed reference measure as so far we can make \(\Omega ^0(M)\), the space of smooth functions over the compact M, a Banach space \(L^\infty (M,\frac{\mu _0}{V_0})\) by completing it with respect to the norm \(\Vert f\Vert _{L^\infty }:= {\frac{\mu _0}{V_0}-\mathrm{ess\,sup}_{x\in M}\vert f(x)\vert }\) for every \(f\in \Omega ^0(M)\). The dual space of \(L^\infty (M,\frac{\mu _0}{V_0})\) is \(L^1(M,\frac{\mu _0}{V_0})\) and contains \(\Omega ^m(M)\) because the formula \(F_\omega (f):=\int _Mf\omega \) by extension gives rise to a continuous linear functional on \(L^\infty (M,\frac{\mu _0}{V_0})\) for every \(\omega \in \Omega ^m(M)\). The norm on \(\Omega ^m(M)\subset L^1(M,\frac{\mu _0}{V_0})\) is \(\Vert \omega \Vert _{L^1}= \int _M\vert \omega \vert =\int _M\vert \frac{\text{d}\omega }{\text{d}\mu _0}\vert \mu _0\) and it follows that \(\frac{\mu _1}{V_1}\) belongs to the unit ball in the dual space \(L^1(M,\frac{\mu _0}{V_0})\). Introducing the \(\text{weak}^*\)-topology on \(L^1(M,\frac{\mu _0}{V_0})\) generated by the seminorms \(\Phi _f\) of the form \(\Phi _f(\omega ):=\vert \int _Mf\omega \vert \) it is easy to check that \(\frac{\mu _1}{V_1}\mapsto -\int _M\log \Big (\frac{\text{d}\mu _1}{\text{d}\mu _0} \frac{V_0}{V_1}\Big )\frac{\mu _1}{V_1}\) is continuous. But by Alaoglu’s theorem (e.g. [24, p. 484]) the unit ball in \(L^1(M,\frac{\mu _0}{V_0})\) is compact in the \(\text{weak}^*\)-topology consequently this map attains its maximum somewhere hence

$$\begin{aligned} S(M):=\sup \limits _{\mu _1\in \Omega ^m(M)}\,\,-\int \limits _M \log \left( \frac{\text{d}\mu _1}{\text{d}\mu _0}\frac{V_0}{V_1}\right) \frac{\mu _1}{V_1} \end{aligned}$$

is a finite number. It is independent of the choice for the reference measure \(\frac{\mu _0}{V_0}\) too by the aid of its diffeomorphism invariance and Moser’s theorem [17]. Moreover it satisfies the (sub)additivity \(S(M\#N)\leqq S(M)+S(N)\) and \(S(M\times N)=S(M)+S(N)\) under taking connected sum or Descartes product, respectively. Thus collecting all of our (including interpretational, with special attention to Assumption–mathematical form) efforts so far we put

Definition 2.1

Let X be an abstract set whose cardinality coincides with that of the continuum in ZFC set theory. Then

$$\begin{aligned} \text{ Entropy } \text{ of } \text{ the } \text{ set } X \text{ in } \text{ its } \text{ compact-orientable-manifold-macrostate } M:=S(M) \end{aligned}$$
(4)

where \(0<S(M)<+\infty \) is the quantity introduced above.

In the case of a manifold-with-boundary we can compare (4) with Theorem 2.3 to obtain a two-sided inequality

$$\begin{aligned} 0\leqq \text{Area}_1(\partial M)-\text{Area}_0(\partial M)\leqq S(M) \end{aligned}$$
(5)

for an arbitrary pair \((M,g_0)\) and \((M,g_1)\) (upon applying a homothery on the former member).

Before closing this section let us also return to the problem of the volume of the diffeomorphism group here for a moment; comparing our entropy definitions so far we come up with

$$\begin{aligned} \Gamma (\text{Diff}^+(M)):=\text{e}^{S(M)}<+\infty \end{aligned}$$

as a reasonable choice at least in the compact orientable setting.

To summarize, we have sketched a framework in which the inherent uncertainty or fuzziness of the arithmetical continuum i.e. the set \({{\mathbb {R}}}\) of real numbers or more generally any differentiable manifold, can be interpreted as a non-zero entropy of the arithmetical continuum (cf. [20, 21]), quantitatively captured by (4) at least in the compact case.

3 Secondary Noether Theory and the Entropy of Black Holes

Keeping in mind the results of Sect. 2 and recalling now [9, Section 3] we would like to approach the formula (4) again but by passing from mathematics to physics. Namely, we shall consider classical physical theories over physical space-time such that in the mathematical description of these theories the physical space-time is modeled on a differentiable manifold possessing the property (1) or more generally (2). Then, we shall ask ourselves: does the inherent uncertainty or fuzziness of the arithmetical continuum, just recognized in the mathematical model of the physical theory, “pop up” somehow among the physical propositions of the physical theory? Putting differently: does this fuzziness somehow “lift” from the mathematical level to the physical level of the physical theory? Since we have found some similarities between this purely mathematical uncertainty or fuzziness of the arithmetical continuum and the physical concept of entropy, we are going to seek entropylike phenomena in those physical theories which are particularly sensitive for the physical structure of space-time. If these sought entropylike phenomena happen to have a pure set-theoretical origin introduced by the mathematical description of the physical theory, then we expect them to have something to do with diffeomorphisms of the underlying differentiable manifold modeling physical space-time; for the expression (4) is diffeomorphism-invariant hence the entropy it describes is invariant under diffeomorphisms. Apart from this, if diffeomorphisms are in addition symmetries of the physical theory we are dealing with then we may as well try to identify these entropylike things with Noether charges associated with diffeomorphism symmetry.

By Noether’s theorem in a broad physical sense one means that “to every continuous symmetry of a physical theory a quantity can be assigned which is conserved”. It may happen however that this conserved quantity, the Noether charge, vanishes. Our goal is to demonstrate that even in this trivial case certain non-trivial de Rham cohomology classes can still be interpreted as “secondary Noether charges” associated with this symmetry of the theory. There is an analogouos situation in algebraic topology. Consider a complex vector bundle E over a topological space X. Recall that for all \(i=0,\dots ,\text{rk}\,E\) the \(i^{\text{th}}\) Chern class of E takes value in \(H^{2i}(X;{{\mathbb {Z}}})\). Therefore, if it happens that X has vanishing even dimensional singular cohomology then characteristic classes cannot be used to distinguish complex vector bundles over it.Footnote 2 However if X is a manifold M then one can still introduce the so-called secondary or Chern–Simons characteristic classes taking values, as a cohomological shift, in odd dimensional cohomology [5]. Motivated by this consideration as well as those in [7, 27] we proceed as follows.

For completeness and convenience let us recall how standard Noether theory works in case of a classical relativistic field theory. We are going to skip all the technical details here but emphasize that in case of a closed, orientable Riemannian 4-manifold all of our considerations below are rigorous mathematical statements; therefore we have a reason to expect that with appropriate technical modifications all the stuff remains valid in physically more realistic situations.

So let (Nh) be a four dimensional (non-)closed oriented (pseudo-)Riemannian manifold representing space-time and let \(\Phi \) denote the full field content of a classical field theory over (Nh) defined by a Lagrangian density \(L(\Phi ,h)\in \Omega ^4(N)\). Note that by definition the Lagrangian is not a function but a 4-form over N allowing one to talk about the corresponding action \(S(\Phi ,h)=\int _NL(\Phi ,h)\) defined by integration over N. Let \({{\mathscr {N}}}\) be the configuration space of all (but belonging to a nice function class) \((\Phi ,h)\)-field configurations over N i.e. its elements are not identified by diffeomorphisms, gauge, etc. transformations. Consider a differentiable curve \(C:{{\mathbb {R}}}\rightarrow {{\mathscr {N}}}\). We say that C is a symmetry of the theory \(L(\Phi ,h)\) if its action \(S(\Phi ,h)\) is constant along C that is, \(S(C(t))=S(\Phi (t),h(t))=\)const. for all \(t\in {{\mathbb {R}}}\). Writing \((\Phi ,h):=(\Phi (0),h(0))\) and using physicists’ usual notation define “the infinitesimal variation of the action at \((\Phi ,h)\) along C” by

$$\begin{aligned} \delta _CS(\Phi ,h):= & {} \lim \limits _{t\rightarrow 0} \frac{1}{t}(S(C(t))-S(C(0)))=\int \limits _N\lim \limits _{t\rightarrow 0} \frac{1}{t}(L(C(t))-L(C(0)))\\=: & {} \int \limits _N\delta _CL(\Phi ,h) \end{aligned}$$

where \(\delta _CL(\Phi ,h)\in \Omega ^4(N)\) is the “infinitesimal variation of the Lagrangian at \((\Phi ,h)\) along C”.Footnote 3 Assume that (Nh) is smooth and let \(\Delta _h:=\text{d}\text{d}^*+\text{d}^*\text{d}\) denote its Hodge Laplacian; recall [24, Chapter 5] that if \(\omega \in \Omega ^4(N)\) the partial differential equation \(\Delta _h\varphi =\omega \) has a smooth solution \(\varphi \) if and only if \(\int _N\omega =0\). By definition of a symmetry \(\int _N\delta _CL(\Phi ,h)=0\) hence there exists an element \(\varphi _C\in \Omega ^4(N)\) satisfying \(\Delta _h\varphi _C =\delta _CL(\Phi ,h)\). However \(\Delta _h\varphi _C=\text{d}\text{d}^*\varphi _C +\text{d}^*\text{d}\varphi _C = \text{d}\text{d}^*\varphi _C\) consequently picking any \(\eta _C\in \Omega ^2(N)\) and putting

$$\begin{aligned} \theta _C:=\text{d}^*\varphi _C+\text{d}\eta _C \end{aligned}$$
(6)

we succeeded to find an element \(\theta _C\in \Omega ^3(N)\) such that \(\text{d}\theta _C=\delta _CL(\Phi ,h)\). Note first that, although \(\varphi _C\) is well-defined only up to a harmonic 4-form i.e. an element \(\varphi \in \text{ker}\,\Delta _h\), the 3-form \(\theta _C\) is not sensitive for this ambiguity because taking into account its harmonicity, the general solution \(\varphi \) above is both closed (\(\text{d}\varphi =0\)) and co-closed (\(\text{d}^*\varphi = 0\)) hence \(\theta _C=\text{d}^*\varphi _C+\text{d}\eta _C= \text{d}^*(\varphi _C +\varphi )+\text{d}\eta _C\). Secondly, the “gauge freedom” i.e. the \(\eta _C\)-ambiguity can be fixed as well by imposing the “Coulomb gauge condition” \(\text{d}^*\theta _C=0\). Indeed, \(\text{d}^*\theta _C={\text{d}^*}^2\varphi _C+\text{d}^*\text{d}\eta _C=\text{d}^*\text{d}\eta _C=0\) (together with the Hodge decomposition theorem) implies \(\text{d}\eta _C=0\). Therefore, given a symmetry C of the theory we come up with a \(\theta _C\in \Omega ^3(N)\) which satisfies

$$\begin{aligned} \left\{ \begin{array}{lll} \text{d}\theta _C&{}=&{}\delta _CL(\Phi ,h)\\ \text{d}^*\theta _C&{}=&{}0\\ \int _N\text{d}\theta _C&{}=&{}0 \end{array}\right. \end{aligned}$$

along N and this 3-form is well-defined in the sense that it is unique and depends only on the symmetry represented by the curve C as expected.Footnote 4

Proceeding further, we call the Hodge dual 1-form \(j_C:=*_h\theta _C\in \Omega ^1(N)\) the Noether current associated with the symmery C moreover for a (spacelike) hypersurface-without-boundary \(S\subset N\) put \(q_{C,S}:=\int _S*_hj_C\) and call it the Noether charge associated with the symmetry C. The Noether charge satisfies

$$\begin{aligned} q_{C,S_1}-q_{C,S_2}=\pm \!\!\!\!\!\int \limits _{W(S_1,S_2)}\text{d}\theta _C =0 \end{aligned}$$

by applying Stokes’ theorem on a domain \(W(S_1,S_2)\subseteqq N\) with induced orientation and oriented boundary \(\partial W(S_1,S_2)= S_1\sqcup (-S_2)\). (Here we strictly speaking assume that the variation vanishes on the complementum \(N\setminus W(S_1,S_2)\) hence \(\int _{W(S_1,S_2)}\text{d}\theta _C=\int _N\text{d}\theta _C=0\) indeed.) Consequently the real number \(q_C:=q_{C,S}\) is a well-defined conserved (i.e. independent of the spacelike surface S) quantity associated with the symmetry of the theory in this sense.

Thanks to the gauge fixing condition \(\text{d}^*\theta _C=0\) the Noether current looks like \(j_C=*_h\theta _C=\pm \text{d}*_h\varphi _C\) consequently \(j_C\in [0]\in H^1(N)\) i.e. the current represents the trivial cohomology class in the first de Rham cohomology. We may then ask ourselves what about \(\theta _C\) from the de Rham theoretic viewpoint? Does it represent a cohomology class in \(H^3(N)\)? Still working in the gauge \(\text{d}^*\theta _C=0\), assume \(\text{d}\theta _C=0\) holds; then via (6) we get \(\Delta _h\varphi _C=0\) implying \(\varphi _C\) is harmonic hence \(\theta _C=\text{d}^*\varphi _C =0\). Therefore we find that \(\text{d}\theta _C=0\) if and only if \(\theta _C=0\). Consequently \(\theta _C\in \Omega ^3(N)\) represents a cohomology class \([\theta _C]\in H^3(N)\) if and only if \(\theta _C=0\) and the associated Noether charge \(q_C=\int _S*_hj_C=\pm \int _S\theta _C=0\) is trivial rendering the classical Noether theory useless in this situation.

Let us focus attention to this trivial case i.e. when for a symmetry C of the theory \(L(\Phi ,h)\) the associated total derivative satisfies \(0=[\theta _C]\in H^3(N)\) (by exploiting the gauge fixing condition \(\text{d}^*\theta _C=0\) too). The gauge fixing also implies \(\text{d}^*\varphi _C=0\) as we have seen hence the general expression (6) reduces to

$$\begin{aligned} 0=\text{d}\eta _C \end{aligned}$$

saying that \(\eta _C\) itself represents a cohomology class in \(H^2(N)\). Consequently in this situation—which is trivial from the variational viewpoint in the sense that it yields vanishing primary Noether theory, but not trivial from the topological viewpoint in the sense that \(H^2(N)\not \cong \{0\}\) may hold—we can still introduce a secondary or topological Noether current \(J_C \in \Omega ^2(N)\) by putting \(J_C:=*_h\eta _C\). Then taking any two dimensional submanifold-without-boundary \(\Sigma \subset N\) the corresponding secondary or topological Noether charge \(Q_{C,\Sigma }:=\int _\Sigma *_hJ_C\) is well-defined in the sense that it depends only on the chosen de Rham cohomology class \([*_hJ_C]\in H^2(N)\). Moreover

$$\begin{aligned} Q_{C,\Sigma _1}-Q_{C,\Sigma _2}=\pm \!\!\!\!\! \int \limits _{W(\Sigma _1,\Sigma _2)}\text{d}\eta _C=0 \end{aligned}$$

by Stokes’ theorem as before. (This time \(W(\Sigma _1,\Sigma _2)\subset N\) is a sub-3-manifold with induced orientation and oriented boundary \(\partial W(\Sigma _1,\Sigma _2)= \Sigma _1\sqcup (-\Sigma _2)\).) Consequently we have a conserved quantity in the sense that the number \(Q_{C,\Sigma }=:Q_{C,[\Sigma ]}\) depends only on \([*_hJ_C]\in H^2(N)\) and the singular homology class \([\Sigma ]\in H_2(N;{{\mathbb {Z}}})\). Although it is not necessary, just for aesthetical reasons we can suppose without loss of generality that \(\eta _C\) is the unique harmonic represnentative of \([\eta _C]\) hence both \(\eta _C\) and \(J_C=*_h\eta _C\) are closed that is, represent cohomology classes within \(H^2(N)\).

Note that, regardless what the symmetry C actually is, in order \(Q_{C,[\Sigma ]}\) not to be trivial i.e., \(Q_{C,[\Sigma ]}\not =0\), we need \([0]\not =[\Sigma ]\in H_2(N;{{\mathbb {Z}}})\) as well as \([0]\not =[*_hJ_C]\in H^2(N)\). Both conditions are met if we demand N to satisfy the topological condition that the free part of its second singular homology group \(H_2(N;{{\mathbb {Z}}})_{\text{free}}\cong {{\mathbb {Z}}}^{\text{rk}\,H_2(N;{{\mathbb {Z}}})}\) be non-zero (i.e., the rank of \(H_2(N;{{\mathbb {Z}}})\) be non-zero). Moreover, at this level of generality all cohomology classes in \(H^2(N)\) are permitted to play the role of \([*_hJ_C]\) consequently the number of linearly independent secondary Noether currents is equal to \(b^2(N)\). Therefore if \(\text{rk}\,H_2(N;{{\mathbb {Z}}})>1\) that is, \(b^2(N)>1\) then we have “too many” options to introduce non-trivial secondary Noether charges for a given symmetry. Consequently, the optimal situation for this secondary theory is when \(\text{rk}\,H_2(N;{{\mathbb {Z}}})=1\) (at this level of generality).

To summarize, we have proved:

Lemma 3.1

(cf. [9, Lemma 3.1]) Let (Nh) be a (non-)closed oriented (pseudo-)Riemannian 4-manifold satisfying the topological condition \(H_2(N;{{\mathbb {Z}}})_\text{free}\not \cong \{0\}\). Let moreover a classical relativistic field theory be given over (Nh) defined by its Lagrangian density \(L(\Phi ,h)\in \Omega ^4(N)\). Assume that \(C:{{\mathbb {R}}}\rightarrow {{\mathscr {N}}}\) is a symmetry of the theory such that the corresponding total derivative \(\theta _C\in \Omega ^3(N)\) satisfying the gauge fixing condition \(\text{d}^*\theta _C=0\) is closed i.e. \(\text{d}\theta _C=0\) (hence \(\theta _C=0\) therefore in fact \([\theta _C]=0\in H^3(N)\)).

Then there exist a 2-form \(0\not =J_C\in \Omega ^2(N)\) representing a non-trivial de Rham cohomology class \(0\not =[*_hJ_C]\in H^2(N)\) as well as a closed oriented surface \(\Sigma \subset N\) representing a non-trivial singular homology class \([0]\not =[\Sigma ]\in H_2(N;{{\mathbb {Z}}})_{\text{free}}\) such that the associated quantity \(Q_{C,\Sigma }:=\int _\Sigma *_hJ_C\in {{\mathbb {R}}}\) is not zero and depends only on \([*_hJ_C]\in H^2(N)\) and \([\Sigma ]\in H_2(N;{{\mathbb {Z}}})_{\text{free}}\) . We denote this quantity by \(Q_{C,[\Sigma ]}\) and call the secondary or topological Noether charge associated with the symmetry C and the classes \([*_hJ_C]\) and \([\Sigma ]\). For a given symmetry C the number of linearly independent cohomology classes \([*_hJ_C]\) is equal to \(\text{rk}\,H_2(N;{{\mathbb {Z}}})\). \(\square \)

Note that what we have done is in fact simple: We interpret the a priori existing cohomology classes of N as certain physical quantities whenever a theory \(L(\Phi ,h)\), possessing certain type of symmetries, has been formulated over N.

Let us apply this theory for diffeomorphisms in pure gravity in four dimensions. Let \(\nu _h\in \Omega ^4(N)\) defined by \(\nu _h:=*_h1\) be the volume-form of (Nh) and take the usual Einstein–Hilbert Lagrangian \(L_{EH}(h):=(\text{Scal}_h-2\Lambda )\nu _h\) with cosmological constant \(\Lambda \in {{\mathbb {R}}}\) and consider its variation with respect to a 1-parameter subgroup \(\{f_t\}_{t\in {{\mathbb {R}}}}\) of the orientation-preserving diffeomorphism group \(\text{Diff}^+(N)\) of the underlying space-time manifold N while the metric h is kept fixed. That is we define our curve by \(C(t):=f^*_th\in {{\mathscr {N}}}\) for all \(t\in {{\mathbb {R}}}\). The infinitesimal generator of this 1-parameter subgroup is a compactly supported vector field \(X\in C^\infty _c(N;TN)\). Since a diffeomorphism acts on k-forms via pullback and the scalar curvature is invariant under diffeomorphisms, \(L_{EH}(C(t))=f^*_tL_{EH}(f^*_th)=f^*_tL_{EH}(h)\) hence the corresponding infinitesimal variation takes the shape

$$\begin{aligned} \delta _CL_{EH}(h)=\lim \limits _{t\rightarrow 0}\frac{1}{t}(1-f^*_t) L_{EH}(h)=L_X(L_{EH}(h)) \end{aligned}$$

where \(L_X\) denotes the Lie derivative with respect to X. Substituting the Lagrangian and applying Cartan’s formula we thus get

$$\begin{aligned} \delta_CL_{EH}(h) &=\text{d}\big(({\rm Scal}_h-2\Lambda )\nu_h\big)(X,\:\cdot)+ \text{d}\big(({\rm Scal}_h-2\Lambda )\nu_h(X,\cdot)\big) \\ &= \text{d}\big(({\rm Scal}_h-2\Lambda)\nu_h(X,\cdot)\big).\end{aligned}$$

While \(\delta _CL_{EH}(h)\not =0\) in general, nevertheless we find that \(\delta _CS_{EH}(h)=\int _M\text{d}\big ((\text{Scal}_h-2\Lambda )\nu _h(X,\cdot )\big )=0\) by Stokes’ theorem hence the Einstein–Hilbert action itself is invariant consequently diffeomorphisms are both off or on shell symmetries of general relativity with possibly non-vanishing cosmological constant. The associated total derivative up to an exact term looks like \(\theta _C=(\text{Scal}_h-2\Lambda )\nu _h(X,\cdot )\) in some gauge probably not satisfying the condition \(\text{d}^*\theta _C=0\).

In order not to get lost in the gauge fixing problem assume instead that (i) we are on shell i.e., Einstein’s equation \(\text{Ric}_h=\Lambda h\) is valid hence \(\theta _C=(4\Lambda -2\Lambda )\nu _h(X,\cdot )= 2\Lambda \nu _h(X,\cdot )\) and (ii) the cosmological constant vanishes yielding \(\theta _C=0\). Consequently the diffeomorphism symmetry in on shell pure gravity with vanishing cosmological constant has vanishing associated (primary) Noether charge. However substituting \(\theta _C=0\) into (6) (and referring to the Hodge decomposition theorem) we get \(\text{d}^*\varphi _C=0\) and \(\text{d}\eta _C=0\) consequently in this physically important situation we can interpret the cohomology classes \([\eta _C]\in H^2(N)\) as Hodge duals of currents \(J_C\) in secondary Noether theory.

As we stressed even in the formulation of Lemma 3.1, interesting secondary Noether theory emerges only if the underlying manifold is topologically non-trivial in the sense formulated there. At this point, by taking e.g. a survey on known solutions [22], we make an observation which is completely independent of our considerations taken so far—hence in our opinion is very interesting!—namely: apparently all explicitly known 4 dimensional black hole solutions in vacuum general relativity with vanishing cosmological constant satisfy the topological condition formulated in Lemma 3.1. This intuitively means that because of some general reason a black hole is even topologically recognizable as a two dimensional “hole” in space-time. In fact with an appropriate restriction this observation can be proved [8] and can be considered as a global topological counterpart of well-known black hole uniqueness theorems [14]. In accordance with this provable version [8] we suppose from now on that: (Nh) is a 4 dimensional solution of the Einstein’s equation \(\text{Ric}_h=0\) and describes a single stationary asymptotically flat black hole; hence \(\text{rk}\,H_2(N;{{\mathbb {Z}}})=1\) (which apparently corresponds to the case that a “single” black hole is present). In this case the homology class of the “instantaneous” event horizon of the black hole as an (immersed) surface \(i:\Sigma \looparrowright N\) represents a non-zero element \([\Sigma ]\in H_2(N;{{\mathbb {Z}}})_{\text{free}}\cong {{\mathbb {Z}}}\).

Then we proceed as follows: like the original volume-form \(\nu _h\in \Omega ^4(N)\), the induced 2 dimensional area-form \(\sigma _h\in \Omega ^2(\Sigma )\) of the “instantaneous” event horizon \(\Sigma \) is closed consequently it represents a class \([\sigma _h]\in H^2(\Sigma )\) which is not zero since the event horizon has finite area. Then exploiting singleness and stationarity, the “instantaneous” event horizon \(\Sigma \) is connected and its area \(\text{Area}_h(\Sigma )=\int _\Sigma \sigma _h\) is constant in time consequently we can suppose that the area form is proportional to the Hodge dual of the secondary Noether current with a time-independent constant. In other words with any choice \(J_C\in \Omega ^2(M)\) for the Hodge dual of the secondary Noether current the \(\sigma _h\) satisfies that \(\mathrm{const.}\,[i^*(*_hJ_C)]=[\sigma _h]\in H^2(\Sigma )\cong {{\mathbb {R}}}\) therefore

$$\begin{aligned} \text{ Entropy } \text{ of } \text{ the } \text{ black } \text{ hole } \text{ in } (N,h)= & {} \mathrm{const.}\,\text{Area}_h(\Sigma )=\mathrm{const.}\!\!\!\int \limits _\Sigma \sigma _h= \mathrm{const.}\!\!\!\int \limits _\Sigma *_hJ_C\\= & {} \mathrm{const.}\,Q_{C,[\Sigma ]} \end{aligned}$$

offering a natural way to normalize \(Q_{C,[\Sigma ]}\) to be equal to the entropy of the black hole i.e.

$$\begin{aligned} \text{ Entropy } \text{ of } \text{ the } \text{ black } \text{ hole } \text{ in } (N,h)=Q_{C,[\Sigma ]}=\mathrm{const.} \,\text{Area}_h(\Sigma ) . \end{aligned}$$
(7)

Accepting this choice of normalization therefore a natural physical interpretation of this abstract secondary or topological conserved quantity also emerges, namely: if a 4 dimensional space-time (N, h) is a solution of the vacuum Einstein’s equation with vanishing cosmological constant and describes a single stationary asymptotically flat black hole then the secondary Noether charge associated with the orientation-preserving diffeomorphism invariance of general relativity is not zero and as a secondary conserved quantity is equal to the entropy of the black hole (cf. [7, 27]). Thus the point is that in this way black hole entropy is conceptionally connected with the invariance of the underlying differentiable manifold against its own diffeomorphisms.

We are now in a position to make our crucial observation: both the introduced set-theoretic entropy (4) previously (as the entropy of an abstract set in its manifold-macrostate) and the black hole entropy (7) here (as the Noether charge for the invariance of a stationary black hole space-time under diffeomorphisms) give rise to conserved quantities assigned to one and the same process namely the permutation of the points of differentiable manifolds by diffeomorphisms. This manifests itself in their common diffeomorphism invariance. However the two quantities, namely (4) and (7), even more resemble each other if the former is computed over a 3-manifold-with-boundary M, cast in a form of differences between areas via Theorem 2.3 yielding the two-sided inequality (5); and then this manifold-with-boundary M is inserted as a spacelike submanifold \((M,g_1)\) into a space-time 4-manifold (Nh) containing a (not necessarily stationary or single) black hole such that \(\partial M=\Sigma \) corresponds to the “instantaneous” event horizon whose area \(\text{Area}_h(\Sigma )\) is therefore equal to \(\text{Area}_1(\partial M)\). Thus (5) written as \(\text{Area}_0(\partial M)\leqq \text{Area}_1(\partial M)\leqq \text{Area}_0(\partial M)+S(M)\) qualitatively implies

$$\begin{aligned}{} & {} \text{ Entropy } \text{ of } \text{ the } \text{ black } \text{ hole } \text{ in } (N,h)\text { with ``instantaneous'' event horizon }\partial M\nonumber \\{} & {} \approx \mathrm{const.}\,+\, \text{ Entropy } \text{ of } X \text{ in } \text{ its } \text{ manifold-macrostate } M\subset N \end{aligned}$$
(8)

where \(\mathrm{const.}\approx \text{Area}_0(\partial M)\) with respect to some reference spacelike submanifold \((M,g_0)\subset (N,h)\) as in Sect. 2 throughout. Thus (8) can be regarded like the decomposition of the physical black hole entropy as mathematically appears in general relativity into the sum of an already recognized pure geometric term proportional to \(\text{Area}_0(\partial M)\) and an unexpected other term proportional to S(M) describing the inherent set-theoretic fuzziness of the mathematical model underlying general relativity. Since the right hand side of (8) contains the positive quantity (4) it follows that its left hand side (7) i.e. the entropy of a black hole cannot decrease in accord with Hawking’s area theorem [11].

The time has come to complete the circle of our arguments. On the mathematical side, the formal concept of the arithmetical continuum (or the set \({{\mathbb {R}}}\) of real numbers) contains a tacit uncertainty or fuzziness in the sense that the effective identification of the arithmetical continuum with its individual disjoint constituents, the points, cannot be carried out (our interpretation of Theorem 2.1). On the physical side, the nowadays accepted mathematical formalization of our intuitive concept of the spatial or temporal continuum in terms of the arithmetical continuum lifts the purely formal—and concerning its Leibnizian monadistic origin, metaphysical—entity, the same point again, to an ontological level. We may then ask ourselves whether or not this sort of description of space-time in a mathematical model of a physical theory introduces a similar uncertainty or fuzziness into the physical theory. Let us formulate our question more carefully. In the modern understanding by a physical theory one means a two-level description of a certain class of natural phenomena: the theory possesses a syntax provided by its mathematical core structure and a semantics which is the meaning i.e. interpretation of the bare mathematical model in terms of physical concepts. Consider a physical theory whose semantics contains a description of space and time (like general relativity) and its syntax uses the arithmetical continuum to mathematically model the thing which corresponds to the space-time continuum at the semantical level of the theory (like general relativity). Then we may ask whether or not the uncertainty or fuzziness recognized at the syntactical level of the physical theory (introduced by the utilization of the arithmetical continuum) shows up at the semantical level of the physical theory too. To answer this for a given physical theory, one has to search among those physical concepts which describe uncertainty, fuzziness, or disorder at the semantical level and check their counterparts at the syntactical level. Of course the basic physical concept of this kind is the entropy.

Therefore, in this context, one can be concerned whether or not entropy within classical general relativity, appearing in its semantics in the form of black hole entropy [2, 11], simply comes from its syntax i.e. has a pure mathematical origin only (hence probably not corresponding to any “objective” thing in the world)? This suspicion is also supported in some extent by the several controversial (geometrical, thermodynamical, quantum and information theoretic, etc.) properties of one and the same thing, the black hole entropy. Our analysis of the black hole entropy formula in general relativity culminating in its formal factorization (8) into the sum of a geometric term and a pure set-theoretic term points at least in part towards a set-theoretic origin. That is, even if the geometric term in (8) indeed corresponds to the “physical part” of black hole entropy, the next term could be a “pure mathematical” or more precisely a “pure set-theoretic” contribution only. This could be an example how certain physical statements within the physical theory of general relativity get “contaminated” by the underlying mathematical model akin to the situation in quantum field theory [1].

However, beyond the “balanced” interpretation above, if one prefers one can read (8) in two extreme ways as well, going into exactly the opposite directions as follows. The first is that black hole entropy is of pure mathematical origin without any physical content hence e.g. the long-sought as well as quite problematic physical degrees of freedom responsible for black hole entropy (far from being complete cf. e.g. [23]) would in fact be not physical at all but would simply coincide with the purely “mathematical degrees of freedom” of the point constituents of the arithmetical continuum used to formulate general relativity mathematically. We have to acknowledge that as long as the physical origin of black hole entropy is not confirmed by the experimental discovery of e.g. black hole radiation [12, 25] or other thermal phenomena, this possibility cannot be a priori refuted. The second extreme reading is that black hole entropy is of pure physical origin without any mathematical content which arises if one rather wishes to accept the physical origin of black hole entropy (and e.g. looks forward its experimental discovery). Then (8) can be interpreted as an argument for the “physical origin” (cf. Heisenberg uncertainity) of what we have called the inherent fuzziness or uncertainty within the set of real numbers [20, 21]. This interpretation then could explain the expected independence of the set-theoretic entropy (4) of any axiomatic system S which is in accordance with the universal character of Theorem 2.1.