Bridging Conceptual Gaps: The Kolmogorov-Sinai Entropy Massimiliano Badino Universitat Autònoma de Barcelona massimiliano.badino@uab.cat 1. Introduction Names tell many things. In the case I am going to explore in this paper, for example, the name Kolmogorov-Sinai entropy (KSE) molds right away the character of my investigation. For it contains "entropy", which immediately makes us think about thermodynamics, statistical mechanics, and gaseous disorder. But it also refers to Andrei N. Kolmogorov, a name tightly connected with the axiomatization of probability and with research on algorithms complexity. Finally, it alludes to Yakov G. Sinai, a mathematician lesser known to philosophers, who has worked extensively on dynamical systems theory and the notion of ergodicity. This composite name labels a concept that lies at the confluence of various research traditions. And it is precisely this historical feature that makes possible the feature I allude to in the title of this paper. It has been claimed that the KSE plays the role of a bridge-concept between dynamical system theory, statistical mechanics, and communication theory (Frigg 2004). My goal in this paper is to show that the qualities of this polymorphic concept are rooted in its convolute history, i.e. it is because it has such a history that it has such characteristics. I will therefore unfold the several threads that make it up and I will tell, alas cursorily, a long-term story covering approximately the period 1750-1960. Obviously, I will be sketchy and I will focus only upon the turning points of this story. KSE emerges from two distinctive lines of research. On the one side the research on stability of the three-body system developed in 2 celestial mechanics. This line, from its origins to Poincare, is covered in the second section. On the other side, there is the problem of ergodic motion in statistical mechanics. This problem forms the subject of section 3. In the 1930s, George D. Birkhoff showed surprising connections between these lines (section 4) and to his results Kolmogorov and Shannon added the dimension of information and algorithm complexity (section 5). Finally, before concluding I will show how these traditions come together to form the multifaceted KSE. 2. Celestial Mechanics The introduction of the universal law of gravitation brought about new opportunities as well as new puzzles. In the pre-Newtonian world the perfection of the heavenly motions was, so to say, built in the system itself. Planetary motions depended on their pre-conceived trajectories only. Newton introduced the idea that the complicated motions of the heavenly bodies was the effect of mutual interactions of bodies themselves. As a consequence, these interactions could even originate catastrophic events. For the first time in history, the intrinsic stability of the universe was no longer a given. Newton was the first one to raise the problem of the stability of the solar system: Is it possible to show analytically that gravitational interactions between the planets will never produce a collision, an expulsion or anything of the kind? However, only with the development of advanced methods of differential calculus it became possible to define the issue in a mathematically tractable way. To fix ideas, mathematicians focused upon the somewhat artificial, but well-defined, three-body problem. In its most popular version, this problem consisted in calculating the behavior of two massive bodies in interaction with each other and with a third body of negligible mass. Physically, the situation corresponds roughly to systems such as sun, earth, and the moon. In the early 1770s, Joseph Louis Lagrange managed to write down the reduced equations of motion of three bodies by eliminating the degrees of freedom corresponding to the known mechanical integrals (conservation of energy, of linear and angular momentum). This result was further improved in mid-1800 by Carl Gustav Jacobi, who added the elimination of the nodes. In addition, between late 1770s and late 1780s, Pierre Simon Laplace proved a series of results, which suggested the stability, at least within some approximations, of the three bodies. In spite of a great display of ingenuity, 3 mathematicians were not able to proceed any further. For the practical purposes of astronomy, approximate solutions in form of trigonometric series were developed, but the closed form solution of the three-body problem remained out of reach.1 Arguably, this state of affairs spoke more for the intrinsic limitations of the analytical tools hitherto used (transformation of coordinates, trigonometric series, Hamilton-Jacobi theory to mention only a few) than for the definite insolubility of the question. Henri Poincare (1854-1912) explored at length the issue and introduced two crucial turning points in the history of the stability of the three-body systems. The first turning point concerned the use of new powerful mathematical techniques borrowed mainly from topology, geometry, and set theory. Poincare's decisive intuition was that, to solve the stability problem, one does not need to provide a full-fledged, closed form solution of the equations of motion. We do not need to know the configuration of the system in each and any instant: we only want to know whether it will remain in the vicinity of a periodic motion. In other words, stability is largely a qualitative property of the system (or, put in mathematical jargon, a topological one) and can be better investigated by means of qualitative methods. Among the many new methods developed by Poincare, likely the most impressive and deployed is the so-called cross-section method, also known as Poincare's map. The cross-section method is a nice illustration of the idea behind topological techniques. Since the very beginning, celestial mechanics focused upon closed periodic trajectories. Periodicity was not only mathematically simple, it was also an observable feature of the planetary motion. Now, in general a mechanical trajectory is studied in a 3dimensional space. This space may be the usual geometrical one (in which case we know where the body is at each instant) or the more abstract phase space (in which case we know what momentum the body has at each point). To give an answer to the issue of stability, one has to perturb a periodic trajectory by a small quantity and see whether the ensuing motions stay indefinitely close to original one or they diverge. Poincare realized that this investigation could be simplified. If a surface is put transversally to the periodic trajectory, this trajectory will intersect the surface in one and only one point because of its periodicity. In this way any problem concerning the 1 Despite the old age, one of the best introductions to the history of the three-body problems is still Marcolongo (1919). On the notion of stability and its bearings on chaos see Diacu and Holmes (1996). 4 stability of trajectories in a 3-dimensional space can be reduced to problems of equilibrium point (EP) on a 2-dimensional surface. For, the neighbor trajectories will draw on the surface curves that can approach the EP or diverge from it and their behavior can be investigated with the techniques used for the equilibrium around a point. Important and consequential though they were, the topological techniques were only one of Poincare's crucial innovations. Another one was an entirely new notion of stability. The founding fathers of celestial mechanics had been working with an intuitive notion of stability and then had tried to cast it in proper mathematical language. The intuitive notion is that the solar system is stable because it "stays together", the relative configurations of the planets always repeat themselves and no catastrophic event takes place. True, there are anomalies (delays or anticipations in the passage of planets), but they are all periodic in the sense that they depend on the mutual positions of the planets and they are to present themselves again after a certain number of years. Central to this intuitive notion of stability is the priority of the periodic motion. Everything happening in the sky must eventually be reduced to some sort of–possibly very complicate– form of periodicity. Mathematically, this idea was implemented by the technique of solving differential equations by means of trigonometric series. As said, it was impossible to find a complete solution of the equations of motion for three bodies, i.e. an explicit function of the orbit parameters. Alternatively, mathematicians tried (1) to cast the problem in terms of small perturbations of a already solved question (e.g. the restricted three-body problem mentioned above, which is a perturbation of a solvable Keplerian two-body problem) and (2) to express the solving function in a (rapidly convergent) series of trigonometric functions. Now, time must obviously appear in the solution, because one is looking for orbits, that is sequence of places passed through over time. As long as time appears as the argument of trigonometric functions, stability is assured. These functions are bounded (they never assume too large, positive or negative, values), so the three bodies will stay close to one another. But if time appears as a factor of a trigonometric function, then one is in trouble. Such a configuration is called a secular term and, being the product between a bounded quantity and an ever growing one, it is doomed to diverge. Thus, the mathematical expression of stability regularly used in celestial mechanics was the following: no secular terms appear in the trigonometric series approximating the solution of the equations of motion. 5 Poincare extended importantly this conception. In his studies on the solution of differential equations, he introduced a fully new notion of stability with the following words: It happens then that the trajectory cannot be a closed curve; but, nevertheless, it keeps a certain stability: one can even say that it is a periodicity of a particular nature. In fact, let M be a point in the trajectory that the moving point occupies in an instant t. We trace a circle around the point M with an arbitrarily small radius r. The moving point starting at M will obviously go beyond the circle, but it will cross again this small circle an infinite number of times, no matter how small r can be.2 In other words, the trajectory is stable according to this definition if (and only if) it returns arbitrarily close to the initial condition M no matter how complicate and long is the in-between path. This notion of stability– which Poincare ascribes to Poisson with a considerable amount of historical inaccuracy–is lengthily discussed in the third volume of the Methodes Nouvelles (Poincare, 1899) and expresses the property of recurrence of some trajectories. This is the first case in which stability is not studied in the context of strictly periodic motion only. Recurrent motion will become extremely important later in our story. But the most spectacular of Poincare's discovery is still to come. Since the early 1880s, Poincare devoted his best efforts to the three-body problem. In 1889 a prize was offered in celebration of King Oscar II of Sweden's birthday to a mathematician able to show whether further integrals could be found to reduce the number of degrees of freedom of the equations of motion. Poincare won the prize although his original memoir contained a serious mistake that was discovered only during the proof reading. However, the details of this story interest us here less than the results of his work.3 Poincare indeed showed that is was impossible to find analytical integrals other than those already known. This amounted to a mathematical proof that a closed form solution of the equations of motion could not be discovered. But the most intriguing result came out of the application of the cross section method to the behavior of trajectories in the neighborhood of a periodic orbit. To understand the full extent of Poincare's finding, we have to introduce some more technical notions. 2 Poincare (1885, 92). 3 This of course does not mean that they are unimportant. A lively account of this interesting story can be found in Barrow-Green (1997). 6 The equations of motion can be studied just as a system of differential equations, each solution of which corresponds to a possible trajectory. (Poincare was actually the first to call "trajectory" the solution of a system of differential equations). On a Poincare's map, a periodic solution is a point. This EP is surrounded by trajectories that start from far away and tend to it and trajectories initially close that tend to go away from it. We call these trajectories "asymptotic" because they can be both considered as approaching the EP in the two temporal directions. (Mathematicians have a very cavalier attitude toward time). For intuitive reasons, the set of trajectories approaching the EP in the positive direction of time is called "stable manifold", whereas those approaching the EP in the negative direction belong to the "unstable manifold". In Hamiltonian integrable systems these manifolds coincide, that is the distinction is reduced to the temporal direction. Poincare's initial mistake was to suppose that this situation holds without exceptions. Later he discovered that the manifolds can intersect transversally and their intersection is called a "homoclinic point" (HP). Around a HP a lot of weird things happen. A subsequent application of a Poincare's map to a HP generates trajectories that tend to equilibrium in both directions of time. In other words, the trajectory is recurrent along an extremely complicate path. Furthermore, the manifolds intersect infinitely many times, thus originating infinitely many HP and an intricate entanglement of trajectories. Poincare's description of the "homoclinic tangle" gives us a sense of the awe he was in: When we try to represent the figure formed by these two curves and their infinitely many intersections, each corresponding to a doubly asymptotic solution, these intersections form a type of trellis, tissue or grid with infinitely fine mesh. Neither of the two curves must ever cut across itself again, but it must bend back upon itself in a very complex manner in order to cut across all of the meshes in the grid in an infinite number of times. The complexity of this figure is striking, and I shall not even try to draw it. Nothing is more suitable for providing us with an idea of the complex nature of the three-body problem, and of all the problems of dynamics in general.4 The homoclinic tangle is the first example of what in the 20th century will come to be called "deterministic chaos". Very diverse trajectories are so closed packed that the slightest change in the initial conditions will lead to a different trajectory and, potentially, to a completely different behavior of the system. Thus, the evolution is virtually unpredictable, because the requested precision in the description of the initial conditions is impossibly high. 4 Poincare (1899, 1059). 7 3. Statistical Mechanics As celestial mechanics was reaching its dramatic climax with Poincare's work, another newly born branch of mechanics was taking physicists and mathematicians through a terra incognita. The origins of this development lie in the attempts, about mid-19th century, to explain thermodynamic phenomena by means of kinetic models. Roughly, the idea behind these models is that heat is due–and it is reducible–to the mechanical motion of the microscopic constituents of matter. To substantiate this idea it was necessary to derive thermodynamic laws from the analysis of the motion and collisions of the particles and, being the number of particles enormously large, to resort to statistical techniques. The Scottish physicist James Clerk Maxwell (1831-1879) was one of the first and most successful to explore this line of inquire. He found, for instance, that the stability of the state of thermal equilibrium could be mimicked by a distribution of velocities among the particles, which did not change by mechanical collisions. However, there was a problem that Maxwell was not able to solve. It is a common experience that thermal systems go straight to their equilibrium state and there they stay until some external perturbation force them out of equilibrium. This state of affair is customarily referred to as the second law of thermodynamics. Now, it is difficult to represent the one-directionality of this behavior through mechanical motion, because the latter does not recognize any privileged temporal direction. In other words, if the reaching of thermal equilibrium boils down to a mechanical process, then it is unclear how it can also be irreversible, since mechanical processes in general are not. A possible solution to this puzzle was offered by Ludwig Boltzmann (1844-1906) in 1872. Boltzmann derived an integro-differential equation able to represent the time evolution of the distribution function f. Further, he showed that a functional ! = !!! log ! (integration is extended over all possible velocities v) can be defined, which decreases monotonically as time passes by and it reaches the minimum when f is precisely Maxwell's distribution of equilibrium. Keep in mind the form of H, it will crop up again later. Boltzmann's miraculous result is not a purely mechanical one. He added a great deal of probabilistic assumptions and statistical arguments. How did he manage to combine mechanics and probability? Well, he understood that this task demanded a step beyond the usual periodic motion and towards a new kind of mechanical trajectory. The first trace of this line of thought can be found in Boltzmann's very first paper. Let me summarize briefly his argument. 8 Boltzmann is trying to show that the second law of thermodynamics can be formally reduced to the principle of the least action. The details of this procedure do not interest us here. What is important is that Boltzmann has to calculate the action integral over the particles trajectory. In general, this integral depends on the initial and final conditions of the gas, conditions that we cannot know because of the huge number of particles. Before Boltzmann, the usual technique was to assume that the particles motion is periodic and close, to integrate over the whole period so that initial and final conditions are equal and cancel each other out. But Boltzmann moves a step forward. He realizes that only closure is essential for this argument, while we do not need to assume a fixed period. He states his assumption in the following way:5 We now assume that, after a certain time, each atom will come back [...] in the same position, with the same velocity and direction of motion, that is to say it describes a closed curve and after that time it repeats its motion albeit not exactly in the same way, at least in a way such that the average kinetic energy [on the given temporal interval] might be considered as the average kinetic energy on an arbitrarily long interval.6 The key passage has been italicized. Here Boltzmann is fathoming a trajectory that is not strictly periodic, but nevertheless closes at some point. In other words, a trajectory that does not pass through a fixed and immutable sequence of points, but can be very complicate provided that, sooner or later, it will pass again through the initial conditions. He is clearly groping for something new in mechanics, something that he will try to make clearer in his following papers. In the early 1870s, Boltzmann drew on this novel concept of trajectory and introduced what is now known as the ergodic hypothesis: If a gas evolves freely with no other constrain than the conservation of energy (and possibly momentum), then it will sooner or later pass through all the physical states compatible with the constrains. The ergodic hypothesis became the key to merge mechanics and probability: it is a dynamical assumption because it concerns the trajectory, but it can also be used to support a probabilistic analysis of the long-term behavior of the system.7 5 Quotations and references to Boltzmann's papers are taken from the Wissenschaftliche Abandhlungen. 6 Boltzmann (1866, 24). 7 On these issues see especially Badino (2009) and Badino (2011). On the ergodic hypothesis see also Von Plato (1991) and Von Plato (1994). 9 4. Dynamical Systems The two traditions I have been discussing so far proceeded almost independently for the whole second part of the 19th century. From the physical viewpoint this is unsurprising. Celestial mechanics deals with macroscopic deterministic systems consisting of few degrees of freedom (a handful of planets, a little more satellites). By contrast, statistical mechanics tackles microscopic systems with a huge number of constituents. The two contexts could not be more different. However, from a purely mathematical point of view, and with a grain of hindsight, this mutual indifference is bewildering. For a close analysis of these two fields of research shows that there was a great deal of common mathematics. Both fields were characterized by the impossibility of finding a complete solution to a mechanical question, so techniques to circumvent this hurdle popped up in both camps, more often than not without stimulating any further curiosity. A striking and somewhat extreme example is the concept of integral invariant. The integral invariant is a function of the phase coordinates whose integral does not change during the motion. Boltzmann introduced it in his– rather spurious–attempt to prove Liouville's theorem,8 and made it one of the ingredients of his first probabilistic argument for irreversibility. Subsequently, Poincare rediscovered, formalized, and deployed it to prove the recurrence theorem, which states that, provided some very general constrains, a Hamiltonian isolated system will come back, sooner or later, infinitely close to the initial conditions.9 Now, in 1896 Ernst Zermelo used the recurrence theorem to argue against Boltzmann's probabilistic view of irreversibility.10 Thus, Zermelo used a result derived from the integral invariant in celestial mechanics to argue against a result derived from the same integral invariant in statistical mechanics! Boltzmann saw how hopelessly paradoxical the situation was when he replied bitterly: Although Herr Zermelo's paper shows that my works have not been understood at all, I have to rejoice in it anyway, for that is the proof that, in Germany, they have been paid any attention to, at least.11 8 In effect, Boltzmann proves only a particular case of Liouville's theorem. The phase volume is famously an example of integral invariant; cf. Boltzmann (1868) and Badino (2009). 9 In other words, the recurrence theorem formalizes the condition for the Poisson stability discussed above. 10 See Zermelo (1896). 11 Boltzmann (1896, 773). 10 Even the most prominent scientists, Poincare and Boltzmann, did not consider this commonality as worth further inquiring. But George D. Birkhoff (1884-1944) thought differently. Birkhoff's first research paper was published in 1912, the year of Poincare's death. And indeed, Birkhoffs's entire research program was inspired by the work of the French mathematician. His paper contains, in the first few lines, the two keywords of his grand project.12 First, he wanted to establish a new and more general branch of mathematics concerned with the deep formal structure underlying the analysis of mechanical systems. The theory of dynamical systems was born. Second, drawing on Poincare's intuition, he generalized the notion of periodicity into the idea of recurrent motion, that is a motion that, sooner or later, comes back to the initial conditions.13 Some years later, he would explain the essence of this idea as follows: In a very deep sense the periodic motions bear the same kind of relation to the totality of motions that repeating doubly infinite sequences of integers 1 to 9 such as ... 2323... do to the totality of such sequences. [...] The recurrent motions correspond to those double sequences specified above in which every finite sequence, which is present at all occurs at least once in every set of N successive integers of the sequence.14 This research program led, in the early 1930s, to a surprising result. To understand its importance, we have to come some years back. We said that Boltzmann introduced the ergodic hypothesis in statistical mechanics. Albeit problematic, this hypothesis was particular useful because it consented to prove, among other things, the uniqueness of the equilibrium distribution function. More generally, from the ergodic hypothesis a remarkable property followed: the average value of a quantity (e.g. the energy) calculated over the trajectory of the system during a very long time (the time average of that quantity) is equal to the instantaneous average calculated over a large number of copies of the system in the most different initial conditions (the phase average of that quantity). Unfortunately, the hypothesis turned out to be false. In 1913, Arthur Rosenthal and Michel Plancherel proved, independently, that no mechanic trajectory could be ergodic in the original, Boltzmannian sense. This result did not discourage physicists, who continued to assume an intuitive notion of ergodicity and to believe in its consequences, such as the uniqueness of equilibrium. But it certainly opened a breach in the formal structure of statistical mechanics. 12 Cf. Birkhoff (1912). 13 On the thread that leads from Poincare to Birkhoff see Roque (2011). 14 Birkhoff (1920, 54-55). 11 Birkhoff was not interested in statistical mechanics, but his research, eventually, repaired serendipitously that breach. In 1928 he introduced a new concept, that of metric transitivity. Birkhoff was studying the properties of recurrent mechanical transformations (i.e. transformations from the phase space onto itself with the properties of mechanical trajectories) and he defined metrically transitive a transformation that cannot be contained in any subset of the phase space with positive measure. By definition, a transformation takes place on its phase space, constructed from the general constrains. Many transformations, however, occupy only a small portion of this space. For instance, the motion of a planet passes through a sequence of points that make up a subset of its phase space: given its energy and momentum, the planet could theoretically perform many other motions. Now, if it is not possible to find out a subset of the phase space with positive volume containing completely the trajectory–or, said alternatively, if this set is only the phase space itself–then the transformation is metrically transitive. In 1931, Birkhoff was able to prove that, if a transformation is metrically transitive, then the time average of a quantity calculated on that transformation is equal to its phase average. In other words, this result, commonly known as the ergodic theorem, showed that–at least some of–the consequences of the ergodic hypothesis were obtainable by a new assumption, metric transitivity, not provably false. This feat of Birkhoff followed as a largely unintended outcome of its extension of Poincare's line of thought and originated a new branch of mathematics nowadays known as ergodic theory.15 This is where the part of our story related to mechanics stops. A complex tradition of studies on the abstract properties of mechanical systems climaxed in a result that unified two apparently unrelated fields of research. Now the reader might wonder what happened to the "probabilistic" component of statistical mechanics. Probability is about to appear again in the next section. 5. Probability, Information, Computability Nearly contemporarily to Birkhoff's ergodic theorem, one ocean and almost a continent far away, Andrei N. Kolmogorov (1903-1987) proposed what is still today the accepted axiomatization of probability theory. Although in use since many years, probabilistic concepts had often been ill-understood 15 On the ergodic theory see Sklar (1993), Badino (2006). 12 at best, badly misunderstood at worst. As a consequence, both in mathematics and in physics, probability was used following more the intuition than a rigorous mathematical procedure. Boltzmann, to cite just an example, deployed different implicit definitions of probability and committed to none. At the beginning of the 20th century Emile Borel and David Hilbert surmised that measure theory, a new and powerful mathematical resource introduced by Henri Lebesgue, could be particularly apt to illuminate the field. In 1928, these ideas were taken up by Kolmogorov, who at that time was groping for a logically clearer systematization of probability theory able «to distinguish those elements of probability (theory) that will determine its internal logical structure» (Kendall 1989, 884). The results of these efforts were published in 1933. Kolmogorov's axiomatic structure of probability theory considers events from a set-theoretical point of view and probability from a measure-theoretical one. Let's assume that E is the set of elementary events and F a subset in it, whose members are said random events. The following axioms are laid down: 1. F is a field (that is, it is closed with regard to union, intersection, and complement). 2. F ⊂ E. 3. To each set A of F, a number P(A) is associated said the probability of A. 4. P(E) = 1. 5. If A and B are disjoint sets, then P(A ∪ B) = P(A) + P(B). According to these axioms, the probability function is just a normalized measure function of the set size. Kolmogorov's axioms allowed a consistent systematization of the known results of probability theory and were therefore broadly accepted by the mathematical community. Furthermore, the connection between probability and measure opened up an important network of relations with other branches of mathematics relying heavily on measure theory such as the theory of dynamical systems and information theory. It's a feature of Kolmogorov's genius that he was able to see crucial and deep conceptual similarities between seemingly unrelated fields. Like Birkhoff, who managed to unify a research tradition on celestial mechanics with the ergodic problem by unfolding the deep-seated common mathematical structure, Kolmogorov perceived that a rigorously axiomatized probability theory could provide a universal language to handle a whole spectrum of questions. 13 However, to understand Kolmogorov's treatment of these questions, and the way in which they wound up in the KSE, we have to make a small detour and come back to the United States. Famously, World War II stimulated an awful lot of cutting-edge mathematical work. One example was the work carried out by Claude Shannon (1916-2001) at the Bell Laboratories. Driven by military purposes, Shannon elaborated a mathematical infrastructure to represent the general process of transmit data and to derive information from them. This work was classified during the war and published only in 1948.16 Shannon's stroke of genius was to define the information attached to a message as "removed uncertainty". This idea is at once deep and easy to understand.17 Let us assume that I pick up the newspaper to find out which horse won the race. The information conveyed by this piece of news depends on how uncertain I am about the result. If I think that there's a very high probability that a given horse won, I will give to the newspaper a somewhat casual glance. But if I think that the chances were more or less even among the participants, I will be very eager to know the outcome. The uncertainty removed by the information is higher and so it is its value. Shannon's second step was to translate this idea into a mathematically treatable quantity. To fix ideas, let's assume that our language consists of N symbols x1, ..., xN, which may occur with different probabilities P(x1), ..., P(xN). By imposing some general and reasonable constrains, Shannon concluded that the amount of information related to receive one of those symbols is ! = − P(!!) logP(!!)!!!! . The similarity between this expression and Boltzmann's H-function is patent. Consequently, Shannon called the amount of information, i.e. of removed uncertainty, entropy. We start now to see the path that leads to the KSE. In statistical mechanics, entropy is the measure of the disorder of a system. From a kinetic point of view, the system is at thermal equilibrium when it is spread over the allowed phase space and its energy is as equally divided among the particles as possible. That is the case in which we are more uncertain about where to find the particle and the corresponding information is more valuable. Thus, there's an intuitive relation between being in a disordered state and the amount of information concerning the specificities of this state. 16 Shannon's work is today available both in its original form, Shannon (1948), and in a more introductive arrangement, Shannon and Weaver (1949). 17 On information theory see the classical Cover and Thomas (1991). For a discussion of the philosophical meaning of information see Badino (2004). 14 Shannon's surprising result was to show that this intuition could be push to the extent of being captured by the same mathematical function. The notion of information entropy can be easily generalized from individual symbols to messages considered as arbitrarily long sequences of symbols. Shannon realized that sometimes the messages so generated are redundant, i.e. the same amount of information can be conveyed by means of fewer symbols. For example, it is often the case that a sentence in English is understandable even though the vowels are taken out. This process of reducing the length of the message is called "coding". When a message is encoded its original appearance gets modified according to an algorithm. The encoded message transports the same amount of information that can be retrieved fully by decoding the outcome. However, it might happen that the transmission through a communication channel can alter the message. The problem that Shannon tackled in his first paper was: is there a way to code the message such that it is always possible to retrieve its original amount of information despite the channel noise? The answer was given by the Shannon theorem: if the entropy of the message does not exceed another quantity called the capacity of the channel, a coding can always be found that gets the error probability of decoding down to zero. Of course, the higher the noise, the most complex the coding and, moreover, the longer the time required for coding and decoding. Shannon's information theory has brought to us a new concept: algorithm. And with this new concept we get back to Kolmogorov. Since his early papers, Kolmogorov was concerned with the notion of complexity. For instance, when he was 19 years old, he investigated the structure of Fourier series to understand to what extend they could approach a random behavior. From 1950 onwards, following the publication of Shannon's work, he increasingly focused upon the relation between complexity and information. Later, in 1987, he went as far as claiming that the usual order that sees probability theory as a fundamental starting point, should be turned upside down: Information theory must precede probability theory, and not be based on it. By the very essence of this discipline, the foundations of information theory have a finite combinatorial character. The applications of probability theory can be put on a uniform basis. It is always a matter of consequences of hypothesis about the impossibility of reducing in one way or another the complexity of the description of the objects in question. Naturally, this approach to the matter does not prevent the development of probability theory as a branch of mathematics being a special case of general measure theory. The concepts of information theory as applied to infinite sequences give rise to very interesting investigations, which, without being indispensable as a basis of probability theory, can acquire a 15 certain value in the investigation of the algorithmic side of mathematics as a whole.18 Information theory provided Kolmogorov with a concrete display of the potentialities of probability and the generality of the concept of entropy, but it was Alan Turing's work that led him to a manageable definition of complexity. That was formulated in 1965.19 Let's assume a message as a sequence of symbols. Now, a Turing machine can "compute" it, i.e. the machine can reproduce it fully when given a suitable program, called an algorithm. Intuitively, the more uniform the sequence, the shorter the algorithm to be provided to the Turing machine for computation. For example, a sequence like "11111..." is computable by means of the simple instruction "write a 1". A slightly more complicated sequence like "121212..." is reproduced by means of "write a 1 and then write a 2". You got the idea: as the sequence approaches a genuinely random one, the complexity of the algorithm increases. In this vein, Kolmogorov defines the intrinsic complexity of a sequence as the length of the shortest program that would allow a Turing machine to compute the sequence. Consequently, a sequence is truly random when the algorithm is as long as the sequence itself. In other words, the only way to have a Turing machine to do the job is to feed it with the sequence itself. This idea reminds us Shannon's notion of coding. The coding procedure enables us to encapsulate the information of a message into a shorter list of symbols in a way that the initial message can always been unequivocally retrieved. If the message is just a random bunch of symbols, no rule can be discerned and no coding is possible, other than the trivial one that codes the message into itself. Kolmogorov was therefore able to single out the essence of Shannon's idea and to translate it into a new field, theory of complexity. In addition, the common language of probability in terms of measure theory pointed to other territories. In the same years Kolmogorov worked extensively on dynamical systems and laid down the foundations of what is today known as the KAM theorem (after Komolgorov, who first formulated the idea, the Soviet mathematician Vladimir Arnold and the German one Jürgen Moser, who provided a proof and a generalization). The KAM theorem challenges the intuition that if we have a stable trajectory and we perturb it, the result will tend to ergodicity. Kolmogorov, Arnold and Moser showed that there is indeed an entire class of trajectories (called "invariant tori"), which remain substantially unchanged by (small enough) 18 Cover, Gacs and Gray (1989, 840-841). 19 See Kolmogorov (1965). 16 perturbations. Thus, in Kolmogorov's flexible mind, notions of information, complexity, and dynamical system formed a conceptual cluster in which he saw more the formal similarities than the differences. It is this appreciation for the mathematical structures that made the KSE possible. 6. Kolmogorov-Sinai Entropy Looking back at the story I have been telling you so far, we can pinpoint three distinct notions of randomness. Celestial mechanics focused on periodic motion and led progressively to chaos theory. Chaotic trajectories are ultimately deterministic, but they appear random in the sense that they are unstable, very sensitive to the initial conditions (small initial perturbations originate hugely different final states), and unpredictable. Statistical mechanics, on the other hand, introduced a new kind of motion, the ergodic one, which was contort and "random" from the start. In addition, statistical mechanics uses probability and statistical tools to supplement mechanics. Again, although deterministic in its essence, this branch of physics uses randomness in the sense of disorder (of the particles) and equiprobability of the microscopic states. Finally, information theory and algorithmic theory also assume deterministic messages, but recognize that they can be random when it becomes impossible to find a set of rules to compute them. The common assumption of an ultimately deterministic world notwithstanding, it would be too quick to cash these notions in terms of epistemic randomness. For instance, several authors have highlighted that coding and computation have an energy as well as a time cost: to compute a "random" chaotic trajectory or the exact motion of billions of particles would exceed the resources of the universe.20 From this point of view, the impossibility of predicting chaotic behavior is more a physical than an epistemic hindrance. Put in other words: it's not that our intelligence is too weak to comply with the epistemic standards of computability, it's that the latter are too high for the universe we live in. Moreover, from the story told in the previous sections, these different notions of randomness appear to be several sides of a multifaceted idea or, if you like a less essentialistic metaphor, components of a conceptual cluster, tied together by deep mathematical relations. The KSE is a way to capture these relations and turn them into a workable mathematical tool. Introduced 20 See for instance Ruelle (1991) and Kellert (1993). 17 by Kolmogorov in 1958 and, independently, by Sinai in 1959, the KSE has been defined in very close analogy with Shannon's entropy.21 Let's consider a point in the phase space. This point represents the state of the system at a certain instant. The system evolves according to mechanical laws, which can be represented by a transformation of the phase space onto itself. We can use this fact to refine progressively our knowledge of the trajectory in the following way. If we apply the transformation back in time, the result will be a partition of the space into subsets, one of which containing the phase point. We apply the transformation another time and we get a partition of the partition and the phase point at that time will be contained in one of those sub-subsets. As the procedure goes on, one can construct finer and finer subdivisions of the phase space, which allow for a more and more specific description of the state of the system at that time. The intersection between all partitions containing the images of the phase point gives a representation of the trajectory. The important aspect to grasp is that the removed uncertainty in determining where the point is placed changes at each step. Let's assume that the transformation divides the space in two subsets at each step. At the first step the point is in one of two subsets, in the second in one of four, in the third in one of eight and so on. A trajectory is the sequence of subsets in which, step by step, we can find the point. This sequence works as a message associated to a mechanical trajectory. Thus, we can ascribe to it an amount of information. If Wi is a sequence, then ! !! = − !(!!) log !(!!)! , where μ is a suitable measure function. Since the procedure is discontinuous and goes through several steps, we can define the amount of information acquired at each step in the following way h = lim!→!!!(!!)/!. Finally, the KSE is defined as the supremum 22 of this information for all possible sequences gained: h!! = !!!!h. The KSE is defined in a very curious and composite fashion. It deploys techniques of statistical mechanics such as the partition of the phase space and measure function. At the same time, it parallels a trajectory and a message, thus it also concerns the algorithm complexity. This connection has been rigorously proved by Brudno in 1978 with a theorem stating that for almost all possible trajectories, KSE gives the algorithmic complexity of the corresponding sequence. On the other hand, it also relates to the theory 21 There are many ways to introduce the KSE. Here I follow closely Dorfman (1999). 22 The supremum of a set is the least element of the set that is greater than (or equal to) any other element of the same set. 18 of dynamical systems and specifically to the problem of instability. One of the many techniques to establish the instability of a trajectory is called Lyapunov exponents. This method was already known to Laplace, but it was systematized and generalized by Alexander Lyapunov at the end of the 19th century. One introduces small variations in the initial conditions of a periodic solution of the equations of motion and calculated the equations for these variations. The solutions, in general, have the form of exponential functions. Depending on the fact that these exponents are real or imaginary, the perturbed trajectory will tend to get away from the periodic one or to stay close to it. In other words, the Lyapunov exponents are a measure of the instability of the trajectory. In 1977 Pesin proved that the KSE could be interpreted as the sum of the Lyapunov exponents of the trajectory. This result does not only spell out the connection with instability, but it also reconfigures the KSE as a measure of chaos. In effect, the increase of the KSE represents a transition of the trajectory to a chaotic behavior. Lastly, a close relation between this concept and information theory can also be defined. In 2004 Roman Frigg has argued along this direction. He claimed that, by partitioning the phase space into cell, we could represent a trajectory on the phase space as the sequence of cell passed through. Now, the cells are tantamount to symbols of a previously defined language and therefore a trajectory is isomorphic to a message. Of this message we can calculate the Shannon entropy. In this way, the KSE can be applied to messages as well and to give a measure of the "chaoticity" of the message. Thus, several notion of randomness get captured and put into a conceptual network by the same formal tool. 7. Conclusions Now, you see how the several strands that compose our story come together in a new mathematical concept. As a conclusion of this survey, I would like to stress some points. First, it should be clear that to appreciate the conceptual content of the KSE it is essential to look at the intricate story to which, albeit implicitly, the KSE refers. Mind, it is not merely a matter of contextualization. Instead, the unfolding of the historical threads packed in the concept allows for a qualification and even a reconfiguration of its epistemological status. On the one side, the connections implicit in the KSE appear less surprising. On the other, when seen through its genealogy, the concept looks still in flux. True, the formal relations and the methods required are robust enough, but it would be too quick to claim that we 19 understand the concept. There is a tendency in philosophy of science to see scientific concepts as being born in a historical vacuum and being anchored to nature only by mathematics. Their intrinsic motion, their potentiality, their internal life, so to speak, gets lost more often than not. The historical perspective adds, I think, this epistemological dimension, that it shows how a scientific concept is a knot, a crossroads in a complex network of traditions and, consequently, it has built-in several, sometimes even mutually contradictory, potentialities. This partially explains why we are still far from understanding the connections between different notions of randomness and different branches of mathematics related to them. A second point I wish to emphasize concerns concept construction. As we have seen, the concepts of randomness had all an intuitive appealing. Random are things that change abruptly, without rules, with all results equally possible. The strategy pursued by physicists and mathematicians to understand these notions was essentially to encapsulate the intuition into a web of mathematical techniques–in turn coming with specific traditions–in order to convert a vague intuition into a set of methods that can be communicated, taught, disseminated, worked on, and expanded. This composite origin of scientific concepts is also historically situated and must be historically comprehended. Not for exhaustiveness', but for epistemology's sake. 20 Bibliographical References Badino, M., 2004, «An Application of Information Theory to the Problem of Scientific Experiment», Synthese, CXL, n. 3, pp. 355-389. Badino, M., 2006, «The Foundational Role of Ergodic Theory», Foundations of Science, XI, n. 4, pp. 323-347. Badino, M., 2009, «The Odd Couple: Boltzmann, Planck and the Application of Statistics to Physics (1900-1913)», Annalen der Physik, XVIII, n. 2-3, pp. 81-101. Badino, M., 2011, «Mechanistic Slumber vs. Statistical Insomnia: The Early Phase of Boltzmann's H-theorem (1868-1877)», European Physical Journal – H, XXXVI, n. 3, pp. 353-378. Barrow-Green, J., 1997, Poincare and the Three-Body Problem, Providence, American Mathematical Society. Batterman, R., 1993, «Defining Chaos», Philosophy of Science, LX, pp. 4366. Birkhoff, G. D., 1912, «Quelques théorèmes sur le movement des systèmes dynamiques», Bulletin Societé Mathematiques de France, XL, pp. 305323; Collected Mathematical Papers, I, pp. 645-672. Birkhoff, G. D., 1920, «Recent Advances in Dynamics», Science, LI, n. 1307, pp. 51-55; Collected Mathematical Papers, II, pp. 106-110. Boltzmann, L., 1866, «Über die mechanische Bedeutung des zweiten Hauptsatzes der Wärmetheorie», Wiener Berichte, LIII, pp. 195-220; Wissenschaftliche Abhandlungen, I, pp. 9-33. Boltzmann, L., 1868, «Studien über das Gleichgewicht der Lebendigen Kraft zwischen bewegten materiellen Punkten», Wiener Berichte, LVIII, pp. 517-560; Wissenschaftliche Abhandlungen, I, pp. 49-96. Boltzmann, L., 1896, «Entgegnung auf die wärmetheoretischen Betrachtungen des Hrn. E. Zermelo», Annalen der Physik, LVII, pp. 773-784; Wissenschaftliche Abhandlungen, III, pp. 567-578. 21 Brudno, A. A., 1978, «The Complexity of the Trajectory of a Dynamical System», Russian Mathematical Surveys, XXXIII, pp. 197-198. Cover, T. M, and Thomas, J. A., 2006, Elements of Information Theory, New York, Wiley. Diacu, F. and Holmes, P., 1996, Celestial Encounters. The Origins of Chaos and Stability, Princeton, Princeton University Press. Dorfman, J. R., 1999, An Introduction to Chaos in Nonequilibrium Statistical Mechanics, Cambridge, Cambridge University Press. Cover, T. M, and Gacs, P. and Gray, R. M., 1989, «Kolmogorov's Contributions to Information Theory and Algorithmic Complexity», The Annals of Probability, XVII, n. 3, PP. 840-865. Frigg, R., 2004, «In What Sense is the Kolmogorov-Sinai Entropy a Measure for Chaotic Behaviour?–Bridging the Gap Between Dynamical Systems Theory and Communication Theory», British Journal for the Philosophy of Science, LV, pp. 411-434. Kellert, S. H., 1993, In the Wake of Chaos, Chicago, University of Chicago Press. Kendall, D. G., 1991, «Andrei Nikolaevich Kolmogorov. 25 April 1903 20 October 1987», Bibliographical Memoirs of Fellows of the Royal Society, XXXVII, pp. 300-319. Kolmogorov, A. N., 1933, Grundbegriffe der Wahrscheinlichkeitsrechnung, Berlin, Springer. Kolmogorov, A. N., 1965, «Three Approaches to the Definition of Quantity of Information», Problems on Information Transmission, I, n. 1, pp. 17. Marcolongo, R., 1919, Il problema dei tre corpi da Newton (1686) ai giorni nostri, Milano, Hoepli. Poincare, H., 1885, «Sur les courbes définies par une équation différentielle», Journal de Mathématiques, I, pp. 167-244. 22 Poincare, H., 1899, New Methods of Celestial Mechanics 3. Integral Invariants and Asymptotic Properties of Certain Solutions, ed. by D. L. Goroff, College Park, American Institute of Physics. Roque, T., 2011, «Stability of Trajectories from Poincare to Birkhoff: Approaching a Qualitative Definition», Archive for History of Exact Sciences, LXV, pp. 295-342. Ruelle, D., 1991, Chance and Chaos, Princeton, Princeton University Press. Shannon, C. E., 1948, «A Mathematical Theory of Communication», Bell System Technical Journal, XXVII, pp. 379-423, 623-656. Shannon, C. E. and Weaver, W., 1949, The Mathematical Theory of Communication, Urbana, The University of Illinois Press. Shiryaev, A. N., 1989, «Kolmogorov: Life and Creative Activities», The Annals of Probability, XVII, n. 3, pp. 866-944. Sklar, L., 1993, Physics and Chance, Cambridge, Cambridge University Press. Von Plato, J., 1991, «Boltzmann's Ergodic Hypotesis», Archive for History of Exact Science, XLII, pp. 71-89. Von Plato, J., 1994, Creating Modern Probability, Cambridge, Cambridge University Press. Zermelo, E., 1896, «Über einen Satz der Dynamik und die mechanische Wärmetheorie», Annalen der Physik, LVII, pp. 485-494.