Skip to main content

Advertisement

Log in

Efficiency in Organism-Environment Information Exchanges: A Semantic Hierarchy of Logical Types Based on the Trial-and-Error Strategy Behind the Emergence of Knowledge

  • Research
  • Published:
Biosemiotics Aims and scope Submit manuscript

Abstract

Based on Kolchinsky and Wolpert’s work on the semantics of autonomous agents, I propose an application of Mathematical Logic and Probability to model cognitive processes. In this work, I will follow Bateson’s insights on the hierarchy of learning in complex organisms and formalize his idea of applying Russell’s Type Theory. Following Weaver’s three levels for the communication problem, I link the Kolchinsky–Wolpert model to Bateson’s insights, and I reach a semantic and conceptual hierarchy in living systems as an explicative model of some adaptive constraints. Due to the generality of Kolchinsky and Wolpert’s hypotheses, I highlight some fundamental gaps between the results in current Artificial Intelligence and the semantic structures in human beings. In light of the consequences of my model, I conclude the paper by proposing a general definition of knowledge in probabilistic terms, overturning de Finetti’s Subjectivist Definition of Probability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  • Achella, S. (2022). Idealism and Science of Life: An Intersection Between Philosophy and Biology. In N. Rezaei, & A. Saghazadeh (Eds.), Thinking. Integrated Science, (vol. 7, pp. 111–131). Springer.

  • Barbieri, M. (2008). Biosemiotics: a new understanding of life. Naturwissenschaften, 95, 577–599.

    Article  CAS  PubMed  Google Scholar 

  • Bateson, G. (1968). The Logical Categories of Learning and Communication, and the Acquisition of World Views. Extended in Steps to an Ecology of Mind. Jason Aronson Inc (1987), 284–314.

  • Bateson, G. (1969). Double Bind, 1969. In Steps to an Ecology of Mind. Jason Aronson Inc (1987), (pp. 276–283).

  • Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning, 2, 1–55.

    Article  Google Scholar 

  • Bishop, R.C. (2022). Contextual Emergence: Constituents, Context and Meaning. In S. Wuppuluri, & I. Stewart (Eds.), From Electrons to Elephants and Elections. The Frontiers Collection, (pp. 243–256), Springer.

  • Brier, S. (1999). Biosemiotics and the foundation of cybersemiotics. Reconceptualizing the insights of ethology, second order cybernetics and Peirce’s semiotics in biosemiotics to create a non-Cartesian information science. Semiotica, 127(1/4), 169–198.

  • Brier, S. (2013). Cybersemiotics: A New Foundation for a Transdisciplinary Theory of Consciousness, Cognition, Meaning and Communication. In L. Swan (Ed.), Origins of Mind. Biosemiotics, (vol. 8, pp. 97–126). Springer.

  • Bueno, O. (2022). Content, Context, and Naturalism in Mathematics. In S. Wuppuluri, & I. Stewart (Eds.), From Electrons to Elephants and Elections. The Frontiers Collection, (pp. 287–306). Springer.

  • de Finetti, B. (1929). Probabilismo. Saggio critico sulla teoria della probabilità e sul valore della scienza. Biblioteca di Filosofia. Editrice F. Perrella, 1931, 1–57.

    Google Scholar 

  • de Finetti, B. (1930). Fondamenti logici del ragionamento probabilistico. Bollettino dell’Unione matematica italiana, 9(1930), 1–3.

    Google Scholar 

  • de Finetti, B. (1930). Funzione caratteristica di un fenomeno aleatorio. Memorie della Reale Accademia Nazionale dei Lincei, S. 6\(^{\textbf{th}}\), vol. 4, Fasc. 5 (1930), 251–299.

  • de Finetti, B. (1969). Sulla proseguibilità di processi aleatori scambiabili. Rendiconti dell’Istituto di Matematica dell’Università di Trieste, 1, 53–67.

    Google Scholar 

  • Frege, G. (1892). Über Begriff und Gegenstand. Translated in P.T. Geach, On Concept and Object, Mind (1951) 60(238), 168–180.

  • Frege, G. (1893). Grundgesetze der Arithmetik. Selected, translated, and edited in M. Furth The basic laws of Arithmetic, University of California Press (1964).

  • Gabora, L., & Kitto, K. (2013). Concept combination and the origins of complex cognition. In L. Swan (Ed.), Origins of Mind. Biosemiotics, (vol. 8, pp. 361–381). Springer.

  • Harris, Z. S. (1991). A Theory of Language and Information. A Mathematical Approach: Oxford University Press.

    Book  Google Scholar 

  • Hatcher, W. S. (1982). The Logical Foundations of Mathematics. Pergamon Press.

    Google Scholar 

  • Herrmann-Pillath, C. (2021). The Natural Philosophy of Economic Information: Autonomous Agents and Physiosemiosis. Entropy, 23(3), 277.

    Article  PubMed  PubMed Central  Google Scholar 

  • Jeffery, K., Pollack, R., & Rovelli, C. (2019). On the Statistical Mechanics of Life: Schrödinger Revisited. Entropy, 21(12), 1211.

    Article  PubMed Central  Google Scholar 

  • Kiverstein, J., Kirchhoff, M. D., & Froese, T. (2022). The Problem of Meaning: The Free Energy Principle and Artificial Agency. Frontiers in Neurorobotics, 16.

  • Kolchinsky, A., & Wolpert, D. H. (2018). Semantic information, autonomous agency and non-equilibrium statistical physics. The Royal Society Publishing, 8(6).

  • Lakatos, I. (1961). What does a mathematical proof prove? In J. Worrall, & G. Currie (Eds.), Mathematics, Science and Epistemology (1978, pp. 61–69). Cambridge University Press.

  • Laplace, P. S. (1814). Essai philosophique sur les probabilités. Translated in F. W. Truscott, & F. L. Emory (Eds.), A philosophical essay on Probabilities (from the 6th ed., 1902). John Wiley & Sons.

  • Morgenstern, O., & von Neumann, J. (1944). Theory of games and economics behavior. Princeton University Press.

    Google Scholar 

  • Muşat, B., & Andonie, R. (2020). Semiotic Aggregation in Deep Learning. Entropy, 22(12), 1365.

    Article  PubMed  PubMed Central  Google Scholar 

  • Niiniluoto, I. (2022). Concepts, Experts, and Deep Learning. In S. Wuppuluri, & I. Stewart (Eds.), From Electrons to Elephants and Elections. The Frontiers Collection, (pp. 577–586). Springer.

  • Ongstad, S. (2022). Perceptions of Context. Epistemological and Methodological Implications for Meta-Studying Zoo-Communication. Biosemiotics, 15, 497–518.

    Article  Google Scholar 

  • Polani, D., Martinetz, T., & Kim, J. (2001). An Information-Theoretic Approach for the Quantification of Relevance. In J. Kelemen, & P. Sosík (Eds.) Advances in Artificial Life. Lecture Notes in Computer Science, vol. 2159. Springer.

  • Roli, A., & Kauffman, S. A. (2022). Emergence of Organisms. Entropy, 22(10), 1163.

    Article  Google Scholar 

  • Rovelli, C. (1995). Relational Quantum Mechanics. International Journal of Theoretical Physics, 35(8), 1637–1678.

    Article  Google Scholar 

  • Rovelli, C. (2015). Relative information at the foundation of physics. In A. Aguirre, B. Foster, & Z. Merali (Eds.), It from Bit or Bit from It? On Physics and information (pp. 79–86). Springer.

    Chapter  Google Scholar 

  • Rovelli, C. (2018). Meaning and Intentionality = Information + Evolution. In A. Aguirre, B. Foster, & Z. Merali (Eds.), Wandering Towards a Goal (pp. 17–27). Springer.

    Chapter  Google Scholar 

  • Schrödinger, E. (1944). What is life? (1992). Cambridge University Press.

    Google Scholar 

  • Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27(379–423), 623–656.

    Article  Google Scholar 

  • Sharov, A. A. (2010). Functional Information: Towards Synthesis of Biosemiotics and Cybernetics. Entropy, 12(5), 1050–1070.

    Article  PubMed  Google Scholar 

  • Summers, R. L. (2023). Lyapunov Stability as a Metric for Meaning in Biological Systems. Biosemiotics, 16, 153–166.

    Article  Google Scholar 

  • Surov, I. A. (2022). Natural Code of Subjective Experience. Biosemiotics, 15, 109–139.

    Article  Google Scholar 

  • Velazquez, J. L. P. (2020). On the emergence of cognition: from catalytic closure to neuroglial closure. Journal of Biological Physics, 46, 95–119.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Weaver, W. (1948). Science and Complexity. American Scientist, 36(4), 536–544.

    CAS  PubMed  Google Scholar 

  • Weaver, W. (1949). Recent Contributions to the Mathematical Theory of Communication. A Review of General Semantics, 10(4), 261–281. Special issue on Information Theory (Summer 1953),

  • Whitehead, A. N., & Russell, B. (1963). Principia Mathematica (2nd ed.). Cambridge University Press.

  • Wiener, N. (1965). Cybernetics, or control and communication in the animal and the machine (2nd ed.), The MIT Press.

  • Wittgenstein, L. (1963). Philosophical Investigations (2nd ed.), Basil Blackwell Ltd.

Download references

Acknowledgements

I want to acknowledge the philosopher Abdullah Öcalan, the highest guide in the quest for truth, who contributed more than anyone else to inspire this research.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

M.B. wrote the entire manuscript.

Corresponding author

Correspondence to Mattia Berera.

Ethics declarations

Ethical standard

Not applicable.

Competing interests

The authors declare no competing interests.

Appendices

Appendix A The Kolchinsky–Wolpert Model

As I stated, the viability function of X at time \(\tau\) is the quantity \(\mathcal {V}(X_{\tau })=\sum _{x_{\tau }}p(x_{\tau })\log p(x_{\tau })\), where p is the marginal distribution of X, and \(x_{t}\) is a particular outcome of random variable \(X_{t}\) representing the system X at time t (see Kolchinsky and Wolpert, 2018, 6). I recall that if a joint system \((X_{1},X_{2})\) has joint distribution \(p_{X_{1},X_{2}}\), then the two marginal distributions are \(p_{X_{i}}(x_{i})=\sum \limits _{x_{j}}p_{X_{1},X_{2}}(x_{1},x_{2})\), for \(i,j=1,2\) and \(i\ne j\).

Kolchinsky and Wolpert call actual distribution the joint distribution \(p_{X_{t},E_{t}}\) of trajectories of the joint system (XE) over time \(t=0\) to \(t=\tau\) (2018, 7), and intervene in the X part of the joint system with a counterfactual method to measure the effects of changes on the rest of the system. Namely, they study modifications on the joint distribution function, which shuffle away the mutual information between X and E, to identify a threshold below which the viability of the scrambled system is lower than that of the actual system. Mutual information \(\mathcal {I}(X;E)\) between X and E admits minimum 0 value if X and E are independent random variables, that is when \(p_{X,E}=p_{X}\cdot p_{E}\). So, one can assume that an intervention such as

$$\begin{aligned} F:p_{X,E}\longmapsto p_{X}\cdot p_{E} \end{aligned}$$

is the intervention on the distribution of (XE), which maximizes mutual information destruction. Therefore, the initial viability value \(\Delta \mathcal {V}(X_{\tau })\) of mutual information between X and E defined by Eq. 2, is the difference at time \(\tau\) between the viability of X if the distribution of (XE) at time \(t=0\) is the actual ditribution \(p_{X_{0},E_{0}}\), and the viability of X if the distribution of (XE) at time \(t=0\) is \(F(p_{X_{0},E_{0}})=p_{X_{0}}\cdot p_{E_{0}}\) (Kolchinsky and Wolpert, 2018, 7).

To discern the meaningful information carrying \(\Delta \mathcal {V}(X_{\tau })\), the two authors introduce the set of deterministic endofunctions on the states of E. The procedure they use is said of coarse-graining on the conditional distribution \(p_{X_{0}|E_{0}}\); roughly speaking, it consists of considering all functions of the possible outcomes of E – depending only on those outcomes and not, for example, on time – that act by exchanging or identifying their inputs. Given a deterministic function \(\varphi\colon E\rightarrow E\), one can define the intervened distribution induced by \(\varvec{\varphi }\) as the joint distribution

$$\begin{aligned} p_{X_{0},E_{0}}^{\varphi }:=p_{X_{0}|\varphi (E_{0})}\cdot p_{E_{0}}\,, \end{aligned}$$

where

$$\begin{aligned} p_{X_{0}|\varphi (E_{0})}\big (x_{0}|\varphi (e_{0})\big )=\dfrac{\sum _{e_{0}':\varphi (e_{0}')=\varphi (e_{0})}p_{X_{0},E_{0}}(x_{0},e_{0})}{\sum _{e_{0}':\varphi (e_{0}')=\varphi (e_{0})}p_{E_{0}}(e_{0})}\,. \end{aligned}$$

As done above, one can rename \(X^{\varphi }_{\tau }\) the system X at time \(t=\tau\) if (XE) evolved with initial joint distribution \(p_{X_{0},E_{0}}^{\varphi }\). I point out that \(\big (X_{0}|\varphi (E_{0})\big )\) is independent of \(E_{0}\), and from the point of view of X two states \(e_{0}\) and \(e_{0}'\) such that \(\varphi (e_{0})=\varphi (e_{0}')\) are indistinguishable. That is, \(X_{0}\) has only information about \(\varphi (E_{0})\) and not about \(E_{0}\).

Therefore one can define the optimal intervention \(p_{X_{0},E_{0}}^{opt}\) as the intervened distribution that holds the following conditions:

  1. 1.

    \(\displaystyle p_{X_{0},E_{0}}^{opt}\in \left\{ p_{X_{0},E_{0}}^{\varphi }\,\bigg |\,\mathcal {I}(X_{0}^{\varphi };E_{0})=\min _{\psi \in \Phi }\mathcal {I}(X_{0}^{\psi };E_{0})\right\}\),

  2. 2.

    \(\mathcal {V}(X_{\tau }^{opt})=\mathcal {V}(X_{\tau })\),

where \(X^{opt}_{t}\) is the system X at time t if (XE) evolved with initial joint distribution \(p_{X_{0},E_{0}}^{opt}\) (see Kolchinsky and Wolpert, 2018, 8). By such a definition, any further intervention on \(p_{X_{0},E_{0}}^{opt}\) would change the output of \(\mathcal {V}\), i.e., all the mutual information contained in \(p_{X_{0},E_{0}}^{opt}\) causally contributes to X’s viability at time \(t=\tau\).

Appendix B The Russell’s Type Theory

In this appendix, I will present the fundamental theory with which I formalized Bateson’s insights. In particular, I will refer directly to the exposition that philosopher and mathematician William Hatcher (1982) gives of such a theory and its history. Contemporary Mathematical Logic has somewhat forgotten Russell’s Type Theory in favor of the set theory approach inaugurated by Zermelo and Fraenkel. Both theories have historically developed to establish consistent logical foundations for the mathematical structure, that is, to correct the approach of Frege’s “Grundgesetze der Arithmetik” (1893), which hides the contradiction known as Russell’s antinomy. One can formulate the latter as follows: if one considers the set y defined by the property \(x\notin x\), y should be the set of all sets that are not elements of themselves. One may ask if y does belong to itself. By the law of excluded middle, either it does or not. If it does, then \(y\in \{\,x\,|\,x\notin x\,\}\) and so y must satisfy the defining property of the set y; i.e., it does not belong to itself. On the other hand, if y does not, then y satisfies the defining property of y and is thus an element of itself (see Whitehead and Russell, 1963, 60).

One of Frege’s fundamental insights is recognizing that when we create concepts – or properties – we might want to express predicates about them. For instance, we construct the property of ‘being a chair,’ and we want to say that ‘there is something that has the property of being a chair,’ meaning ‘there is a chair.’ In such sentences, we are not predicating the property but objectifying it, i.e., nominalizing the predicate (see Frege 1892 and 1893). In overcoming Russell’s antinomy, the Type Theory approach retains Frege’s aim of formalizing that part of abstract reasoning, which Imre Lakatos calls ‘quasi-experience’ (1978). According to the latter, as the experimental sciences, Mathematics grows by explicating phenomena of thought in a quasi-empirical way. He argues that behind the definition of a mathematical concept lies an accidental choice due to unformalized thinking referring to a set of non-mathematical objects. On the other hand,

Zermelo’s system is more directly concerned with mathematics and the needs of mathematical structures [...]: Mathematics is (we believe) consistent. Thus, if we give a precise account of the intuitive use of sets as mathematicians use them, we shall have an adequate and correct foundation. [...] we observe that mathematicians do not normally use such sets as ‘the set of all sets’ or the ‘set of all sets not elements of themselves’. We might contend that these contradictory notions are not really valid mathematical objects at all. (Hatcher, 1982, 135)

I am not concerned with the foundations of mathematics here; therefore, Russell’s approach, although more uncomfortable and in some ways a failure concerning meta-mathematical purposes, better addresses what I need. As mentioned, the idea behind Type Theory is to build an axiomatic theory that prohibits antinomies due to self-reference while maintaining Frege’s Law of Courses of Value (Frege, 1893). One can state the latter as follows: given any property P, there exists a set y such that for all x, x is in y if and only if x satisfies the condition P; i.e.,

$$\begin{aligned} \exists y\forall x \big (x\in y\iff P(x)\big )\,. \end{aligned}$$

Therefore, it is to impose the following constraints:

‘Whatever involves all of a collection must not be one of the collection;’ or, conversely: ‘If, provided a certain collection had a total, it would have members only definable in terms of that total, then the said collection has no total.’ We shall call this the ‘vicious-circle principle,’ because it enables us to avoid the vicious circles involved in the assumption of illegitimate totalities. (Whitehead and Russell, 1963, 37)

Hereafter, I refer to the more recent formulation due to Wiener and Kuratowski in the version reported by Hatcher (1982), which can be expressed through the notation of Set Theory, closer to the contemporary reader’s taste.

Definition 9

I call Type Theory (TT) the formal system in which

  • the language is that of set theory plus the sets of symbols \(\{x^{n}_{i}|i,n\in \mathbb {N}\}\), \(\{a^{n}_{i}|i,n\in \mathbb {N}\}\) for variables and constants, respectively;

  • the well-formed formulas (wffs) and terms are define as follows:

    1. 1.

      \(\{x^{n}_{i}\}_{i\in \mathbb {N}}\), \(\{a^{n}_{i}\}_{i\in \mathbb {N}}\) are terms said to be of type \(\varvec{n}\);

    2. 2.

      \(x_{i}^{n}\in x_{j}^{n+1}\)’ is a wff;

    3. 3.

      if \(P,P'\) are wffs, then \(\lnot P\) and \(P\vee P'\) are wffs;

    4. 4.

      if P is any wff and x is any variable, then ‘\(\forall x P(x)\)’ and ‘\(\exists x P(x)\)’ are wffs;

    5. 5.

      if \(P(x_{i}^{n})\) is a wff containing \(x_{i}^{n}\) free, then ‘\(\{x_{i}^{n}|P(x_{i}^{n})\}\)’ is a term of type \(n+1\).

  • the axioms are the following schemes:

    T1.:

    \(\exists x_{i}^{n}\forall x_{j}^{n-1}\left( x_{j}^{n-1}\in x_{i}^{n}\iff P(x_{j}^{n-1})\right)\),

    where \(x_{i}^{n}\) does not occur in the wff \(P(x_{j}^{n-1})\), which contains the variable \(x_{j}^{n-1}\) free;

    T2.:

    \(\forall x_{i}^{n}\left( \big (x_{i}^{n}\in x_{j}^{n+1}\iff x_{i}^{n}\in x_{\ell }^{n+1}\big )\implies \big (x_{i}^{n+1}=x_{\ell }^{n+1}\big )\right) \,;\)

  • the rules of inference are the natural deduction rules (for instance, see Hatcher, 1982, 43-44), with the constraint that only variables and terms of a given type can be substituted.

I emphasize that the theory as defined cannot prove arithmetic because its axioms do not allow the existence of \(\mathbb {N}\) to be established; to do so, an axiom of infinity must be added to TT. As mentioned, the present exposition has no foundational purposes, and I state a sufficient theory for my argument.

Remark

I make some observations on the last definition.

  1. 1.

    The scheme T2 constrains the Extensionality Principle – stating that two sets are identical if and only if the same elements fall under their domain – to hold only within any given type.

  2. 2.

    The operation described by rule 4 states a principle of abstraction; i.e., it is a method for constructing concepts.

  3. 3.

    The wffs formalize the intuitive Frege’s notion of ‘property;’ in particular, the terms defined by rule 5, and whose existence is guaranteed by T1, model Frege’s operation of ‘objectivization’ of ‘concepts,’ that one needs to ‘speak about concepts.’

With Axiom T1, we thus have a new version of the Law of Courses of Value that only holds within any given type.

Principle

(Abstraction) Given a type, for any property P, there exists a set y of this type such that for all x of the preceding type, x is in y if and only if x satisfies P.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berera, M. Efficiency in Organism-Environment Information Exchanges: A Semantic Hierarchy of Logical Types Based on the Trial-and-Error Strategy Behind the Emergence of Knowledge. Biosemiotics 17, 131–160 (2024). https://doi.org/10.1007/s12304-024-09554-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12304-024-09554-1

Keywords

Navigation