Abstract
The article deals with the notion of the genetic code and its metaphorical understanding as a “language”. In the traditional view of the language metaphor of the genetic code, combinations of nucleotides are signs of amino acids (see the table of the genetic code). Similarly, words combined from letters (speech sounds) represent certain meanings. The language metaphor of the genetic code (Markoš and Faltýnek, Biosemiotics 4(2), 171–200, 2011) assumes that the nucleotides stay in the analogy to letters, triples to words and genes to sentences (Jakobson 1971). We propose an application of mathematical linguistic methods on the notion of the genetic code. We provide quantitative analysis (n-gram structure, Zipf’s law) of mRNA strings and natural language texts. This analysis is sensitive to the detection of the code (language) units hierarchy. We also take into consideration a representative quantitative analysis of DNA, RNA and proteins. Our analysis of mRNA confirms an assumption that the design of the genetic code cannot analogize DNA bases and letters. The notion of the letter is much more appropriate if analogized with triplets or amino acids (see Lacková et al, Theory in Biosciences 136(3–4), 187–191, 2017).
Similar content being viewed by others
References
Andres, J. (2010). On a conjecture about the fractal structure of language. Journal of Quantitative Linguistics, 17(2), 101–122.
Andres, J., Benešová, M., Kubáček, L., & Vrbková, J. (2011). Methodological note on the fractal analysis of texts. Journal of Quantitative Linguistics., 18(4), 337–367.
Baixeries, J., Hernández-Fernández, A., Forns, N., & Ferrer-i-Cancho, R. (2013). The parameters of Menzerath-Altmann law in genomes. Journal of Quantitative Linguistics, 20(2), 94–104.
Barbieri, M. (2002). The organic codes: An introduction to semantic biology. Cambridge: Cambridge University Press.
Bolshoy, A., Volkovich, Z., Kirzhner, V., & Barzily, Z. (2010). Genome Clustering from Linguistic Models to Classification of Genetic Texts. Berlin. Heidelberg: Springer.
Cobb, M. (2013). 1953: When genes became “information”. Cell, 153(3), 503–506. https://doi.org/10.1016/j.cell.2013.04.012.
Collado-Vides, J. (1992). Grammatical model of the regulation of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 89(20), 9405–9409.
Collado-Vides, J. (1993). A linguistic representation of the regulation of transcription initiation. I. An ordered array of complex symbols with distinctive features. BioSystems, 29(2–3), 87–104.
De Beule, J. (2012). Von Neumann’s legacy for a scientific. Biosemiotics, 5(1), 1–4. https://doi.org/10.1007/s12304-011-9132-2.
DeFrancis, J. (1990). The Chinese language: fact and fantasy. Taipei: Wen-Jou Typing and Printing.
Emmeche, C. (2015). Semiotic scaffolding of the social self in reflexivity and friendship. Biosemiotics, 8, 275–289. https://doi.org/10.1007/s12304-014-9221-0.
Eroglu, S. (2014). Self-organization of genic and intergenic sequence lengths in genomes: Statistical properties and linguistic coherence. Complexity, 21(1), 268–282.
Faltýnek, D. (2012). Sémiotické primitivy v konstrukci gramatik: Testování gramatik jazyka a DNA. Olomouc: Univerzita Palackého v Olomouci.
Ferrer-i-Cancho, R. (2006). When language breaks into pieces: A conflict between communication through isolated signals and language. BioSystems, 84(3), 242–253.
Ferrer-i-Cancho, R., & Elvevåg, B. (2010). Random texts do not exhibit the real Zipf’s law-like rank distribution. PLoS One, 5(3). https://doi.org/10.1371/journal.pone.0009411.
Ferrer-i-Cancho, R., & McCowan, B. (2009). A law of word meaning in dolphin whistle types. Entropy, 11(4), 688–701. https://doi.org/10.3390/e11040688.
Ferrer-i-Cancho, R., Forns, N., Hernández-Fernández, A., Bel-Enguix, G., & Baixeries, J. (2013). The challenges of statistical patterns of language: The case of Menzerath's law in genomes. Complexity, 18(3), 11–17.
Gimona, M. (2008). Protein linguistics; a grammar for modular protein assembly? Nature Reviews Molecular Cell Biology, 7, 68–73.
Havlin, S., Buldyrev, S. V., Goldberger, A. L., Mantegna, R. N., Peng, C., Simons, M., & Stanley, H. E. (1995). Statistical and linguistic features of DNA sequences. Fractals, 3(2), 269–284.
Hernández-Fernández, A., Baixeries, J., Forns, N., & Ferrer-i-Cancho, R. (2011). Size of the whole versus number of parts in genomes. Entropy, 13(8), 1465–1480.
Hoffmeyer, J. (2007). Semiotic scaffolding of living systems in Introduction to biosemiotics, Barbieri, M., 149–166. Dordrecht: Springer.
Jakobson, R. (1971). Linguistics in relation to other sciences. In Roman Jakobson, Selected Writings: Vol. 2: Word and Language, 655–696, The Hague — Paris: Mouton.
Ji, S. (1999). The linguistics of DNA: Words, sentences, grammar, phonetics, and semantics. Annals of the New York Academy of Science, 870, 411–417.
Katz, G. (2008). The hypothesis of a genetic protolanguage: An epistemological investigation. Biosemiotics, 1(1), 57–73.
Kister, A. (2015). Amino acid distribution rules predict protein fold: ProteinGrammar for Beta-Strand Sandwich-like structures. Biomolecules, 5, 41–59. https://doi.org/10.3390/biom5010041.
Kull, K. (2015). Evolution, choice, and scaffolding: Semiosisis changing its own building. Biosemiotics, 8, 223–234. https://doi.org/10.1007/s12304-015-9243-2.
Lacková, L., Faltýnek, D., & Matlach, V. (2017). Aritrariness is not enough. Theory in Biosciences, 136(3–4), 187–191 Springer. https://doi.org/10.1007/s12064-017-0246-1.
Li, W. (2012). Menzerath’s law at the gene-exon level in the human genome. Complexity, 17(4), 49–53.
Mantegna, R. N., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C., Simons, M., & Stanley, H. E. (1994). Linguistic features of noncoding sequences. Physical Review Letters, 73(23), 3169–3172.
Mantegna, R. N., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C., Simons, M., & Stanley, H. E. (1995). Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics. Physical Review: E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 52(3), 2939–2950.
Markoš, A. (1997). Povstání živého tvaru. Praha: Vesmír.
Markoš, A. (2002). Readers of the book of life: Contextualizing developmental evolutionary biology. New York: Oxford University Press.
Markoš, A., & Faltýnek, D. (2011). Language metaphors of life. Biosemiotics, 4(2), 171–200.
Maturana, R. H. (1978). Biology of language: The epistemology of reality. In G. A. Miller & Lenneberg (Eds.), Psychology and Biology of Language and Tought: Essays in Honor of Eric Lenneberg (pp. 27–63). New Yourk: Academic Press.
Maturana, R. H., & Varela, F. J. (1980). Autopoiesis and cognition: The realization of the living. D. Reidel Publishing Company.
Nikolaou, C. (2014). Menzerath-Altmann law in mammalian exons reflects the dynamics of gene structure evolution. Computational Biology and Chemistry, 53(Pt A, 134–143.
Niyogi, P. and Berwick, R. C. (1995). A note on Zipf's law, natural languages, and noncoding DNA regions [online]. A. I. Memo, (1530) / C.B.C.L. Paper, (118). Cit. 8. 1. 2016.
Palazzo, A. F., & Ryan, G. (2014). The case for junk DNA. PLoS Genetics, 8, 10(5). https://doi.org/10.1371/journal.pgen.1004351.
Pattee, H. H. (2001). The physics of symbols: bridging the epistemic cut, BioSystems, 60, 5–21
Pattee, H. H. (2008). Physical and functional conditions for symbols, codes, and languages. Biosemiotics, 1, 147–168. https://doi.org/10.1007/s12304-008-9012-6.
Pattee, H., & Kull, K. (2009). A biosemiotic conversation: Between physics and semiotics. Sign Systems Studies, 37(1/2), 311. https://doi.org/10.12697/SSS.2009.37.1-2.12.
Piantadosi, S. (2014). Zipf’s law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130.
Raible, W. (2001). Linguistics and genetics: systematic parallels. In M. Haspelmath, E. König, W. Oesterreicher, & W. Raible (Eds.), Language Typology and Language Universals: An International Handbook (pp. 103–123). Berlin — New York: Walter De Gruyter.
Rosen, R. (1999). Essays on life itself. New York: Columbia University Press.
Rubin, S. S. (2017). From the cellular standpoint: Is DNA sequence genetic ‘information’? Biosemiotics, 10(2), 247–264. https://doi.org/10.1007/s12304-017-9303-x.
Scaiewicz and Levitt. (2015). The language of the protein universe. Current Opinion in Genetics and Development, 35, 50–56. https://doi.org/10.1016/j.gde.2015.08.010.
Searls, D. B. (2002). The language of genes. Nature, 420, 211–217. https://doi.org/10.1038/nature01255.
Searls, D. B. (2003). Linguistics: Trees of life and of language. Nature, 426, 391–392. https://doi.org/10.1038/426391a.
Sebeok, T. (2001). Signs. An Introduction to Semiotics (2nd ed.). Toronto , Buffalo. London: University of Toronto Press.
Shahzad, K., Mittenthal, J. E., & Caetano-Anollés, G. (2015). The organization of domains in proteins obeys Menzerath-Altmann’s law of language. BMC Systems Biology, 9(44), 1–13.
Sharov, A. A. (2016). Evolution of natural agents: Preservation, advance, and emergence of functional information. Biosemiotics, 9(1), 103–120. https://doi.org/10.1007/S12304-015-9250-3.
The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74 The european bioinformatics institute (EMBL-EBI) (2014).
Trifonov, E. N. (1988). Codes of nucleotide sequences. Mathematical Biosciences, 90(1–2), 507–517.
Trifonov, E. N., & Berezovsky, I. N. (2002). Proteomic Code. Moelcular Biology, 36(2), 239–243.
Tsonis, A. A., Elsner, J. B., & Panagiotis, A. T. (1997). Is DNA a language? Journal of Theoretical Biology, 184, 25–29.
Viewegh, M. (2006). Účastníci zájezdu. Brno: Druhé město.
Watson, J. D., & Berry, A. J. (2003). DNA: The secret of life. New York: Alfred A. Knopf.
Zipf, G. K. (1949). Human behavior and the principle of least Ecort: An introduction to human ecology. Cambridge: AddisonWesley Press.
Acknowledgements
The name of the financed project is Sinophone Borderlands: Interaction at the Edges, n. CZ.02.1.01/0.0/0.0/16_019/0000791.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Faltýnek, D., Matlach, V. & Lacková, Ľ. Bases are Not Letters: On the Analogy between the Genetic Code and Natural Language by Sequence Analysis. Biosemiotics 12, 289–304 (2019). https://doi.org/10.1007/s12304-019-09353-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12304-019-09353-z