Skip to main content
Log in

Bases are Not Letters: On the Analogy between the Genetic Code and Natural Language by Sequence Analysis

  • Published:
Biosemiotics Aims and scope Submit manuscript

Abstract

The article deals with the notion of the genetic code and its metaphorical understanding as a “language”. In the traditional view of the language metaphor of the genetic code, combinations of nucleotides are signs of amino acids (see the table of the genetic code). Similarly, words combined from letters (speech sounds) represent certain meanings. The language metaphor of the genetic code (Markoš and Faltýnek, Biosemiotics 4(2), 171–200, 2011) assumes that the nucleotides stay in the analogy to letters, triples to words and genes to sentences (Jakobson 1971). We propose an application of mathematical linguistic methods on the notion of the genetic code. We provide quantitative analysis (n-gram structure, Zipf’s law) of mRNA strings and natural language texts. This analysis is sensitive to the detection of the code (language) units hierarchy. We also take into consideration a representative quantitative analysis of DNA, RNA and proteins. Our analysis of mRNA confirms an assumption that the design of the genetic code cannot analogize DNA bases and letters. The notion of the letter is much more appropriate if analogized with triplets or amino acids (see Lacková et al, Theory in Biosciences 136(3–4), 187–191, 2017).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Andres, J. (2010). On a conjecture about the fractal structure of language. Journal of Quantitative Linguistics, 17(2), 101–122.

    Article  Google Scholar 

  • Andres, J., Benešová, M., Kubáček, L., & Vrbková, J. (2011). Methodological note on the fractal analysis of texts. Journal of Quantitative Linguistics., 18(4), 337–367.

    Article  Google Scholar 

  • Baixeries, J., Hernández-Fernández, A., Forns, N., & Ferrer-i-Cancho, R. (2013). The parameters of Menzerath-Altmann law in genomes. Journal of Quantitative Linguistics, 20(2), 94–104.

    Article  Google Scholar 

  • Barbieri, M. (2002). The organic codes: An introduction to semantic biology. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Bolshoy, A., Volkovich, Z., Kirzhner, V., & Barzily, Z. (2010). Genome Clustering from Linguistic Models to Classification of Genetic Texts. Berlin. Heidelberg: Springer.

    Google Scholar 

  • Cobb, M. (2013). 1953: When genes became “information”. Cell, 153(3), 503–506. https://doi.org/10.1016/j.cell.2013.04.012.

    Article  CAS  PubMed  Google Scholar 

  • Collado-Vides, J. (1992). Grammatical model of the regulation of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 89(20), 9405–9409.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Collado-Vides, J. (1993). A linguistic representation of the regulation of transcription initiation. I. An ordered array of complex symbols with distinctive features. BioSystems, 29(2–3), 87–104.

    Article  CAS  PubMed  Google Scholar 

  • De Beule, J. (2012). Von Neumann’s legacy for a scientific. Biosemiotics, 5(1), 1–4. https://doi.org/10.1007/s12304-011-9132-2.

    Article  Google Scholar 

  • DeFrancis, J. (1990). The Chinese language: fact and fantasy. Taipei: Wen-Jou Typing and Printing.

    Google Scholar 

  • Emmeche, C. (2015). Semiotic scaffolding of the social self in reflexivity and friendship. Biosemiotics, 8, 275–289. https://doi.org/10.1007/s12304-014-9221-0.

    Article  Google Scholar 

  • Eroglu, S. (2014). Self-organization of genic and intergenic sequence lengths in genomes: Statistical properties and linguistic coherence. Complexity, 21(1), 268–282.

    Article  Google Scholar 

  • Faltýnek, D. (2012). Sémiotické primitivy v konstrukci gramatik: Testování gramatik jazyka a DNA. Olomouc: Univerzita Palackého v Olomouci.

    Google Scholar 

  • Ferrer-i-Cancho, R. (2006). When language breaks into pieces: A conflict between communication through isolated signals and language. BioSystems, 84(3), 242–253.

    Article  Google Scholar 

  • Ferrer-i-Cancho, R., & Elvevåg, B. (2010). Random texts do not exhibit the real Zipf’s law-like rank distribution. PLoS One, 5(3). https://doi.org/10.1371/journal.pone.0009411.

  • Ferrer-i-Cancho, R., & McCowan, B. (2009). A law of word meaning in dolphin whistle types. Entropy, 11(4), 688–701. https://doi.org/10.3390/e11040688.

    Article  Google Scholar 

  • Ferrer-i-Cancho, R., Forns, N., Hernández-Fernández, A., Bel-Enguix, G., & Baixeries, J. (2013). The challenges of statistical patterns of language: The case of Menzerath's law in genomes. Complexity, 18(3), 11–17.

    Article  Google Scholar 

  • Gimona, M. (2008). Protein linguistics; a grammar for modular protein assembly? Nature Reviews Molecular Cell Biology, 7, 68–73.

    Article  CAS  Google Scholar 

  • Havlin, S., Buldyrev, S. V., Goldberger, A. L., Mantegna, R. N., Peng, C., Simons, M., & Stanley, H. E. (1995). Statistical and linguistic features of DNA sequences. Fractals, 3(2), 269–284.

    Article  CAS  PubMed  Google Scholar 

  • Hernández-Fernández, A., Baixeries, J., Forns, N., & Ferrer-i-Cancho, R. (2011). Size of the whole versus number of parts in genomes. Entropy, 13(8), 1465–1480.

    Article  CAS  Google Scholar 

  • Hoffmeyer, J. (2007). Semiotic scaffolding of living systems in Introduction to biosemiotics, Barbieri, M., 149–166. Dordrecht: Springer.

    Google Scholar 

  • Jakobson, R. (1971). Linguistics in relation to other sciences. In Roman Jakobson, Selected Writings: Vol. 2: Word and Language, 655–696, The Hague — Paris: Mouton.

  • Ji, S. (1999). The linguistics of DNA: Words, sentences, grammar, phonetics, and semantics. Annals of the New York Academy of Science, 870, 411–417.

    Article  CAS  Google Scholar 

  • Katz, G. (2008). The hypothesis of a genetic protolanguage: An epistemological investigation. Biosemiotics, 1(1), 57–73.

    Article  Google Scholar 

  • Kister, A. (2015). Amino acid distribution rules predict protein fold: ProteinGrammar for Beta-Strand Sandwich-like structures. Biomolecules, 5, 41–59. https://doi.org/10.3390/biom5010041.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kull, K. (2015). Evolution, choice, and scaffolding: Semiosisis changing its own building. Biosemiotics, 8, 223–234. https://doi.org/10.1007/s12304-015-9243-2.

    Article  Google Scholar 

  • Lacková, L., Faltýnek, D., & Matlach, V. (2017). Aritrariness is not enough. Theory in Biosciences, 136(3–4), 187–191 Springer. https://doi.org/10.1007/s12064-017-0246-1.

  • Li, W. (2012). Menzerath’s law at the gene-exon level in the human genome. Complexity, 17(4), 49–53.

    Article  Google Scholar 

  • Mantegna, R. N., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C., Simons, M., & Stanley, H. E. (1994). Linguistic features of noncoding sequences. Physical Review Letters, 73(23), 3169–3172.

    Article  CAS  PubMed  Google Scholar 

  • Mantegna, R. N., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C., Simons, M., & Stanley, H. E. (1995). Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics. Physical Review: E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 52(3), 2939–2950.

    CAS  Google Scholar 

  • Markoš, A. (1997). Povstání živého tvaru. Praha: Vesmír.

    Google Scholar 

  • Markoš, A. (2002). Readers of the book of life: Contextualizing developmental evolutionary biology. New York: Oxford University Press.

    Google Scholar 

  • Markoš, A., & Faltýnek, D. (2011). Language metaphors of life. Biosemiotics, 4(2), 171–200.

    Article  Google Scholar 

  • Maturana, R. H. (1978). Biology of language: The epistemology of reality. In G. A. Miller & Lenneberg (Eds.), Psychology and Biology of Language and Tought: Essays in Honor of Eric Lenneberg (pp. 27–63). New Yourk: Academic Press.

    Google Scholar 

  • Maturana, R. H., & Varela, F. J. (1980). Autopoiesis and cognition: The realization of the living. D. Reidel Publishing Company.

  • Nikolaou, C. (2014). Menzerath-Altmann law in mammalian exons reflects the dynamics of gene structure evolution. Computational Biology and Chemistry, 53(Pt A, 134–143.

    Article  CAS  PubMed  Google Scholar 

  • Niyogi, P. and Berwick, R. C. (1995). A note on Zipf's law, natural languages, and noncoding DNA regions [online]. A. I. Memo, (1530) / C.B.C.L. Paper, (118). Cit. 8. 1. 2016.

  • Palazzo, A. F., & Ryan, G. (2014). The case for junk DNA. PLoS Genetics, 8, 10(5). https://doi.org/10.1371/journal.pgen.1004351.

  • Pattee, H. H. (2001). The physics of symbols: bridging the epistemic cut, BioSystems, 60, 5–21

  • Pattee, H. H. (2008). Physical and functional conditions for symbols, codes, and languages. Biosemiotics, 1, 147–168. https://doi.org/10.1007/s12304-008-9012-6.

    Article  Google Scholar 

  • Pattee, H., & Kull, K. (2009). A biosemiotic conversation: Between physics and semiotics. Sign Systems Studies, 37(1/2), 311. https://doi.org/10.12697/SSS.2009.37.1-2.12.

    Article  Google Scholar 

  • Piantadosi, S. (2014). Zipf’s law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130.

    Article  Google Scholar 

  • Raible, W. (2001). Linguistics and genetics: systematic parallels. In M. Haspelmath, E. König, W. Oesterreicher, & W. Raible (Eds.), Language Typology and Language Universals: An International Handbook (pp. 103–123). Berlin — New York: Walter De Gruyter.

    Google Scholar 

  • Rosen, R. (1999). Essays on life itself. New York: Columbia University Press.

    Google Scholar 

  • Rubin, S. S. (2017). From the cellular standpoint: Is DNA sequence genetic ‘information’? Biosemiotics, 10(2), 247–264. https://doi.org/10.1007/s12304-017-9303-x.

    Article  Google Scholar 

  • Scaiewicz and Levitt. (2015). The language of the protein universe. Current Opinion in Genetics and Development, 35, 50–56. https://doi.org/10.1016/j.gde.2015.08.010.

    Article  CAS  PubMed  Google Scholar 

  • Searls, D. B. (2002). The language of genes. Nature, 420, 211–217. https://doi.org/10.1038/nature01255.

    Article  CAS  PubMed  Google Scholar 

  • Searls, D. B. (2003). Linguistics: Trees of life and of language. Nature, 426, 391–392. https://doi.org/10.1038/426391a.

    Article  CAS  PubMed  Google Scholar 

  • Sebeok, T. (2001). Signs. An Introduction to Semiotics (2nd ed.). Toronto , Buffalo. London: University of Toronto Press.

    Google Scholar 

  • Shahzad, K., Mittenthal, J. E., & Caetano-Anollés, G. (2015). The organization of domains in proteins obeys Menzerath-Altmann’s law of language. BMC Systems Biology, 9(44), 1–13.

    CAS  Google Scholar 

  • Sharov, A. A. (2016). Evolution of natural agents: Preservation, advance, and emergence of functional information. Biosemiotics, 9(1), 103–120. https://doi.org/10.1007/S12304-015-9250-3.

    Article  PubMed  PubMed Central  Google Scholar 

  • The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74 The european bioinformatics institute (EMBL-EBI) (2014).

    Article  CAS  PubMed Central  Google Scholar 

  • Trifonov, E. N. (1988). Codes of nucleotide sequences. Mathematical Biosciences, 90(1–2), 507–517.

    Article  Google Scholar 

  • Trifonov, E. N., & Berezovsky, I. N. (2002). Proteomic Code. Moelcular Biology, 36(2), 239–243.

    Article  CAS  Google Scholar 

  • Tsonis, A. A., Elsner, J. B., & Panagiotis, A. T. (1997). Is DNA a language? Journal of Theoretical Biology, 184, 25–29.

    Article  CAS  PubMed  Google Scholar 

  • Viewegh, M. (2006). Účastníci zájezdu. Brno: Druhé město.

    Google Scholar 

  • Watson, J. D., & Berry, A. J. (2003). DNA: The secret of life. New York: Alfred A. Knopf.

    Google Scholar 

  • Zipf, G. K. (1949). Human behavior and the principle of least Ecort: An introduction to human ecology. Cambridge: AddisonWesley Press.

    Google Scholar 

Download references

Acknowledgements

The name of the financed project is Sinophone Borderlands: Interaction at the Edges, n. CZ.02.1.01/0.0/0.0/16_019/0000791.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ľudmila Lacková.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Faltýnek, D., Matlach, V. & Lacková, Ľ. Bases are Not Letters: On the Analogy between the Genetic Code and Natural Language by Sequence Analysis. Biosemiotics 12, 289–304 (2019). https://doi.org/10.1007/s12304-019-09353-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12304-019-09353-z

Keywords

Navigation