Skip to main content
Log in

Language Learnability in the Limit: A Generalization of Gold’s Theorem

  • Published:
Journal of Logic, Language and Information Aims and scope Submit manuscript

Abstract

In his pioneering work in the field of inductive inference, Gold (Inf Control 10:447–474, 1967) proved that a set containing all finite languages and at least one infinite language over the same fixed alphabet is not identifiable in the limit (learnable in the exact sense) from complete texts. Gold’s work paved the way for computational learning theories of language and has implications for two linguistically relevant classes in the Chomsky hierarchy (cf. Chomsky in Inf Control 2:137–167, 1959, Chomsky in Knowledge of language: its nature, origin, and use, Praeger, 1986; Shieber in Linguist Philos 8:333–343, 1985; Heinz in Linguist Inq 41:623–661, 2010). Within that same framework, Angluin (Inf Control 45:117–135, 1980) provided a complete characterization for the learnability of language families. Mathematically, the concept of identification in the limit from that classical setting can be seen as the use of a particular type of metric for learning in the limit (Wharton in Inf Control 26:236–255, 1974). In this research note, I use Niyogi’s extended version of a theorem by Blum and Blum (Inf Control 28:125–155, 1975) on the existence of locking data sets to prove a necessary condition for learnability in the limit of any family of languages with the Gold property in any given metric. This recovers the most general version of Gold’s Theorem as a particular case. Moreover, when the language family is further assumed to contain all finite languages, the same condition also becomes sufficient for learnability in the limit. Finally, we discuss questions that are left open and outline a research program regarding language learnability in the limit from other input types and cognitive feasibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Of course, lexical entries and segments have their own internal structure, but we can think of them here as represented by primitive symbols.

  2. The term “complete text” is preferrable for surjective sequences as defined below, but it is cumbersome to carry the adjective, so we refer to complete texts/texts interchangeably in this paper since we will not need the more general notion of “text” as any sequence of elements from the language.

  3. A notable exception not entirely related to identification in the limit is PAC-learning.

References

  • Angluin, D. (1980). Inductive inference of formal languages from positive data. Information and Control, 45, 117–135.

    Article  Google Scholar 

  • Blum, M., & Blum, L. (1975). Towards a mathematical theory of inductive inference. Information and Control, 28, 125–155.

    Article  Google Scholar 

  • Chiswell, I. (2009). A course in formal Languages. Springer.

    Google Scholar 

  • Chomsky, N. (1959). On certain formal properties of grammars. Information and Control, 2, 137–167.

    Article  Google Scholar 

  • Chomsky, N. (1986). Knowledge of language: its nature, origin, and use. Praeger.

    Google Scholar 

  • Gold, M. (1967). Language identification in the limit. Information and Control, 10, 447–474.

    Article  Google Scholar 

  • Heinz, J. (2010). Learning long-distance phonotactics. Linguistic Inquiry, 41(4), 623–661.

    Article  Google Scholar 

  • Heinz, J. (2016). Computational theories of learning and developmental psycholinguistics. In J. Lidz, W. Snyder, & J. Pater (Eds.), The Oxford handbook of developmental linguistics (pp. 633–663). Oxford University Press.

    Google Scholar 

  • Johnson, K. (2004). Gold’s theorem and cognitive science. Philosophy of Science, 71, 571–592.

    Article  Google Scholar 

  • Niyogi, P. (2006). The computational nature of language learning and evolution. MIT press.

    Book  Google Scholar 

  • Shieber, S. (1985). Evidence against the context-freeness of natural language. Linguistics & Philosophy, 8, 333–343.

    Article  Google Scholar 

  • Wharton, R. (1974). Approximate language identification. Information and Control, 26, 236–255.

    Article  Google Scholar 

Download references

Acknowledgements

I greatly thank the anonymous reviewer for the invaluable feedback that helped improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernando C. Alves.

Ethics declarations

Conflict of interest

The author has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alves, F.C. Language Learnability in the Limit: A Generalization of Gold’s Theorem. J of Log Lang and Inf 32, 363–372 (2023). https://doi.org/10.1007/s10849-022-09391-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10849-022-09391-w

Keywords

Navigation