Abstract
In his pioneering work in the field of inductive inference, Gold (Inf Control 10:447–474, 1967) proved that a set containing all finite languages and at least one infinite language over the same fixed alphabet is not identifiable in the limit (learnable in the exact sense) from complete texts. Gold’s work paved the way for computational learning theories of language and has implications for two linguistically relevant classes in the Chomsky hierarchy (cf. Chomsky in Inf Control 2:137–167, 1959, Chomsky in Knowledge of language: its nature, origin, and use, Praeger, 1986; Shieber in Linguist Philos 8:333–343, 1985; Heinz in Linguist Inq 41:623–661, 2010). Within that same framework, Angluin (Inf Control 45:117–135, 1980) provided a complete characterization for the learnability of language families. Mathematically, the concept of identification in the limit from that classical setting can be seen as the use of a particular type of metric for learning in the limit (Wharton in Inf Control 26:236–255, 1974). In this research note, I use Niyogi’s extended version of a theorem by Blum and Blum (Inf Control 28:125–155, 1975) on the existence of locking data sets to prove a necessary condition for learnability in the limit of any family of languages with the Gold property in any given metric. This recovers the most general version of Gold’s Theorem as a particular case. Moreover, when the language family is further assumed to contain all finite languages, the same condition also becomes sufficient for learnability in the limit. Finally, we discuss questions that are left open and outline a research program regarding language learnability in the limit from other input types and cognitive feasibility.
Similar content being viewed by others
Notes
Of course, lexical entries and segments have their own internal structure, but we can think of them here as represented by primitive symbols.
The term “complete text” is preferrable for surjective sequences as defined below, but it is cumbersome to carry the adjective, so we refer to complete texts/texts interchangeably in this paper since we will not need the more general notion of “text” as any sequence of elements from the language.
A notable exception not entirely related to identification in the limit is PAC-learning.
References
Angluin, D. (1980). Inductive inference of formal languages from positive data. Information and Control, 45, 117–135.
Blum, M., & Blum, L. (1975). Towards a mathematical theory of inductive inference. Information and Control, 28, 125–155.
Chiswell, I. (2009). A course in formal Languages. Springer.
Chomsky, N. (1959). On certain formal properties of grammars. Information and Control, 2, 137–167.
Chomsky, N. (1986). Knowledge of language: its nature, origin, and use. Praeger.
Gold, M. (1967). Language identification in the limit. Information and Control, 10, 447–474.
Heinz, J. (2010). Learning long-distance phonotactics. Linguistic Inquiry, 41(4), 623–661.
Heinz, J. (2016). Computational theories of learning and developmental psycholinguistics. In J. Lidz, W. Snyder, & J. Pater (Eds.), The Oxford handbook of developmental linguistics (pp. 633–663). Oxford University Press.
Johnson, K. (2004). Gold’s theorem and cognitive science. Philosophy of Science, 71, 571–592.
Niyogi, P. (2006). The computational nature of language learning and evolution. MIT press.
Shieber, S. (1985). Evidence against the context-freeness of natural language. Linguistics & Philosophy, 8, 333–343.
Wharton, R. (1974). Approximate language identification. Information and Control, 26, 236–255.
Acknowledgements
I greatly thank the anonymous reviewer for the invaluable feedback that helped improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alves, F.C. Language Learnability in the Limit: A Generalization of Gold’s Theorem. J of Log Lang and Inf 32, 363–372 (2023). https://doi.org/10.1007/s10849-022-09391-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10849-022-09391-w