Skip to content
BY 4.0 license Open Access Published by De Gruyter Mouton April 30, 2024

The role of entrenchment and schematisation in the acquisition of rich verbal morphology

  • Gordana Hržica ORCID logo EMAIL logo , Sara Košutar , Tomislava Bošnjak Botica and Petar Milin ORCID logo
From the journal Cognitive Linguistics

Abstract

Entrenchment and schematisation are the two most important cognitive processes in language acquisition. In this article, the role of the two processes, operationalised by token and type frequency, in the production of overgeneralised verb forms in Croatian preschool children is investigated using a parental questionnaire and computational simulation of language acquisition. The participants of the questionnaire were parents of children aged 3;0–5;11 years (n = 174). The results showed that parents of most children (93 %) reported the parallel use of both adult-like and overgeneralised verb forms, suggesting that Croatian-speaking preschool children have not yet fully acquired the verbal system. The likelihood of overgeneralised forms being reported decreases with the age of the children and verb type frequency. The results of the computational simulation show that patterns with a higher type frequency also show a greater preference for the correct form, while lexical items show both learning and unlearning tendencies during the process.

1 Introduction

1.1 Usage-based approach to language acquisition

A long-standing debate in the field of language acquisition is whether this process is driven by innate grammatical knowledge, rules, and parameters, or by general cognitive and interactive abilities, with input being the primary source of information about how language works (Lieven and Tomasello 2008; Tomasello 2005). From the perspective of a usage-based approach, language acquisition emerges from actual usage events; that is, interactions with adults from which children develop increasingly complex and abstract linguistic representations based on domain-general cognitive abilities (e.g., Bybee 2010; Tomasello 2005). Usage-based accounts therefore assume that children’s knowledge of language emerges from generalisations that they form in their language use, i.e. characteristics of the input are seen as the driving force behind language acquisition. This is in contrast with generativist approaches, according to which language acquisition is governed by formal rules that operate on abstract grammatical categories stored in the lexicon (see Ambridge and Lieven 2011). In the present study, we adopt a usage-based approach to investigate the role of input in language acquisition.

From the perspective of usage-based accounts, two main cognitive processes are involved in language acquisition: entrenchment and schematisation (Tomasello 2005). Entrenchment refers to the strengthening of the mental representation of a linguistic structure through repetition to the point that the use of the structure becomes automatic (Blumenthal-Dramé 2012; Langacker 2008). Schmid (2017) has proposed a comprehensive definition of entrenchment. It is a lifelong process of reorganisation and adaptation of individual communicative knowledge as a function of exposure to the language, as well as of domain-general cognitive processes and social context. Thus, the more often a child hears a particular linguistic structure, the more this structure becomes entrenched.

However, variable forms of entrenched routines appear in the input. Therefore, entrenchment alone is insufficient for language acquisition. Schematisation also comes into play. Schematisation is a particular type of abstraction, the process of discovering similarities and abstracting differences between structures (Langacker 2000; Tomasello 2005). The likelihood of a particular schema being extended depends on the number of items that are similar in form or meaning (Bybee 1995). This process is favoured by the repetition of various items that share commonalities. Thus, the more items with similar forms or meanings a child hears, the more likely they are to abstract a schema.

Entrenchment and schematisation are closely related (Blumenthal-Dramé 2012; Langacker 2017; Schmid 2017; Theakston 2017). According to Schmid (2017), entrenchment is not only related to the strength of the representations of different linguistic elements and structures but also to the emergence and reorganisation of variable schemas based on generalisations. All linguistic knowledge is available in the format of associations, and entrenchment processes operate over these associations so that they become entrenched. The greater the number of items displaying similar pattern, the more entrenched the schema. From a purely theoretical perspective, such claims are considered indisputable, but they essentially lack empirical evidence (see Schmid 2017). Furthermore, the emergence of schemas is considered part of the entrenchment process, but the relationship between entrenchment and schematisation has not been studied in detail.

In language acquisition studies, entrenchment and schematisation have mainly been treated separately. They have been operationalised as token frequency and type frequency, respectively. However, not all studies investigating frequency effects in language acquisition link these factors to entrenchment and schematisation. The role of type and token frequency has been explored within different theoretical approaches (see Ambridge et al. 2015), but studies have mainly focused on syntactic constructions (e.g., Abbot-Smith and Tomasello 2006; Ambridge et al. 2012; Goldberg et al. 2004; Theakston 2004). There have been studies of the acquisition of inflectional morphology, but this language domain has not been examined in the same detail (e.g., Bybee 1995; Dąbrowska and Szczerbiński 2006; Marchman 1997; Maslen et al. 2004; Räsänen et al. 2016). These studies have shown that both entrenchment and schematisation play a role in language acquisition; however, the precise ways in which they affect a child’s language system as the system becomes more abstract are not yet fully understood.

1.2 Acquisition of inflectional morphology

Inflectional morphology provides a fertile ground for testing hypotheses about how entrenchment and schematisation operate in language acquisition (e.g., Bybee 1995; Bybee and Slobin 1982; Dąbrowska and Szczerbiński 2006; Marchman 1997; Maslen et al. 2004). Children acquire inflections by first extracting concrete items they hear in the input, for instance morphemes (-ed) or ready-inflected forms (e.g., He jumped), and storing them (Bittner et al. 2003; Dąbrowska 2004; Lieven and Tomasello 2008; Räsänen et al. 2016; Tomasello 2005). Initially, these rote-learnt forms are unproductive, and the children have no understanding of their internal morphological structure. Generativist and usage-based approaches differ in their explanation of subsequent morphological development. According to most generativist accounts, the production of morphological forms, for example tense- and agreement-marked forms, is governed by the abstract grammatical categories: tense (TNS) and agreement (AGR). Therefore, children use a regular rule or procedure that operates over every word stored in their vocabulary. Only a small number of words are acquired differently. They are considered irregular and are stored directly as associated pairs of root and past tense forms (e.g., Pinker 1999). The usage-based approach assumes that children gradually generalise over the stored items to construct more abstract morphological schemas in the schematisation process (e.g., He X-ed). As the repertoire of constructions increases, so does their complexity (i.e., the number of parts) and abstractness (i.e., the scope of the slots). It is important to note that children are capable of abstraction from the beginning of their language acquisition. Once a child has acquired a handful of stored forms, he or she can form at least a few productive schemas (Lieven and Tomasello 2008; Räsänen et al. 2016).

Arriving at the correct and automatic production of inflectional word forms in different languages can be challenging because of the complexity of the inflectional system. The most cited example of generalisation is the English simple past tense (see Marcus et al. 1992). The general pattern (e.g., jump – jump-ed) can be applied to many verbs. However, many irregular verbs are formed idiosyncratically (e.g., think – thought), giving rise to unacceptable overgeneralisations (e.g., *think – thinked). Overgeneralisation involves extending a particular pattern to instances where this pattern does not apply (see Bittner et al. 2003; Dressler 2011; Maslen et al. 2004; Xu and Pinker 1995). Therefore, more than one form per cell (adult-like and overgeneralised) is attested in children’s language. Overgeneralisation reflects the complexity of the morphological systems, and the strategies children use to master it. The use of two forms in children’s language is a temporary side effect of the earlier stages of language acquisition that should disappear as children grow older. Previous studies have shown that there are cross-linguistic differences in the acquisition of morphology due to language-specific factors such as morphological complexity or transparency (Ravid 2019; Xanthos et al. 2011). Languages other than English display different instances of overgeneralised forms. In Croatian, the target language of the present study, inflectional morphemes for marking tense, person and number are for many verbs regularised and, thus, early acquired. However, certain verb classes undergo stem change to form a morphological form. Overgeneralised forms in which a child attaches the correct inflectional morpheme but does not change the stem are frequent (e.g., plak-a-ti ‘to cry’ > *plak-a-m instead of plač-e-m ‘I cry’; see Hržica 2012).

However, it seems that entrenchment and schematisation may play crucial roles in enabling children to constrain overgeneralisations across different languages (e.g., Ambridge et al. 2008; Brooks et al. 1999; Bybee 1995; Marchman 1997; Maslen et al. 2004; Theakston 2004). One of the factors underlying the processes of entrenchment and schematisation is input frequency (see Divjak and Caldwell-Harris 2015; Schmid 2017). Researchers have distinguished between two types of frequencies (e.g., Bybee 1995; Bybee and Thompson 1997). Token frequency refers to the number of times a particular word form appears in the input (e.g., the number of instances of the past tense form of the verb jump). The greater the frequency with which the relevant form occurs, the greater the likelihood that it is available for retrieval. Type frequency has been referred to under different names in previous studies, such as class size or phonological neighbourhood density, with slight differences in operationalisation (see Dąbrowska and Szczerbiński 2006; Engelmann et al. 2019; Kirjavainen et al. 2012; Räsänen et al. 2016; Savičiūtė et al. 2018). Following usage-based accounts, the term type frequency is used in this study. This factor generally refers to the number of distinct word forms that occur in the same type or schema slot, such as the number of verbs with -ed as instances of the past tense schema ‘V-ed’.

Different theoretical approaches consider input frequency in slightly different ways (Ambridge et al. 2015; Yang 2015). If at all explicitly mentioned, generativist accounts consider token frequency to be highly relevant for the acquisition of irregular language items. Only in some cases is type frequency regarded relevant for the acquisition of both general regular rules and irregular items. Returning to the debate about the past tense, Yang (2015) argues that the frequency of regular verbs leads to their early successful usage of the default -ed form. Irregular verbs are mastered later, as they are organised into classes that undergo identical or similar suffixation and readjustment processes. In the acquisition of such lexicalised rules, a child is guided by both token and type frequency (Yang 2002, 2015). Usage-based approaches consider both type and token frequency to be important. Token frequency promotes entrenchment through repetition of identical tokens in the input. Type frequency promotes schematisation through the repetition of various items that share commonalities in form or meaning. The more items a particular schema contains, the more likely it is to be extended to new items. However, regarding the acquisition of inflection, it is important to note that token frequency may play a different role than type frequency, as inflected forms with high token frequency represent unanalysed frozen phrases and thus do not contribute to analogical generalisation (Ambridge et al. 2015).

Numerous studies have found effects of token and/or type frequency on the acquisition of inflectional morphology (e.g., Aguado-Orea and Pine 2015; Bybee 1995; Dąbrowska and Szczerbiński 2006; Kirjavainen et al. 2012; Maslen et al. 2004). The basic finding is that the more frequently an inflected target form occurs in the input, the easier it is for a child to access and use it, leading to a lower production error rate. Moreover, a higher type frequency encourages generalisations; therefore, inflections that cover larger noun or verb classes also have lower error rates in production. However, studies in different languages have yielded contradictory results regarding the frequency effects that drive these processes. Regarding overgeneralisation, most studies have examined the effects of frequency on English verbal morphology (e.g., past tense). In morphology-rich languages, due to high inflectional variation (see Xanthos et al. 2011), certain forms are less predictable and therefore cannot always be easily generalised (see Dąbrowska and Szczerbiński 2006; Mirković et al. 2011; Savičiūtė et al. 2018). Among highly inflected languages, studies have mostly focused on the case marking of nouns (e.g., Dąbrowska and Szczerbiński 2006; Granlund et al. 2019; Krajewski et al. 2011; Savičiūtė et al. 2018). Studies examining verbal morphology, another inflectional domain, have found conflicting results. Some studies have confirmed a facilitative effect of both type and token frequency (e.g., Polish: Engelmann et al. 2019; Finnish: Räsänen et al. 2016; Finnish: Engelmann et al. 2019), while others found no facilitative effect of token frequency (e.g., Finnish: Kirjavainen et al. 2012), or reported that type frequency did not have the same effect on word processing in different languages (e.g., English and Spanish: Vitevitch and Stamer 2006). From a usage-based perspective, it is important to investigate how the phenomena of entrenchment (operationalised by token frequency) and schematisation (operationalised by type frequency) function in languages with different morphological features. The Croatian language, with its rich inflectional system and its numerous patterns, which can have different acquisition paths, could offer deeper insights into these processes.

Another factor to consider in terms of overgeneralisation is age (see Schmid 2017; Theakston 2017). It has often been suggested that there is a U-shaped curve period in language acquisition, in which a child transitions from the adult-like inflectional form (as it is acquired as a rote-learnt form) to an overgeneralised form (when a child starts to generalise and construct morphological schemas) and then back to an adult-like form when a specific schema is acquired. Many researchers have investigated this period of multi-form production and noted a gradual decrease in overgeneralised forms (e.g., Frank et al. 2021; Maratsos 2000; Marchman et al. 1997; Maslen et al. 2004), but the recovery from overgeneralisation has rarely been researched. There is evidence of overgeneralisations in school-age speech (Marcus et al. 1992), and in small percentages even in adult speech (Marcus et al. 1992; McDonald and Roussel 2010). Furthermore, children’s morphological behaviour might also differ as a function of typological properties and system complexity (e.g., Bittner et al. 2003; Clark 2001). Children who speak morphology-rich languages respond earlier to specific structural features of their language. The most recent study of overgeneralisations in Croatian demonstrates that children do not retreat from the inflectional errors earlier than in less rich languages like English (Hržica et al. 2023).

1.3 Methodological aspects of entrenchment and schematisation studies

Entrenchment and schematisation in language acquisition (operationalised by token and type frequency respectively) have been studied mostly with corpus-based (e.g., Aguado-Orea and Pine 2015; Maslen et al. 2004) or experimental methods (e.g., Granlund et al. 2019; Kirjavainen et al. 2012). Computational modelling provides another means of investigating how the emergence of entrenchment and schematisation might take place in morphology-rich language (e.g., Divjak et al. 2021; Divjak et al. 2023a; Divjak et al. 2023b; Engelmann et al. 2019; Milin et al. 2016; Rumelhart et al. 1986; Wonnacott et al. 2008). The aim of computational models of morphological acquisition is to simulate not only the adult-like performance, but also phenomena related to patterns of errors (i.e., overgeneralisations) produced by children (e.g., Engelman et al. 2019; Ramscar and Yarlett 2007; Ramscar et al. 2013). Although many computational models of morphological acquisition have been proposed (overview: Engelman et al. 2019, also see Divjak and Milin 2023; Milin et al. 2023), they mostly focus on the English past tense marking.

To this end, connectionist models that aim to produce rule-like behaviour without resorting to any formal symbolic rules have been tested. Connectionist models have been developed to model language data within the usage-based linguistics (Bybee and McClelland 2005). The famous Parallel Distributed Processing (PDP) model by Rumelhart et al. (1986) successfully demonstrated typical developmental error patterns, namely overgeneralisations, showing how they increase with age and how they are eventually overcome. This has been achieved by confirming that rule-like behaviour could arise from interactions between abstract representations in the network of input-output mappings. Although the model succeeded in simulating developmental patterns by eschewing formal symbolic rules, it did not achieve adult’s performance (Engelman et al. 2019).

A question that remains open is whether such models can succeed in simulating the acquisition of morphological systems in highly inflected languages. Only a few studies attempted to answer this question. For example, Engelman et al. (2019) tested whether a connectionist model could simulate the acquisition of inflectional morphology for marking person and number in highly inflected languages such as Polish and Finnish. The model showed the robustness of the effects of token frequency and phonological neighbourhood density (i.e., type frequency), namely both factors contributed to the choice of the correct form (as opposed to errors or overgeneralisations), and they did not decrease with age. The same results were obtained by behavioural elicited production tasks. Therefore, Engelman et al. (2019) study showed that computational models developed for simpler systems such as English can scale up to the full complexities in highly inflected languages. Similarly, Mirković et al. (2011) have attempted to develop a model that simulates a highly complex system of noun marking in Serbian. The results showed that the model can successfully simulate the adult-like performance tested in the behavioural experiment that was part of the same study. Although the authors did not focus on language acquisition, they have pointed out that their model could be used in future research to simulate how children acquire such systems.

While connectionist models have focused on networks of representations that are sensitive to input, some researchers have gone a step further and explored models that implement basic cognitive principles of human capacity for learning. A usage-based approach assumes that knowledge of language emerges from the exposure to usage and that linguistic abilities are rooted in general cognitive abilities. This aligns with learning-based models, in which learning is conceived as a general explanatory principle for the organisation of language knowledge (e.g., Dąbrowska 2016; Divjak et al. 2021; Milin et al. 2016, 2023). These are error-correction, associative (also, discriminative, predictive) models that adapt to the environment by repeatedly correcting erroneous predictions for upcoming events. The most prominent is the Rescorla–Wagner model (Rescorla 2008; Rescorla and Wagner 1972), which is identical to one of the first Artificial Neural Network (ANN) models, well-known in engineering and machine learning as Widrow–Hoff or Delta rule (Widrow and Hoff 1960). The basic components of such a learning system are the input cues and their weights in predicting learning outcomes.

The error-driven learning model has been applied to account for a range of phenomena (see Milin et al. 2023). For example, Divjak et al. (2021) relied on the Naïve Discriminative Learning implementation of the Rescorla–Wagner learning rule to capture how language users arrive at making decisions between two forms, i.e. when dealing with the allomorphy exhibited by the Polish genitive singular, which can be marked by –a or –u. This study showed that the basic principle of error-driven learning allows language users to detect relevant patterns of any degree of systematicity. However, to the best of our knowledge, no study has yet investigated how learning-based models can simulate the processes of entrenchment and schematisation (in terms of token and type frequency) in the acquisition of morphology in highly inflected languages, especially their role in the use of overgeneralisations. Furthermore, triangulation of methods, for example, combining parental questionnaire data with computer cimulationso of language learning, could provide a better understanding of how these processes work in children’s developing language, or how they contribute to overgeneralisation. In recent studies of overgeneralisations in child language, researchers have used parental questionnaires (e.g., Frank et al. 2021; Hržica et al. 2023). This method allows us to investigate low-frequency phenomena such as overgeneralisations, which might be difficult to capture with corpus-based methods. On the other hand, there have been no attempts at computational modelling based on data from parental questionnaires, and this approach may be able to provide “the best of both worlds”, providing empirical rigour while obviating conjectures at the low-frequency end.

1.4 The Croatian verbal system

Like other Slavic languages, the Croatian verbal system displays great morphological richness. Person and number are fused and expressed by six verbal suffixes: 1st, 2nd, and 3rd person in singular and plural. In addition, in some verb classes infinitive and present tense stems differ; i.e., the formation of basic verb forms requires not only adding inflectional morphemes but also forming the proper stem, which often includes complex morphophonological alternations. These classes constitute a significant portion of linguistic structures. According to several sources (overview provided by Bošnjak Botica et al. 2022), stem-changing classes make up approximately 30 % of the verbs. Within each of these classes, the readjustment processes for stem changes are either identical or exhibit striking similarities across all verbs. The structure of a verb consists of several slots – for most verbs, the canonical form has three obligatory slots: root, thematic suffix (which can be additionally segmented), and inflectional morphemes: hod-a-ti ‘to walk’ > hod-a-m ‘I walk’. Two verb stems can differ in the second slot (e.g., trč-a-ti ‘to run’ > trč-i-m ‘I run’) or in the first and second slot (pis-a-ti ‘to write’ > piš-e-m ‘I write’). Conjugational classes (6–10 classes, depending on the approach, cf. Jelaska and Bošnjak Botica 2019) are based on schemas that govern the production of the first and second slots, whereas inflectional morphemes apply to verbs of all conjugational classes. Moreover, there is a small number of verbs consisting of two obligatory slots, root and inflectional morphemes, which lack an overt thematic suffix in the infinitive, where the first slot changes and a vowel is added in the second slot to form a present tense stem (e.g., jes-ø-ti ‘to eat’ > jed-e-m ‘I am eating’). The slots that undergo a stem change are susceptible to diverse deviations and overgeneralised forms have been attested in children’s early acquisition of verbal morphology (e.g., Hržica 2012). Also, classes without stem change are expected to serve as a model for overgeneralisation, based on some previous research (see Hržica 2012).

(1)
Examples of verb classes regarding the change in obligatory slots:
a.
Verbs without stem change: gled-a-ti ‘to watch’ – gled-a-m ‘I watch’
b.
Verbs with the change in the second slot: drž-a-ti ‘to hold’ – drž-i-m ‘I hold’
c.
Verbs with the change in the first and second slot: pis-a-ti ‘to write’ – piš-e-m ‘I write’

The verb classes vary in their type frequency, with the most frequent classes being characterised by the absence of stem changes. The distribution of frequency among other classes cannot be directly related to the degree of morphophonological complexity. Certain classes with thematic vowel and consonant changes surpass the frequency of those with only thematic vowel changes in the stem (Jelaska and Bošnjak Botica 2019). The heterogeneity within class frequency is further accentuated by the wide spectrum of verb token frequencies, ranging from exceptionally rare to very frequent instances. In this study, additional classes were introduced (see Section 3.2). This extension allows us to examine the auditory input experienced by the child in more detail and to make fine distinctions between different phoneme alternations. By creating these additional classes, we aim to simulate how children perceive and categorise verbs in the context of morphophonological variation.

2 Aim and hypotheses

Previous research has shown that entrenchment and schematisation play a role in constraining the overgeneralisations in language acquisition. Although entrenchment and schematisation are considered universal, studies conducted in typologically different languages have yielded conflicting results regarding the frequency effects that drive these processes. From a usage-based approach, exploring how entrenchment (operationalised by token frequency) and schematisation (operationalised by type frequency) co-affect the production of overgeneralised verb forms in morphology-rich languages could provide a deeper understanding of the role of language input and language usage. Here, we seek to replicate the results of the effects of type and token frequency in acquisition of inflectional morphology in Croatian. This will then form the basis for subsequent computational modelling.

Therefore, the present study has two aims. The first aim is to test the predictions of input-based accounts of the acquisition of inflectional morphology and, specifically, the role of entrenchment and schematisation (operationalised by the token and type frequency, respectively) in the production of overgeneralised verb forms in preschool children speaking Croatian. Secondly, a computer simulation will be used to determine how the choice between the correct and overgeneralized form is made given the interplay of the token (entrenchment) and type (schematisation) frequency.

The Croatian system of verb classes provides an ideal ground for testing usage-based accounts due to its diverse classes with the intricate interplay of type and token frequencies. While the most common classes in Croatian have no stem change, there are several classes with stem-changing verbs. The extent of stem change varies across verb classes. Considering that a child must establish a rule for the acquisition of the verbal system, it could be assumed that morphologically simpler verb classes are acquired earlier. Conversely, if the characteristics of the input take precedence, we would assume that more frequent verb classes take precedence in acquisition, regardless of their morphological complexity. We can therefore observe the interaction between morphological features and frequency dynamics, which sheds light on the factors that influence the acquisition of inflection.

In line with the aims of the study, we addressed the following research questions:

  1. Are there age-related differences in the production of overgeneralised verb forms?

H1: The rate of overgeneralised verb forms will be significantly lower as children’s age increases.

  1. Do entrenchment (operationalised by token frequency) and schematisation (operationalised by type frequency) contribute to the production of overgeneralised verb forms by Croatian-speaking preschool children?

H2: Both token frequency and type frequency contribute to the probability of producing overgeneralised verb forms, i.e. the rate of overgeneralised forms will be significantly lower for verbs with higher token frequency and type frequency.

  1. How do entrenchment (operationalised by token frequency) and schematisation (operationalised by type frequency) contribute to the computational simulation of learning a phenomenon, i.e. the preference for selecting the correct vs. the overgeneralised form?

H3: Higher token frequency of individual items contributes to a higher probability of selecting the correct (assigned) form of the verb in earlier stages of the learning process.

H4: Higher type frequency of the paradigm contributes to a higher probability of selecting the correct (assigned) form of the paradigm in later stages of the learning process.

3 Parental questionnaire: method and results

3.1 Participants

Prior to commencing the study, ethical approval was obtained. Participants were parents of native Croatian-speaking children aged 3;0–5;11, recruited via social media (n = 82) and in three kindergartens (n = 92). The invitation included a link to the SurveyMonkey platform where participants could access a consent form with clear and detailed description of the purpose of the study, specific procedures related to the study and the possibility of withdrawal from the study. Participants were divided into three groups according to the age of their children: three-year-olds (n = 64), four-year-olds (n = 61), and five-year-olds (n = 49). All participants provided information on their socioeconomic status, including maternal education. The three age groups did not differ according to their socioeconomic status (H(2) = 2.56, p = 0.277). The data reported by the parents are presented in Table 1.

Table 1:

Demographic characteristics of participants (n = 174).

Chronological age Gender
N Age group Min Max M SD Male Female
64 Three-year-olds 3;0 3;11 3;6 3.5 33 31
61 Four-year-olds 4;0 4;11 4;6 3.4 32 29
49 Five-year-olds 5;0 5;11 5;5 3.5 24 25

The initial dataset encompassed 6 children whose parents completed the protocol twice. These cases were excluded, resulting in a final dataset of 167 participants.

3.2 Materials

For the purpose of this study, an online parental questionnaire was developed. Based on parents’ extensive knowledge of their child’s language skills in a wide variety of naturalistic settings, parental questionnaires have been shown to correlate with laboratory-based measures of children’s performance (see Dale 1991; Dale et al. 1989). Parental questionnaires have already been used in research on inflectional morphology (Frank et al. 2021). This method can accommodate many items, and if distributed online, can reach a more diverse population. Moreover, the phenomenon investigated in the present study, i.e. child-like forms such as overgeneralisations, can be easily identified by parents. In the parental questionnaire, we asked parents to assess how often their child uses a certain form of the verb, namely correct and overgeneralised.

When creating the questionnaire, we first selected the verbs that could be overgeneralised with regard to the morphological characteristics of Croatian verbal system. The verbs were taken from the Croatian Corpus of Child Language (Kovačević 2002), namely child-directed speech, to ensure that children encounter all verbs regularly. The child language corpus included language samples of spontaneous interactions between three children and adults collected during early language acquisition until approximately 3 years of age.

Croatian verbs can be categorised into different classes based on their morphological properties, namely stem changes (see Jelaska and Bošnjak Botica 2019). In the present study, we included verb classes with stem changes that occur at least five times in child-directed speech. We hypothesised that the presence of a certain verb class in the input would stimulate the acquisition of a concrete verb form by promoting schematisation, which in turn would facilitate this process. To this end, type frequency was calculated for each of the verb classes. Type frequency is the frequency of all verbs belonging to a certain class in selected morphological forms. Based on the type frequency values, we have ordered the morphological verb classes (see Table 2) from 1 (the lowest type frequency) to 7 (the highest type frequency). Since overgeneralisation appears in different forms for different verb classes, type frequency was calculated differently for classes 1 to 6 than for class 7. For classes 1 to 6, type frequency is the number of all forms of all verbs of a given class in the present tense; for class 7, it is the number of all forms of all verbs of this class in the past participle.

Table 2:

Verb classes in the questionnaire.

Type-frequency classa Type frequency in CDSb Number of verbs in CDSc Morphological changed Example
1 79 7 First and second slot ma h - a -ti ‘to wave’ – ma š - e - m ‘I wave’
2 118 13 First and second slot pla k - a -ti ‘to cry’ – pla č - e -m ‘I cry’
3 127 21 First and second slot pi s - a -ti ‘to write’ – pi š - e -m ‘I write’
4 153 17 Second slot penj- a -ti ‘to climb’ – penj- e -m ‘I climb’
5 259 23 Second slot bjež- a -ti ‘to run away’ – bjež- i -m ‘I run away’
6 297 24 First and second slot di z - a -ti ‘to lift’ – di ž - e -m ‘I lift’
7 1,094 51 First and second slot do- ø -ći ‘to come’ – do š - a -o ‘came’
  1. aOrder of a particular verb class according to its type frequency. bNumber of all tokens of all verbs belonging to a particular class in the selected morphological forms (class 1 to 6: all verb forms in the present tense; class 7: all verb forms in the past participle) in child-directed speech. cSelection criteria for a verb class: number of different verbs belonging to a certain verb class in child-directed speech (at least 5). dSlot in which the stem change within the verb is observed.

To better capture the morphophonological changes relevant to the production of different stems, we used an additional subclassification of traditional conjugation classes (the way they are presented in Jelaska and Bošnjak Botica 2019), namely the class 5 (see Section 1.4 and Table 2). The verbs in classes 1, 2, 3 and 6 evidence the thematic vowel change along with the consonant change (e.g., plak-a-ti ‘to cry’ > plač-e-m ‘I cry’), i.e. there is a change in the first and the second slot. Verbs in classes 4 and 5 evidence the change of a thematic vowel but no consonant change in the present tense (e.g., penj-a-ti ‘to climb’ > penj-e-m ‘I climb’), i.e. there is a change in the second slot only. The verbs in class 7 undergo vowel and consonant insertion when forming the present tense and past participle. In the past participle, the vowel a is added, as is the consonant in the stem (e.g., do-ø-ći ‘to come’ > dođ-e-m ‘I come’ > doš-a-o ‘I came’), i.e. there is a change in the first and the second slot.

In each class, two verbs were selected, the one with the lower frequency and the one with the higher frequency in the Croatian Corpus of Child Language (Kovačević 2002). Overgeneralised forms in the present tense were formed by retaining the infinitive stem and disregarding phonological alternations within the stem that appear in adult-like forms of these verbs (e.g., mahati ‘to wave’ > *maham ‘I wave’). Overgeneralised forms in the present tense resembled the most frequent Croatian conjugational class without stem change (e.g., hodati ‘to walk’ > hodam ‘I walk’; see Hržica 2012). The overgeneralised forms in the past participle were formed from the present tense stem of a verb (e.g., dođem ‘I come’) by adding the thematic vowel either to the past participle (e.g., *dođao ‘came’), the present tense (e.g., *dođeo ‘came’), or the imperative stem of the verb (e.g., *dođio ‘came’).

3.3 Procedure

After providing informed consent by signing the written document that explains the purpose and the conditions of the research, the participants were sent a link to the questionnaire via email. By confirming an online consent, they were provided with the instructions and the example of a test item. Participants were asked to rate the frequency with which their child used a particular verb form on a 5-point Likert scale ranging from 1 (never) to 5 (very often; see Appendix A1 and Appendix A2). Two items for each verb were included in the questionnaire. The first item tested whether the child used an adult-like form of the verb. The second tested whether the child used an overgeneralised form. The items were displayed randomly and the participants estimated each form (adult-like and overgeneralised) independently. Participants were also instructed to provide additional forms (e.g., erroneous or overgeneralised) that their child is producing in case that the offered overgeneralised form was not available in the questionnaire. The questionnaire took 10–15 min to complete.

3.4 Variables for the analysis

Three variables were used in the analysis: token frequency, type frequency and the age of the child in months. Below we discuss the details for token and type frequency, and then turn to explaining the chosen statistical approach.

Our first variable for the analysis was token frequency. Previous studies have shown that the token frequency of an individual item supports its entrenchment. The token frequency ratings for the 14 verbs were obtained from child-directed speech in the Croatian Corpus of Child Language (Kovačević 2002). To this end, we have summed up the frequencies of all forms of a certain verb occurring in the morphological form we analysed (i.e., present tense forms for classes 1 to 6 and past tense forms for class 7).

Type frequency was the second variable for the analysis. Previous studies have also reported that type frequency supports schematisation. For the seven stem-change verb classes, type frequency was obtained from child-directed speech in the Croatian Corpus of Child Language (Kovačević 2002). This measure was calculated by extracting all verbs belonging to a particular class and summing their token frequencies. Previous studies have used different methods to determine the size of a particular morphological paradigm, either by (a) considering a list of verb forms that belong to that class according to reference grammars, (b) considering all tokens of all verb forms attested within that class in corpora, or (c) considering only individual selected (targeted) verb forms attested within that class in corpora (see Engelmann et al. 2019). We opted for the last of the three because it better reflects the children’s exposure to certain morphological schemes. Type frequency was calculated for each class. For classes 1 to 6, the type frequency is the number of all forms of all verbs of a particular class in the present tense; for class 7, it is the number of all forms of all verbs of that class in the past participle (see Table 2).

Parents’ responses were collected for 14 selected Croatian verbs, resulting in a total of N = 2,338 data points (167 participants × 14 verbs). Prior to statistical modelling, several preparatory steps were taken. (1) Children’s age in months were scaled (i.e., standardized or z-transformed) to accommodate high values and their wide scatter. (2) Token frequency underwent a rank-to-normal transformation and then a small number of extreme, discontinuous values were removed (167 data points or 7 % of the full dataset) to address its asymmetric and uneven density distribution (see Baayen and Milin 2010). Practically, the data trimming pertained to the removal of one exceptionally frequent Croatian verb (doći ‘to come’). (3) The main response variable, i.e. the rate of overgeneralisation, featured ordered categories (1: never, 2: rarely, 3: sometimes, 4: often, 5: very often). To address the significantly lower frequency of the last two categories (4 and 5), they were combined, resulting in a modified response variable with four ordered categories (1: never, 2: rarely, 3: sometimes, 4: often and very often).

The chosen statistical approach was Bayesian Linear Mixed-Effect (BLME) modelling. For the purpose of the present study, its advantage lies in how it accounts for the data size. In simple terms, the Bayesian likelihood depends on the size of the data, with convergence between Bayesian and Frequentist solutions increasing with larger datasets. Smaller datasets, conversely, yield wider Bayesian Credible Intervals (CrIs, akin to Frequentists’ Confidence Intervals, CIs; see Gill 2007), reflecting the “penalised” likelihood.

The analysis utilized the brms package (version 2.19.0; Bürkner 2017, 2021) and posterior package (version 1.4.1; Bürkner et al. 2023) for the R statistical programming environment (R Core Team 2023). The final and optimal model was specified as follows:

brm([RateOfOvergeneralisation] ∼

AgeInMonths.z ∶ VerbType +

VerbClassSize +

(1|Item) +

(1|Participant),

family=acat,

chains=4, iter=2000,

…)

The model employed a particular link-function suited for ordered categories (adjacent categories – acat), for the 4-level ordered rates of verb overgeneralisation. The posterior checks confirmed the choice of the link-function was rather adequate (see Figure B-1 in Appendix B).

3.5 Results

First, we examined the differences between age groups in the ratios of overgeneralised forms in the seven verb classes. For most children, parents reported both overgeneralised and adult-like forms (N = 162). There were some participants (N = 12) for whom no overgeneralised forms were reported and no participants for whom only overgeneralised forms were reported per verb. The mean number of reported overgeneralised forms per participant was 5.11 (36 2), with a minimum of 0 and a maximum of 14.0 (SD = 3.95). For each verb, the minimum reported Likert scale score for frequency of overgeneralisation was 0 and the maximum reported score was 5. The mean score was 3.4 (SD = 1.35). For each verb, the majority of the results for the use of overgeneralised forms were ‘never’, and for the majority of verbs (N = 11) all values were selected. For three verbs, ‘very often’ was never reported (see Figure 1).

Figure 1: 
Overgeneralisations reported per each verb (by points on Likert scale).
Figure 1:

Overgeneralisations reported per each verb (by points on Likert scale).

As the effect of token frequency did not reach significance, this factor was not considered in the final modelling. This allowed us to use the entire dataset (N = 2,338), as the variables used in the modelling showed no signs of influential values. The results, more specifically, revealed highly significant random effects of participants and items. On the fixed-effect side, both the class size of the verb and the interaction of verb type and the child’s age emerged as significant too. Initially, weakly informed priors were used. Then, they were subsequently reset to conservative informed priors, assuming a narrow dispersion of posterior draws, setting up the small standard deviations. Complete prior specifications (Table B-1) and the summary table of results (Table B-2) are available in Appendix B.

The main effect of type frequency (VerbClassSize) proved significant and facilitatory (CrI = [−0.29 0.12]). As illustrated in Figure 2, the probability of a parent/carer reporting that a child never uses an overgeneralised verb form increases with verb type frequency. At the same time, there is a proportional decrease in the probability of the other three categories to occur (2: rarely, 3: sometimes, 4: often and very often).

Figure 2: 
Conditional effect of type frequency ordered rates with respective 95 % CrLs of posterior draws.
Figure 2:

Conditional effect of type frequency ordered rates with respective 95 % CrLs of posterior draws.

The interaction effect of age by verb class is visually summarized in Figures 3 and 4. Figure 3 shows the magnitude of the age effect for each verb type individually. The age effect for Class 1 is not significant, as its Credible Intervals cross zero. For the remaining six classes, the age effect varies in its strength, with Class 2 and 3 being the most pronounced, and Classes 4 to 7 exhibiting roughly comparable effects. Figure 4 provides a detailed representation of said interaction.

Figure 3: 
Effect size summary for AgeInMonths by VerbClass. Thin lines represent 95 % CrIs, while tick lines represent the interquartile range (25 to 75 %), both in terms of posterior draws. The vertical dotted line represents the boundary at which an effect changes its direction from negative (facilitatory) to positive (inhibitory).
Figure 3:

Effect size summary for AgeInMonths by VerbClass. Thin lines represent 95 % CrIs, while tick lines represent the interquartile range (25 to 75 %), both in terms of posterior draws. The vertical dotted line represents the boundary at which an effect changes its direction from negative (facilitatory) to positive (inhibitory).

Figure 4: 
Effect size summary for AgeInMonths by VerbClass.
Figure 4:

Effect size summary for AgeInMonths by VerbClass.

In a broad sense, the interaction patterns show strong similarity both among verb classes and when compared with the main effect of TypeFrequency (compare Figures 2 and 4). Notably, the probability of the never-answer increases with the child’s age, while other response categories gradually converge towards zero probability. To summarise, the model underscores a significant decline in the use of overgeneralised verb forms, indicated by an increase in the probability of answering never, in relation to both verb type frequency and the child’s age. This age-related shift, furthermore, manifests differently across 6 verb types (Classes 2 to 6).

Notably, the model did not establish a discernible effect of verb token frequency. This seeming contradiction to established literature, where token frequency effects are often considered a benchmark, can be explained by considering the interplay between that covariate and the random effect of items. To explore this point further, we ran additional models including the critical predictor – token frequency, thus relying on the trimmed dataset with influential values removed (N = 2,171). The targeted comparisons of the models with and without the random effect of items and the token frequency helped shed light on the issue. Three key points are worth attention: (1) The effect of token frequency is indeed present and significant, aligning with expectations, once the random effect of Item is omitted from the model (CrI = [−0.14 −0.02]). (2) When both Item adjustments and token frequency are incorporated into the model, the token frequency effect becomes not significant (CrI = [−0.27 0.16]). (3) Most verb items have distinct token frequencies, with only two pairs sharing the same frequency values. In conclusion, the absence of token frequency effects is attributed to a “tug of war” between the Item random effect and the token frequency covariate, where the former proves more parsimonious and prevails. In fact, a model solely considering frequency effects exhibits a poorer fit compared to a model with only Item random effect by 46 waic units (elpd difference). This nuanced understanding emphasises the rich dynamics at play in the linguistic phenomena under investigation as well as the necessity for careful statistical treatment of the data.

4 Computational simulations: method and results

4.1 Materials

The aim of the computational modelling study was to explore entrenchment and schematisation in the hypothesised process of acquisition of a verbal system in which different active morphological patterns occur with different frequencies (Table 3). The input was modelled on the verbs we had included in the questionnaire, with verb classes that occur in the child-directed speech (CDS). The cues in the input were lexical items and morphological patterns. Eight morphological patterns were included (see Table 4). For seven of them the outcome, i.e. the correct form, was outcome −x, and for one it was outcome −y. The outcome −y for x-patterns would be considered overgeneralised, and vice versa for the y-pattern. The y-pattern is significantly more frequent than the x-patterns (10 times more frequent than the most frequent x-pattern). The x-patterns differ in their frequency in a way that corresponds to the numbers from the child-directed speech. Each pattern is associated with at least two and at most 16 lexical items. Most of the x-patterns (six) are associated with two lexical items, while one x-pattern is associated with four lexical items. The y-pattern was associated with the largest number of lexical items. For each pattern, half of the lexical items had a higher frequency (500), while the other half had a lower frequency (100).

Table 3:

Verbs in the questionnaire.

Type-frequency classa Verbsb Token frequencyc Correct form Overgeneralised form
1 ma h - a -ti

wave.INF.
9 maš-e-m

wave.PRS.1.SG.
mah-a-m

wave.PRS.1.SG.
pu h - a -ti

blow.INF.
37 puš-e-m

blow.PRS.1.SG.
puh-a-m

blow.PRS.1.SG.
2 pla k - a -ti

cry.INF.
51 plač-e-m

cry.PRS.1.SG.
plak-a-m

cry.PRS.1.SG.
vi k - a -ti

yell.INF.
17 vič-e-m

yell.PRS.1.SG.
vik-a-m

yell.PRS.1.SG.
3 pi s - a -ti

write.INF.
7 piš-e-m

write.PRS.1.SG.
pis-a-m

write.PRS.1.SG.
bri s - a -ti

wipe.INF.
64 briš-e-m

wipe.PRS.1.SG.
bris-a-m

wipe.PRS.1.SG.
4 penj- a -ti

climb.INF.
14 penj-e-m

climb.PRS.1.SG.
penj-a-m

climb.PRS.1.SG
smij- a -ti se

laugh.INF.
26 smij-e-m se

laugh.PRS.1.SG.
smij-a-m

laugh.PRS.1.SG.
5 bjež- a -ti

run.INF.
4 bjež-i-m

run.PRS.1.SG.
bjež-a-m

run.PRS.1.SG.
drž- a -ti

hold.INF.
54 drž-i-m

hold.PRS.1.SG.
drž-a-m

hold.PRS.1.SG.
6 di z - a -ti

lift.INF.
4 diž-e-m

lift.PRS.1.SG.
diz-a-m

lift.PRS.1.SG.
poka z - a -ti

show.INF.
29 pokaž-e-m

show.PRS.1.SG.
pokaz-a-m

show.PRS.1.SG.
7 do-ø-ći 253 doš-a-o dođ-a-o/dođ-e-o/dođ-i-o
come.INF. come.PTCP.M.SG. come.PTCP.M.SG.
re-ø-ći 228 rek-a-o reč-a-o/reč-e-o/rec-i-o
say.INF. say.PTCP.M.SG. say.PTCP.M.SG.
  1. aOrder of a particular verb class according to its type frequency. bVerbs from each class included in the questionnaire. cFrequency of a verb in the selected morphological forms (class 1 to 6: all verb forms in the present tense; class 7: all verb forms in the past participle) in child-directed speech.

Table 4:

Input for computational modelling based on the properties of verbs in child-directed speech (CDS).

Verb class Type frequency in CDS Morphological patterns n of lexical items Outcome Frequency
1 79 mo1 2 x 100
2 118 mo2 2 x 200
3 127 mo3 2 x 300
4 153 mo4 2 x 400
5 259 mo5 2 x 500
6 297 mo6 2 x 600
7 1,094 mo7 4 x 1,000
No stem change 12,281 mo8 16 y 10,000

4.2 Variables for the analysis

The token frequency of a single word is predicted to support its entrenchment. In computational simulations, each lexical item associated with a particular pattern has its own frequency of occurrence. We observed whether higher frequencies of individual lexical items in the simulation lead to a higher weight in predicting the outcome. It is assumed that type frequency supports schematisation. In computational simulations, some patterns differ in the number of lexical items associated with the pattern. Both are different conceptualisations of type frequency.

4.3 Procedure

We conducted a computational simulation study using an error-driven learning algorithm, the Widrow–Hoff rule (Widrow and Hoff 1960), which is effectively identical to the Rescorla–Wagner rule (Rescorla and Wagner 1972; see Rescorla 2008 for the point on two learning rules being identical). The rule is presented with events from which it learns to associate the presence of a particular outcome with cues that are informative of its occurrence. At each learning step (the event) it adjusts the strength or weight of the association between a particular cue and a particular outcome. The learning, thus, occurs as continuous process of weights changing.

The joint presence of a cue and an outcome strengthens the association, while the presence of a cue without an outcome reduces the association weight. The plenitude of cues increases the cue competition and makes each of them, individually, less prominent. At the same time, a large number of cues boosts the importance of error-correction while decreasing the importance of correct predictions (represented as weight strengthening).

Following Divjak et al. (2021), we have assumed a constant learning background in addition to the cues (items), meaning that everything else related to learning is considered equally informative (or uninformative). The results were calculated as associative weights showing what the network learnt about the relationships between cues and outcomes.

4.4 Results

The results of the computational simulations represent the acquisition process, i.e. the learning process itself. The process itself summarised in Figures 5 and 6. Figure 5 shows the results for the morphological patterns, while Figure 6 shows the results for a number of the most characteristic lexical items.

Figure 5: 
Results of the computational simulation for the morphological patterns.
Figure 5:

Results of the computational simulation for the morphological patterns.

Figure 6: 
Results of the computational simulation for selected lexical items.
Figure 6:

Results of the computational simulation for selected lexical items.

Figure 5 reveals that the associated weights for morphological patterns 1 to 7 are positive for the −x outcome and negative for the −y outcome, while the opposite is true for pattern 8. The rate at which the −y outcome occurs for pattern 8 is much higher (0.9) than the associated weights for patterns 1 to 7 (all around 0.2), indicating a stronger preference for selecting pattern 8 than for selecting patterns 1 to 7. The rate at which the −x outcome occurs is higher for patterns with higher frequency (for mo1 to mo7 it is 0.187, 0.206, 0.218, 0.221, 0.221, 0.229, 0.234, 0.233). When looking how learning unfolds as illustrated by the coloured curves in Figure 5 (X-axis represents time steps, i.e. new learning events), two important trends can be observed. First, the curve in Figure 5 is quite steep, indicating relatively quick acquisition process. Second, individual developments of cue-outcome associations are rather similar except for pattern 7 – the most common of all x-positive patterns. That pattern too “merges” with other patterns in the later stages of the learning curve, but in the early stages, it has a much higher preference for −x than any of the other x-patterns because it has a higher frequency (probability) of occurrence, and it occurs with −x.

Figure 6 shows learned associations over trial events (X-axis) for a handful of most characteristic or prototypical lexical items. In fact, there are two lexical items per morphological pattern, one with a higher frequency and the other with a lower frequency (e.g., 500 vs. 100). Both lexical items associated with the y-pattern (marked with the initial letter H) show a preference for −y, while lexical items associated with x-patterns (marked with the initial letters A, B, C, D, E, F, G) show a preference for −x. However, all final preferences are rather low, less than 0.06. When lexical items associations are discussed with respect to their related frequencies, the final associative preference for a particular outcome is not significantly higher for the more frequent lexical items (lexical items with B as the final letter) than for the less frequent lexical items (lexical items with A and C as final letters). However, looking at the learning curve, two interesting patterns can be observed. More frequent lexical items show a higher preference for −x or −y (up to 0.1) in the initial phase of learning, but this decreases towards the end of the learning process and the process seems to be about unlearning the general relevance of these more frequent cues. In other words, we see that more frequent and less frequent lexical items are similar in ultimate association strength but reveal different learning trajectories over time (i.e., learning events). This is exactly what we can see in the patterns in Figure 6; compare, for example, lexical cues AA and AB. Similar to Divjak et al. (2024), we also observe layers of cues each of which plays a different role in the complex dynamics of language: some morphological cues (e.g., Figure 5, mo8) give enough support to an outcome (Figure 5, −y ending), while other morphological cues (Figure 5, mo1–mo7) require additional support from lexemes (e.g., Figure 6, AA–BB), because the outcome (Figures 5 and 6, −x ending) is not strongly activated otherwise.

With respect to the type and token frequencies and their role in learning, the results of the simulations show that the type frequency of a pattern changes the learning outcomes in the way that the patterns with a higher type frequency also show a greater preference for the correct form. The lexical items and their token frequencies, more specifically, contribute to more nuanced language dynamics, with both learning and unlearning tendencies that, likely, allow expressive complexities to emerge.

5 Discussion

The present study had two main aims. The first was to test the predictions of usage-based accounts of the acquisition of inflectional morphology, in particular the role of entrenchment and schematisation (operationalised by token and type frequency, respectively) in the production of overgeneralised verb forms in preschool children acquiring Croatian, a language with rich verbal morphology. The second aim was to investigate language users’ preferences for one of the verb forms (correct and overgeneralised), considering the interplay of token frequency (entrenchment) and type frequency (schematisation). For the first aim, we analysed data from a parental questionnaire; for the second, we ran a computational simulation. We addressed three questions that were pertinent to the aims of the study.

The first question investigated whether the production of overgeneralised verb forms differs according to children’s age. The results showed that parents of most children (93 %) reported parallel use of both adult-like and overgeneralised verb forms, suggesting that Croatian-speaking preschool children have not yet fully acquired the complex verbal system. However, the likelihood of parents answering never increases with children’s age, underlining a significant decrease in the use of overgeneralised verb forms in terms of both verb type frequency and the children’s age. This result is consistent with usage-based approaches that assume gradual input-based learning (e.g., Frank et al. 2021; Marchman et al. 1997; Maslen et al. 2004; Räsänen et al. 2016). Moreover, this age-related shift manifests itself differently across six verb classes (Classes 2 to 6), implying that although the use of overgeneralised forms decreases with age, the characteristics of the input may play a crucial role in preschoolers’ use of these forms.

The absence of a single developmental trajectory across six verb types lends support to a usage-based and emergent principle, countering the notion of formal, all-at-once behaviour advocated by generativist approaches (see Ambridge and Lieven 2011). In Croatian, verb classes are distinguished by two criteria. The first involves stem alternation, where certain classes exhibit more intricate morphophonological changes in their stems. The second criterion relates to the type frequency of each class, with the distribution of frequency not aligning with the morphophonological complexity of the verbs. Adhering to the usage-based approach, our study refrained from delving into morphophonological differences between classes, concentrating solely on frequency effects that could shape the acquisition of diverse verb classes. The outcomes reveal that, despite variations in morphophonological complexity, type frequency – the number of verbs of a certain class in the input – exerts a distinct impact on acquisition. Children, following a usage-based approach, construct analogies based on type frequency, facilitating a more resilient recovery from overgeneralizations due to exposure to frequently occurring patterns. Thus, our findings imply that the predominant force driving inflectional morphology acquisition is the formation of patterns from input, rather than the establishment of rules operating across abstract grammatical categories.

In this study, our examination of the system of verb classes in Croatian takes a departure from the verb class divisions outlined in the literature. Commonly, classifications rely on linguistic criteria, such as the grouping of all palatalized verbs in the present tense into a singular class (Jelaska and Bošnjak Botica 2019). To circumvent potential pitfalls arising from linguistic generalizations inherent in existing classifications, we centred on the language input. This alternative perspective has led us to identify additional classes that genuinely mirror what a child is exposed to. Specifically, we have subdivided the palatalized verb class into three distinct subclasses, each representing stem-change patterns encountered by children during the language acquisition process. The aim was to provide a more accurate representation of how verb classes manifest in the context of language learning. The influence of type frequency is evident for these newly defined subclasses during the acquisition process and contributes to our comprehension of how verb classes are acquired in Croatian, steering clear of a theoretically driven categorisation of verb classes and capturing the children’s language learning experiences instead.

Our second research question aimed to investigate how entrenchment (operationalised by token frequency) and schematisation (operationalised by type frequency) contribute to the production of overgeneralised verb forms in Croatian-speaking preschool children. We expected both token and type frequency to contribute to the probability of producing overgeneralised verb forms, in that the rate of overgeneralised forms is significantly lower for verbs with higher type and token frequency. The results show that type frequency has a significant effect on the rate of overgeneralised verb forms. There is a significant decrease in the production of overgeneralised verb forms as a function of type frequency, implying the probability of parents answering never in the questionnaire increases with verb type frequency. The results of the present study are in line with studies that have confirmed the facilitative effect of type frequency on the acquisition of verbal morphology in languages with rich inflectional morphology (e.g., Finnish: Engelmann et al. 2019; Kirjavainen et al. 2012; Räsänen et al. 2016; Polish: Engelmann et al. 2019). By recognising similarities between many items, Croatian children form analogy patterns at an early age. Thus, schematisation (operationalised by type frequency) plays an important role in the retreat from overgeneralised verb forms in morphologically-rich languages.

The statistical analysis did not establish the effect of token frequency – the hypothesised entrenchment. As argued, that effect was concealed by the random effect of verb items. Specifically, verb items, being more parsimonious in a statistical sense, exerted a stronger effect than token frequency. Nonetheless, when the item random effect is removed from the model, the token frequency effect re-emerged in the expected facilitatory direction. Prior research also yields inconsistent findings concerning the token frequency effect. For instance, Kirjavainen et al. (2012) identified a significant inhibitory effect of token frequency in Finnish, where children’s performance was superior for verbs with low token frequency. Conversely, Granlund et al. (2019) detected a significant effect of token frequency on noun morphology in Estonian and Polish but not in Finnish.

Collectively, when considering both prior and present findings, we cannot conclusively refute the effect of token frequency or, for that matter, its validity as an entrenchment indicator. It is reasonable to anticipate, however, that with large(r) datasets, encompassing numerous word items and their respective frequencies, the alignment or one-to-one mapping between the two linguistic predictors – token frequency and word label (i.e. random effect of items), may diminish, thereby enabling the former to attain statistical significance (Milin et al. 2009). In that, more general sense, our results shed light on the intricate dynamics at play in languages with rich morphology.

The results of the present study on the role of entrenchment and schematisation are consistent with usage-based/constructivist approaches to language acquisition. Children create abstract linguistic schemas or constructions (morphological in the present study) from the concrete items they are exposed to in the input; hence the significant effect of type frequency on the production of overgeneralisations. Moreover, children constrain their abstractions depending on how strongly these items are entrenched in the input; hence the observed effect of token frequency. In contrast to generativist approaches, these results indicate that children’s linguistic knowledge may not consist of formal rules, but rather emerges from pattern-finding, i.e. from generalisations that children form when using language, whereas entrenchment strengthens the representation of items, making them more readily available for production. It has to be noted that there are generativist models that explicitly mention or indirectly take into account token frequency or both type and token frequency (e.g., Yang 2002). However, there is an important difference between these models and usage-based models. Although both types of approaches (may) take frequency into account, for usage-based approaches it is considered as built in core mechanism of emergence. Within generativist approaches, frequency is not relevant for the core mechanism, but comes into the picture when peripheral, “unexplainable” parts of the system have to be explored.

Our third research question examined how entrenchment (operationalised by token frequency) and schematisation (operationalised by type frequency) can be modelled computationally, using the simple learning rule and analysing the selection preferences for the correct versus the overgeneralised form given the set of relevant cues. We predicted that both higher token frequency (operationalised as the frequency of lexical items) and higher type frequency (operationalised as the frequency of morphological patterns) would increase the likelihood of selecting the correct form of the given morphological paradigm. The results showed that the lexical items and their token frequencies exhibited both learning and unlearning tendencies, i.e. they formed layers of cues, each of which makes a particular contribution to the complex dynamics of language learning, with cues of different frequencies playing different roles that also change over time; i.e. with experience. Finally, many lexical cues become important when the support from morphological cues alone is not enough.

Preferences for endings in particular lexical items do not change during the process, only their strength, showing that they have a different role in language dynamics. The results also showed that the type frequency had an impact on the final preferences for the morphological patterns: preferences for the y-pattern (the most frequent of all patterns) were much higher than preferences for x-patterns (all less frequent than the y-pattern), and among x-patterns preferences were higher for the more frequent x-patterns and lower for the less frequent x-patterns. In comparison, preferences for endings of lexical items do not change during the process, and apart from the initial phase of uncertainty, the strength of these preferences stabilises quickly and remains similar in association strength. These results can be interpreted in terms of our original expectations. It appears that in the simulated learning process, schematisation (operationalised as type frequency) plays an important role throughout the process, while the role of entrenchment (operationalised as token frequency) changes during the process, showing initial learning and subsequent unlearning tendencies. The specific differences across cues relate to their respective token frequency.

The results of both the parental questionnaire and the computational simulation are consistent with those of previous studies that have found significant effects of token and/or type frequency on the acquisition of verbal morphology (e.g., Ambridge 2010; Engelmann et al. 2019; Kirjavainen et al. 2012; Marchman 1997; Maslen et al. 2004; Räsänen et al. 2016). In this study, we aimed to investigate the relationship between frequency phenomena and language acquisition by observing Croatian conjugational classes and their type frequencies. The Croatian verbal system consists of classes which differ not only in their morphological patterns but also in their type frequencies. The effects of morphological classes have already been observed in studies on the acquisition of different languages. For English, there is ample evidence that the morphological class of a word contributes to its correct or overgeneralised production (e.g., Marchman 1997; Maslen et al. 2004). Certain verbs form the past tense in similar ways (e.g., keep/kept; creep/crept; weep/wept), and these groups differ in their type frequency. Larger classes are less prone to overgeneralisation. Similar tendencies have been observed for other morphologically-rich languages in the domain of noun case marking (e.g., Dąbrowska 2004; Savičiūtė et al. 2018), but less frequently when considering verbal inflection. Engelmann et al. (2019) found that in Polish and Finnish, the type frequency of conjugational classes contributes to the error rate in the inflection of verbs. The results of the present study show that in Croatian, which is also a morphologically-rich Slavic language, the type frequency of a verb class contributes to the occurrence of overgeneralisation. Our results suggest that type frequency, i.e. schematisation, influences the acquisition of a particular verb class or subclass, which could also be relevant for other highly inflected languages.

Usage-based accounts describe language acquisition as a process in which children initially store the completely inflected forms they have heard in the input without being aware of their internal morphological structure. Later in development, generalisation occurs, enabling children to grasp common schemas and form new word forms (see Tomasello 2005). The process of schematisation is evident at a very early age, as soon as a child is able to name a set of objects with the same label, but what changes over time is the scope of abstraction (see Lieven and Tomasello 2008). The presence of schemas is important to the extent that the number of items in the input increases and abstractions of greater scope become available. In our computational simulation similar tendencies were shown as morphological patterns with higher type frequency had a higher preference for the assigned form, throughout the learning phase and after the initial uncertainty.

In summary, our results show that children do indeed use two or more forms per verb in language development, which means that it takes time to retreat from overgeneralisation. The present study, using a different methodology than earlier studies, showed that overgeneralisation and multiple-form usage are still present in five-year-old children. Age is important for recovering from overgeneralisation, but it is not fully understood how entrenchment and schematisation support language acquisition in relation to this factor (cf. Räsänen et al. 2016). Our study contributes to this understanding by showing that type frequency, which is thought to promote schematisation, is an important factor in the successful retreat from overgeneralisation. Many studies point to large individual differences in language acquisition (e.g., Frank et al. 2021) and attribute these differences to child-internal and child-external factors, such as the quality of input or cognitive ability (Paradis 2023; Rowe 2018). Vocabulary size has been shown to influence the rate of overgeneralisation (e.g., Frank et al. 2021). Previous studies have presented that additional language-specific factors may play a role in the acquisition of morphology, so the interaction between them and the general language factors should be further explored. In addition, there are semantic and syntactic differences between the verbs selected for this study, that may have influenced the results. We used overgeneralised forms in the present and past tense to illustrate the morphological complexity a child faces in acquiring Croatian, guided by the idea that there is a similar frequency effect for all overgeneralisations.

The results of this study, which confirm previous research in many ways, demonstrate the strong effect of type frequency and a masked effect of token frequency in two independent studies relying on different methodologies. Using parental questionnaire allowed us to cover a larger number of participants, avoid the frequency limitations of naturalistic corpora studies (see Maslen et al. 2004), and test the theoretical assumptions with a design commonly used in experimental research. With computational modelling we were able to simulate the learning process and, thus, the emergence of key cue-outcome relationships in the complex morphological system modeled by the features of the Croatian language.

The acquisition of inflectional morphology, especially past tense in English, is greatly discussed topic with theoretical perspectives suggesting either inherent systematicity or quasiregularity. Systematic views define language by rules, while quasiregularity posits an integrated system with both regular and irregular elements. Central questions in language acquisition focus on understanding this system and its influencing factors. However, inflectional systems vary across languages; for example, English reports 4–7 % irregular verbs (Beedham 2005; Grabowski and Mindt 1995), while Croatian has around 30 % of stem-changing verbs. They also differ in the number of irregular patterns and complexity. For instance, in some Croatian verbs, only one element of the verb’s stem needs to change to form different tenses, while in others, the change is more extensive. An open question remains whether the acquisition of such differing systems is equally influenced by entrenchment and schematisation. If distinctions exist, they might elucidate why certain frequency effects are not universally found in different languages or parts of inflectional systems. It is still unresolved whether the acquisition of such varied systems is uniformly affected by entrenchment and schematisation, potentially explaining inconsistent frequency effects across languages or inflectional systems. This study explores the acquisition of verbal morphology in Croatian, a language where verb classes showcase varying morphological complexities, revealing a quasiregular pattern. A substantial number of verbs fall into these classes, displaying degrees of regularity based on class. This system positions verbs on a quasiregularity spectrum, concentrating more verbs towards the middle compared to languages like English. Through the examination of Croatian acquisition via parental questionnaires and computer simulations, significant frequency factors, particularly type frequency, emerged as influential in the process. Whether this holds true for specific types of quasiregular systems or for such systems in general awaits further exploration in future studies.

The impact of entrenchment and schematisation, or more precisely, their operationalisations in terms of token and type frequency, has been convincingly demonstrated in the emergence of constructions (cf. Bidgood et al. 2021; Goldberg et al. 2004; Wonnacott et al. 2008). The current study extends this observation to the acquisition of verb inflections in a morphologically rich language. The findings align with Construction Morphology (CxM: Booij 2010, 2017), which favours a constructional approach to morphological phenomena. The vantage points of this framework assume holistic properties of inflectional forms and underscore the relevance of paradigmatic relations. Furthermore, as schematicity is graded, it allows children to gradually discover and productively use abstract, morphologically realised patterns (cf. Booij 2010: 258). Within the broader usage-based account, morphological constructions such as verb-inflected forms and paradigms exhibit variable degrees of systematicity as indicated by type frequency. As demonstrated in the present study, these inflectional schemas gradually emerge in a child’s language usage.

6 Conclusions

This study investigated language acquisition using two methods, parental questionnaire and computational simulations. In both cases, a complex system of several morphological patterns that differ in type frequency was explored. This is in contrasts to most previous studies of overgeneralisation, as they focus on morphologically less complex languages. The results of this study support usage-based approaches in which the entrenchment of a particular form, operationalised by token frequency, in the early stages of acquisition of the verbal system is relevant, together with schematisation, operationalised by a large number of different word forms following the same pattern. The particularly strong effect of type frequency, which can be generalised to other verbs of the same classes and to participants other than those included in the present study, provides strong evidence for the role of schematisation in the acquisition of morphological systems consisting of morphological patterns with different type frequency.

Data availability statement

The data that support the findings of this study are openly available via the Open Science Framework (https://osf.io/) under the name. The role of entrenchment and schematization in the acquisition of rich verbal morphology: https://doi.org/10.17605/osf.io/mnjfs.


Corresponding author: Gordana Hržica, University of Zagreb, Zagreb, Croatia, E-mail:

Award Identifier / Grant number: AH/T002859/1

Award Identifier / Grant number: HRZZ-UIP-2017-05-6603

Acknowledgments

We highly appreciate the feedback we received on drafts of this paper from two anonymous reviewers.

  1. Research funding: This work was supported by Arts and Humanities Research Council (AH/T002859/1) and Hrvatska Zaklada za Znanost (HRZZ-UIP-2017-05-6603).

Appendix A1

Example of a target item (correct form) from the parental questionnaire.

How often does your child use držim ‘I hold’ (including forms držiš ‘you hold’, drži ‘he/she holds’, držimo ‘we hold’…)?

1 – never   2 – rarely   3 – sometimes   4 – often   5 – very often

Appendix A2

Example of an overgeneralised item from the parental questionnaire.

How often does your child use držam ‘*I hold’ (including forms držaš ‘*you hold’, drža ‘*he/she holds’, držamo ‘*we hold’ …)?

1 – never   2 – rarely   3 – sometimes   4 – often   5 – very often

Appendix B

Table B-1:

Specification of informed priors.

Class/coefficient Prior
Intercept Normal (mean = 0.5, sd = 0.5)
AgeInMonths.z ∶ VerbType (Type1) Normal (mean = 0, sd = 0.2)
AgeInMonths.z ∶ VerbType (Type2) Normal (mean = −0.4, sd = 0.2)
AgeInMonths.z ∶ VerbType (Type3) Normal (mean = −0.3, sd = 0.2)
AgeInMonths.z ∶ VerbType (Type4) Normal (mean = −0.2, sd = 0.2)
AgeInMonths.z ∶ VerbType (Type5) Normal (mean = −0.2, sd = 0.2)
AgeInMonths.z ∶ VerbType (Type6) Normal (mean = −0.2, sd = 0.2)
AgeInMonths.z ∶ VerbType (Type7) Normal (mean = −0.2, sd = 0.2)
VerbClassSize Normal (mean = −0.2, sd = 0.1)
Item Normal (mean = 0.35, sd = 0.1)
Participant Normal (mean = 1.0, sd = 0.1)
Table B-2:

Summary table of results for the final Bayesian Mixed Effect model. The last two columns represent boundaries of the 95 % Credible Intervals (CrI).

Group-level effects: Estimate Est. error Lower-95 % CrI Upper-95 % CrI
Participant intercept (sd) 0.99 0.06 0.88 1.12
Item intercept (sd) 0.34 0.06 0.23 0.48
Population-level effects: Estimate Est. error Lower-95 % CrI Upper-95 % CrI
Intercept[1] 0.40 0.22 −0.05 0.81
Intercept[2] 0.01 0.22 −0.44 0.43
Intercept[3] 0.65 0.23 0.20 1.08
AgeInMonths.z : VerbType(1) −0.03 0.08 −0.18 0.13
AgeInMonths.z : VerbType(2) −0.39 0.09 −0.57 −0.23
AgeInMonths.z : VerbType(3) −0.27 0.08 −0.43 −0.11
AgeInMonths.z : VerbType(4) −0.21 0.08 −0.37 −0.04
AgeInMonths.z : VerbType(5) −0.21 0.10 −0.40 −0.02
AgeInMonths.z : VerbType(6) −0.18 0.09 −0.36 0.00
AgeInMonths.z : VerbType(7) −0.21 0.10 −0.41 −0.01
VerbClassSize −0.20 0.04 −0.29 −0.12
Figure B-1: 
Posterior draws density function.
Figure B-1:

Posterior draws density function.

References

Abbot-Smith, Kirsten & Michael Tomasello. 2006. Exemplar-learning and schematization in a usage-based account of syntactic acquisition. The Linguistic Review 23(3). 275–290. https://doi.org/10.1515/tlr.2006.011.Search in Google Scholar

Aguado-Orea, Javier & Julian M. Pine. 2015. Comparing different models of the development of verb inflection in early child Spanish. PLoS One 10(3). e0119613. https://doi.org/10.1371/journal.pone.0119613.Search in Google Scholar

Ambridge, Ben. 2010. Children’s judgments of regular and irregular novel past-tense forms: New data on the English past-tense debate. Developmental Psychology 46(6). 1497–1504. https://doi.org/10.1037/a0020668.Search in Google Scholar

Ambridge, Ben & Elena V. M. Lieven. 2011. Child language acquisition: Contrasting theoretical approaches. Cambridge: Cambridge University Press.10.1017/CBO9780511975073Search in Google Scholar

Ambridge, Ben, Julian M. Pine, Caroline F. Rowland & Chris R. Young. 2008. The effect of verb semantic class and verb frequency (entrenchment) on children’s and adults’ graded judgements of argument-structure. Cognition 106(1). 87–129. https://doi.org/10.1016/j.cognition.2006.12.015.Search in Google Scholar

Ambridge, Ben, Julian M. Pine, Caroline F. Rowland & Franklin, Chang. 2012. The roles of verb semantics, entrenchment, and morphophonology in the retreat from dative argument-structure overgeneralization errors. Language 88(1). 45–81. https://doi.org/10.1353/lan.2012.0000.Search in Google Scholar

Ambridge, Ben, Amy, Bidgood, Katherine E. Twomey, Julian M. Pine, Caroline F. Rowland & Daniel Freudenthal. 2015. Preemption versus entrenchment: Towards a construction-general solution to the problem of the retreat from verb argument structure overgeneralization. PLoS One 10(4). e0123723. https://doi.org/10.1371/journal.pone.0123723.Search in Google Scholar

Baayen, Harald R. & Petar Milin. 2010. Analyzing reaction times. International Journal of Psychological Research 3(2). 12–28. https://doi.org/10.21500/20112084.807.Search in Google Scholar

Beedham, Christopher. 2005. Language and meaning: The structural creation of reality. Amsterdam & Philadelphia: John Benjamins Publishing.10.1075/sfsl.55Search in Google Scholar

Bidgood, Amy, Julian Pine, Caroline Rowland, Giovanni Sala, Daniel Freudenthal & Ben Ambridge. 2021. Verb argument structure overgeneralisations for the English intransitive and transitive constructions: Grammaticality judgments and production priming. Language & Cognition 13(3). 397–437. https://doi.org/10.1017/langcog.2021.8.Search in Google Scholar

Bittner, Dagmar, Wolfgang U. Dressler & Marianne Kilani-Schoch. 2003. Introduction. In Dagmar Bittner, Wolfgang U. Dressler & Marianne Kilani-Schoch (eds.), Development of verb inflection in first language acquisition: A cross-linguistic perspective, vii–xxxviii. Berlin & New York: De Gruyter Mouton.10.1515/9783110899832Search in Google Scholar

Blumenthal-Dramé, Alice. 2012. Entrenchment in usage-based theories: What corpus data do and do not reveal about the mind. Berlin & Boston: De Gruyter Mouton.10.1515/9783110294002Search in Google Scholar

Booij, Geert. 2010. Construction morphology. Language & Linguistics Compass 4(7). 543–555. https://doi.org/10.1111/j.1749-818x.2010.00213.x.Search in Google Scholar

Booij, Geert. 2017. The construction of words. In Barbara Dancygier (ed.), The Cambridge handbook of cognitive linguistics, 229–246. Cambridge: Cambridge University Press.10.1017/9781316339732.016Search in Google Scholar

Bošnjak Botica, Tomislava, Gordana Hržica & Sara Košutar. 2022. Korpusna analiza brojnosti hrvatskih glagolskih vrsta. In Ivan Marković, Iva Nazalević Čučević & Igor Marko Gligorić (eds.), Riječ o riječi i Riječi. Zbornik u čast Zrinki Jelaska [Word about the word and the word, volume in honor of Zrinka Jelaska], 409–430. Zagreb: Disput.Search in Google Scholar

Brooks, Patricia, Michael Tomasello, Kelly, Dodson & Lawrence B. Lewis. 1999. Young children’s overgeneralizations with fixed transitivity verbs. Child Development 70. 1325–1337. https://doi.org/10.1111/1467-8624.00097.Search in Google Scholar

Bürkner, Paul-Christian. 2017. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software 80. 1–28. https://doi.org/10.18637/jss.v080.i01.Search in Google Scholar

Bürkner, Paul-Christian. 2021. Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software 100. 1–54. https://doi.org/10.18637/jss.v100.i05.Search in Google Scholar

Bürkner, Paul-Christian, Jonah Gabry, Matthew Kay & Aki, Vehtari. 2023. Posterior: Tools for working with posterior distributions. R package version 1.4.1. Available at: https://mc-stan.org/posterior/.Search in Google Scholar

Bybee, Joan L. 1995. Regular morphology and the lexicon. Language & Cognitive Processes 10. 425–455. https://doi.org/10.1080/01690969508407111.Search in Google Scholar

Bybee, Joan. 2010. Language, usage and cognition. Cambridge, MA: Cambridge University Press.10.1017/CBO9780511750526Search in Google Scholar

Bybee, Joan L. & Dan Slobin. 1982. Rules and schemas in the development and use of the English past tense. Language 58(2). 265–289. https://doi.org/10.1353/lan.1982.0021.Search in Google Scholar

Bybee, Joan & James L. McClelland. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22(2–4). 381–410. https://doi.org/10.1515/tlir.2005.22.2-4.381.Search in Google Scholar

Bybee, Joan & Sandra Thompson. 1997. Three frequency effects in syntax. Berkeley Linguistic Society 23. 65–85. https://doi.org/10.3765/bls.v23i1.1293.Search in Google Scholar

Clark, Eve V. 2001. Morphology in language acquisition. In Andrew Spencer & Arnold M. Zwicky (eds.), The handbook of morphology, 374–389. Oxford: Blackwell Publishing.10.1111/b.9780631226949.2001.00022.xSearch in Google Scholar

Dąbrowska, Ewa. 2004. Rules or schemas? Evidence from Polish. Language & Cognitive Processes 19. 225–271. https://doi.org/10.1080/01690960344000170.Search in Google Scholar

Dąbrowska, Ewa. 2016. Cognitive linguistics’ seven deadly sins. Cognitive Linguistics 27(4). 479–491. https://doi.org/10.1515/cog-2016-0059.Search in Google Scholar

Dąbrowska, Ewa & Marcin Szczerbiński. 2006. Polish children’s productivity with case marking: The role of regularity, type frequency, and phonological diversity. Journal of Child Language 33(3). 559–597. https://doi.org/10.1017/s0305000906007471.Search in Google Scholar

Dale, Philip S. 1991. The validity of a parent report measure on vocabulary and syntax at 24 months. Journal of Speech & Hearing Research 34(3). 565–571. https://doi.org/10.1044/jshr.3403.565.Search in Google Scholar

Dale, Philip S., Elizabeth Bates, Steven J. Reznick & Colleen Morisset. 1989. The validity of a parent report instrument of child language at twenty months. Journal of Child Language 16(2). 239–249. https://doi.org/10.1017/s0305000900010394.Search in Google Scholar

Divjak, Dagmar & Catherine Caldwell-Harris. 2015. Frequency and entrenchment. In Ewa Dąbrowska & Dagmar Divjak (eds.), Handbook of cognitive linguistics, 53–75. Berlin, München & Boston: De Gruyter Mouton.10.1515/9783110292022-004Search in Google Scholar

Divjak, Dagmar & Petar Milin. 2023. Using computational cognitive modeling in usage-based linguistic. In Manuel Diaz-Campos & Sonia Balasch (eds.), The handbook of usage-based linguistics. Hoboken, New Jersey: Wiley.10.1002/9781119839859.ch17Search in Google Scholar

Divjak, Dagmar, Petar Milin & Adnane Ez-zizi. 2023a. Error-correction mechanisms in language learning: Modeling individuals. Language Learning 74(1). 1–37. https://doi.org/10.1111/lang.12569.Search in Google Scholar

Divjak, Dagmar, Laurence Romain & Petar Milin. 2023b. From their point of view: The article category as a hierarchically structured referent tracking system. Linguistics 61(4). 1027–1068. https://doi.org/10.1515/ling-2022-0186.Search in Google Scholar

Divjak, Dagmar, Irene Testini & Petar Milin. 2024. On the nature and organisation of morphological categories: Verbal aspect through the lens of associative learning. Morphology. https://doi.org/10.1007/s11525-024-09423-0 [Epub ahead of print].10.1007/s11525-024-09423-0Search in Google Scholar

Divjak, Dagmar, Petar Milin, Adnane Ez-zizi, Jarosław Józefowski & Christian Adam. 2021. What is learned from exposure: An error-driven approach to productivity in language. Language, Cognition & Neuroscience 36(1). 60–83. https://doi.org/10.1080/23273798.2020.1815813.Search in Google Scholar

Dressler, Wolfgang. 2011. The rise of complexity in inflectional morphology. Poznań Studies in Contemporary Linguistics 47(2). 159. https://doi.org/10.2478/psicl-2011-0013.Search in Google Scholar

Engelmann, Felix, Sonia Granlund, Joanna Kolak, Magdalena Szreder, Ben Ambridge, Julian M. Pine, Anna L. Theakston & Elena Lieven. 2019. How the input shapes the acquisition of verb morphology: Elicited production and computational modelling in two highly inflected languages. Cognitive Psychology 110. 30–69. https://doi.org/10.1016/j.cogpsych.2019.02.001.Search in Google Scholar

Frank, Michael C., Mika Braginsky, Daniel Yurovsky & Virginia A. Marchman. 2021. Variability and consistency in early language learning: The wordbank project. Cambridge, MA: MIT Press.10.7551/mitpress/11577.001.0001Search in Google Scholar

Gill, Jeff. 2007. Bayesian methods: A social and behavioral sciences approach. New York: CRC Press.10.1201/9781420010824Search in Google Scholar

Goldberg, Adele E., Devin M. Casenhiser & NityaSethuraman. 2004. Learning argument structure generalizations. Cognitive Linguistics 15(3). 289–316.10.1515/cogl.2004.011Search in Google Scholar

Grabowski, Eva & Dieter Mindt. 1995. A corpus-based learning list of irregular verbs in English. ICAME Journal 19. 5–22.Search in Google Scholar

Granlund, Sonia, Joanna Kolak, Virve Vihman, Felix Engelmann, Elena V. M. Lieven, Julian M. Pine, Anna L. Theakston & Ben Ambridge. 2019. Language-general and language-specific phenomena in the acquisition of inflectional noun morphology: A cross-linguistic elicited-production study of Polish, Finnish and Estonian. Journal of Memory & Language 107. 169–194. https://doi.org/10.1016/j.jml.2019.04.004.Search in Google Scholar

Hržica, Gordana. 2012. Daj mi to napisaj: Preopćavanja glagolske osnove u usvajanju hrvatskog jezika [Overgeneralizations of the verb stem in the acquisition of Croatian]. Suvremena lingvistika 38(74). 189–208.Search in Google Scholar

Hržica, Gordana, Tomislava Bošnjak Botica & Sara Košutar. 2023. Stem overgeneralizations in the acquisition of Croatian verbal morphology: Evidence from parental questionnaires. Word Structure 16(2–3). 176–205. https://doi.org/10.3366/word.2023.0228.Search in Google Scholar

Jelaska, Zrinka & Tomislava Bošnjak Botica. 2019. Conjugational types in Croatian. Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje 45(1). 47–74. https://doi.org/10.31724/rihjj.45.1.3.Search in Google Scholar

Kirjavainen, Minna, Alexandre Nikolaev & Evan Kidd. 2012. The effect of frequency and phonological neighbourhood density on the acquisition of past tense verbs by Finnish children. Cognitive Linguistics 23(2). 273–315. https://doi.org/10.1515/cog-2012-0009.Search in Google Scholar

Kovačević, Melita. 2002. Croatian corpus, CHILDES. Available at: https://childes.talkbank.org/access/Slavic/Croatian/Kovacevic.html.Search in Google Scholar

Krajewski, Grzegorz, Anna Theakston, Elena Lieven & Michael Tomasello, M. 2011. How Polish children switch from one case to another when using novel nouns: Challenges for current models of inflectional morphology. Language & Cognitive Processes 26(4–6). 830–861. https://doi.org/10.1080/01690965.2010.506062.Search in Google Scholar

Langacker, Ronald W. 2000. Grammar and conceptualization. Berlin & New York: De Gruyter Mouton.10.1515/9783110800524Search in Google Scholar

Langacker, Ronald W. 2008. Cognitive grammar. A basic introduction. Oxford: Oxford University Press.10.1093/acprof:oso/9780195331967.001.0001Search in Google Scholar

Langacker, Ronald W. 2017. Entrenchment in cognitive grammar. In Hans-Jörg Schmid (ed.), Entrenchment and the psychology of language learning: How we reorganize and adapt linguistic knowledge, 39–56. Berlin & Boston: De Gruyter Mouton.10.1037/15969-003Search in Google Scholar

Lieven, Elena & Michael Tomasello. 2008. Children’s first language acquisition from a usage-based perspective. In Peter Robinson & Nick C. Ellis (eds.), Handbook of cognitive linguistics and second language acquisition, 168–196. Abingdon: Routledge/Taylor & Francis Group.Search in Google Scholar

Maratsos, Michael. 2000. More overregularizations after all: New data and discussion on Marcus, Pinker, Ullman, Hollander, Rosen & Xu. Journal of Child Language 27(1). 183–212. https://doi.org/10.1017/s0305000999004067.Search in Google Scholar

Marchman, Virginia A. 1997. Children’s productivity in the English past tense: The role of frequency, phonology, and neighbourhood structures. Cognitive Science 21. 283–304. https://doi.org/10.1207/s15516709cog2103_2.Search in Google Scholar

Marchman, Virginia A., Kim, Plunkett & Judith Goodman. 1997. Overregularization in English plural and past tense inflectional morphology: A response to Marcus (1995). Journal of Child Language 24. 767–779. https://doi.org/10.1017/s0305000997003206.Search in Google Scholar

Marcus, Gary F., Steven Pinker, Michael Ullman, Michelle Hollander, John T. Rosen, Fei Xu & Harald Clahsen. 1992. Overregularization in language acquisition. Monographs of the Society for Research in Child Development 57(4). i-178. https://doi.org/10.2307/1166115.Search in Google Scholar

Maslen, Robert J. C., Anna Theakston, Elena Lieven & Michael Tomasello. 2004. A dense corpus study of past tense and plural overregularization in English. Journal of Speech Language & Hearing Research 47(6). 1319–1333. https://doi.org/10.1044/1092-4388(2004/099).Search in Google Scholar

McDonald, Janet & Cristine Roussel. 2010. Past tense grammaticality judgment and production in non-native and stressed native English speakers. Bilingualism: Language & Cognition 13(4). 429–448. https://doi.org/10.1017/s1366728909990599.Search in Google Scholar

Milin, Petar, Benjamin V. Tucker & Dagmar Divjak. 2023. A learning perspective on the emergence of abstractions: The curious case of phone(me)s. Language & Cognition 15(4). 740–762. https://doi.org/10.1017/langcog.2023.11.Search in Google Scholar

Milin, Petar, Victor Kuperman, Aleksandar Kostić & Harald Baayen. 2009. Paradigms bit by bit: An information theoretic approach to the processing of paradigmatic structure in inflection and derivation. Analogy in Grammar: Form & Acquisition 381. 214–252.10.1093/acprof:oso/9780199547548.003.0010Search in Google Scholar

Milin, Petar, Dagmar Divjak, Strahinja Dimitrijević & Harald Baayen. 2016. Towards cognitively plausible data science in language research. Cognitive Linguistics 27(4). 507–526. https://doi.org/10.1515/cog-2016-0055.Search in Google Scholar

Mirković, Jelena, Mark S. Seidenberg & Marc F. Joanisse. 2011. Rules versus statistics: Insights from a highly inflected language. Cognitive Science 35. 638–681. https://doi.org/10.1111/j.1551-6709.2011.01174.x.Search in Google Scholar

Ramscar, Michael & Daniel Yarlett. 2007. Linguistic self-correction in the absence of feedback: A new approach to the logical problem of language acquisition. Cognitive Science 31(6). 927–960. https://doi.org/10.1080/03640210701703576.Search in Google Scholar

Ramscar, Michael, Melody Dye & Stewart M. McCauley. 2013. Error and expectation in language learning: The curious absence of “mouses” in adult speech. Language 89(4). 760–793. https://doi.org/10.1353/lan.2013.0068.Search in Google Scholar

Räsänen, Sanna H. M., Ben Ambridge & Julian M. Pine. 2016. An elicited-production study of inflectional verb morphology in child Finnish. Cognitive Science 40(7). 1704–1738. https://doi.org/10.1111/cogs.12305.Search in Google Scholar

Paradis Johanne. 2023. Sources of individual differences in the dual language development of heritage bilinguals. Journal of Child Language 50(40). 793–817. https://doi.org/10.1017/s0305000922000708.Search in Google Scholar

Pinker, Steven. 1999. Words and rules: The ingredients of language. Words and rules: The ingredients of language. New York: Basic Books.Search in Google Scholar

R Core Team. 2023. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar

Ravid, Dorit. 2019. First-language acquisition of morphology. In Oxford research encyclopedia of linguistics. https://oxfordre.com/linguistics/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-603 (accessed 7 January 2023).10.1093/acrefore/9780199384655.013.603Search in Google Scholar

Rescorla, Robert A. & Allan R. Wagner. 1972. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Abraham H. Black & William F. Prokasy (eds.), Classical conditioning II: Current research and theory, 64–99. New York: Appleton-Century-Crofts.Search in Google Scholar

Rescorla, Robert A. 2008. Rescorla–Wagner model. Scholarpedia 3(3). 2237. https://doi.org/10.4249/scholarpedia.2237.Search in Google Scholar

Rowe, Meredith L. 2018. Understanding socioeconomic differences in parents’ speech to children. Child Development Perspectives 12(2). 122–127. https://doi.org/10.1111/cdep.12271.Search in Google Scholar

Rumelhart, David E., James L. McClelland & The PDP Research Group. 1986. On learning the past tenses of English Verbs. In James L. McClelland & David E. Rumelhart (eds.), Parallel distributed processing: Explorations in the microstructure of cognition: Vol. 2. Psychological and biological model, 216–271. Cambridge, MA: Bradford Books/MIT Press.Search in Google Scholar

Savičiūtė, Eglė, Ben Ambridge & Julian M. Pine. 2018. The roles of word-form frequency and phonological neighbourhood density in the acquisition of Lithuanian noun morphology. Journal of Child Language 45(3). 641–672. https://doi.org/10.1017/s030500091700037x.Search in Google Scholar

Schmid, Hans-Jörg. 2017. Entrenchment and the psychology of language learning: How we reorganize and adapt linguistic knowledge. Berlin & Boston: De Gruyter Mouton.10.1037/15969-000Search in Google Scholar

Theakston, Anna. L. 2004. The role of entrenchment in children’s and adults’ performance on grammaticality judgment tasks. Cognitive Development 19(1). 15–34. https://doi.org/10.1016/j.cogdev.2003.08.001.Search in Google Scholar

Theakston, Anna L. 2017. Entrenchment in first language learning. In Hans-Jörg Schmid (ed.), Entrenchment and the Psychology of language learning: How we reorganize and adapt linguistic knowledge, 315–341. Berlin & Boston: De Gruyter Mouton.10.1037/15969-015Search in Google Scholar

Tomasello, Michael. 2005. Constructing a language. A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press.10.2307/j.ctv26070v8Search in Google Scholar

Vitevitch, Michael S. & Melissa K. Stamer. 2006. The curious case of competition in Spanish speech production. Language, Cognition & Neuroscience 21(6). 760–770. https://doi.org/10.1080/01690960500287196.Search in Google Scholar

Widrow, Bernard & Marcian E. Hoff. 1960. Adaptive switching circuits. IRE WESCON Convention Record 4. 96–104.10.21236/AD0241531Search in Google Scholar

Wonnacott, Elizabeth, Elissa L. Newport & Michael K. Tanenhaus. 2008. Acquiring and processing verb argument structure: Distributional learning in a miniature language. Cognitive Psychology 56(3). 165–209. https://doi.org/10.1016/j.cogpsych.2007.04.002.Search in Google Scholar

Xanthos, Aris, Sabine Laaha, Steven Gillis, Ursula Stephany, Ayhan Aksu-Koç, Anastasia Christofidou, Natalia Gagarina, Gordana Hržica, Nihan F. Ketrez, Marianne Kilani-Schoch, Katharina Korecky-Kröll, Melita Kovačević, Klaus Laalo, Marijan Palmović, Barbara Pfeiler, Maria D. Voeikova & Wolfgang U. Dressler. 2011. On the role of morphological richness in the early development of noun and verb inflection. First Language 31(4). 461–479. https://doi.org/10.1177/0142723711409976.Search in Google Scholar

Xu, Fei & Steven Pinker. 1995. Weird past tense forms. Journal of Child Language 22(3). 531–556. https://doi.org/10.1017/s0305000900009946.Search in Google Scholar

Yang, Charles. 2002. Knowledge and learning in natural language. Oxford: Oxford University Pres.Search in Google Scholar

Yang, Charles. 2015. For and against frequencies. Journal of Child Language 42. 287–293. https://doi.org/10.1017/s0305000914000683.Search in Google Scholar

Received: 2023-02-20
Accepted: 2024-03-18
Published Online: 2024-04-30
Published in Print: 2024-05-27

© 2024 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 25.5.2024 from https://www.degruyter.com/document/doi/10.1515/cog-2023-0022/html
Scroll to top button