Balancing information-structure and semantic constraints on construction choice: building a computational model of passive and passive-like constructions in Mandarin Chinese

Li Liu; Ben Ambridge

doi:10.1515/cog-2019-0100

Open Access Published by De Gruyter Mouton April 16, 2021

Balancing information-structure and semantic constraints on construction choice: building a computational model of passive and passive-like constructions in Mandarin Chinese

Li Liu and Ben Ambridge

From the journal Cognitive Linguistics

https://doi.org/10.1515/cog-2019-0100

Abstract

A central tenet of cognitive linguistics is that adults’ knowledge of language consists of a structured inventory of constructions, including various two-argument constructions such as the active (e.g., Lizzy rescued John), the passive (e.g., John was rescued by Lizzy) and “fronting” constructions (e.g., John was the one Lizzy rescued). But how do speakers choose which construction to use for a particular utterance, given constraints such as discourse/information structure and the semantic fit between verb and construction? The goal of the present study was to build a computational model of this phenomenon for two-argument constructions in Mandarin. First, we conducted a grammaticality judgment study with 60 native speakers which demonstrated that, across 57 verbs, semantic affectedness – as determined by further 16 native speakers – predicted each verb’s relative acceptability in the bei-passive and ba-active constructions, but not the Notional Passive and SVO Active constructions. Second, in order to simulate acquisition of these competing constraints, we built a computational model that learns to map from corpus-derived input (information structure + verb semantics + lexical verb identity) to an output representation corresponding to these four constructions (+“other”). The model was able to predict judgments of the relative acceptability of the test verbs in the ba-active and bei-passive constructions obtained in Study 1, with model-human correlations in the region of r = 0.5 and r = 0.3, respectively. Surprisingly, these correlations increased (to r = 0.75 and r = 0.5 respectively) when lexical verb identity was removed; perhaps because this information leads to over-fitting of the training set. These findings suggest the intriguing possibility that acquiring constructions involves forgetting as a mechanism for abstracting across certain fine-grained lexical details and idiosyncrasies.

Keywords: computational modeling; discriminative learning; Mandarin Chinese; passive construction

1 Introduction

Cognitive linguistic approaches assume, as a central tenet (e.g., Croft and Cruse 2004; Goldberg 1995; Hilpert 2014), that adults’ knowledge of language consists of a structured inventory of grammatical constructions: abstract patterns like [NP][VERB][NP] and [NP] BE [VERB] by [NP]. Consequently, for almost every utterance, the speaker faces a choice between a number of potential constructions. Often, this choice is between two or more constructions that are identical at the level of truth value (e.g., Lizzy rescued John; John was rescued by Lizzy; As for John, Lizzy rescued him etc.). How, then, do speakers make this choice?

Our overarching goal in the present study is to begin to sketch – in the form of a computational model – an answer to this question, focusing on two-argument constructions in Mandarin Chinese; a particularly good test case, since Mandarin has four such constructions in common use. In answering this overarching question, we address a number of issues that have proved controversial, even within cognitive-linguistic and constructionist circles^[1]: First, what are the different information-structure properties associated with otherwise-similar constructions (sometimes called constructional alternations)? Second, do (all) constructions have meanings in and of themselves, above and beyond the meanings of the lexical items that appear in any one concrete instantiation? Third, for those constructions that do have particular meanings, are those meanings language specific or shared by equivalent constructions in different languages?

Having investigated these questions, at least as they pertain two two-argument constructions in Mandarin, we then turn to our overarching question of interest: How do speakers balance information-structure and construction-semantic considerations when choosing which construction to use for a given speech act?

1.1 Information structure

How do speakers choose which particular construction to use to convey a particular message when there exist two or more that would yield a grammatical utterance, and that are equivalent in terms of truth-conditional semantics (e.g., Davidson 1967), as in (1)–(2).

(1)

English active transitive construction

Lizzy rescued John.

(2)

English passive construction

John was rescued by Lizzy.

One important factor is information structure. For example, the end-weight principle (Quirk et al. 1985: 1,361–1,362) states that speakers generally place long, structurally heavy constituents at the end of the sentence (c.f., Examples [3] and [4]).

(3)

English ditransitive construction

John gave Lizzy a book

(4)

English prepositional dative construction

John gave a book to Elizabeth the Second, by the Grace of God, of the United Kingdom of Great Britain and Northern Ireland and of Her other Realms and Territories Queen, Head of the Commonwealth, Defender of the Faith.

Similarly, the end-focus principle (Quirk et al. 1985: 1,356–1,357) states that speakers generally place the most important information at the end of the sentence (e.g., in [5]; from Dixon 2012: 227).

(5)

This bed was slept in by Winston Churchill.

Relatedly, the English passive is subject to what Pullum (2014: 64) calls the “The new-information condition on by-phrases” (a condition, he laments, that is disregarded entirely by the advice of style guides to avoid the passive): “The denotation of the by-phrase NP in a passive clause must denote something at least as new in the discourse as the subject.” (Pullum 2014: 64). He goes on to illustrate this constraint with the examples shown in (6)–(7) (notes in original):

(6)

Have you heard the news about YouTube? It was bought by Google. [acceptable because the subject is old and by-phrase NP is new] (Pullum 2014: 64)

(7)

Have you heard the news about Google? YouTube was bought by it/them. [unacceptable because the subject (YouTube) is newer in the discourse than the by-phrase] (Pullum 2014: 64)

That is, the English passive serves a topicalization function, allowing for an NP that is the topic of conversation to appear at the start of a sentence or clause when, as a Patient, it would normally appear after the Verb. This constraint clearly contributes to speakers’ construction choice. For example, in an ongoing conversation about YouTube, passive, as in (8), is considerably more natural than active, as in (9).

(8)

It got bought by Google in 2006.

(9)

Google bought it in 2006.

Mandarin is similar to English in that its canonical passive, the bei-passive, also allows heavy, new, important elements to be placed after the NP topic. However, as we will see in more detail shortly, it also has a noncanonical, Notional Passive that also shares this property.

1.2 Construction semantics

Not all construction-based approaches agree that all constructions have their own meanings (see Hilpert 2014: 50–57 for discussion). Fillmore et al. (2012: 326) argue for the basis of at least some “semantically null constructions”.^[2] One example offered is the English Subject-Predicate construction, many different instantiations of which would seem to share little semantics, not even a Topic-Comment structure (e.g., Sue sings; There’s a problem). Another is the English Subject-Auxiliary Inversion construction, different instantiations of which have very different kinds of meanings (e.g., questions, conditionals, wishes and exclamatives). Goldberg (1995, 2006, 2019, on the other hand, argues that all constructions have independent meanings, even if only very general ones. For example, all instantiations of English Subject-Auxiliary Inversion share the meaning of being non-assertive.

This debate has also been played out for the passive construction (or, at least, the English passive construction). For example, in a Behavioral and Brain Sciences target article, Branigan and Pickering (2017: 8) summarize 30 years of syntactic priming research as demonstrating that “syntactic representations do not contain semantic information”, including as evidence for their position the absence of evidence of by-verb semantic differences in the passive priming study of Messenger et al. (2012). On the other hand, Pinker et al. (1987: 249; see also Pinker 1989) proposed that the English passive construction is associated with a particular semantics; specifically: “[B] (mapped onto the surface subject [of a passive]) is in a state or circumstance characterized by [A] (mapped onto the by-object or an understood argument) having acted upon it.” (Pinker et al. 1987: 249)

Herein, we refer to this semantic property of the passive construction (which Pullum 2014: 64, calls the “state-affecting condition”) as affectedness. The claim is that, all else being equal, a passive is as felicitous as the equivalent active only when the verb denotes an action in which a Patient (the Subject of the passive sentence) is highly affected.

Evidence for this claim comes from the English and Indonesian judgment studies of Ambridge et al. (2016), and Aryawibawa and Ambridge (2018).^[3] In these studies, a positive correlation was observed between a verb’s semantic property of affectedness and its acceptability in passive sentences, relative to actives. In both studies, native adult speakers were asked to rate each verb for 10 semantic properties relating to affectedness (based on Pinker et al. 1987) on a 9-point scale, to yield a composite affectedness measure. Another group of adult speakers judged the acceptability of passive and active sentences with each of 72 verbs. Even when each verb’s overall frequency and passive frequency were controlled, the findings for both languages indicated that the acceptability of passives relative to actives increased with the composite measure of verbs’ semantic affectedness. That is, interaction effects revealed that the effect of affectedness was bigger for passives than for actives, and also – for Indonesian – than for so-called “notional passives” (a topicalization construction).

1.3 Crosslinguistic construction meanings?

The observed similarity between English and Indonesian passives raises the intriguing possibility of a universal crosslinguistic tendency for passive constructions to be associated with the semantics of subject affectedness. Of course, construction-based approaches see constructions themselves as learned from the input, and hence language-specific (e.g., Croft 2001), and the passive is no exception. But what could plausibly be considered a (near) universal is speakers’ need for a construction that, at the same time, topicalizes the undergoer and denotes that it was affected in some way.

1.4 Balancing information-structure and semantic constraints on construction choice in Mandarin Chinese

How, then, do speakers balance these information-structure and semantic constraints when choosing which construction to use to produce a particular utterance? In the present study, we investigate this question for Mandarin Chinese; a language that constitutes a particularly interesting test case: Unlike English, which, for two-argument constructions, conflates information-structure and semantic constraints into (on the whole) a binary choice between the active and passive constructions, Mandarin crosses them perfectly (see Table 1)^[4].

Table 1:

Information structure and semantics of active and passive constructions in Mandarin (vs. English).

	Patient affected	Patient not (necessarily) affected
Topic = patient	Mandarin O-bei-SV passive (English OVS passive)	Mandarin OSV notional passive
Topic = agent	Mandarin S-ba-OV active	Mandarin SVO Active (English SVO Active)

The “standard” and most common Mandarin passive is the bei-passive (McEnery and Xiao 2005). Similarly to English passives, the sentence-initial NP (at least for Agent-Patient verbs) is the Patient argument which occupies Object position in the active counterpart sentence, and passivization is overtly marked on the verb by the morpheme bei. Hence the order of a canonical bei-passive: Object-bei-Subject-Verb. As shown in (10), the initial NP1 Lisi is the Patient and NP2 Zhangsan, immediately after bei, the Agent. Consistent with Pinker (1987) and Pullum’s “affectedness” constraint, bei-passives are often characterized as profiling the undergoer (i.e. NP1) that is affected by the action of the verb (e.g., Li 1990). Again, similarly to English passives, the Agent (for Agent-Patient verbs) can be optionally present or absent, yielding a long or short passive respectively.

(10)

李四NP1	被	张三NP2	救了。
Lisi NP1	BEI	Zhangsan NP2	jiu le
Lisi NP1	BEI	Zhangsan NP2	save le
‘Lisi was saved by Zhangsan.’

Another Mandarin construction often claimed to denote affectedness is the ba construction, in which the Patient argument is moved from “the normal post-verbal position to the pre-verbal position and is preceded by the morpheme ba” (Tompson 1973); hence S-ba-O-V. For example, the use of the object marker ba in (11) indicates that NP1 Zhangsan is the Agent while the post-ba NP2 Lisi is the Patient. As the ba construction maintains the S-before-O structure of its SVO active counterpart, we refer to it here as the “ba-active” construction.

(11)

张三NP1	把	李四NP2	救了。
Zhangsan NP1	BA	Lisi NP2	jiu le
Zhangsan NP1	BA	Lisi NP2	save le
‘Zhangsan saved Lisi.’

The ba-active is often called the ‘disposal’ construction and describes “how a person is handled, manipulated or dealt with; how something is disposed of” (Li 1974). Thus, semantically, both the bei-passive and the ba-active constructions would seem to mark affectedness (though in the bei-passive, it is the pre-bei NP that is affected; in the ba-active, the post-ba NP). Indeed, Mandarin linguists (e.g., Huang et al. 2009) have previously noted that only verbs that seem to imply a relatively high degree of affectedness are compatible with the constructions. For example, (12) and (13) are not acceptable, as the act of seeing does not normally imply any affect upon the blue sky.

(12)

*蓝天	被	我	看见了。
Lantian	BEI	wo	kanjian le
Blue sky	BEI	me	see le
‘The blue sky is seen by me.’

(13)

*我	把	蓝天	看见了。
wo	BA	lantian	kanjian le
I	BA	blue sky	see le
‘I saw the blue sky.’

However, the two constructions do not share the same information structure. In the bei-passive, the Agent, the post-bei NP, is generally the new information in discourse; while in the ba-active, the action denoted by the Verb is often considered as the newer (Zhang 2001). In addition, if there is a semantic distinction between the two constructions, it is that the ba-active places an even more stringent affectedness requirement than the bei-passive, requiring “the post-ba NP to be directly affected by an action” (Zhang 2001). For example, while the bei sentence shown in (14) is acceptable, its ba counterpart (15) is not, as the action of knowing does not directly affect that news:

(14)

那个	消息	被	我	知道了。
Nage	xiaoxi	BEI	wo	zhidao le
That	news	BEI	me	know le
‘That news was known by me.’

(15)

*我	把	那个	消息	知道了。
wo	BA	nage	xiaoxi	zhidao le
I	BA	that	news	know le
‘I knew that news.’

Indeed, so strong is the apparent affectedness requirement of both the bei-passive and ba-active constructions that, generally, simple verb forms may not appear grammatical in either (see Examples [16] and [17]), instead requiring an additional complement, or the marker le, in order to indicate completedness and/or some kind of change in situation (Deng et al. 2018; Huang et al., 2009). The handful of verbs that are exempt from this requirement, such as da ‘hit’, jiu ‘save’ and pai ‘pat’, are those that already imply a certain result per se (Chu 1973; Huang et al. 2013; Li et al. 1993).

(16)

*张三	被	李四	看。
Zhangsan	BEI	Lisi	kan
Zhangsan	BEI	Lisi	look at
* ‘Zhangsan was looked at by Lisi.’

(17)

*李四		把	张三	看。
Lisi	BA		Zhangsan	kan
Lisi	BA		Zhangsan	look at
‘Lisi looked at Zhangsan.’

The final construction considered in the present study is the Notional Passive construction (18), in which – exactly as for bei-passives – the post-verbal Patient (for Agent-Patient verbs) pre-poses to sentence-initial position, but without any overt marker.

(18)

早饭	张三	吃了。
zaofan	Zhangsan	chi le
breakfast	Zhangsan	eat le
‘Zhangsan finished his breakfast.’

Such sentences have been referred to as Notional Passives or topic-comment sentences, where the initial NP serves as a topic and the following clause the comment (Chao 1968; Shi 2000). The possibility of such sentences in Mandarin (unlike English, but like Indonesian) has seen it often referred to as a “topic-prominent” language (e.g., Li and Thompson 1976). Although Notional Passives share the word order of bei-passives, and convey a very similar meaning, they are generally not considered to be “true” passives (e.g., McEnery and Xiao 2005; Tang 2004)^[5]. Our working hypothesis, is that, like for Indonesian (Aryawibawa and Ambridge 2018), the Notional Passive construction is not associated with the meaning of affectedness, in contrast to the “true” bei-passive (and the ba-active).

2 The present study

The overarching goal of the present study was to build a computational model of how Mandarin speakers choose between one of four truth-value-identical constructions when producing a two-argument utterance, given the competing information-structure and construction-semantic constraints set out above. Computational modelling is useful for making explicit accounts of learning phenomena that involve competing probabilistic constraints and are thus impossible to specify with any precision as purely verbal models. First, however, we report a grammaticality judgment study designed to verify the semantics of the constructions, and to determine the inputs and target outputs of the model.

2.1 Study 1: grammaticality judgments and semantic ratings

Study 1 had two aims. The first was to verify that our characterization of the semantic properties of the four constructions under investigation, as set out above, is broadly correct. We assume here that the semantic properties of a particular construction can be determined by investigating the extent to which verbs that differ along the relevant semantic dimension are deemed to be grammatically acceptable in that construction (that is, we adopt the assumption of “probabilistic grammars”; that verbs do not “select” or “project” particular argument structure constructions, but are compatible with particular constructions to a greater or lesser degree). We, therefore, set out to test the prediction that the relationship between verbs’ semantic affectedness (as determined by a separate group of raters) and acceptability of the resulting sentence will be

Greater for bei-passives and ba-actives than for SVO Actives
Greater than zero for bei-passives and ba-actives
Smaller for Notional Passives than for bei-passives

The second aim of Study 1 was to obtain the training and test data for the subsequent computational model. In brief, the model learns mappings between a tripartite input representation (information structure + verb semantics + lexical verb identity, all combined into a single input vector) and an output representation corresponding to the four constructions set out above (Active S-ba-O-V, Active S-V-O, Notional Passive O-S-V, Passive O-bei-S-V) plus “Other” (representing all other constructions; e.g., intransitives). The relative frequency with which each of these mappings is presented to the model is determined on the basis of a representative input corpus. At test, the model is presented with input (information structure + verb semantics + lexical verb identity, again combined into a single input vector) and interrogated for its prediction of the acceptability of the relevant verb in each of the four target constructions. These predictions are then compared against judgment data obtained from native speaking adults (though it is important to note that these data are never provided to the model, but function solely as a benchmark against which to evaluate it). Thus, the second aim of Study 1 was to obtain

Semantic affectedness ratings for each of our 57 test verbs;
Corpus frequency of each of these verbs in each of the four target constructions (Active S-ba-O-V, Active S-V-O, Notional Passive O-S-V, Passive O-bei-S-V) plus “Other” (representing intransitives, ditransitives, single-word utterances etc.),
Grammatical acceptability ratings of each of these verbs in each of the four target constructions.

2.1.1 Study 1: method

2.1.1.1 Participants

A total of 76 native Mandarin speakers participated in the study. Sixteen teaching staff at a Mainland university in China completed the Semantic Rating task. Sixty newly registered Chinese students at the University of Liverpool completed the Grammaticality Judgment test. The age of participants ranged from 18 to 43 years. Of course, this sample is not representative of Mandarin speakers in general, since all will have spent longer than average in education, and speak at least 1 s language (i.e., English) at a relatively high level. For this reason, it is likely that a more representative sample would yield noisier data, and hence reduced effect sizes. Nevertheless, we felt that this population and this sample size (based on Aryawibawa and Ambridge 2018) was sufficient for our purposes of investigating the semantics of the relevant constructions, and obtaining judgment data for simulation, since the theoretically-important unit of measurement is the verb, not the participant (i.e., all measures are within-subjects).

2.1.1.2 Semantic Rating task

A total of 57 verbs, Mandarin translation equivalents of 57/72 “core” verbs from Ambridge et al. (2016) were used (some were removed due to overlap in translations). Each verb was rated, on a scale of 1–9, for a set of 8 semantic properties linked to affectedness (as listed below), chosen from 10 listed in Pinker et al. (1987). The task was presented in a randomized Excel spreadsheet that participants completed in their own time (taking roughly 30 min). The semantic properties were as follows:

A causes (or is responsible for) some effect/change involving B,
A enables or allows the change/event,
A is responsible,
A makes physical contact with B,
B changes state or circumstances,
B is responsible,
The event affects B in some way,
The action adversely (negatively) affects B.

Care was taken not to mention passives in any of the study materials, in order to avoid the possibility that participants could approach the rating task by trying out each verb in passive sentences. For each rating, aggregate (mean) scores were taken across all 16 participants, and these means subjected to Principal Components Analysis (‘principal’ from the R package ‘psych’; Revelle 2018) to yield the final semantic affectedness predictor.

2.1.1.3 Grammaticality judgment test

The Grammaticality Judgment test was run online using the Gorilla platform (https://gorilla.sc/), where participants registered and completed the test on-line. Each verb was presented in four sentence types: SVO Active, bei-passive, Notional Passive and ba-active, as in the examples shown below for jiu (save).

SVO Active	那个男人救了那个女人 nage nanren jiu le nage nvren	‘The man saved the woman.’
bei-passive	那个女人被那个男人救了 nage nvren BEI nage nanren jiu le	‘The woman was saved by the man.’
Notional Passive	那个女人, 那个男人救了 nage nvren, nage nanren jiu le	‘The woman, the man saved.’
ba-active	那个男人把那个女人救了 nage nanren BA nage nvren jiu le	‘The man had the woman saved.’

Participants were randomly assigned to two counterbalance versions with reversed semantic roles. For example, half of the participant rated (in translation) The man saved the woman, half The woman saved the man. Each sentence was presented alongside a short animated clip illustrating the intended meaning. Participants submitted their acceptability ratings using a 5-point smiley-face scale. All sentences were presented in random order, with the task taking about 30 min to complete. Due to a programming error, 8 additional datapoints were collected, when a particular trial was presented twice to the same participant. Fortunately, in only 1 of these 8 cases did the participant give different ratings on the duplicated trials. We decided to retain only the first response both in this case and (though the decision is arbitrary) for the remaining 7 cases. Thus the final dataset contains 13,680 datapoints: 60 participants * 57 verbs * 4 sentence types.

2.1.1.4 Corpus counts

In order to assess the effect of semantic affectedness above and beyond input frequency, and to allow for frequency weighting of the input to the subsequent computational model, we obtained counts of each verb (a) overall and (b) in each of the four constructions from the CCL corpus developed by the Centre for Chinese Linguistics at Peking University. Tools supplied with the corpus were used to automatically generate overall counts and to extract candidate verb-in-construction uses. Subsequently, 100 example sentences for each verb + construction pairing (or, for counts of <500, all sentences) were hand coded and the counts pro-rated to yield the final estimates. It must be acknowledged that this method does not take into account verbal polysemy. However, there is evidence that, for at least some Mandarin verbs, polysemous senses share a semantic core (e.g., chi-niu-pai ‘to eat’; chi-wei-ya ‘to attend a year-end party’; chi-lao-ben ‘to live on one’s own fat’; Hsiao et al. 2016) rather than constituting completely separate entities. Indeed, construction grammar approaches would not seem to rule out the possibility that even “true” polysemous uses contribute to whatever learning process is sensitive to the frequency with which particular verbs appear in particular constructions.

In order to obtain a single predictor that captures frequency (a) overall and (b) in the target construction (i.e., the construction in which the verb is being rated on a particular trial) we followed the collustructional approach outlined in Stefanowitsch and Gries (2003); in particular, the method set out by Ambridge et al. (2018). This method yields a chi-square value which represents the extent to which each verb differs from all other verbs in the set with respect to its frequency in a target construction versus all other uses. For example, consider a trial on which the participant rates the frequency of da ‘hit’ in the bei-passive construction. The chi-square calculation is as shown in Table 2.

( A ∗ D − B ∗ C ) 2 ∗ ( A + B + C + D ) ( A + C ) ∗ ( B + D ) ∗ ( A + B ) ∗ ( C + D )

This chi-square value^[6] is then natural-log transformed and assigned a sign (±) corresponding to whether, by comparison with the other 56 verbs in the set, it is biased towards or away from the target construction in question. For this example, da ‘hit’ is assigned a large positive value, reflecting the fact that its ratio of bei-passive to Other uses (roughly 1:25) is considerably greater than for the other 56 verbs in the set (roughly 1:60). Of course, this method is based on the incorrect assumption that these 57 verbs are representative of the language in general. This assumption is almost certainly unwarranted, but is unavoidable given the impossibility of obtaining such counts for all verbs in Mandarin or of determining what constitutes a representative sample.

Table 2:

Example calculation of the chi-square frequency predictor for the verb da (hit).

	Target construction (here bei-passive)	All other uses (total verb uses minus bei-passive uses)
da	A (12,835)	B (331,611)
56 other verbs (summed)	C (21,160)	D (1,262,140)

2.1.2 Study 1: results

For the main analyses, we used Bayesian mixed effects models (brms package; Bürkner 2018), in the R environment (R Core Team 2019). Because no directly comparable study exists, meaning that it is not possible to use a well-informed prior, we did not follow the approach of Bayesian hypothesis testing (e.g., using Bayes Factors), but rather that of “estimation with quantified uncertainty” (Kruschke and Liddell 2018). That is, we used pMCMC values (shown in the summary output tables as “B < > 0”) and credible intervals to estimate the probability that each effect of interest is greater than zero (or, for negative effects, smaller than zero). Readers used to thinking in frequentist terms are free to interpret any effect with a pMCMC value of >0.95 as “statistically significant”, though one of the advantages of a Bayesian approach is that it avoids such arbitrary dichotomies. Because the distribution of the ratings was not even approximately Gaussian (the modal acceptability rating was 5/5), we used a flat, wide normal prior ([0,10]; with all predictors scaled and centred), with the intention that this prior should be overwhelmed by the data (see Appendices 1 and 4 in the Supplementary Materials for details, including the full model outputs). Nevertheless, in order to check that this choice of prior did not overly influence the outcome, we ran additional models with normal priors of [0,2] and [0,5]. These additional models are reported in Appendices 2–3 and 5–6, and yielded almost identical results.

Rather than adopting a maximal random effects structure (e.g., Barr et al. 2013), which can “lead to a significant loss of power” (Matuschek et al. 2017), we adopted a data-driven approach: Using the MixedModels.jl package (Bates et al. 2016) in the Julia environment (Bezanson et al. 2012), we started with maximal models and sequentially simplified the random-effects structure, stopping when doing so yielded worse fit (i.e., higher AICc and BIC values).

2.1.2.1 Main (semantics-only) model

The main model included the crucial interaction of Sentence Type (bei-passive, ba-active, Notional Passive, SVO Active) by Semantics (continuous predictor) as both a fixed effect, and a by-participant random slope. The only fixed effect that varies within verb is Sentence Type, which was included as a by-verb random slope. For this model, maximal random effects structure was justified according to the simplification procedure set out above (i.e., any simplification yielded higher AICc and BIC values). Thus, the Bayesian model was as follows (estimation of this model took 2 h, 19 min on a 4-core 4.2 GHz Intel i7 machine):

brm(formula = Response ∼ S_Type*Semantics + (1+ S_Type*Semantics|Participant) + (1+S_Type|verb), data = Data, family = gaussian(), set_prior(“normal(0,10)”, class = “b”), warmup = 2000, #iter = 5,000, chains = 4, cores = 4, save_all_pars = F, control = list(adapt_delta = 0.99), silent = FALSE)

The model for the main analysis is summarized in Table 3, and shown in full in Appendix 1 in the Supplementary Materials. Posterior predictive checks using all posterior samples (obtained with the pp_check function of brms) revealed an almost perfect correlation between the observed data and simulated data from the posterior predictive distribution, collapsing across all predictors and participants (see Appendix 1). As well as main effects of Sentence Type and Semantics, the analysis revealed an interaction such that the effect of Semantics differed according to Sentence Type (see Figure 1): As predicted, the effect of semantics was greater for ba-actives and bei-passives than for SVO Actives (the reference category), with B<>0 (pMCMC) values of 1 in both cases, indicating that all posterior samples were in the predicted direction. The further predictions that the relationship between verbs’ semantic affectedness and acceptability of the resulting sentence will be (a) greater than zero for bei-passives and ba-actives and (b) smaller for Notional Passives than for bei-passives were tested using the “hypothesis” function of brms. Note that this function calculates estimates and posterior probabilities from the model already fitted, rather than fitting a new model, obviating the need to correct for multiple comparisons (e.g., Gelman et al. 2012). As shown in the final three rows of Table 3, all predictions were again supported, with all B<>0 (pMCMC) values at 1.Models with different priors are summarized in Appendices 2–3 (Supplementary Materials), and yielded almost identical results.

Table 3:

Bayesian model for main analysis (comparisons of theoretical interest shown in bold).

Covariate	Estimate	Est. error	l–95% CI	u–95% CI	B <>
Intercept	4.74	0.05	4.64	4.84	1
S_TypeActive_S_BA_OV	−1.2	0.1	−1.39	−1.01	1
S_TypePassive_Notional_OSV	−3.01	0.09	−3.18	−2.85	1
S_TypePassive_O_BEI_SV	−0.39	0.07	−0.52	−0.26	1
Semantics	−0.03	0.04	−0.12	0.05	0.78
S_TypeActive_S_BA_OV:Semantics	0.8	0.09	0.62	0.98	1
S_TypePassive_Notional_OSV:Semantics	−0.08	0.05	−0.16	0.02	0.95
S_TypePassive_O_BEI_SV:Semantics	0.25	0.06	0.14	0.36	1

Semantics + S_TypePassive_O_BEI_SV:Semantics > 0	0.22	0.05	0.13	0.31	1
Semantics + S_TypeActive_S_BA_OV:Semantics > 0	0.76	0.09	0.61	0.91	1
Semantics + S_TypePassive_Notional_OSV:Semantics <	−0.33	0.06	−0.44	−0.22	1
Semantics + S_TypePassive_O_BEI_SV:Semantics

Figure 1:

Grammaticality judgment scores (y axis) as a function of semantic affectedness (x axis).

2.1.2.2 Semantics + frequency model

Finally, we fitted a new series of models including Verb Frequency (i.e., collustructional frequency, calculated according to the chi-square method set out above) and its interaction with Sentence Type as control predictors. The purpose of this model was to investigate whether the semantic effects reported above hold even after “controlling for” input frequency effects. That said, this analysis should be treated with considerable caution for three reasons. First, frequency effects – at least in part – are a consequence of semantic effects: For any given verb, speakers frequently use this verb in constructions with which it is highly semantically compatible, and rarely – if at all – in constructions with which it is semantically incompatible. Thus, it is debatable whether we should even expect to see semantic effects after “controlling for” frequency effects that – at least in part – arose from those semantic effects in the first place. Second, as noted by Westfall and Yarkoni (2016), regression analysis can “control for” only predictor variables that are measured perfectly (which, for frequency counts, would require a verbatim record of the entire linguistic input of our participants). Otherwise, it is impossible to know whether variance that is not explained by the “control” predictor (here, frequency), should be attributed to the remaining predictors in the model (here, semantics), or to measurement error in the “control” predictor. Third, the frequency and semantic predictors are colinear, particularly when looking at the crucial cases of bei-passives (r = 0.48) and ba-actives (r = 0.53), and also for SVO Actives (r = −0.53). Though not Notional Passives (r = 0.01)^[7]. That said, a comparison between zero-order, semi-partial (part) and partial correlations (see Table 4) reveals that the raw (i.e., zero-order) semantics-DV correlation (a) does not decrease by a great deal when controlling for the correlation between (b) frequency and semantics and, additionally, (c) frequency and the DV. Thus while, for all the reasons above, this analysis should be treated with considerable caution, collinearity does not seem to be especially problematic here.

Table 4:

Correlation with DV.

Sentence type	Zero order	Semi-partial	Partial
Active_SVO	−0.05	0.00	0.00
Active_S_BA_OV	0.57	0.43	0.47
Passive_Notional_OSV	−0.12	−0.12	−0.12
Passive_O_BEI_SV	0.24	0.17	0.17

Table 4 shows Correlations – for each sentence type – between the semantic predictor and the dependent variable (participants’ acceptability judgments) (a) without controlling for the frequency predictor (“Zero Order”) and controlling for the effect of the frequency predictor on (b) the semantic predictor only (“Semi-Partial”) and (c) both the semantic predictor and the dependent variable (“Partial”).

The model for this analysis is summarized in Table 5 (see also Figure 2), and shown in full in Appendix 4 (estimation took 2 h 36 min on the same machine used previously). In brief, although the model yielded some evidence for frequency effects (particularly as an interaction with Sentence Type = ba-actives) the effects of semantics reported above held (with all pMCMC values unchanged at 1), even with the introduction of these controls (though see above for three reasons to be extremely cautious regarding this conclusion). Models with different priors are summarized in Appendices 5–6, and yielded almost identical results.

Table 5:

Bayesian model for additional analysis with frequency as control predictor (comparisons of theoretical interest shown in bold).

Covariate	Estimate	Est. error	l–95% CI	u–95% CI	B <>
Intercept	4.72	0.06	4.62	4.83	1
S_TypeActive_S_BA_OV	−1.11	0.1	−1.31	−0.9	1
S_TypePassive_Notional_OSV	−2.98	0.09	−3.16	−2.81	1
S_TypePassive_O_BEI_SV	−0.37	0.07	−0.51	−0.23	1
Semantics	−0.01	0.05	−0.11	0.08	0.62
Frequency	0.03	0.03	−0.04	0.1	0.81
S_TypeActive_S_BA_OV:Semantics	0.66	0.1	0.47	0.85	1
S_TypePassive_Notional_OSV:Semantics	−0.1	0.05	−0.2	0.01	0.97
S_TypePassive_O_BEI_SV:Semantics	0.21	0.06	0.08	0.33	1
S_TypeActive_S_BA_OV:Frequency	0.23	0.1	0.04	0.43	0.99
S_TypePassive_Notional_OSV:Frequency	0.02	0.06	−0.1	0.14	0.65
S_TypePassive_O_BEI_SV:Frequency	0.03	0.06	−0.09	0.14	0.69

Semantics + S_TypePassive_O_BEI_SV: =Semantics > 0	0.19	0.06	0.1	0.29	1
Semantics + S_TypeActive_S_BA_OV:Semantics > 0	0.65	0.1	0.49	0.81	1
Semantics + S_TypePassive_Notional_OSV:Semantics <	−0.3	0.07	−0.42	−0.19	1
Semantics + S_TypePassive_O_BEI_SV:Semantics

Figure 2:

Grammaticality judgment scores (y axis) as a function of frequency (chi-square bias towards the relevant construction; (x axis).

2.1.3 Study 1: discussion

The aim of Study 1 was to confirm empirically the claim (e.g., Chu 1973; Deng et al. 2018; Huang et al. 2009, 2013; Li 1974; Li et al. 1993; Tompson 1973; Zhang 2001) that the bei-passive and ba-active constructions are associated with the meaning of affectedness of the Patient (i.e., of the Object in the equivalent active sentence). As predicted, and replicating similar results previously observed for English and Indonesian, by-verb semantic affectedness ratings predicted the relative acceptability of 57 verbs in the bei-passive and ba-active constructions.

Before moving on, it is worth pausing to consider the implications of the present findings for another recent claim in the literature; specifically, that verbs can be conventionalized via frequent use into constructions with which they are semantically incompatible (Diessel 2019). Diessel gives the examples of forgive and envy, which can appear in the ditransitive construction despite lacking any meaning of transfer. Similarly, conventionalization into a competitor construction (e.g., of donate into the prepositional dative construction) can block uses that are semantically acceptable (e.g., of donate in the double object construction, both of which share the meaning of caused possession). Although such restrictions often have diachronic explanations (e.g., forgive and envy used to denote giving; Latinate verbs like donate are incompatible with the Germanic ditransitive construction), modern-day learners probably rely considerably more on collustructional frequency (though see Ambridge et al. [2014], for evidence that English speakers have implicit knowledge of the latter restriction).

It is interesting to ask, therefore, whether the present data show any such effects. Do we see Mandarin verbs that are judged to be considerably more (or less) acceptable in a particular construction than we would expect on the basis of their semantics? And if so, is it because they are particularly frequent in that construction? Inspection of Figure 1 reveals a handful of candidates: renchu ‘recognize’, jizhu ‘remember’, hushi ‘ignore’ and wangji ‘forget’ are more acceptable in the ba-active construction than we would expect on the basis of their semantics (since these are mental-state verbs that take Themes, rather than action verbs that take Patients). Similarly, hushi ‘ignore’ and faxian ‘spot’ are more acceptable in the bei-passive construction than we would expect on the basis of their semantics while renshi ‘know’ and xiangxin ‘believe’ are less acceptable. Inspection of Figure 2 reveals that, indeed, in most cases, a frequency-based bias towards the relevant construction – or, for renshi ‘know’ and xiangxin ‘believe’, towards the competing SVO Active construction – seems to be the deciding factor. Thus the present data provide some support for Diessel’s (2019) claim that frequency – via conventionalization – can override semantics.

Having established that the bei-passive and ba-active constructions are associated with the meaning of affectedness, we now turn to our main question: How do speakers balance this semantic constraint with (often competing) information-structure constraints when deciding which of these four constructions to use when producing a given two-argument utterance?

2.2 Study 2: building a computational model

The starting point for our simulation of how Mandarin speakers balance information-structure and semantic constraints when choosing between these four competing two-argument constructions is the observation that a surprisingly simple learning model (see Milin, Dagmar, and Baayen 2017; Milin, Beth Feldman, Ramscar, Hendirx, and Baayen 2017 submitted, for a review) has been shown to offer a close fit to the empirical data in a number of linguistic domains, including grammatical gender (Arnon and Ramscar 2012), word-learning (e.g., Baayen et al. 2019; Milin, Dagmar, and Baayen 2017; Milin, Beth Feldman, Ramscar, Hendirx, and Baayen 2017; Ramscar, Dye, and McCauley 2013; Ramscar, Dye, and Klein 2013; Milin et al. 2017; Ramscar et al. 2013), reading (e.g., Milin et al. 2017) and both inflectional and derivational morphology (e.g., Baayen and Eva 2020; Durdevic and Milin 2019; Milin et al. 2016; Ramscar, Dye, and McCauley 2013; Ramscar, Dye, and Klein 2013; Ramscar and Yarlett 2007). Discriminative learning models map directly from input to output units. That is, unlike typical connectionist models, they do not incorporate a hidden layer that forms linguistic abstractions. This makes discriminative learning models ideal for simulating radically exemplar-based accounts of language acquisition (e.g., Ambridge 2019; Bybee 1985, 2010; Croft, 2000, 2001) that also eschew stored linguistic abstractions above and beyond exemplars. Discriminative learning models are also well grounded in the domains of both human and animal learning generally (e.g., Gureckis and Love 2010; Rescorla 1998; Rescorla and Wagner 1972), and so enjoy psychological plausibility as models of learning (unlike, for example, models based on Bayesian clustering).

Discriminative learning, however, was only a starting point. Our goal in this modelling work was to investigate whether such a simple model can yield a good fit to the human data described above, or whether a better fit can be achieved by adding weight decay (a form of regularization), and/or hidden units (which allow the model to discover any nonlinearities present in the dataset). If it turns out that these models yield a better fit to the human judgment data, the implication is that accounts of human language learning and representation should not be based solely on discriminative learning, but should incorporate some role for (in the case of decay) “forgetting as the basis of abstraction and generalization” (Vlach and Kalish 2014: 1,021) or (in the case of hidden units) abstractions above the level of individual verbs, but below of the level of verb-argument-structure constructions constructions (e.g., Pinker’s, 1989, semantic verb classes).

2.2.1 Study 2: method

The general model architecture (for versions with no hidden units) is summarized in Figure 3. Each input-output pair presented to the model represents an utterance in the corpus described above. The input layer comprises 57 lexical units (0/1), an Object Focus unit (0/1), and a Semantics unit (continuous activation level 0–1). All three units are activated (or not activated) on each training trial. The orthogonal lexical units (0/1) represent the identity of the verb, and can be conceptualized as a pseudo phonological and/or lexical-semantic representation. The Object Focus unit (1/0) represents the information structure of the utterance presented to the model on that trial; specifically, whether or not the Object (or, strictly speaking, what would be the Object in an equivalent active transitive sentence) is in sentence-initial position, and hence likely to be discourse-old. That is, the Object Focus unit is set to 1 if the relevant corpus utterance uses either the O-bei-SV Passive, or the O-S-V Notional Passive construction, and to 0 if it uses the Active SVO, Active S-ba-O-V or Other construction. The Semantics unit is assigned a continuous activation level based on the mean rating – across all semantic raters – for the relevant verb on the Affectedness measure obtained in Study 1. The use of continuous activation for this unit, rather than simply cues that are either present or absent, represents a departure from most discriminative learning models, in that we are using the Widrow-Hoff, rather than Rescorla-Wagner, learning rule. All three types of input units – i.e., the Object Focus Unit, the Semantics unit and one of the one-hot lexical units – are activated on each learning trial. Thus, competition between input units arise when the model is presented with, for example, a verb whose semantics (i.e., low affectedness) predict the SVO Active, Notional Passive or Other construction, but with the Object Focus unit set to 1, which predicts the bei-passive or Notional Passive construction.

Figure 3:

Architecture of the basic model (version with no hidden units).

It is important at this point to be explicit about the assumptions that are inherent in the modelling setup. With regard to the outputs, we adopt the simplifying assumption that the four constructions (plus “Other”) are already known. In reality, of course, speakers will have to learn these constructions alongside the individual verbs. However, a model of how speakers acquire the basic argument constructions of their language – which would be well on the way to constituting a complete model of grammar acquisition – is well beyond the scope of the current paper.

With regard to the inputs, we adopt three simplifying assumptions. The first, inherent in the “Semantics” unit, is that speakers are somehow able to extract and hone-in on the single semantic feature (“Affectedness”) that is relevant for learning verbs’ argument structure preferences. In reality, of course, speakers will have to learn which aspects of a verb’s meaning are (ir)relevant for predicting their morphosyntactic behaviour. However, there is no way for us to simulate these additional complexities, since we do not have anything like a complete list of morphosyntactically (ir)relevant properties; let alone sets of ratings of the extent to which particular verbs exhibit them.

The second, inherent in the Object Focus unit (1/0) is that speakers are somehow able to boil down all of the discourse-pragmatic aspects of incoming speech into whether the (active) Object should be “moved” to the beginning of the utterance. Again, in reality, these subtleties of pragmatics will have to be effortfully learned. But, again, we cannot simulate them here, since we do not have a suitable input corpus that codes for discourse-pragmatic function.

The third simplifying assumption inherent in our input representations is that verb identity (at the phonological and lexical-semantic level) can be represented using a bank of orthogonal one-hot units. Again, the reality is much more complex, since learners must acquire a finely structured semantic “web” of lexical verb meanings, linked to phonological representations. However, this simplifying assumption would not seem to be particularly problematic, given that fine grained phonological and lexical-semantic similarities (i.e., those below the level of “morphosyntactically-relevant” properties such as “Affectedness”) seem to be largely irrelevant for verb + argument structure combinatorial possibilities.

Of course, as noted by an anonymous reviewer, the biggest simplifying assumption of all is that, together with high-level outputs, these inputs are presented together in a “complex and composite cue…giving high-level and sophisticated knowledge for free – pre-processed. Such knowledge appears “miraculously” rather than being gradually learned”. We accept this criticism entirely but, in the absence of corpora coded for fine-grained semantics and discourse function (and, to be frank, of the necessary modelling expertise) it is not possible to build a more fine-grained, more realistic model. It is vital, therefore, to emphasize that conclusions for human learning and representation drawn on the basis of this modelling work must be regarded as tentative, pending replication with more realistic models.

For each analysis, we ran a total of 600 models: 60 models with different random seeds, each run over 100 epochs. Each epoch consisted of 10,000 input-output pairs randomly selected (with replacement) from the corpus, and presented in random order. Models were implemented using the nnet R package (Venables and Ripley 2013) with the range parameter (which determines the initial random seed) always set to 0.5 (preliminary simulations not reported revealed that different settings of this parameter yielded virtually identical results).

At test, we simulated a discourse scenario that requires some form of passive or topicalization construction, by always setting the Object Focus unit to 1, then presented the model in turn with each verb (i.e., by setting the relevant “one hot” orthogonal unit to 1, and the Semantic unit to the relevant continuous activation level for that verb). The resulting activation levels of the Passive O-bei-S-V, Notional Passive O-S-V, Active S-ba-O-V and Active S-V-O units were taken as the model’s prediction of the acceptability of the relevant utterance in our human judgment task (Study 1). We assessed the model’s performance by correlating these predictions with the data from Study 1, in each case taking the mean across all 60 participants, and all 60 models at a given epoch. It is important to stress that the model itself is never shown participants’ grammaticality judgment data, which function solely as a benchmark.

2.2.2 Experimenting with hyperparameters

2.2.2.1 Weight-decay

First, we investigated the effect of varying the model’s weight-decay parameter, while holding constant a maximally simple architecture with no hidden units. Eleven versions of this model were run with the weight-decay parameter set to 0 (corresponding to “pure” discriminative learning) and increased at increments of 0.1, up to a maximum of 1.0. The results of this analysis are shown in Figure 4. Perhaps surprisingly, the decay = 0 (“pure” discriminative learning) model performed relatively poorly, achieving correlations of only r = 0.19 and r = 0.10 with human judgments of bei-passives and ba-actives respectively. Indeed, neither of these correlations are statistically significant (critical value for r [df = 56] = 0.22 at p < 0.05 and r = 0.31 at p < 0.01). With decay = 0.4, the model achieved statistically significant correlations of r = 0.31 and r = 0.46 with human judgments of bei-passives and ba-actives respectively (the former stabilized at around r = 0.3, while the latter climbed as high as r = 0.50 with larger decay values). Bearing in mind the above caveats regarding modelling assumptions, what this suggests is that pure discriminative learning may overfit the training set, and that some form of regularization (akin to human “forgetting”) is needed to better simulate human performance. As we will see shortly, this conclusion is also supported by modelling runs that investigated the effect of removing the orthogonal verb-identity units.

Figure 4:

Model-human grammaticality-judgment correlations (Pearson’s r, y-axis) for different values of the model’s weight-decay parameter (x-axis).

2.2.2.2 Hidden units

Next, we investigated the effect of introducing hidden units, while holding decay constant at 0.4. Versions of this model were run with the number of hidden units set to zero (as above), 1, 2, 3, 4, 5 and 10. The results of this analysis are shown in Figure 5. Perhaps surprisingly, given that three-layer networks are often used for this type of task, adding hidden units never helped the model and, for ba-actives, actively hindered it, presumably by forcing it to “squash” its representations into a small number of hidden units. Bearing in mind the above caveats regarding modelling assumptions, and in particular the fact that the model already possesses abstract representations in the forms of the output constructions, what this suggests is that an intermediate representational step between individual verbs and constructions (like Pinker’s, 1989, verb classes) is unnecessary (and possibly even harmful) for learning.

Figure 5:

Model-human grammaticality-judgment correlations (Pearson’s r, y-axis) for different numbers of hidden units (x-axis).

2.2.3 Exploring the main model

As a result of the hyperparameter manipulation set-out above, we designated a no-hidden-unit model with decay = 0.4 (and range = 0.5) as the “main model”, and conducted a more detailed investigation of its learning and representations. Figure 6 shows this model’s verb-by-verb predictions. After just a handful of epochs predicting Active S-V-O and Other utterances (presumably because these are the most frequent in the input), the model rapidly learns Pullum’s (2014) “new-information condition on by-phrases”, learning that an Object Focus unit set to 1 predicts either a bei or Notional Passive. Active S-ba-O-V predictions are discarded even more rapidly, since this construction is neither frequent in the input, nor predicted by an Object Focus setting of 1 (recall that this is always the case in the test phase). Interestingly, though, the model still shows considerable by-verb variation. For most verbs, particularly high-affectedness verbs like jingdai ‘amaze’, douxiao ‘amuse’, rehuo ‘anger’, geshang ‘cut’, yao ‘bite’, la ‘pull’, yadao ‘squash’, and kongxia ‘terrify’, the model strongly prefers bei over Notional Passives. For a handful of verbs, particularly lower-affectedness verbs like genzhe ‘follow’, zhushi ‘look at’, xiangnian ‘miss’, jizhu ‘remember’, and kanjian ‘see’, it is ambivalent between bei and Notional Passives. For five verbs, the model prefers Notional over bei-passives. This is as expected for the low-affectedness verbs xiangxin ‘believe’, wangji ‘forget’, and xihuan ‘like’, if a little surprising for the high-affectedness verbs chi ‘eat’, and pai ‘pat’. The most likely explanation here is that (unlike in our stimuli), corpus uses of chi ‘eat’, and pai ‘pat’ mainly have nonhuman Patients, and so tend to be topicalized, but not necessarily highly affected (e.g., The pizza, I ate [it]; The dog I patted [it]). Overall, however, the by-verb results shown in Figure 6 broadly follow the pattern that one would expect on the basis of Study 1: In accordance with the putative semantics of these constructions, verbs that score high for semantic affectedness prefer the bei over the Notional passive, while verbs that score low for semantic affectedness reverse this pattern.

Figure 6:

Main model: By-verb output probabilities (0–1, y-axis) for each of the four sentence types (plus other) by development (10,000 sentence epochs, x-axis).

In order to examine in more detail how the model learns to accurately simulate human judgments we now consider the developmental trajectory of model-human correlations, as summarized in Figure 7. This plot shows, for each sentence type, the correlation across all 57 verbs between the human judgment data from Study 1 and the (main) model’s predictions, and how this correlation changes as the model learns. Considering, first, Passive O-bei-S-V sentences, by around 50 epochs, the model-human correlation asymptotes at (as we have already seen) around r = 0.30. Early in learning, its correlation with human judgments is essentially zero, since (as shown in Figure 6), it initially predicts the most-frequent SVO Active and (in particular) Other (presumably mostly intransitive) sentence types across the board. However, it then learns relatively rapidly to produce bei-passives, but only for semantically-compatible verbs.

Figure 7:

Main model: Model-human grammaticality-judgment correlations (Pearson’s r, y-axis) for each of the four sentence types by development (10,000 sentence epochs, x-axis).

The model’s even-better performance for Active S-ba-O-V sentences, which asymptotes at around r = 0.5 by 75 epochs, is surprising, given that (as shown in Figure 6), it essentially learns NOT to produce ba-actives when the Object Focus unit is set to 1 (as is always the case in the test phase). Apparently, then, the model achieves its correlation with human judgments by learning “how bad” ba-actives “sound” when the discourse context strongly pulls for some form of passive; specifically, “not that bad” when the semantics of the verb are highly consistent with the notion of affectedness, which strongly pulls for a ba-active.

Figure 8 plots the weights learned by this (main) model. The large number of input nodes makes interpretation of this figure difficult, but it is apparent that (as shown in Figure 6), verbs that score high for “Affectedness” build excitatory links to the ba-active and bei-passive constructions and inhibitory links to SVO Active, Notional Passive and Other constructions, with verbs that score low for “Affectedness” showing the opposite pattern.

Figure 8:

Main model: Learned weights at 100 epochs.

2.2.4 Semantics versus statistics

Finally, we ran a set of simulations designed to explore the relative contributions of semantics-based and purely distributional (statistical) lexical-learning. One possibility (“statistics”) is that the semantics node is largely or completely superfluous, and that the model achieves its correlation with human judgments simply by learning which of the orthogonal “one hot” lexical units predicts which output construction under which setting (1/0) of the (“discourse pragmatic”) Object-focus node. The opposite possibility (“semantics”) is that the orthogonal lexical units are largely or completely superfluous, and that the model achieves its correlation with human judgments by learning thresholds above which the Semantics “Affectedness” unit predicts a particular output construction under which setting (1/0) of the (“discourse pragmatic”) Object-focus node. Perhaps the most likely possibility a priori is that semantics and statistics work together, and that removing either type of cue will damage the model’s performance, but not break it entirely.

2.2.4.1 Statistics-only

In order to investigate the first possibility, we ran a version of the “main” model (no hidden units, decay = 0.4, range = 0.5) with the settings of the Semantics unit permutated across verbs. That is, a given verb (as defined by the orthogonal “lexical” units), still has a consistent Semantics (in terms of “Affectedness) across runs. However, this assigned Semantics is not a useful cue for learning the constructions in which the verb does (/not) appear. This “Statistics-only” model is summarized in Figures 9–11 (see the Supplementary Materials for Figures 9–14).

As expected, the model’s performance was negatively affected, but not dramatically so, with correlations of around r = 0.2 observed for both bei-passives and ba-actives (as compared to r = 0.3 and r = 0.5 for the main model). The implication is that the model (and, by extension, humans) is certainly capable of learning lexical restrictions that have no semantic motivation (as discussed in Study 1 above, with reference to Diessel [2019]).

2.2.4.2 Semantics-only

In order to investigate the second possibility, we ran a version of the “main” model (no hidden units, decay = 0.4, range = 0.5) with the orthogonal lexical units set to zero for all learning trials. That is, the model still has the opportunity to learn which levels of (continuous) affectedness predict which output construction under which setting (1/0) of the Object-focus node; and that information is still weighted in the same way as in the original corpus. What the model cannot do, however, is pure lexical learning; learning of the extent to which a particular verb predicts a particular construction. Surprisingly, this Semantics-only model (see Figures 12–14 in the Supplementary Materials) showed the best performance of any we investigated, achieving correlations with human judgments of just over r = 0.5 and r = 0.75 for bei-passive and ba-actives respectively.

2.2.5 Study 2: discussion

Why does removing lexical information appear to help the model when, by all accounts, it should hinder it? Again, any conclusions that we draw here should be accompanied by a heavy caveat, since this result may well be specific to the particular, necessarily unrealistic, idealized modelling set-up used here. That said, one possibility is that removing lexical information helps the model for the same reason that building in a fairly hefty amount of weight decay also does so: by preventing overfitting. A zero-decay model with lexical information learns, very efficiently, which verbs happen to predict which constructions (under which settings of the Object-focus parameter) in this particular corpus. But a model that rapidly forgets these lexical-level predictions (i.e., a model with weight-decay) or, better still, a model that cannot form them in the first place is denied that option. Instead, it can learn only a much more general semantic constraint: what degree of affectedness predicts construction choice (under which settings of the Object-focus parameter), regardless of the particular verb used? On this story, this more general semantic constraint is more useful for predicting participants’ acceptability judgments, which reflect what a verb can do (recall that the sentences they rated were constructed, and most likely did not appear in this particular corpus), as opposed to simply what it does do most frequently.

Whether or not this speculative possibility is correct, in conclusion, both the statistical and computational modelling findings converge on the conclusion that the Mandarin Active S-ba-O-V and Passive O-bei-S-V constructions are associated with the meaning of affectedness, while the Active S-V-O and Notional Passive O-S-V sentences are not. More importantly, the findings of Study 2 demonstrate that a simple, psychologically plausible learning model (essentially a discriminative learning model augmented with weight-decay) can explain how Mandarin speakers learn to balance information-structure and semantic constraints when selecting between constructions that convey – at least in truth-value terms – similar messages.

3 General discussion

Given a particular message to convey (e.g., Predicate = rescue, Agent = Lizzy, Patient = John), how does a speaker decide which of a number of truth-value-identical constructions to use (e.g., Lizzy rescued John, John was rescued by Lizzy etc.), given constraints such as discourse/information structure and the semantic fit between verb and construction? The aim of the present study was to set out a preliminary answer to this question, at least as it pertains to two-argument constructions in Mandarin Chinese, in the form of a computational model. When selecting one of these constructions for use, Mandarin speakers must balance two (often competing) constraints: (1) An information structure constraint which specifies that “[t]he denotation of the by-phrase NP in a passive clause must denote something at least as new in the discourse as the subject” (Pullum 2014: 64,) and (2) a construction-semantic constraint such as the bei-passive and ba-active constructions, but not the Notional Passive and SVO Active constructions, are associated with the meaning of affectedness of the Patient (i.e., of the Object of the active forms). The value of computational modelling, here, is that it allows us to build and test a quantitative psychologically plausible account of how learners balance these constraints, in a way that is almost impossible for traditional verbal (i.e., non-computational) models. It also allows us – bearing in mind the caveats raised above regarding the models’ necessarily-unrealistic simplifications – to loop back and make inferences about human language learning mechanisms.

The aim of Study 1 (as well as obtaining data for subsequent computational modelling) was to test empirically the claim (e.g., Chu 1973; Deng et al. 2018; Huang et al. 2009, 2013; Li 1974; Li et al. 1993; Tompson 1973; Zhang 2001) that the bei-passive and ba-active constructions are associated with the meaning of affectedness, to the extent that verbs that do not exhibit this property to a sufficient degree are ungrammatical (or at least dispreferred) in these constructions. To this end, 60 native Mandarin-speaking adults rated the acceptability of 57 verbs in each of four constructions: O-bei-S-V Passive, O-S-V Notional Passive, S-ba-O-V Active and S-V-O Active. We found that, as expected, the relative acceptability of verbs in the O-bei-S-V Passive and S-ba-O-V Active constructions was predicted by verbs’ semantic affectedness, as judged by independent adult raters.

Before moving on to discuss the findings of the computational modelling work, we first consider the implications of the present judgment findings for the broader questions raised in the Introduction. First, these findings suggest, at least for those considered in the present study, that constructions have independent meaning. It is not easy to see how one could explain the present findings except in terms of semantic compatibility between particular verbs and particular constructions with regard to the property of affectedness. Certainly, it does not seem to be the case that solely verb semantics – and not construction semantics – are relevant (c.f., Messenger et al. 2012), given that the semantic effects observed for the Mandarin O-bei-S-V Passive and S-ba-O-V Active constructions were not echoed for the closely matched O-S-V Notional Passive or S-V-O Active (replicating findings observed for English and Indonesian; Ambridge et al. [2016]; Aryawibawa and Ambridge [2018]).

Second, these findings – when coupled with previous similar findings from English and Indonesian – suggest that we should begin to take seriously the possibility that, far from having no discernible semantics (e.g., Branigan and Pickering 2017), the passive construction has similar semantics – perhaps even something approaching universal semantics – across languages. But just why should unrelated languages share passive constructions with similar semantics? Certainly, given the historical timescales involved, crosslinguistic borrowing is not a viable explanation. Rather, as we noted in the Introduction, languages – by which we mean speakers – just seem to need a construction that, at the same time, topicalizes the undergoer, and denotes that it was affected in some way. This is presumably why Mandarin has retained the O-bei-S-V Passive construction, even though it has other constructions – the O-S-V Notional Passive and the S-ba-O-V Active that fulfil each of these functions individually. Of course, this possibility remains speculative, pending investigation in further languages, a project that we are currently undertaking (specifically for Balinese and Hebrew).

Indeed, the computational modelling work reported in Study 2 provides support for the view that Mandarin retains both the bei-passive and the Notional Passive precisely because this allows speakers to topicalize the (active) Object, but retain a high degree of choice over whether or not they additionally wish to imply that this entity is affected in some way. In this study, a simple two-layer discriminative-learning model (though one augmented with weight-decay in the optimal case) learned to map, in a frequency-sensitive way, from lexical verb identity + verb semantics (degree of affectedness) + information structure (Object topical or not) to each of four constructions: Active S-ba-O-V, Active S-V-O, Notional Passive O-S-V, Passive O-bei-S-V (plus “Other”; e.g., intransitives). When placed in a communicative situation that required it to produce an Object-topic construction, the model generally predicted a bei-passive when the verb denoted a high degree of Object affectedness, and a notional passive when it did not. Furthermore, when looking within construction type, the model was able to predict the relative acceptability of 57 verbs in the Active S-ba-O-V and Passive O-bei-S-V constructions, as judged by human raters (correlations in the region of r = 0.5 and r = 0.3, respectively). This study therefore adds to the growing body of work showing that simple discriminative learning models, originally developed to explain how animals learn to discriminate cues to particular outcomes (e.g., a buzzer versus a light predicting food) are plausible candidates for models of language learning.

That said, the present results suggest that “pure” discriminative learning is not quite optimal in this particular case. Recall that both adding decay and removing a cue that is almost certainly used by humans – the lexical identity of the verb – boosted the model’s performance, apparently by reducing overfitting. To reiterate, we must be extremely cautious in drawing real-world theoretical conclusions from these findings, since the present model simplifies the learning problem in a number of important ways. First, a verb’s semantic representation consists of a single (continuous) feature, chosen specifically for its relevance in discriminating between the four constructions between which the model must choose at test. In a more realistic scenario, the model would have to learn which of a very large number of semantic features along which verbs vary (presumably hundreds or even thousands) are discriminative. Second, and similarly, the model’s representation of information structure was extremely coarse: whether or not the Object (of the equivalent active construction) appeared in sentence-initial position (and so was presumably topical in the discourse). Again, in a more realistic scenario, the model would have to learn which of a very large number of information structure constraints are predictive. Third, and most seriously of all, the present modelling task takes the acquisition of the four verb argument structure constructions – or at least the corresponding formal patterns – as a fait accompli at the start of learning. All that the model must do is to learn which information-structure and semantic properties (in this case, degree of affectedness) are predictive of which construction. In real life, of course, learners must acquire the formal structure of the constructions hand-in-hand with the relevant information-structure and semantic constraints. In this respect, the present model represents a significant departure from radical exemplar accounts of language acquisition (e.g., Ambridge 2020a; Bybee 1985, 2010; Croft 2000, 2001) which, in fact, assume that constructions are not stored independently, but exist only as mnemonics for on-the-fly generalizations made across stored exemplar utterances (though see Ambridge, 2020b). Bearing all of these caveats in mind, however, the present results at least raise the possibility that some degree of “forgetting” of fine-grained lexical information – Vlach and Kalish’s (2014) “forgetting as…abstraction” is important for learning.

In conclusion, much work remains to be done to build a psychologically plausible account of how learners acquire verb argument structure constructions, and balance information-structure and semantic constraints when choosing between these constructions in particular discourse scenarios. In the meantime, the present study has (1) provided support for the position that (at least these) constructions exhibit their own semantic properties, and (2) (modified) discriminative learning models hold considerable promise as accounts of how learners learn to balance these semantic constraints with other constraints, including those imposed by considerations of information structure.

Corresponding author: Ben Ambridge, Department of Psychology, University of Liverpool, Eleanor Rathbone Building, Bedford St South, Liverpool L69 7ZA, UK; and ESRC International Center for Language and Communicative Development (LuCiD), Liverpool, UK, E-mail: Ben.Ambridge@Liverpool.ac.uk

Funding source: the European Union’s Horizon 2020 research and innovation programme

Award Identifier / Grant number: no 681296: CLASS

Funding source: Chinese Council Scholarship

Award Identifier / Grant number: 20180715003

Funding source: the Economic and Social Research Council

Award Identifier / Grant number: ES/L008955/1

Acknowledgements

Liu Li received funding of the 2018 scholarship from China Scholarship Council (CSC). Ben Ambridge is Professor in the International Centre for Language and Communicative Development (LuCiD) at the University of Liverpool. The support of the Economic and Social Research Council [ES/L008955/1] is gratefully acknowledged. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 681296: CLASS).

Data availability statement: All data and analysis scripts can be downloaded from https://osf.io/cbz8m a permanent, frozen registration of the project (DOI: 10.17605/OSF.IO/CBZ8M). The project contains two compressed (zip) files. The file for Study 1 (“Mandarin_Study1.zip”) contains the R analysis code (“V5KristenPostRev.R”), the data file required by this analysis (“FinalData.csv”) and the R environment containing the Bayesian models (“CB_BAYES_Environment.RData”), as well as all outputs generated by the analysis (.pdf and .txt files). The file for Study 2 (“Mandarin_Study2.zip”) contains the R analysis code (“V15 ModelingPOSTREV.R”), the data files required for the computational modeling (in .csv format) and all outputs generated (.pdf files).

References

Abbot-Smith, Kirsten & Michael Tomasello. 2006. Exemplar-learning and schematization in a usage-based account of syntactic acquisition. The Linguistic Review 23(3). 275–290. https://doi.org/10.1515/tlr.2006.011.Search in Google Scholar

Ambridge, Ben. 2020a. Against stored abstractions: A radical exemplar model of language acquisition. First Language 40(5–6). 509–559. https://doi.org/10.1177/0142723719869731.Search in Google Scholar

Ambridge, Ben. 2020b. Abstractions made of exemplars or ‘You’re all right and I’ve changed my mind’ Response to commentators. First Language 40(5–6). 640–659. https://doi.org/10.1177/0142723720949723.Search in Google Scholar

Ambridge, Ben. 2019. Against stored abstraction: A radical exemplar model of language acquisition. First Language 40(5–6). 509–559. https://doi.org/10.1177/0142723719869731.Search in Google Scholar

Ambridge, Ben, Libby Barak, Elizabeth Wonnacott, Colin Bannard & Giovanni Sala. 2018. Effects of both preemption and entrenchment in the retreat from verb overgeneralization errors: Four reanalyses, an extended replication, and a meta-Analytic synthesis. Collabra: Psychology 4(1). 23. https://doi.org/10.1525/collabra.133.Search in Google Scholar

Ambridge, Ben, Bidgood Amy, Julian Pine, Caroline Rowland & Daniel Freudenthal. 2016. Is passive syntax semantically constrained? Evidence from adult grammaticality judgment and comprehension studies. Cognitive Science 40(6). 1435–1459. https://doi.org/10.1111/cogs.12277.Search in Google Scholar

Ambridge, Ben, Julian Pine, Caroline Rowland, Daniel Freudenthal & Franklin Chang. 2014. Avoiding dative overgeneralisation errors: Semantics, statistics or both? Language Cognition and neuroscience 29(2). 218–243. https://doi.org/10.1080/01690965.2012.738300.Search in Google Scholar

Aryawibawa, I Nyoman & Ben Ambridge. 2018. Is syntax semantically constrained? Evidence from a grammaticality judgment study of Indonesian. Cognitive Science 42(8). 3135–3148. https://doi.org/10.1111/cogs.12697.Search in Google Scholar

Arnon, Inbal & M ichael Ramscar. 2012. Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition 122(3). 292–305. https://doi.org/10.1016/j.cognition.2011.10.009.Search in Google Scholar

Baayen, Harald, Yu-Ying Chuang, Elnaz Shafaei-Bajestan & James Blevins. 2019. The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity 1–39. https://doi.org/10.1155/2019/4895891.https://doi.org/10.1155/2019/4895891Search in Google Scholar

Baayen, Harald & Smolka Eva. 2020. Modeling morphological priming in German with native discriminative learning. Frontiers in Communication 5(17). https://doi.org/10.3389/fcomm.2020.00017.https://doi.org/10.3389/fcomm.2020.00017Search in Google Scholar

Barr, Dale, Roger Levy, Christoph Scheepers & Harry Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68(3). 255–278. https://doi.org/10.1016/j.jml.2012.11.001.Search in Google Scholar

Bates, Douglas, Tony Kelman, A. Simon, Andreas Noack, Michael Hatherly & Milan Bouchet-Valat. 2016. Dmbates/Mixedmodels.Jl: Drop Julia V0.4.X And Earlier Support. Zenodo. Available at: https://doi.org/10.5281/ZENODO.162045.Search in Google Scholar

Bezanson, Jeff, Stefan Karpinski, Viral Shah & Alan Edelman. 2012. Julia: A Fast Dynamic Language for Technical Computing. Available at: http://julialang.org/images/julia-dynamic-2012-tr.pdf.Search in Google Scholar

Boeckx, Cedric. 1998. A minimalist view on the passive. Cambridge, MA: MIT Press.Search in Google Scholar

Branigan, Holly & Martin Pickering. 2017. Structural priming and the representation of language. Behavioral and Brain Sciences 40. 1–61. https://doi.org/10.1017/s0140525x17001212.Search in Google Scholar

Burkner, Paul-Christian. 2018. Advanced Bayesian Multilevel Modeling with the R package brms. The R Journal 10(1). 395–411. https://doi.org/10.32614/rj-2018-017.Search in Google Scholar

Bybee, Joan. 1985. Morphology: A study of the relation between meaning and form, vol. 9. Amsterdam: John Benjamins.10.1075/tsl.9Search in Google Scholar

Bybee, Joan. 2010. Language, usage and cognition. Cambridge: Cambridge University Press.10.1017/CBO9780511750526Search in Google Scholar

Carnie, Andrew. 2007. Syntax: A generative introduction, 3rd edn. Oxford: Wiley Blackwell.Search in Google Scholar

Chandler, Steve. 2010. The English past tense: Analogy redux. Cognitive Linguistics 21(3). 371–417. https://doi.org/10.1515/cogl.2010.014.Search in Google Scholar

Chandler, Steve. 2017. The analogical modeling of linguistic categories. Language and Cognition 1(1). 52–87. https://doi.org/10.1017/langcog.2015.24.Search in Google Scholar

Chao, Yuan-Ren. 1968. A grammar of spoken Chinese. Berkeley: University of California Press.Search in Google Scholar

Chomsky, Noam. 1993. A minimalist program for linguistic theory. In Ken Hale & Samuel Keyser (eds.), The view from building 20, 1–52. Cambridge, MA: MIT Press.Search in Google Scholar

Chu, Chauncey. 1973. The passive construction: Chinese and English. Journal of Chinese Linguistics 1(3). 437–470.Search in Google Scholar

Collins, Chris. 2005. A Smuggling approach to the passive in English. Syntax 8(2). 81–120. https://doi.org/10.1111/j.1467-9612.2005.00076.x.Search in Google Scholar

Croft, William. 2000. Explaining language Change: An evolutionary approach. (Longman Linguistics Library). London: Longman.Search in Google Scholar

Croft, William. 2001. Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press.10.1093/acprof:oso/9780198299554.001.0001Search in Google Scholar

Croft, William & Alan Cruse. 2004. Cognitive linguistics. Cambridge, UK: Cambridge University Press.10.1017/CBO9780511803864Search in Google Scholar

Culicover, Peter, Ray Jackendoff & Jenny Audring. 2017. Multiword constructions in the grammar. Topics in Cognitive Science 9(3). 552–568.10.1111/tops.12255Search in Google Scholar

Davidson, Donald. 1967. Truth and meaning. Synthese 17(3). 304–323. https://doi.org/10.1007/bf00485035.Search in Google Scholar

Deng, Xiangjun, Ziyin Mai & Virginia Yip. 2018. An aspectual account of ba and bei constructions in child Mandarin. First Language 38(3). 243–262. https://doi.org/10.1177/0142723717743363.Search in Google Scholar

Diessel, Holger. 2019. The grammar network: How linguistic structure is shaped by language use. Cambridge: Cambridge University Press.10.1017/9781108671040Search in Google Scholar

Dixon, Robert M. 2012. Basic linguistic theory, vol. 3: Further grammatical topics. Oxford: Oxford University Press.Search in Google Scholar

Durdevic, Dusica & Petar Milin. 2019. Information and learning in processing of adjective inflection. Cortex 116. 209–227.10.1016/j.cortex.2018.07.020Search in Google Scholar

Fillmore, Charles, Russell Lee-Goldman & Russell Rhomieux. 2012. The FrameNet Constructicon. In Hans C. Boas & Ivan A. Sag (eds.), Sign-based construction grammar, 283–299. Stanford: CSLI Publications.Search in Google Scholar

Gelman, Andrew, Jennifer Hill & Masanao Yajima. 2012. Why we don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness 5(2). 189–211. https://doi.org/10.1080/19345747.2011.618213.Search in Google Scholar

Gordon, Peter & Jill Chafets. 1990. Verb-based versus class-based accounts of actionality effects in children’s comprehension of passives. Cognition 36(3). 227–254. https://doi.org/10.1016/0010-0277(90)90058-r.Search in Google Scholar

Goldberg, Adele E. 1995. Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.Search in Google Scholar

Goldberg, Adele E. 2006. Constructions at work: The nature of generalization in language. New York: Oxford University Press.10.1093/acprof:oso/9780199268511.001.0001Search in Google Scholar

Goldberg, Adele E. 2019. Explain me this: Creativity, competition and the partial productivity of constructions. Princeton: Princeton University Press.10.1515/9780691183954Search in Google Scholar

Gureckis, Todd & Bradley Love. 2010. Direct associations or internal transformations? Exploring the mechanisms underlying sequential learning behavior. Cognitive science 34(1). 10–50. https://doi.org/10.1111/j.1551-6709.2009.01076.x.Search in Google Scholar

Hilpert, Martin. 2014. Construction grammar and its application to English. Edinburgh: Edinburgh University Press.Search in Google Scholar

Hirsch, Christopher & Ken Wexler. 2006. Children’s passives and their resulting interpretation. In The proceedings of the inaugural conference on generative approaches to language acquisition–North America, University of Connecticut Occasional Papers in Linguistics, vol. 4, 125–136. Storrs, CT: University of Connecticut.Search in Google Scholar

Huang, C. T. James, Audrey Li & Yafei Li. 2009. The syntax of Chinese. London: Cambridge University Press.10.1017/CBO9781139166935Search in Google Scholar

Huang, Yiting, Xiaobei Zheng, Xiangzhi Meng & Jesse Snedeker. 2013. Children’s assignment of grammatical roles in the online processing of Mandarin passive sentences. Journal of Memory and Language 69(4). 1–34. https://doi.org/10.1016/j.jml.2013.08.002.Search in Google Scholar

Hsiao, Hui-Chen, Yi-Chun Chen & Ying-Chen Wu. 2016. Representation of polysemy in Mandarin verbs: Chi, da and xi. Concentric Studies in Linguistics 42(1). 1–30.Search in Google Scholar

Kruchke, John & Torrin Liddell. 2018. Bayesian data analysis for newcomers. Psychonomic Bulletin & Review 25. 155–177.10.3758/s13423-017-1272-1Search in Google Scholar

Langacker, Ronald W. 1988. A usage-based model. Topics in Cognitive Linguistics 11. 127–161. John Benjamin Publishing. https://doi.org/10.1075/cilt.50.06lan.Search in Google Scholar

Li, Ying-Che. 1974. What does ‘disposal’ mean? Features of the verb and noun in Chinese. Journal of Chinese Linguistics 2(2). 200–218.Search in Google Scholar

Li, Yen-Hui Audrey. 1990. Order and constituency in Mandarin Chinese. Dordrecht: Kluwer.10.1007/978-94-009-1898-6Search in Google Scholar

Li, Charles N. & Sandra A. Thompson. 1976. Subject and topic: A new typology of language. In Charles N. Li (ed.), Subject and topic, 457–489. New York: Academic Press.Search in Google Scholar

Li, Ping, Elizabeth Bates & Brian McWhinney. 1993. Processing a language without inflections: A reaction time study of sentence interpretation in Chinese. Journal of Memory and Language 32(2). 169–192. https://doi.org/10.1006/jmla.1993.1010.Search in Google Scholar

Maratsos, Michael, Judith Becker Dana Fox & Mary Anne Chalkley. 1985. Semantic restrictions on children’s passives. Cognition 19(2). 167–191. https://doi.org/10.1016/0010-0277(85)90017-4.Search in Google Scholar

Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen & Douglas Bates. 2017. Balancing Type I error and power in linear mixed models. Journal of Memory and Language 94. 305–315. https://doi.org/10.1016/j.jml.2017.01.001.Search in Google Scholar

Messenger, Katherine, Holly Branigan, Janet McLean & Antonella Sorace. 2012. Is young children’s passive syntax semantically constrained? Evidence from syntactic priming. Journal of Memory and Language 66(4). 568–587. https://doi.org/10.1016/j.jml.2012.03.008.Search in Google Scholar

McEnery, Tony & Richard Xiao. 2005. Passive constructions in English and Chinese: A corpus-based contrastive study. Proceedings from the Corpus Linguistics conference series 1(1).Search in Google Scholar

Milin, Petar, Dagmar Divjak & Harald Baayen. 2017. A learning perspective on individual difference in skilled reading: Exploring and exploiting orthographic and semantic discrimination cues. Journal of Experimental Psychology: Learning, Memory, and Cognition 43(11). 1730–1751. https://doi.org/10.1037/xlm0000410.Search in Google Scholar

Milin, Petar, Dagmar Divjak, Strahinja Dimitrijević & Harald Baayen. 2016. Towards cognitively plausible data science in language research. Cognitive Linguistics 27. 507–526. https://doi.org/10.1515/cog-2016-0055.Search in Google Scholar

Milin, Petar, Laurie Beth Feldman, Michael Ramscar, Peter Hendirx & Harald Baayen. 2017. Discrimination in lexical decision. PloS One 12(2). https://doi.org/10.1371/journal.pone.0171935.Search in Google Scholar

Milin, Petar, Harish Madabushi, Michael Croucher & Dagmar Divjak. 2020. Keeping it simple: Implementation and performance of the proto-principle of adaptation and learning in the language sciences. arXiv:2003.o3813[cs.CL].Search in Google Scholar

Pinker, Steven. 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press.Search in Google Scholar

Pinker, Steven, David Lebeaux & Loren Ann Frost. 1987. Productivity and constraints in the acquisition of the passive. Cognition 26(3). 195–267. https://doi.org/10.1016/s0010-0277(87)80001-x.Search in Google Scholar

Pullum, Geoffrey. 2014. Fear and loathing of the English passive. Language & Communication 37. 60–74. https://doi.org/10.1016/j.langcom.2013.08.009.Search in Google Scholar

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English Language. London: Pearson Longman.Search in Google Scholar

Ramscar, Michael, Melody Dye & Joseph Klein. 2013. Children value informativity over logic in word learning. Psychological Science 24. 1017–1023. https://doi.org/10.1177/0956797612460691.Search in Google Scholar

Ramscar, Michael, Melody Dye & Stewart McCauley. 2013. Error and expectation in language learning: The curious absence of “mouses” in adult speech. Language 89. 760–793. https://doi.org/10.1353/lan.2013.0068.Search in Google Scholar

Ramscar, Michael & Daniel Yarlett. 2007. Linguistic self-correction in the absence of feedback: A new approach to the logical problem of language acquisition. Cognitive Science 31(6). 927–960. https://doi.org/10.1080/03640210701703576.Search in Google Scholar

Rescorla, Robert. 1998. Instrumental learning: Nature and persistence. In M. Sabourin (ed.), Advances in psychological science: Vol.2. Biological and cognition aspects, 239–257. Hove, England: Psychological Press.Search in Google Scholar

Rescorla, R. A. & Allan Wagner. 1972. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Abraham H. Black & William F. Prokasy (eds.), Classical conditioning II, 64–99. New York: Appleton-Century-Crofts.Search in Google Scholar

Revelle, William. 2018. Psych: Procedures for personality and psychological research. Evanston, Illinois, USA: Northwestern University. https://CRAN.R-project.org/package=psych Version=1.8.3.Search in Google Scholar

Shi, Dingxu. 2000. Topic and topic-comment constructions in Mandarin Chinese. Language 76(2). 383–408. https://doi.org/10.1353/lan.2000.0070.Search in Google Scholar

Stefanowitsch, Anatol & Stefan Gries. 2003. Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8(2). 209–243. https://doi.org/10.1075/ijcl.8.2.03ste.Search in Google Scholar

Tang, Sze-Wing. 2004. Three syntactic issues of Chinese passives. Available at: http://www.swtang.net/doc/paper_2006_passive.pdf.Search in Google Scholar

Tompson, Sandra A. 1973. Transitivity and some problems with the BA construction in Mandarin Chinese. Journal of Chinese Linguistics 1(2). 208–221.Search in Google Scholar

Venables, William & Brian Ripley. 2013. Modern applied statistics with S-PLUS. New York: Springer Science & Business Media.Search in Google Scholar

Vlach, Haley & Chuch Kalish. 2014. Temporal dynamics of categorization: Forgetting as the basis of abstraction and generalization. Frontiers in Psychology 5. 1021. https://doi.org/10.3389/fpsyg.2014.01021.Search in Google Scholar

Westfall, Jacob & Tal Yarkoni. 2016. Statistically controlling for confounding constructs is harder than you think. PloS One 11(3). e0152719. https://doi.org/10.1371/journal.pone.0152719.Search in Google Scholar

Zhang, Bojiang. 2001. The symmetry and asymmetry in bei and ba construcitons. Zhongguo Yuwen 6. 519–524.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/cog-2019-0100).

Received: 2019-10-24

Accepted: 2021-03-26

Published Online: 2021-04-16

Published in Print: 2021-09-27

This work is licensed under the Creative Commons Attribution 4.0 International License.

Balancing information-structure and semantic constraints on construction choice: building a computational model of passive and passive-like constructions in Mandarin Chinese

Abstract

1 Introduction

1.1 Information structure

1.2 Construction semantics

1.3 Crosslinguistic construction meanings?

1.4 Balancing information-structure and semantic constraints on construction choice in Mandarin Chinese

2 The present study

2.1 Study 1: grammaticality judgments and semantic ratings

2.1.1 Study 1: method

2.1.1.1 Participants

2.1.1.2 Semantic Rating task

2.1.1.3 Grammaticality judgment test

2.1.1.4 Corpus counts

2.1.2 Study 1: results

2.1.2.1 Main (semantics-only) model

2.1.2.2 Semantics + frequency model

2.1.3 Study 1: discussion

2.2 Study 2: building a computational model

2.2.1 Study 2: method

2.2.2 Experimenting with hyperparameters

2.2.2.1 Weight-decay

2.2.2.2 Hidden units

2.2.3 Exploring the main model

2.2.4 Semantics versus statistics

2.2.4.1 Statistics-only

2.2.4.2 Semantics-only

2.2.5 Study 2: discussion

3 General discussion

Acknowledgements

References

Supplementary Material

Journal and Issue

Articles in the same Issue