David Bourget (Western Ontario)
David Chalmers (ANU, NYU)
Rafael De Clercq
Jack Alan Reynolds
Learn more about PhilPapers
In Stefania Nuccorini (ed.), Phrases and Phraseology – Data and Descriptions. Peter Lang Verlag (2002)
In recent decades, the analysis of phraseology has made use of the exploration of large corpora as a source of quantitative information about language. This paper intends to present the main lines of work in progress based on this empirical approach to linguistic analysis. In particular, we focus our attention on some problems relating to the morpho-syntactic annotation of corpora. The CORIS/CODIS corpus of contemporary written Italian, developed at CILTA – University of Bologna (Rossini Favretti 2000; Rossini Favretti, Tamburini, De Santis in press), is a synchronic 100-million-word corpus and is being lemmatised and annotated with part-of-speech (POS) tags, in order to increase the quantity of information and improve data retrieval procedures (Tamburini 2000). The aim of POS tagging is to assign each lexical unit to the appropriate word class. Usually the set of tags is pre-established by the linguist, who uses his/her competence to identify the different word classes. The very first experiments we made revealed how the traditional part-of-speech distinctions in Italian (generally based on morphological and semantic criteria) are often inadequate to represent the syntactic features of words in context. It is worth noting that the uncertainties in categorisation contained in Italian grammars and dictionaries reflect a growing difficulty as they move from fundamental linguistic classes, such as nouns and verbs, to more complex classes, such as adverbs, pronouns, prepositions and conjunctions. This latter class, that groups together elements traditionally used to express connections between sentences, appears inadequate when describing cohesive relations in Italian. This phenomenon actually seems to involve other elements traditionally assigned to different classes, such as adverbs, pronouns and interjections. Recent studies proposed the class of ‘connectives’, grouping all words that, apart from their traditional word class, have the function of connecting phrases and contributing to textual cohesion. From this point of view, conjunctions can be considered as part of phrasal connectives, that can in turn be included in the wider category of textual connectives. The aim of this study is to identify elements that can be included in the class of phrasal connectives, using quantitative methods. According to Shannon and Weaver’s (1949) observation that words are linked by dependent probabilities, corroborated by Halliday’s (1991) argument that the grammatical “system” (in Firth’s sense of the term) is essentially probabilistic, quantitative data are introduced in order to provide evidence of relative frequencies. Section 2 presents a description of word-class categorisation from the point of view of grammars and dictionaries arguing that the traditional category of conjunctions is inadequate for capturing the notion of phrasal connective. Section 3 examines the notion of ‘connective’ and suggests a truth-function interpretation of connective behaviour. Section 4 describes the quantitative methods proposed for analysing the distributional properties of lexical units, and section 5 comments on the results obtained by applying such methods drawing some provisional conclusions.
|Keywords||No keywords specified (fix it)|
|Categories||categorize this paper)|
Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
|Through your library|
References found in this work BETA
No references found.
Citations of this work BETA
No citations found.
Similar books and articles
Fabio Del Prete (2008). A Non-Uniform Semantic Analysis of the Italian Temporal Connectives Prima and Dopo. Natural Language Semantics 16 (2):157-203.
Sandeep Prasada, Laura Hennefield & Daniel Otap (2012). Conceptual and Linguistic Representations of Kinds and Classes. Cognitive Science 36 (7):1224-1250.
Rui P. Chaves (2008). Linearization-Based Word-Part Ellipsis. Linguistics and Philosophy 31 (3):261-307.
Luciano Floridi (2004). Outline of a Theory of Strongly Semantic Information. Minds and Machines 14 (2):197-221.
Lloyd Humberstone (2006). Identical Twins, Deduction Theorems, and Pattern Functions: Exploring the Implicative BCsK Fragment of S. [REVIEW] Journal of Philosophical Logic 35 (5):435 - 487.
Lloyd Humberstone (2007). Identical Twins, Deduction Theorems, and Pattern Functions: Exploring the Implicative BCsK Fragment of S. [REVIEW] Journal of Philosophical Logic 36 (5):435 - 487.
Francis Heylighen & Jean-Marc Dewaele (2002). Variation in the Contextuality of Language: An Empirical Measure. [REVIEW] Foundations of Science 7 (3):293-340.
Athanassios Tzouvaras (2003). An Axiomatization of 'Very' Within Systiems of Set Theory. Studia Logica 73 (3):413 - 430.
Claudia Casadio (2007). Applying Pregroups to Italian Statements and Questions. Studia Logica 87 (2-3):253 - 268.
Anna Szabolcsi & Bill Haddican (2004). Conjunction Meets Negation: A Study in Cross-Linguistic Variation. Journal of Semantics 21 (3):219-249.
I. Hanzel (2011). Beyond Blumer and Symbolic Interactionism: The Qualitative-Quantitative Issue in Social Theory and Methodology. Philosophy of the Social Sciences 41 (3):303-326.
Patrícia Amaral & Fabio Del Prete (2010). Approximating the Limit: The Interaction Between Quasi 'Almost' and Some Temporal Connectives in Italian. [REVIEW] Linguistics and Philosophy 33 (2):51-115.
Added to index2010-12-18
Total downloads81 ( #25,662 of 1,699,818 )
Recent downloads (6 months)13 ( #50,123 of 1,699,818 )
How can I increase my downloads?