In Stefania Nuccorini (ed.), Phrases and Phraseology – Data and Descriptions. Peter Lang Verlag (2002)
In recent decades, the analysis of phraseology has made use of the exploration of large corpora as a source of quantitative information about language. This paper intends to present the main lines of work in progress based on this empirical approach to linguistic analysis. In particular, we focus our attention on some problems relating to the morpho-syntactic annotation of corpora. The CORIS/CODIS corpus of contemporary written Italian, developed at CILTA – University of Bologna (Rossini Favretti 2000; Rossini Favretti, Tamburini, De Santis in press), is a synchronic 100-million-word corpus and is being lemmatised and annotated with part-of-speech (POS) tags, in order to increase the quantity of information and improve data retrieval procedures (Tamburini 2000). The aim of POS tagging is to assign each lexical unit to the appropriate word class. Usually the set of tags is pre-established by the linguist, who uses his/her competence to identify the different word classes. The very first experiments we made revealed how the traditional part-of-speech distinctions in Italian (generally based on morphological and semantic criteria) are often inadequate to represent the syntactic features of words in context. It is worth noting that the uncertainties in categorisation contained in Italian grammars and dictionaries reflect a growing difficulty as they move from fundamental linguistic classes, such as nouns and verbs, to more complex classes, such as adverbs, pronouns, prepositions and conjunctions. This latter class, that groups together elements traditionally used to express connections between sentences, appears inadequate when describing cohesive relations in Italian. This phenomenon actually seems to involve other elements traditionally assigned to different classes, such as adverbs, pronouns and interjections. Recent studies proposed the class of ‘connectives’, grouping all words that, apart from their traditional word class, have the function of connecting phrases and contributing to textual cohesion. From this point of view, conjunctions can be considered as part of phrasal connectives, that can in turn be included in the wider category of textual connectives. The aim of this study is to identify elements that can be included in the class of phrasal connectives, using quantitative methods. According to Shannon and Weaver’s (1949) observation that words are linked by dependent probabilities, corroborated by Halliday’s (1991) argument that the grammatical “system” (in Firth’s sense of the term) is essentially probabilistic, quantitative data are introduced in order to provide evidence of relative frequencies. Section 2 presents a description of word-class categorisation from the point of view of grammars and dictionaries arguing that the traditional category of conjunctions is inadequate for capturing the notion of phrasal connective. Section 3 examines the notion of ‘connective’ and suggests a truth-function interpretation of connective behaviour. Section 4 describes the quantitative methods proposed for analysing the distributional properties of lexical units, and section 5 comments on the results obtained by applying such methods drawing some provisional conclusions.
|Keywords||No keywords specified (fix it)|
|Categories||categorize this paper)|
References found in this work BETA
No references found.
Citations of this work BETA
No citations found.
Similar books and articles
Conceptual and Linguistic Representations of Kinds and Classes.Sandeep Prasada, Laura Hennefield & Daniel Otap - 2012 - Cognitive Science 36 (7):1224-1250.
Linearization-Based Word-Part Ellipsis.Rui P. Chaves - 2008 - Linguistics and Philosophy 31 (3):261-307.
Outline of a Theory of Strongly Semantic Information.Luciano Floridi - 2004 - Minds and Machines 14 (2):197-221.
Identical Twins, Deduction Theorems, and Pattern Functions: Exploring the Implicative BCsK Fragment of S. [REVIEW]Lloyd Humberstone - 2006 - Journal of Philosophical Logic 35 (5):435 - 487.
Identical Twins, Deduction Theorems, and Pattern Functions: Exploring the Implicative BCsK Fragment of S. [REVIEW]Lloyd Humberstone - 2007 - Journal of Philosophical Logic 36 (5):435 - 487.
Variation in the Contextuality of Language: An Empirical Measure. [REVIEW]Francis Heylighen & Jean-Marc Dewaele - 2002 - Foundations of Science 7 (3):293-340.
An Axiomatization of 'Very' Within Systiems of Set Theory.Athanassios Tzouvaras - 2003 - Studia Logica 73 (3):413 - 430.
Conjunction Meets Negation: A Study in Cross-Linguistic Variation.Anna Szabolcsi & Bill Haddican - 2004 - Journal of Semantics 21 (3):219-249.
Beyond Blumer and Symbolic Interactionism: The Qualitative-Quantitative Issue in Social Theory and Methodology.I. Hanzel - 2011 - Philosophy of the Social Sciences 41 (3):303-326.
Added to index2010-12-18
Total downloads180 ( #23,432 of 2,143,564 )
Recent downloads (6 months)39 ( #6,580 of 2,143,564 )
How can I increase my downloads?
There are no threads in this forum
Nothing in this forum yet.