A Large‐Scale Analysis of Variance in Written Language

Cognitive Science 42 (4):1360-1374 (2018)

Abstract

The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language. The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing.

Download options

PhilArchive



    Upload a copy of this work     Papers currently archived: 72,879

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Analytics

Added to PP
2018-01-22

Downloads
9 (#959,112)

6 months
1 (#386,016)

Historical graph of downloads
How can I increase my downloads?

Similar books and articles

The Language of Social Software.Jan van Eijck - 2010 - Synthese 177 (S1):77 - 96.
Thinking in Words: Language as an Embodied Medium of Thought.Guy Dove - 2014 - Topics in Cognitive Science 6 (3):371-389.