Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora
Cognitive Science 46 (2):e13085 (2022)
Abstract
Applying machine learning algorithms to automatically infer relationships between concepts from large-scale collections of documents presents a unique opportunity to investigate at scale how human semantic knowledge is organized, how people use it to make fundamental judgments (“How similar are cats and bears?”), and how these judgments depend on the features that describe concepts (e.g., size, furriness). However, efforts to date have exhibited a substantial discrepancy between algorithm predictions and human empirical judgments. Here, we introduce a novel approach to generating embeddings for this purpose motivated by the idea that semantic context plays a critical role in human judgment. We leverage this idea by constraining the topic or domain from which documents used for generating embeddings are drawn (e.g., referring to the natural world vs. transportation apparatus). Specifically, we trained state-of-the-art machine learning algorithms using contextually-constrained text corpora (domain-specific subsets of Wikipedia articles, 50+ million words each) and showed that this procedure greatly improved predictions of empirical similarity judgments and feature ratings of contextually relevant concepts. Furthermore, we describe a novel, computationally tractable method for improving predictions of contextually-unconstrained embedding models based on dimensionality reduction of their internal representation to a small number of contextually relevant semantic features. By improving the correspondence between predictions derived automatically by machine learning methods using vast amounts of data and more limited, but direct empirical measurements of human judgments, our approach may help leverage the availability of online corpora to better understand the structure of human semantic representations and how people make judgments based on those.Author's Profile
DOI
10.1111/cogs.13085
My notes
Similar books and articles
Neural Network Machine Translation Method Based on Unsupervised Domain Adaptation.Rui Wang - 2020 - Complexity 2020:1-11.
Comparing Methods for Single Paragraph Similarity Analysis.Benjamin Stone, Simon Dennis & Peter J. Kwantes - 2011 - Topics in Cognitive Science 3 (1):92-122.
Automatic Extraction of Property Norm‐Like Data From Large Text Corpora.Colin Kelly, Barry Devereux & Anna Korhonen - 2014 - Cognitive Science 38 (4):638-682.
Computational Methods to Extract Meaning From Text and Advance Theories of Human Cognition.Danielle S. McNamara - 2011 - Topics in Cognitive Science 3 (1):3-17.
Exploratory analysis of concept and document spaces with connectionist networks.Dieter Merkl, Erich Schweighoffer & Werner Winiwarter - 1999 - Artificial Intelligence and Law 7 (2-3):185-209.
The Large‐Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.Mark Steyvers & Joshua B. Tenenbaum - 2005 - Cognitive Science 29 (1):41-78.
Deep learning in law: early adaptation and legal word embeddings trained on large corpora.Ilias Chalkidis & Dimitrios Kampas - 2019 - Artificial Intelligence and Law 27 (2):171-198.
Deep learning in law: early adaptation and legal word embeddings trained on large corpora.Ilias Chalkidis & Dimitrios Kampas - 2019 - Artificial Intelligence and Law 27 (2):171-198.
Content-Enhanced Network Embedding for Academic Collaborator Recommendation.Jie Chen, Xin Wang, Shu Zhao & Yanping Zhang - 2021 - Complexity 2021:1-12.
Modeling the Structure and Dynamics of Semantic Processing.Armand S. Rotaru, Gabriella Vigliocco & Stefan L. Frank - 2018 - Cognitive Science 42 (8):2890-2917.
A Large‐Scale Analysis of Variance in Written Language.Brendan T. Johns & Randall K. Jamieson - 2018 - Cognitive Science 42 (4):1360-1374.
Aligning Semantic Graphs for Textual Inference and Machine Reading.Marie-Catherine de Marneffe, Trond Grenager, Bill MacCartney, Daniel Cer, Daniel Ramage, Chloe Kiddon & Christopher D. Manning - unknown
Deep learning approach to text analysis for human emotion detection from big data.Jia Guo - 2022 - Journal of Intelligent Systems 31 (1):113-126.
Analytics
Added to PP
2022-02-11
Downloads
8 (#989,686)
6 months
1 (#455,463)
2022-02-11
Downloads
8 (#989,686)
6 months
1 (#455,463)
Historical graph of downloads
Author's Profile
References found in this work
Recognition-by-components: A theory of human image understanding.Irving Biederman - 1987 - Psychological Review 94 (2):115-147.
An Integrative Theory of Prefrontal Cortex Function.Earl K. Miller & Jonathan D. Cohen - 2001 - Annual Review of Neuroscience 24 (1):167-202.
Principles of categorization [Електронний ресурс]/Eleonora Rosch.E. Rosch - 1978 - In Eleanor Rosch & Barbara Lloyd (eds.), Cognition and Categorization. Lawrence Elbaum Associates.