Synthese 200 (3):1-33 (2022)

Christophe Malaterre
Université Du Québec À Montréal (UQAM)
Topic model is a well proven tool to investigate the semantic content of textual corpora. Yet corpora sometimes include texts in several languages, making it impossible to apply language-specific computational approaches over their entire content. This is the problem we encountered when setting to analyze a philosophy of science corpus spanning over eight decades and including original articles in Dutch, German and French, on top of a large majority of articles in English. To circumvent this multilingual problem, we use machine-translation tools to bulk translate non-English documents into English. Though largely imperfect, especially syntactically, these translations nevertheless provide correctly translated terms and preserve the semantic proximity of documents with respect to one another. To assess the quality of this translation step, we develop a “semantic topology preservation test” that relies on estimating the extent to which document-to-document distances have been preserved during translation. We then conduct an LDA topic-model analysis over the entire corpus of translated and English original texts, and compare it to a topic-model done over the English original texts only. We thereby identify the specific contribution of the translated texts. These studies reveal a more complete picture of main topics that can found in the philosophy of science literature, especially during the early days of the discipline when numerous articles were published in languages other than English.
Keywords No keywords specified (fix it)
Categories No categories specified
(categorize this paper)
DOI 10.1007/s11229-022-03722-x
Edit this record
Mark as duplicate
Export citation
Find it on Scholar
Request removal from index
Translate to english
Revision history

Download options

PhilArchive copy

Upload a copy of this paper     Check publisher's policy     Papers currently archived: 72,564
External links

Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
Through your library

References found in this work BETA

How to Do Digital Philosophy of Science.Charles H. Pence & Grant Ramsey - 2018 - Philosophy of Science 85 (5):930-941.
What is This Thing Called Philosophy of Science? A Computational Topic-Modeling Perspective, 1934–2015.Christophe Malaterre, Jean-François Chartier & Davide Pulizzotto - 2019 - Hopos: The Journal of the International Society for the History of Philosophy of Science 9 (2):215-249.

View all 20 references / Add more references

Citations of this work BETA

No citations found.

Add more citations

Similar books and articles

Neural Machine Translation for Indian Languages.Amarnath Pathak & Partha Pakray - 2019 - Journal of Intelligent Systems 28 (3):465-477.
Local Context Selection for Aligning Sentences in Parallel Corpora.Ergun Biçici - 2007 - In D. C. Richardson B. Kokinov (ed.), Modeling and Using Context. Springer. pp. 82--93.


Added to PP index

Total views
2 ( #1,458,134 of 2,533,484 )

Recent downloads (6 months)
2 ( #261,612 of 2,533,484 )

How can I increase my downloads?


My notes