Half a century of bioethics and philosophy of medicine: A topic‐modeling study

Abstract Topic modeling—a text‐mining technique often used to uncover thematic structures in large collections of texts—has been increasingly frequently used in the context of the analysis of scholarly output. In this study, we construct a corpus of 19,488 texts published since 1971 in seven leading journals in the field of bioethics and philosophy of medicine, and we use a machine learning algorithm to identify almost 100 topics representing distinct themes of interest in the field. On the basis of intertopic correlations, we group the content‐based topics into eight clusters, thus providing a novel, fine‐grained intellectual map of bioethics and philosophy of medicine. Moreover, we conduct a number of diachronic analyses, examining how the “prominence” of different topics has changed across time. In this way, we are able to observe the distinct patterns in which bioethics and philosophy of medicine have evolved and changed their focus over the past half a century.

philosophical foundations for many bioethical debates. The inseparability of these two is often highlighted by scholars writing about the history of the intersection of medicine and the humanities. For example, Robert M. Veatch noticed "the gradual convergence of themes in philosophy of science and philosophy of medicine with more specific issues in medical and biomedical ethics." 2 It is also visible in the self-declaration of The Journal of Philosophy and Medicine, the second oldest bioethics journal in the United States, which defines itself as "the flagship scholarly journal in bioethics and the philosophy of medicine." A standard manner in which practitioners of an academic discipline reflect upon the history and development of their own discipline is through "close reading" of selected texts, which is often mediated by their personal experience and academic interests. This approach is present in the classic books about the history of bioethics 3 or important articles that try to identify "the hottest topics" during the development of the field. 4 Here is a typical statement identifying trends in bioethics based on such an approach: "Over the course of the history of bioethics certain topics have moved in and out of fashion: in the 1970s it was euthanasia and abortion, in the 1980s genetics, in the 1990s stem cells and reproductive technologies, and in the 2000s, enhancement and data/tissue storage." 5 However, "close reading" as a way to detect very general trends in the literature is sometimes treated not only as not replicable and suffering from underdetermination by evidence (i.e., different interpretations may easily be drawn on the basis of the same material) but first of all as nontransparent and one that uses arbitrary sampling when working with large literatures. 6 In contrast, the approach that we adopt in this article takes seriously the epistemological question of how one can justify the belief, for example, that the issue of "enhancement and data/tissue storage" dominated the debates of the 2000s. We use a "distant reading" 7 approach based on topic modeling-a computational text-mining technique aimed at discovering hidden thematic compositions in large text corpora. We believe that this technique provides a rigorous tool for understanding the structure of bioethics and philosophy of medicine (as well as their development over the last half a century) represented by the content published by seven leading journals, as identified by experts in the field.
The latent Dirichlet allocation (LDA) algorithm, which we use in this study, identifies "topics," that is, sets of words that tend to be used together across documents in the corpus. 8 Those "topics" are chiefly characterized by the relatively small sets of words most strongly associated with them, and, thus it is typically easy for the researcher to interpret them, that is, to associate "topics" with actual, discrete themes discussed in the analyzed collection of texts. For instance, if the model's output includes a topic characterized by the terms "gene," "therapy," "clone," "disease," and "germline," we can reasonably interpret such a topic as being connected to the classic debate on germline modification and gene therapy.
A topic model is able to provide the exact proportions in which different topics discovered by the model contribute to each document in the corpus. This makes a number of interesting analyses easy to conduct: Which topics are the most prominent in the corpus? Which topics tend to occur together in the same documents? How does the average prominence of a given topic change for documents from different periods? These are the kinds of analyses that make topic modeling so useful in analyzing large bodies of scholarly texts. 9 We assume that this method allows us to uncover the pattern of researchers' interests and the evolution of such interests over time.
Our aim is not to replace close reading, which is so typical for the humanities, but rather to present an instrument useful for researchers that may support human interpretive work "by providing evidence for interpretations in a manner that is not only much more scalable but also less subject to biases that derive from the interpreters' preconceptions." 10 In other words, we are able to analyze the themes that "have moved in and out of fashion" in bioethics and philosophy of medicine more precisely and rigorously than with the help of standard "close reading" methods. Assuming that the prominence of different topics in our corpus is a proxy for the popularity of different Latent Dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022. 9 Areas in which scholarly output has been analyzed using topic-modeling techniques include, but are not limited to, anthropology (Marwick, B. (2013). Discovery of emergent issues and controversies in anthropology using text mining, topic modeling, and social network analysis bioethical themes among researchers, the method used in this paper may be helpful in interpreting the thematic structure of the entire field, the relations between different themes, and its diachronic changes. Moreover, as we observe that some topics are correlated, in the sense of being more frequently present together in the same texts and thus creating interconnected clusters of related topics, we end up drawing a novel, fine-grained yet interesting map of bioethics and philosophy of medicine that readers should inspect in full on their own. 11 Still, in Section 4, we comment in more detail on some of the observed patterns that we think are particularly unexpected or interesting. Before doing so, we present and analyze our main findings.

| Journal selection
Following similar analyses conducted in other areas of philosophy, 12 we aimed to fit a topic model on a corpus of full texts of all articles published in leading journals in the field of bioethics and philosophy of medicine. 13 However, given the number of outlets publishing research in this field, any selection of target journals based on our own judgment would risk representing our personal preferences rather than the actual role played by the given journals. Hence, we chose instead to base the selection on more objective criteria.
To establish the list of the most representative journals, we invited experts in bioethics or philosophy of medicine to conduct a free listing task. A request to provide a list of five key journals in philosophy of medicine and/or bioethics was distributed via the Philos-L mailing list (a large mailing list focused on philosophy-related news) and-after the initial round of data collection-posted on the "Bioethics International" Facebook page and tweeted out by our department's profile. The following criteria were provided to specify what we mean by "key journal in philosophy of medicine and/or bioethics": a) The journal is focused on the general philosophy of medicine and/ or general bioethics rather than a narrower and more specialized subfield.
b) The journal played an important role in shaping the field.
c) The journal publishes important work in the field.
d) The journal is recognized by the community as a key journal in the field.
We received responses from 27 individuals who indicated that they are "teachers and/or researchers in an academic institution." To analyze the free-listing data, we used FLARES. 14 Every expert provided a list of five journals and this resulted in a list of 135 items. Twenty-eight different journals were mentioned. Figure 1 presents the results of the free-listing analysis. The frequency of mentions (solid line) indicates the proportion of experts who mentioned a given journal, with the highest numbers for Bioethics (89%) and the Journal of Medical Ethics (85%). The Smith index (dotted line) is a measure of cultural saliency that combines frequency of mention and rank of citation of items on the lists, that is, how early in the lists a journal tends to be mentioned. 15 On the basis of the free-listing analysis, we included the following seven journals (frequency of mention in brackets): Bioethics (89%); the Journal of Medical Ethics (JME, 85%); the American Journal of Bioethics (AJOB, 52%); Medicine, Health Care and Philosophy (MHCP, 44%); Hastings Center Report (HCR, 33%); the Journal of Medicine and Philosophy (JMP, 30%); and Theoretical Medicine and Bioethics (TMB, 26%). 16 We decided to choose seven journals because of a clear drop in the frequency of mentions and the Smith index between the seventh journal and the eighth journal. 17

| Corpus acquisition and characteristics
Having identified the target journals, we built a complete corpus of texts published in all seven journals. We included regular-length articles but also many types of shorter pieces because of their relative importance in the field of bioethics: open-peer commentaries, replies, letters, book reviews, and so forth. We believe that these types of publications are particularly important in bioethics as it is a practically oriented discipline.
However, we excluded types of documents that would typically lack "substantive" content: tables of contents, issue introductions, corrigenda, lists of referees, book notes, calls for papers, obituaries, and so forth. 18 We also excluded extremely short documents (below 3000 characters) independently of their content. Only the main text of a document was to be included in the corpus. This meant that we aimed to exclude other elements of a document: title, abstract, authors list, reference list, footnotes, endnotes, acknowledgments, and so forth. 11 We use the word "topic" to refer to statistical patterns (revealed by our topic model) of cooccurrence of words across documents in our corpus and the word "cluster" to refer to communities of topics determined by their tendency to be present together in the same documents in our corpus. We italicize the names of the topics and CAPITALIZE the names of the clusters. When we refer not to our corpus but to the discipline itself, we will instead use the terms "theme" and "area [of research]." 12 Malaterre,C.,et al.,op. cit. note 9;Weatherson,op. cit. note 9. 13 This approach implies that all and only texts that were published in the target journals were included in our analysis and each of them is given "equal weight." The impact of any specific text (as measured, e.g., by citation statistics) plays no role in this study. Critically, some of the most influential texts in bioethics are not included in our corpus-simply because they were published in different outlets. However, even if some influential text is not included in our corpus, the ideas and themes it introduced should have later resonated in the target journals, so they are in no way lost.  (1997). Salience counts-and so does accuracy: Correcting and updating a measure for free-list-item salience. Journal of Linguistic Anthropology, 7, 208-209. Sutrop, U. (2001). List task and a cognitive salience index. Field Methods, 13(3), 263-276. 16 For detailed characteristics of selected journals and the exact number of documents each of them contributed to the corpus, consult Online Supplement 1 (https://osf.io/5364e/). 17 The data were highly saturated, indicating a high level of cultural consensus. All the journals from the final top-7 were already mentioned in the lists provided by the first six experts (who, in combination, mentioned 68% of all the journals in the final list). It took 13 experts (<50%) to mention >90% of all the journals in the full list. 18 More detailed lists of inclusion/exclusion criteria for each journal can be found in Online Supplement 2 (https://osf.io/ey6sn/). The resulting corpus consisted of 19,448 documents, with 64,326,072 tokens (words) distributed across them. The average length of the main text of a document in the corpus was 3308 words, following the relative prevalence of shorter texts. The vast majority of documents came from three journals: AJOB (27%), JME (26%), and HCR (18%), with the four remaining journals contributing between 6% and 8% each. The time distribution of documents was also rather skewed, with few texts published in the 1970s (4%), the 1980s (9%), or the 1990s (11%) and the vast majority published in the 2000s (28%) and from 2010 onwards (41%). The number of articles in each journal and the average article word count are plotted in Figure 2.
We built the corpus in April 2021 and fitted the topic model on all eligible texts published as of that date.

| Corpus cleaning and preprocessing
Whenever possible, we scrapped text from the HTML version of a document from the publisher's website. For documents where it was not possible, we downloaded a PDF version of the article and scrapped the text using GROBID software. 19 We used regular expressions to remove inline references, as well as any residual footnotes, lists of references, copyright notes, URL addresses, and so forth, from the corpus. We lower-cased tokens and removed punctuation and numerals. We removed stopwords 20 by using the list of 179 Englishlanguage stopwords from the Python NLTK library. 21 We used the function Phrases() from the Python GenSim library 22 to detect common F I G U R E 1 Frequency of mention and cultural saliency (Smith index) of key journals in philosophy of medicine and/or bioethics in our sample of experts (N = 27). Journals that were mentioned by fewer than three experts are not included in the graph. Stopwords are words that are filtered out from a text before conducting a computational analysis on it. Most of the time, stopwords are words that are so common in the natural language that their presence in a given text does not provide any useful document-specific information. An English-language stopword list typically contains words such as the, you, do, and so forth. | 905 multiword expressions, with the PMI-like scoring 23 of 100 or above, and transformed them into bigrams or trigrams (e.g., as words male and circumcision tended to occur next to each other, each such occurrence was transformed into a single bigram: male_circumcision). Then, we used the spaCy tagger 24 to identify the part of speech of every token to discard all tokens that were not nouns, verbs, adjectives, adverbs, or proper nouns (hence, we removed determiners, prepositions, pronouns, etc.). Finally, we conducted lemmatization 25 using the Python library spaCy. 26 The resulting dictionary consisted of 173,489 terms.
However, the vast majority of those appeared in only a couple of documents, which meant that they would not be very useful in fitting a topic model (or they would even be a source of noise). For this reason, we chose to keep only those token types that appeared in at least 25 documents from the corpus and in no more than 50% of all the documents, which allowed us to substantially reduce the size of the dictionary to 20,005 terms. The distribution of those terms across all 19,448 documents (i.e., the document-term matrix) was the main input to the topic modeling algorithm.

| Topic modeling
To fit the topic model, we used the standard LDA algorithm 27 with Gibbs sampling, as implemented in the Python library lda. 28 LDA is among the oldest and simplest topic modeling algorithms, but it remains the most established and widely used tool in this context. 29 LDA assumes that the analyzed corpus consists of a set of topics, where each topic is a probability distribution over the entire vocabulary used in the corpus. Terms (words and collocations) assigned a high probability within a single topic tend to co-occur in corpus documents more frequently than would happen by chance.
Furthermore, LDA associates each document in the corpus with a distribution over topics, thus showing the proportions of the topical composition of a given document. Crucially, the algorithm searches for distributions that would facilitate two (conflicting) goals: first, that each topic assigns a high probability to just a few terms (reflecting the intuition that distinct themes are characterized by a small set of key words) and, second, that each document assigns high probability to just a few topics (reflecting the intuition that each document engages with a small number of main themes).
LDA is an "unsupervised" algorithm, which means that resulting topics are not in any way guided by researchers' expectations but, instead, are "discovered" by the algorithm itself on the sole basis of the patterns of co-occurrence of terms across documents. Researchers, however, still have some control over the operation of the algorithm, as they have to decide on three hyper-parameters: alpha (which controls prior topic probability distributions over documents), beta (which controls prior term probability distributions over topics), 30 and, most importantly, K (the hypothesized number of topics in the corpus). As for alpha and beta, following some fine-tuning, we chose the rather standard values of 0.01 and 0.31. As for K, we followed a standard practice of manually comparing the output of models with different numbers of topics and picking the one that, according to our judgment, appeared optimal.
Typically, the optimal number of topics is neither too low (as this would result in large, heterogeneous topics that are hard to interpret) nor too high (as this would lead to topics that are overlapping and too numerous to make the entire model comprehensible for humans). In the present context, we fitted 10 models (for K = 30, 40, 50, 60, 70, 80, 90, 100, 110, 120) and decided that the model with 100 topics looked the most promising, with almost all of the resulting topics corresponding to what we find to be distinct themes that are present in the literature. 31

| Interpreting and clustering the topics
A topic resulting from an LDA algorithm is nothing more than a probability distribution over terms (words and collocations), so it does not have any determined meaning-it has yet to be interpreted by researchers. The interpretative task is made easier by the fact that the LDA algorithm guarantees that the probability mass of each such distribution is focused on a relatively small number of highly probable words, so that it is just a small number of such words that characterize a given topic and should suffice for its interpretation.
For example, in the context of topic 87, a glimpse at the set of five most probable terms ("virtue," "action," "character," "pellegrino," "aristotle") was sufficient for us to interpret that topic as referring to virtue ethics. 32 Such interpretations can be guided or corroborated further by examining the documents most strongly associated with a given topic, as just a few documents for which a given topic is the most probable should characterize this topic quite well. 33  Intuitively, a higher alpha results in documents assigning a nonnegligible probability to fewer topics, and a higher beta results in topics assigning nonnegligible probability to fewer terms. 31 The list of 10 terms most strongly associated with each topic can be found in Appendix A, while the list of 10 documents in which each topic is expressed most strongly can be found Another way of analyzing the links between terms and topics is to identify topics with which a given term is most strongly associated. In our context, this can be done with the following R Shiny app: https://bioethics.incet.uj.edu.pl/lexicon 33 The topic distribution for each document in the corpus can be examined with the following In almost every topic model, however, a fraction of the topics resist such an interpretation as they represent stylistic peculiarities or other possibly spurious patterns present across documents. We found three such topics in our model (3; 9; 66) and discarded them from further analysis. 34 Some further topics seemed either to refer to specific types of journal texts (60: Clinical stories; 71: Reviews) or to denote specific methodological approaches used in different contexts (2: Concepts; 12: Qualitative; 24: Quantitative; 49: Moral philosophy); following an earlier study, 35 we call those "framing topics" and treat them separately in most of the analyses to follow. 36 The list of all interpreted topics with a longer descriptive title and the corresponding 10 most likely terms for each topic can be found in Appendix A. 37 This still left us with 91 topics that we interpreted as denoting distinct areas of research present in the target journals. Such a wide class of topics is not particularly manageable; hence, we aimed to reduce its dimensionality by following the procedure used earlier by Malaterre and colleagues. 38 On the basis of document-topic probability distributions, we calculated pairwise intertopic correlations. 39 The resulting correlation matrix was the basis for constructing a graph in which nodes represent topics and edges represent intertopic Pearson's correlation coefficients above 0.05. 40 We ran a series of modularity analyses-that is, we used the community detection method proposed by Blondel and colleagues, as implemented by Bastian and colleagues 41 -until we found a solution whose eight clusters seemed easily interpretable as distinct greater areas of research in bioethics and philosophy of medicine. On the basis of our expert judgment, we chose to manually correct the output of the clustering algorithm in cases where the assignment of a given topic to a given cluster seemed to be based on correlations that we interpreted as accidental. We therefore

| Prominence of topic clusters
We also calculated the joint prominence of topic clusters by adding together the prominence of all individual topics that constitute a given cluster. Clusters range in prominence from 5.25% to 20.37%: PATIENTS AND RESEARCH PARTICIPANTS (20.37%), THEORETICAL BIOETHICS (11.44%), PHYSICIAN AND RESEARCHER (10.67%), INSTITUTIONS 34 The presence of uninterpretable jargon topics can be seen as evidence that the K value selected by the modeler is too large. In our context, however, jargon topics would not disappear even for the lowest tested number of topics (K = 30). Furthermore, the proportion of jargon topics in our preferred model (3 topics out of 100, with the joint prominence of 10.5%) is moderate and appears more than acceptable. 35 Cohen Priva & Austerweil, op. cit. note 9. 36 Although the distinction between jargon and framing topics is not yet a standard one in the literature, we find it useful. Jargon topics, at best, represent some particularities of style in which a text is written, which is not something relevant in a study like this one. Framing topics, on the other hand, represent distinct ways (in the sense of distinct genres or used methods) in which more contentful topics can be "framed" in a given document. For these reasons, we exclude jargon topics from all the analyses to follow but include framing topics whenever we think it could result in some relevant observations. 37 Throughout this paper, we refer to topics using the short labels. Those, however, can be occasionally misleading. For this reason, we encourage readers interested in specific topics to always consult the long label and the list of most likely terms in Appendix A. 38 Malaterre, C., et al., op. cit. note 9. 39 Such a correlation measures the degree to which given two topics are likely to be associated with the same documents. In other words, topics that are correlated have been often expressed in the same articles. 40 We set this threshold around the minimal value that allowed connecting all the topics to the main graph. Setting it at a higher value would allow us to retain only the most notable correlations at the expense of leaving many topics disconnected from the graph. Setting it at a lower value would introduce many small and uninformative correlations and make the graph less readable.
Overall, joint prominence of framing topics is 12.04%.

| Intertopic correlations
A table of the Pearson's correlations between topics (measuring the tendency of a pair of topics to co-occur in the same documents) is provided as a supplement. 43 Here is the list of the 10 most strongly correlated pairs of topics: Consent-Participation (r = 0.19), Phenomenology-Body ownership (0.17), Embryos: identity-Embryos: research F I G U R E 3 Ninety-one content-based topics grouped into eight clusters. Node size reflects a topic's prominence in the corpus, and edge size reflects Pearson's correlation coefficient for a given pair of topics (only correlation coefficients above 0.05 are included in the graph). Gephi's Multigravity ForceAtlas 2 was used for layout rendering.  concept-Concepts (0.14). Because clusters were based on intertopic correlations, it is unsurprising that all these pairs connect topics within the same cluster (except Disease: concept-Concepts, which involves a framing topic Concepts). Overall, positive correlations were more pronounced than negative ones (the strongest negative correlation was Moral philosophy-Clinical stories, r = −0.10).

| Diachronic analysis of topic prominence
We conducted our diachronic analyses for the period from 1976 (the year in which JMP began publication, after HCR and JME, which were already in production) to 2020 (the last year for which we had a complete set of articles published by all seven journals). To focus on long-term trends and avoid noise caused by factors such as the publication of special issues, we divided that 45-year period into nine 5-year periods and calculated each topic's prominence in a respective period (i.e., the mean probability with which an article published in a given 5-year period expressed a given topic). Figure 4 shows diachronic plots of topic prominence for each of the 91 content-based and 6 framing topics, grouped by clusters and, within each cluster, ordered by their overall prominence in the corpus. The area under the curve can be used to visually compare the overall prominence of different topics.
As can be seen from the plots, the chronological development of topic prominence shows various patterns, from a gradual increase in prominence through relative uniformity over the years to a gradual increase. Some topics are suggestive of more complex chronological patterns, involving one or more peaks in prominence. We further explore these patterns in Sections 3.5.1-3.5.3. Figure 5 plots the mean prevalence of each topic cluster across 5-year periods from 1976 to 2020. It also includes a separate plot for joint prominence of the six framing topics. Some topic clusters show a relatively clear growing or contracting pattern. For instance, we can compare the mean prominence of the cluster in the first two periods (1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985) to the mean prominence in the last two periods (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020). EMERGING TOPICS and PATIENTS AND RESEARCH PARTICIPANTS in general seem to be gaining in prominence over time, with the mean prominence in the last two periods being, respectively, 194.13% and 177.13% of that in the first two periods.
The most pronounced relative decline in prominence is that of PHILOSOPHY OF MEDICINE, followed by END OF LIFE, which have contracted by more than a third (to 56.10% and 63.36%, respectively).

| Largest overall increases and decreases
We looked at which topics demonstrated the largest increase and decrease in prominence overall. To do this, we compared the mean prominence of the topic in the first two periods (1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985) to its mean prominence in the last two periods (2011-2020). The 10 largest increases and decreases are presented in Table 1. 44

| Peaks
To identify and analyze any sudden increases in the prominence of topics-suggesting that a given topic had suddenly become notably more prominent-we developed a method of detecting peaks of topic prominence. The prominence of each topic in a given period was divided by the mean prominence of this topic for the two preceding periods (where a "period" is our 5-year bin). We also applied a threshold of 1% prominence in the target period to focus on the most notable increases. Given our definition of a peak and the fact that our diachronic analysis starts in 1976, the first period for which we could identify peaks is [1986][1987][1988][1989][1990]. Table 2 provides a list of the twenty most substantial peaks, each showing at least a 2.25-fold increase in the prominence of a topic.
Because our data set includes bundled texts, most prominently open peer commentaries appended to AJOB's target articles, we wanted to check whether peaks in the full corpus represent robust trends in the discussion rather than artifacts of multiple counting of such bundled short texts. We decided to conduct the same kind of peak analysis on an abridged corpus that contains only relatively long texts (at least 2300 words long). 45 Twelve topics (60%) made it into both top-20 lists (enhanced in bold in Table 2

| Recent trends
The most recent trends are not always clearly visible in the above graphs based on 5-year periods, so we decided to check for the most recent trends in a more fine-grained manner. To identify any topics that are currently enjoying an increase in prominence within the corpus (Table 3), we calculated the average probability for each topic for the last two complete years in our data set (2019-2020), and we divided those values by analogical averages for the preceding 10-year period (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). 46 44 For the full table, see Online Supplement 6 (https://osf.io/9pu5h). 45 The threshold was based on our judgment on what would be the best point separating shorter texts from standard research articles. The threshold of 2300 words allowed us, for example, to exclude from the abridged corpus 98% of documents labeled as "Open Peer Commentaries," while retaining 90% of documents labeled as "Original Paper." The abridged corpus contains 52% of documents from the full corpus and 80% of the overall word count. 46 For the list of recent changes for all topics, see Online Supplement 7 (https://osf.io/jx9dq). | 909 F I G U R E 4 The mean prevalence of topics across 5-year periods from 1976 to 2020. Topics are grouped by clusters and, within each cluster, ordered by their overall prominence in the corpus.

| Contextualism and a document-grounded approach
Before we discuss how one can use our data to interpret the topical structure and diachronic trends in bioethics and philosophy of medicine, we briefly discuss our main theoretical assumptions.
First, the method that we use assumes a contextual approach to meaning because co-occurrences of terms in the same documents are crucial in the assignment of terms to topics.
Moreover, each term is assigned with positive probability to many different topics, which may be interpreted as an ability of the method to capture polysemy or distinguish different uses of the same term on the basis of the context. 47 The same is true about documents-their assignment to different topics may be interpreted as an ability of the model to capture the fact that documents are multithematic. This feature may be explained by means of the example of two topics that, respectively, we have termed (1) Abortion: regulatory (top-10 terms: "abortion," "fetus," "woman," "pregnancy," "fetal," "mother," "birth," "child," "pregnant," "prenatal") and (2) Abortion: philosophy ("kill," "status," "future," "abortion," "fetus," "wrong," "personhood," "morally," "property," "being"). These topics reveal somewhat different contexts in which the term "abortion" is placed.
The first use is more regulatory-oriented, that is, connected with an institutional perspective; the second one is more theoryoriented, that is, connected with a philosophical perspective on the moral (im)permissibility of abortion. This is clearly discernible if we compare the first uses of the word "abortion" in two papers that are most representative for the two respective topics. In the case of Second, the natural consequence of our contextualism is a document-grounded approach. Our preferred way of interpreting the topics, patterns, trends, and peaks in the corpus involves looking at documents themselves at every stage of the decision process that requires human interpretation. Let us again refer to the example with the two "abortion" topics. Comparing the top-10 papers characteristic for these two topics is revealing because they provide evidence in favor of our interpretations of these two topics: In the first case, most documents refer to women's rights, Roe v. Wade, prenatal diagnosis, or the legal liability of physicians. In the second, most of the top papers discuss secular arguments against abortion, such as the "future like ours" argument proposed by Marquis and the substance view that concludes that a human fetus has the same intrinsic value as a typical adult human being.

| The cluster structure of the field
What can researchers specializing in this field learn from a "distantreading" of this large corpus?
First, this method provides a data-driven thematic partition of the field. Any attempt to do so on the basis of close-reading the texts is much more susceptible to the biases of individual scholarsboth because individual researchers cannot realistically read such vast collections of texts and because they cannot be expected to abstain from their own subjective takes on the relative importance T A B L E 1 The biggest overall increases and decreases in topic prominence Note: Overall increase/decrease is defined as the mean prominence of a topic in the last two periods in our data set (2011-2020) divided by the mean prominence of the topic in the first two periods (1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985). EMERGING TOPICS is a cluster that seems to be mostly unified by the novelty and felt urgency of certain challenges-be they Note: Increase is defined as the prominence of a topic in a given 5-year period divided by the mean prominence of the topic for the two preceding periods. A threshold of 1% prominence in a target period is applied. Topics enhanced in bold made it into the top 20 in both full and abridged (texts that contain more than 2300 words) corpora. 50 It is worth mentioning that the overall cluster structure does not obviously map on either the applied/theoretical distinction or the bioethics/philosophy of medicine distinction.
technological or societal. It is the cluster that was characterized by the greatest relative increase in prominence over time. 51 Second, the method allows us to adopt a birds-eye view of the development of the field. Diachronic changes in the relative prominence of both topics and their clusters can be a welcome addition to the more traditional tools used by historians of philosophy. We can learn which topics and topic clusters have gained or lost in relative prominence over time. Locating local peaks in relative prominence can allow for a more focused search for the precise factors that have shaped these discussions. This is particularly interesting to the extent that bioethics is said to develop in reaction to sudden shocks. For this reason, focusing on the most rapid changes in the topical composition can sometimes offer greater insight than studying long-term trends. States as a discipline, and our analysis confirms that the themes important from the U.S. perspective loom large in it.

| The most prominent topics and the strongest correlations
As to the pairs of most strongly correlated topics, they can serve more as a sanity check for the model rather than a source of surprising insights. It is hardly surprising that the topic Consent most typically accompanies Participation and Biobanking or that the topic representing the metaphysical discussion on the beginning of life (Embryos: identity) is strongly associated with Embryos: research.
Perhaps more interesting is the list of most negatively correlated pairs of topics, which turn out to be dominated by framing topics. Here, two patterns can be observed. First, some pairs of framing topics are negatively correlated, suggesting that such perspectives are used very rarely in tandem to analyze an issue in bioethics and philosophy of medicine (Moral philosophy appears to be rarely combined with Note: Recent increase is defined as the mean prominence of a topic in the last two complete years in our data set (2019-2020) divided by the mean prominence of the topic for the preceding 10-year period (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). 51 The interpretation of that cluster as characterized by a sense of novelty is further corroborated by the observation that the lists of the 20 most likely terms for topics from EMERGING TOPICS are on average "younger" than for topics from any other cluster (where a term's "age" is determined by the year in which it first appeared in our corpus).

BYSTRANOWSKI ET AL. | 915
Quantitative or Clinical stories, and the latter also tends to dissociate with Concepts). Second, some framing topics are negatively correlated with some content-based topics, suggesting that the latter are unlikely to be analyzed from a given perspective (and so, e.g., Moral philosophy seems to be rarely used in the context of Hospital, Biopolitics, Law: health, or Screening).

| Diachronic trends
In the following pages, we briefly discuss some of the potential ways in which diachronic trends can be interpreted in two dimensions: (1) overall and recent trends and (2) [1976][1977][1978][1979][1980][1981][1982][1983][1984][1985] in comparison with the last 10 years (2011-2020), "the biggest winners" in terms of relative growth are themes represented by the topics we called Enhancement (mean prominence from 0.03% to 0.97%), Public health emergencies (from 0.04% to 0.96%), and Circumcision (from 0.04% to 0.37%), whereas the greatest losers are History (from 3.26% to 0.46%), Confidentiality (from 4.37% to 0.62%), and Science: philosophy (from 3.11% to 0.63%). 52 The first two winners (in terms of relative growth) are easily interpreted. In particular, taking into account that Enhancement is correlated with Germline and Genetics (see Figure 3), one can observe a broader trend of interest in different ethical, regulatory, and theoretical questions about heritable genome editing. Germline is also among the top recent peaks, but it is also perfectly understandable if one takes into account the recent explosion of interest in the CRISPR/Cas9 method and the He Jiankui scandal. Both of these issues were discussed in an article by Cwik, which is the most characteristic for this trend. 53 The second topic among "the biggest winners" (Public health emergencies) is obviously related to COVID-19, but the recent pandemic is not enough to do justice to the full scale of its growth.
Even after excluding the "pandemic" year 2020 from the analyses (i.e., comparing 1976-1985 with 2011-2019), this topic would still be in the third place, just below Circumcision. It suggests that discussions about healthcare emergencies were steadily growing even before the COVID-19 outbreak, which may be related either to earlier epidemiological crises, such as the 2013-2016 Ebola outbreak, or, more generally, to a growing interest in the bioethical aspects of large-scale catastrophes, such as natural disasters or terrorist attacks.
The third winner, Circumcision, is an interesting case because all top-10 articles characteristic for this topic are about two main issues: either neonatal male circumcision or female genital alteration.
However, five of them were published in the same issue of AJOB (3(2)) and revolve around the target article by Benatar and Benatar. 54 The paper reacted to discussions that were up to date at the time, including the statement issued in 1999 by the American Academy of Pediatrics and other guidelines published by medical societies, that highlighted the alleged health benefits of male circumcision-a view that has come to be seen as increasingly controversial. It seems that the relative growth of popularity of this topic in our corpus after 2000 may stem from the increased interest of bioethicists in the ethical issues around neonatal male circumcision, a practice that in the early days of bioethics and philosophy of medicine was not even considered to be ethically troubling (in contrast with the practices of female genital mutilation).
In the case of the biggest losers, a possible explanation for the relative decline of the topic that we called History may be found in the most characteristic papers for this topic. All of them refer to some classical texts (Gilgamesh) or authors (Socrates, Galen, Stoics, Boethius, Hume, Camus). One might speculate that the fact that the field is becoming more mature provides an explanation: In the early days of bioethics and philosophy of science, authors needed to refer to the classics, whereas now the main discussions concern one another's papers.
The relative decline of Confidentiality may be instructive because it may be interpreted as a sign that bioethics and philosophy of medicine were closer to physicians' professional ethics in their early days and much more focused on physician-patient relations than now (which is further corroborated by the relative decline of Clinical stories and Codes, topics with which Confidentiality is correlated and which are related to the professional role of physicians). In turn, the relative decline of Science: philosophy may reflect a growing separation between bioethics and philosophy of science (this is further corroborated by the relative decline of Diagnosis and Disease: concept, topics with which Science: philosophy is relatively strongly correlated and which are also thematically close to philosophy of science). It seems that most papers characteristic for this topic could also be published in philosophy of science journals. This may be a sign that mainstream bioethics and philosophy of medicine journals are increasingly leaning toward an understanding of their own field as one that is more practice-oriented than theory-based. On the other hand, we can also adopt a bird-eye perspective and search for "the biggest winners" in terms of trends larger than a particular topic. From this perspective, the main winners are a group of topics that we interpreted as the cluster EMERGING TOPICS (covering topics that we called Enhancement, Sport, and Genetic testing, among others) and those areas of bioethics that are centered around an individual who encounters the healthcare system (i.e., the cluster PATIENTS AND RESEARCH PARTICIPANTS), whereas "the The reasons why the cluster END OF LIFE as a whole has relatively shrunk is more difficult to explain. The strongest relative decline is visible in the cases of Hazards (from 0.94% to 0.25%), Law: health (from 4.04% to 1.27%); Omission (from 1.33% to 0.69%), War and prisons (from 1.2% to 0.67%), and Death: life support (from 1.46% to 0.83%). However, taking into account absolute decreases, the relative decline of END OF LIFE is largely driven by Law: health, which shrunk by 2.77 percentage points (the summary effect of the other four fastest shrinking topics was smaller than that of Law: health). Its most characteristic terms (e.g., "law", "rule", "judge", "supreme_court") are typical of papers published in the early days of bioethics, in particular in HCR, and now such issues may have moved to more legally oriented journals. However, the relative decline of the HCR's share in the corpus (in particular, after the launch of AJOB and the growth of JME in the 2000s) does not fully explain the decrease of this particular topic.
The relative decline of Death: life support is also surprising, particularly if one takes into account the fact that a similar topic, that is, Death: euthanasia (from 0.71% to 0.98%) is rather stable (with one important peak that we will discuss below). Anyway, this trend may be a sign that the popularity of one particular theme in discussions about aid in dying may be decreasing: Artificial nutrition and hydration were discussed in almost all top-10 papers in the topic Death: life support. It is also worth noticing one particularly important growth in this cluster, namely, Dementia (from 0.51% to 1.1%). well-known philosophers such as Philippa Foot, Frances M. Kamm, and Warren Quinn. Given the involvement of such prominent philosophers, one might expect sustained influence; yet, it seems that these classic themes are losing their relative popularity in bioethics.

| Limitations
Our approach is mostly data-driven and automatic, but it also includes "manual interventions" at several important junctures.
Primarily, this concerns identifying the corpus and then assigning the number of topics and other parameters of the model, labeling topics, and, finally, clustering them. In particular, there are other possible approaches to constructing the corpus that could result in a different general picture of the field. Instead of delineating the most important bioethics journals, as we did, one could choose to collect the most important articles, defining "importance" as, for example, the most cited articles published in a larger set of journals. Therefore, the picture resulting from our modeling is not self-evident and should not be treated as a ready-made object created independently of any human intervention but rather as a useful tool or a piece of evidence that may help researchers in their own interpretive engagement with the original text materials. 60 We assume that the observed distribution of topics and diachronic trends mirror the changing patterns in research interests, which in turn are reactions to scientific discoveries and break- This type of dynamic is arguably shared with most other academic disciplines: Scholarly interests typically resist rapid changes.
However, what is more characteristic of the analyzed field is the second type of observed changes: sudden, rapid peaks in interest in some themes (in particular, Nudge [2011(in particular, Nudge [ -2015, Embryos: research [2001][2002][2003][2004][2005], Pharma ethics [2001][2002][2003][2004][2005] and philosophy of medicine, starting from the very high-level structure of the field in terms of topic clusters and then zooming into more fine-grained topic distribution and diachronic trends. By providing extensive online supplements, we encourage readers not only to engage in their own interpretations of the present corpus but also to utilize the model in a variety of ways, from more focused historical analyses to teaching. We also hope that this study will motivate further corpus-based research in philosophy in general and in bioethics and philosophy of medicine in particular, reaching into larger and more diverse-in terms of types of texts, historical periods, and languages-corpora. Initiative at Jagiellonian University.

CONFLICT OF INTEREST
The authors declare no conflict of interest.