WebSci '21 Proceedings of the 13th Annual ACM Web Science Conference (Companion Volume) (2021)

Trystan S. Goetze
Harvard University
Darren Abramson
Dalhousie University
The use of language models in Web applications and other areas of computing and business have grown significantly over the last five years. One reason for this growth is the improvement in performance of language models on a number of benchmarks — but a side effect of these advances has been the adoption of a “bigger is always better” paradigm when it comes to the size of training, testing, and challenge datasets. Drawing on previous criticisms of this paradigm as applied to large training datasets crawled from pre-existing text on the Web, we extend the critique to challenge datasets custom-created by crowdworkers. We present several sets of criticisms, where ethical and scientific issues in language model research reinforce each other: labour injustices in crowdwork, dataset quality and inscrutability, inequities in the research community, and centralized corporate control of the technology. We also present a new type of tool for researchers to use in examining large datasets when evaluating them for quality.
Keywords natural language processing  artificial intelligence  machine learning methodology  computer ethics  artificial intelligence ethics  crowdsourcing
Categories (categorize this paper)
Edit this record
Mark as duplicate
Export citation
Find it on Scholar
Request removal from index
Revision history

Download options

PhilArchive copy

 PhilArchive page | Other versions
External links

Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
Through your library

References found in this work BETA

The Fate of Knowledge.Helen E. Longino - 2001 - Princeton University Press.
Whose Science? Whose Knowledge?Sandra Harding - 1991 - Cornell University Press.

View all 6 references / Add more references

Citations of this work BETA

No citations found.

Add more citations

Similar books and articles

Machine Ethics.M. Anderson & S. Anderson (eds.) - 2011 - Cambridge Univ. Press.
Ethical Machines?Ariela Tubert - 2018 - Seattle University Law Review 41 (4).
AI Ethics Is Not a Panacea.Stuart McLennan, Meredith M. Lee, Amelia Fiske & Leo Anthony Celi - 2020 - American Journal of Bioethics 20 (11):20-22.
Artificial Intelligence: Machine Translation Accuracy in Translating French-Indonesian Culinary Texts.Hasyim Muhammad - 2021 - International Journal of Advanced Computer Science and Applications 12 (3):186-191.
Form and Content in Semantics.Y. Wilks - 1990 - Synthese 82 (3):329-51.
Ethics of Artificial Intelligence.John-Stewart Gordon, and & Sven Nyholm - 2021 - Internet Encyclopedia of Philosophy.


Added to PP index

Total views
99 ( #111,951 of 2,461,959 )

Recent downloads (6 months)
62 ( #13,082 of 2,461,959 )

How can I increase my downloads?


My notes