Still no lie detector for language models: probing empirical and conceptual roadblocks

Philosophical Studies:1-27 (forthcoming)
  Copy   BIBTEX

Abstract

We consider the questions of whether or not large language models (LLMs) have beliefs, and, if they do, how we might measure them. First, we consider whether or not we should expect LLMs to have something like beliefs in the first place. We consider some recent arguments aiming to show that LLMs cannot have beliefs. We show that these arguments are misguided. We provide a more productive framing of questions surrounding the status of beliefs in LLMs, and highlight the empirical nature of the problem. With this lesson in hand, we evaluate two existing approaches for measuring the beliefs of LLMs, one due to Azaria and Mitchell (The internal state of an llm knows when its lying, 2023) and the other to Burns et al. (Discovering latent knowledge in language models without supervision, 2022). Moving from the armchair to the desk chair, we provide empirical results that show that these methods fail to generalize in very basic ways. We then argue that, even if LLMs have beliefs, these methods are unlikely to be successful for conceptual reasons. Thus, there is still no lie-detector for LLMs. We conclude by suggesting some concrete paths for future work.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,897

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Correction: Ordinary Language and Absolute Certainty.[author unknown] - 1950 - Philosophical Studies 1 (3):48-48.
Corrigendum: On the Impossibility of Any Future Metaphysics.[author unknown] - 1961 - Philosophical Studies 12 (3):48-48.
Empirical significance and relevance.Marcelo Pascal - 1971 - Philosophia 1 (1-2):81-106.
On Sturgeon’s “The rational Mind”. [REVIEW]Juan Comesaña - 2023 - Philosophical Studies 180 (10):3205-3213.
Correction to: Embodied mind sparsism.Stuart Clint Dowland - 2022 - Philosophical Studies 180 (2):701-701.
Addendum.[author unknown] - 1979 - Philosophical Studies 36 (4):433-433.
Erratum.[author unknown] - 2006 - Philosophical Studies 131 (3):775-775.
Introduction.[author unknown] - 1993 - Philosophical Studies 71 (2):113-118.
Errata.[author unknown] - 1962 - Philosophical Studies 13 (6):96-96.
Note.[author unknown] - 1973 - Philosophical Studies 24 (1):65-65.
Corrigendum.[author unknown] - 1960 - Philosophical Studies 11 (6):96-96.
Editorial.[author unknown] - 2000 - Philosophical Studies 99 (1):1-2.
Moral intuitions and justification in ethics.Stefan Sencerz - 1986 - Philosophical Studies 50 (1):77 - 95.
Volume Information.[author unknown] - 2001 - Philosophical Studies 102 (3):360-360.

Analytics

Added to PP
2024-02-19

Downloads
49 (#324,689)

6 months
49 (#88,675)

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Ben Levinstein
University of Illinois, Urbana-Champaign

Citations of this work

No citations found.

Add more citations

References found in this work

Lockeans Maximize Expected Accuracy.Kevin Dorst - 2019 - Mind 128 (509):175-211.
Word and Object.Willard Van Orman Quine - 1960 - Les Etudes Philosophiques 17 (2):278-279.
The Foundations of Statistics.Leonard J. Savage - 1956 - Philosophy of Science 23 (2):166-166.
A nonpragmatic vindication of probabilism.James M. Joyce - 1998 - Philosophy of Science 65 (4):575-603.

View all 13 references / Add more references