Informational richness and its impact on algorithmic fairness

Di Bello, Marcello; Gong, Ruobin

doi:10.1007/s11098-023-02004-7

Informational richness and its impact on algorithmic fairness

Published: 12 July 2023

(2023)
Cite this article

Philosophical Studies Aims and scope Submit manuscript

251 Accesses
1 Altmetric
Explore all metrics

Abstract

The literature on algorithmic fairness has examined exogenous sources of biases such as shortcomings in the data and structural injustices in society. It has also examined internal sources of bias as evidenced by a number of impossibility theorems showing that no algorithm can concurrently satisfy multiple criteria of fairness. This paper contributes to the literature stemming from the impossibility theorems by examining how informational richness affects the accuracy and fairness of predictive algorithms. With the aid of a computer simulation, we show that informational richness is the engine that drives improvements in the performance of a predictive algorithm, in terms of both accuracy and fairness. The centrality of informational richness suggests that classification parity, a popular criterion of algorithmic fairness, should be given relatively little weight. But we caution that the centrality of informational richness should be taken with a grain of salt in light of practical limitations, in particular, the so-called bias-variance trade off.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

In AI we trust? Perceptions about automated decision-making by artificial intelligence

Article 01 January 2020

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Notes

There is strong evidence of an association between race and differential treatment by health care providers (McKinlay, 1996; Schulman, Berlin, Harless, Kerner, Sistrunk, Gersh, Dubé, Taleghani, Burke, Williams, Eisenberg, Ayers, and Escarce, 1999; Chen, Rathore, Radford, Wang, and Krumholz, 2001; Petersen, Wright, Peterson, and Daley, 2002) Whether or not these differences are explained by implicit biases is unclear (Dehon, Weiss, Jones, Faulconer, Hinton, and Sterling, 2017). On lending practises, there is a growing body of literature documenting the impact of redlining on economic inequalities today (Aaronson, Faber, Hartley, Mazumder, and Sharkey, 2021; Ladd, 1998). The justice system is filled with racial disparities at different stages (Rehavi and Starr, 2014; Gross, Possley, Otterbourg, Stephens, Paredes, and O’Brien, 2022).
For example, the American Civil Liberty Union of New Jersey argued that the deployment of predictive algorithms in criminal justice can end the unfair system of bail that most disproportionately harms the poor; see https://www.aclu-nj.org/theissues/criminaljustice/pretrial-justice-reform. For a more detailed defense of this claim, see Slobogin (2021). The consulting firm McKinsey estimated that predictive algorithms can save $300 billion every year in U.S. healthcare costs (Manyika, Chui, Brown, Bughin, Dobbs, Roxburgh, , and Byers, 2011). More generally, for the positive impact of big data in health care, see Raghupathi and Raghupathi (2014).
Data can be defective because of their reliance on proxies for example, when arrest data are used as proxies for actual criminal offending (Barabas, Bowers, Buolamwini, Benjamin, Broussard, Constanza-Chock, Crawford, Doyle, Harcourt, Hopkins, Minow, Ochigame, Priyadarshi, Schneier, Selbin, Dinakar, Gebru, Helreich, Ito, O’Neil, Paxson, Richardson, Schultz, and Southerland, 2019) or when healthcare costs are used as proxies for actual medical needs (Obermeyer, Powers, Vogeli, and Mullainathan, 2019). Beside the proxy problem (also known as measurement problem), biases can arise during data collection, for example, when certain groups are under-sampled. For an overview of sources of bias in the data, see, among others, Suresh and Guttag (2021). For an analysis of the implications of biased data from the standpoint of US constitutional law, see Barocas and Selbst (2016).
Along similar lines, Mitchell et al. (2021) draw a distinction between statistical bias (a mismatch between the world and the sample used to train the model) and societal bias (a mismatch between the world as it is and the world as it should be).
Define “structural injustice" as any historically entrenched distribution of goods, benefits, powers and advantages (or their negative correlates) among social groups, where such distribution negatively impact the well-being of specific social groups and not others. See Powers and Faden (2019) and Young (2003).
Deborah Hellman (2023) calls this phenomenon compounding injustice Facts grounded in past injustices are used as the basis for making punitive decisions in the present, thereby compounding the past injustice.
Formally, the algorithm’s prediction should satisfy the following equality between conditional probability statements:
$$\begin{aligned} P\left( Y=1\mid S \ge a,G=g\right) = P\left( Y=1\mid S \ge a,G = g'\right) \quad \forall g\ne g', \end{aligned}$$
where Y is the binary outcome to be predicted (which can take values 1 or 0) and G is the group membership based on a protected classification. The expression $S \ge a$ is the algorithm’s binary prediction of the positive outcome $Y=1$. Predictive algorithms usually make a fine-grained prediction in terms of a risk score S. The greater the score, the greater the probability of the outcome. The algorithm’s binary prediction results by thresholding the risk score at some value a that is considered sufficiently high. Another common criterion of predictive parity is calibration, a more fine-grained version of equal positive predictive value. This measure of fairness is not dependent on a decision threshold. It compares the predictive accuracy of the algorithm across groups for each risk score, not just risk scores above the threshold. A predictive algorithm is relatively calibrated (Chouldechova, 2017; Corbett-Davies and Goel, 2018) if
$$\begin{aligned} P\left( Y=1\mid S, G=g\right) =P\left( Y=1\mid S, G=g'\right) \qquad \forall g\ne g'. \end{aligned}$$
If the risk score further satisfies $P\left( Y=1\mid S,G\right) =S$, we say that it is absolutely calibrated (Kleinberg, Mullainathan, and Raghavan, 2017).
Formally, a predictive algorithm satisfies equal classification accuracy if it has the same false positive rates across groups:
$$\begin{aligned} P\left( S\ge a\mid Y=0,G=g\right) = P\left( S\ge a\mid Y=0,G=g'\right) \quad \forall g\ne g', \end{aligned}$$
as well as the same false negative rates across groups:
$$\begin{aligned} P\left( S\le a\mid Y=1,G=g\right) =P\left( S\le a\mid Y=1,G=g'\right) \quad \forall g\ne g'. \end{aligned}$$
The satisfaction of these conditions depends on a specific risk threshold that is considered high enough to make a positive classification. Balance is another measure of classification parity which, however, does not depend on selecting a specific risk threshold. A predictive algorithm is said to be balanced if it assigns on average the same risk scores for people with the same positive outcome ($Y=1$) or negative outcome ($Y=0$) in each group membership. In terms of expectation, balance can be defined as follows:
$$\begin{aligned} E\left( S \mid Y=y, G=g\right) = E\left( S \mid Y=y\right) \end{aligned}$$
for any group g and outcome $y = 0$ or 1.
In contrast, individual fairness is often understood as equal treatment of similarly situated individuals (Dwork, Hardt, Pitassi, Reingold, and Zemel, 2012; Sharifi-Malvajerdi, Kearns, and Roth, 2019) This conception of algorithmic fairness tracks how an individual is treated relative to others by constructing a counterfactual (Kusner, Loftus, Russell, and Silva, 2018). On the apparent conflict between individual and group fairness, see Binns (2020).
The two most well-known impossibility results are due to Chouldechova (2017) and Kleinberg et al. (2017). An earlier result was proven by Borsboom et al. (2008). There is also a possibility result due to Reich and Vijaykumar (2021) who show that classification parity (specifically, equal false positive and false negative rates across groups) and predictive parity (specifically, calibration) can be concurrently satisfied.
Some claim that different performance criteria of algorithmic fairness embody different moral commitments about what fairness requires. In this sense, the impossibility theorems underscore a conflict between different moral commitments about algorithmic fairness (Heidari, Loi, Gummadi, and Krause, 2019). This interpretation is compatible with our own. Our claim that the impossibility theorems constitute an inner critique underscores the fact that violations of fairness criteria can occur absent exogenous sources of bias in the data or in society. On a more technical level, a popular explanation for why these violations of fairness criteria occur even without exogenous biases appeals to the so-called problem of infra-marginality. As soon as two groups have differences in prevalence—say, differences in criminality, financial stability or health—the shape of the risk distributions of the two groups, as viewed by the predictive algorithm, will be different. This implies, inevitably, that the rate of correct predictions will differ across groups, thus giving raise to violations of one criterion of fairness or another (Corbett-Davies and Goel, 2018).
On trade-offs between different fairness criteria, see Berk et al. (2021) and Lee et al. (2021).
On the contextuality of criteria of algorithmic fairness within a theory of justice that applies to predictions, as opposed to decisions, see Lazar and Stone (2023).
On this more radical approach, see Green (2022).
In philosophy, Brian Hedden (2021) and Robert Long (2021) have provided the most discussed examples.
The idea of conscientiousness has been discussed—under different names—in both the philosophical and computer science literature in different ways. In the philosophical literature, the idea of conscientiousness is closely related to what some call “the right to be treated as an individual.” This right can be understood in an informational sense, roughly as the right to be judged on as much relevant information as what is reasonably available (Lippert-Rasmussen, 2011) Others have emphasized the imperative of avoiding doxastic negligence and collecting more information if appropriate (Zimmermann and Lee-Stronach, 2022). Another, non-informational conceptions of the right to be treated as an individual focuses on the fair allocation of risks and burdens (Castro, 2019; Jorgensen, 2022). In the computer science literature, some have suggested that further screening or collecting more data about select groups can improve the fairness performance of predictive algorithms (Chen, Johansson, and Sontag, 2018; Cai, Gaebler, Garg, and Goel, 2020). We are sympathetic with these approaches, but our analysis differs in two ways. First, we are not advocating that only select groups be subject to further screening or data gathering as this may increase surveillance of already marginalized communities. Second, we are interested in examining how conscientiousness impact the different performance criteria of algorithmic fairness. As we will see, improvements in conscientiousness do not impact all performance criteria of fairness equally (Sect. 3). This observation will then be the basis for an argument against classification parity (Sect. 5).
On the trade-off between accuracy and fairness, see Menon and Williamson (2018). Kearns and Roth (2019) discuss the concept of a Pareto frontier between accuracy and fairness.
The literature on causal criterion of algorithmic fairness has begun to address these questions, see e.g. Chiappa and Gillam (2018).
For one thing, group differences in prevalence—which drive in part violations of predictive and classification parity—are not necessarily due to structural injustice. There exist several layers of inequality that may exist in society. Some inequalities are certainly due to structural, historical injustices and discrimination, but others may be less pernicious and due to differences in preferences or priorities among groups (Lee, Singh, and Floridi, 2021) At the same time, violations of fairness performance criteria could still cause harm even without historical conditions of structural injustice. Consider two communities whose wealth happens to be different, but not for reason of structural injustice. If a community experiences, say, a higher rate of false loan rejections, this difference in the long run may entrench their economic disadvantage. Or suppose the algorithm’s predictive accuracy is worse for one community compared to another. This will have negative reputational costs, for example, if one community is viewed as less capable of repaying loans.
What’s implied of the relationship between Y and $S_\infty$ is also weaker than the equality relationship, but one reasonable requirement is that the idealized risk satisfies absolute calibration: $P( Y = 1 \mid S_{\infty }) = S_{\infty }$, where the probability P reflects the untamed randomness inherent in the outcome Y.
Hedden’s argument does not assume that the people in the two rooms have different base rates. The distribution of their risks is assumed to be different, however. This fact then triggers a violation of the performance criteria of fairness. This is a consequence of the problem of inframarginality; see footnote$^{21}$.
For a more extensive critique, see Vigano’ et al. (2022).
Another scenario in the philosophical literature, due to Robert Long (2021), makes a similar assumption. Suppose you are an undergraduate student in a large course. For the purpose of grading your homework, you could be assigned to section I or section II. Homework is graded exactly in the same way in the two sections, but it just so happens that the base rate of true A papers is higher in section I than section II. If the predictive accuracy of the grades is the same across sections, the rate at which true A papers are correctly graded will differ across the two sections. So there will be a disparity in classification errors across the two sections. But, Long argues, this disparity should not raise fairness concerns. Suppose, for concreteness, that true A papers in section II are incorrectly graded more often than in section I. It would be odd for a student in section II to complain they were unfairly treated because true A papers were incorrectly graded in section II more often. Had the student been in section I, they would have been graded in the same way. They would have gotten the same grade since being in one section or another is irrelevant for how students are graded. The counterfactual hold simply because group membership is causally irrelevant.
For a similar point, see Lazar and Stone (2023).
The R code of the simulation is available with the authors.
Hedden in the coin example assumed that the relationship between idealized individual risk (the objective chance or bias of the coin) and outcome was stochastic. Each person was associated with a biased coin and the probability of the outcome was determined by the bias. But predictive algorithms need not be thought as working that way. The relationship between idealized risk and outcome can also be deterministic. Here we assume that the relationship is deterministic but relax this assumption later in the paper.
For the probit regression model that we examine in this paper, the objective function is simply defined as the data likelihood, rendering maximum likelihood estimation that is guaranteed to consistently and asymptotically efficiently recover the true parameter values in our setting. Other definitions of the objective function, such as those incorporating regularization, may be employed in practice.
To make the simulation more realistic, we vary the composition of group 0 vs group 1 records in the training and the test datasets. In the training set, $60\%$ of the records are from group 0, whereas in the test set, $40\%$ of the records are from group 0. This mimics the possibility that the training and the test sets may over-sample or under-sample some of the groups, so that their sample composition departs from that of the population. Since this variation merely perturbs the group proportion and maintains the ratios between the positive versus negative outcomes within each group identical across the training and the test sets, it does not reflect an outcome-biased sampling scheme and does not constitute an instance of distorted data.
A specification of a loss function is the standard procedure to measure accuracy. The loss function embodies the assessment of closeness between the risk model S and the true outcome Y it is intended to predict. As risk models are often probabilistic in nature, the loss function to examine is an expected predictive loss. Thus, the assessment of closeness are usually defined using the language of expectation:
$$\begin{aligned} {\mathbb {E}}\left( {\mathcal {L}}\left( S_{p}\left( \textbf{X}_{p};{\hat{\theta }}_{p}\right) ,Y\right) \right) , \end{aligned}$$
where the expectation may be taken over many sources of uncertainties. A common choice of the loss function is the squared error loss, ${\mathcal {L}}\left( a, b\right) = \left( a-b\right) ^2$. The squared differences for each prediction are summed and divided by the total number of predictions (or the total number of individuals about whom a prediction is made). This computation gives the average squared error loss. The lower the loss, the more accurate the model. The square error loss is known as the Brier score. It is a strictly proper scoring rule, and is the loss function employed in this paper. There are other choices of loss functions that may be particularly indicative of model performance in different contexts, such as the Area Under Curve (AUC) or the Matthew correlation coefficient.
Recall that true attributes are those that, as a matter of fact, bring about the outcome of interest in the generative (oracle) model.
On the definition of calibration, see footnote 7.
A similar example was given by Corbett-Davies and Goel (2018) in a seminal paper on algorithmic fairness.
Long (2021) makes the point that group differences in false positive rates do not track group differences in the risks (prospects) of error. To make this point, Long relies on a hypothetical case (see footnote 23) in which group membership is causally irrelevant to whatever features are used by the algorithmic to make its predictions. The argument here does not make this assumption. In our simulation study (see Sect. 3), group membership is causally implicated in bringing about some of the predictors used by the predictive algorithm.
For example, consider Northpointe’s answer in (Dieterich, Mendoza, and Brennan, 2016) to ProPublica’s accusation in (Angwin, Larson, Mattu, and Kirchner, 2016) that COMPAS is racially biased. COMPAS is an algorithm used in several jurisdictions in the United States to make predictions about recidivism. Northpointe alleged that, since the prevalence of criminality among black people is higher, false positives will also be higher and false negatives will be lower. Long (2021) makes the same claim with the qualification that it holds if (a) the algorithm meets predictive parity and (b) it applies the same decision threshold for different groups.
For an argument about uneven prospects of mistaken convictions in criminal trial and its implications for fairness, see Di Bello and O’Neil (2020). The argument (roughly) is this. Suppose, for example, there is profile evidence that shows that low socioeconomic status is positively correlated with the crime of drug trafficking. If you are on trial for drug trafficking and are of low socioeconomic status, should this profile evidence be introduced as evidence against you? It would seem unfair to present this evidence against you. One way to make sense of this unfairness is to realize the following fact: if you were innocent, you would be mistakenly convicted with a greater probability than those of higher socioeconomic status against whom the same profile evidence could not be used as incriminating. After all, if the profile evidence were added to other evidence available against you at trial, this addition may tip the balance of the evidence in favor of a conviction. So, in this context, if you were an innocent facing trial, you would be more likely to be mistakenly convicted. The analogy with algorithmic predictions should be clear. They rest on a very sophisticated form of profile evidence that involves multiple correlations between certain features and an outcome of interest.
Another source of uncertainty is modeling uncertainty The model could be mis-specified, in the sense that it does not capture the structure of true data generating process. When this happens, even in absence of informational or data uncertainty, the risk model will fail to approximate perfect accuracy. See Fig. 3 (orange line).
This trade-off between informational and data uncertainty is also known as the bias-variance trade-off (Li and Meng, 2021).
Incidentally, Hedden’s scenario in which each individual is associated with a coin whose bias represents the objective chance for the individual of bringing about the outcome assumes a stochastic relationship between predictive attributes and outcome.
Self-efficiency requires the model’s estimate to be more accurate when computed using the complete dataset. A model is not self-efficient if its estimate achieves a smaller mean squared error when applied to a subset of data selected from the complete data (Meng, 1994) Xie and Meng (2017) discuss cases in which the lack of self-efficiency arises in the context of multi-phase statistical inference.
Same threshold is often taken for granted as a criterion of fairness. For an examination of this criterion of algorithmic fairness, see Johnson King and Babic (2023). On the other hand, Aziz Huq Huq (2019) argues that, in some cases, fairness requires that same threshold be violated. Huq points out that people in minority groups may suffer greater harm as a result of a mistaken algorithmic classification, and when this is the case, the decision threshold should be more stringent for them.

References

Aaronson, D., Faber, J., Hartley, D., Mazumder, B., & Sharkey, P. (2021). The long-run effects of the 1930s holc “redlining’’ maps on place-based measures of economic opportunity and socioeconomic success. Regional Science and Urban Economics, 86, 103622.
Article Google Scholar
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias: There’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Barabas, C., Bowers, J., Buolamwini, J., Benjamin, R., Broussard, M., Constanza-Chock, S., Crawford, K., Doyle, C., Harcourt, B.E., Hopkins, B., Minow, M., Ochigame, R., Priyadarshi, T., Schneier, B., Selbin, J., Dinakar, K., Gebru, T., Helreich, S., Ito, J., O’Neil, C., Paxson, H., Richardson, R., Schultz, J., & Southerland, V.M. (2019). Technical flaws of pretrial risk assessment raise grave concerns.
Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104(3), 671–732.
Google Scholar
Berk, R., Shahin Jabbari, H. H., Kearns, M., & Roth, A. (2021). Fairness in criminal justice risk assessment: The state of the art. Sociological Methods and Research, 50(1), 3–44.
Article Google Scholar
Binns, R. (2020). On the apparent conflict between individual and group fairness. FAT* ’20 (pp. 514–524). Association for Computing Machinery.
Borsboom, D., Romeijn, J. W., & Wicherts, J. M. (2008). Measurement invariance versus selection invariance: Is fair selection possible? Psychological Methods, 13(2), 75–98. https://doi.org/10.1037/1082-989X.13.2.75
Article Google Scholar
Cai, W., Gaebler, J., Garg, N., & Goel, S. (2020). Fair allocation through selective information acquisition. In Proceedings of the AAAI/ACM conference on AI, ethics, and society.
Castro, C. (2019). What’s wrong with machine bias. Ergo, 6(15), 405–426.
Google Scholar
Chen, I., Johansson, F. D., & Sontag, D. (2018). Why is my classifier discriminatory? Advances in Neural Information Processing Systems, 1, 3543–3554.
Google Scholar
Chen, J., Rathore, S. S., Radford, M. J., Wang, Y., & Krumholz, H. M. (2001). Racial differences in the use of cardiac catheterization after acute myocardial infarction. New England Journal of Medicine, 344(19), 1443–1449.
Article Google Scholar
Chiappa, S. & Gillam, T. P. S. (2018). Path-specific counterfactual fairness.
Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data. https://doi.org/10.1089/big.2016.0047
Corbett-Davies, S. & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. Manuscript arXiv preprint arXiv:1808.00023 .
Dehon, E., Weiss, N., Jones, J., Faulconer, W., Hinton, E., & Sterling, S. (2017). A systematic review of the impact of physician implicit racial bias on clinical decision making. Academic Emergency Medicine, 24(8), 895–904.
Article Google Scholar
Di Bello, M., & O’Neil, C. (2020). Profile evidence, fairness, and the risks of mistaken convictions. Ethics, 130(2), 147–178.
Article Google Scholar
Dieterich, W., Mendoza, C., & Brennan, T. (2016). Compas risk scales: Demonstrating accuracy equity and predictive parity performance of the compas risk scales in broward county.
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. In ITCS ’12: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (pp. 214–226).
Green, B. (2022). Escaping the impossibility of fairness: From formal to substantive algorithmic fairness. Philosophy & Technology, 35(90), 1–32.
Google Scholar
Gross, S.R., Possley, M., Otterbourg, K., Stephens, K., Paredes, J.W., & O’Brien, B. (2022). Race and wrongful convictions in the united states (2022).
Hedden, B. (2021). On statistical criteria of algorithmic fairness. Philosophy and Public Affairs, 49(2), 209–231. https://doi.org/10.1111/papa.12189
Article Google Scholar
Heidari, H., Loi, M., Gummadi, K.P., & Krause, A. (2019). A moral framework for understanding fair ML through economic models of equality of opportunity. FAT* ’19, New York, NY, USA (pp. 181–190). Association for Computing Machinery.
Hellman, D. (2023). Big data and compounding injustice. Journal of Moral Philosophy.
Huq, A. Z. (2019). Racial equity in algorithmic criminal justice. Duke Law Journal, 68(6), 1043–1134.
Google Scholar
Johnson King, Z., & Babic, B. (2023). Algorithmic fairness and resentment. Philosophical Studies.
Jorgensen, R. (2022). Algorithms and the individual in criminal law. Canadian Journal of Philosophy, 52(1), 61–77.
Article Google Scholar
Kearns, M., & Roth, A., (2019). The ethical algorithm: The science of socially aware algorithm design illustrated edition. Oxford University Press.
Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). Inherent trade-offs in the fair determination of risk scores. In C. H. Papadimitrou (Ed.), 8th Innovations in theoretical computer science conference (Vol. 43, pp. 43:1–43:23).
Kusner, M.J., Loftus, J.R., Russell, C., & Silva, R. (2018). Counterfactual fairness.
Ladd, H. F. (1998). Evidence on discrimination in mortgage lending. The Journal of Economic Perspectives, 12(2), 41–62.
Article Google Scholar
Lazar, S., & Stone, J. (2023). On the site of predictive justice.
Lee, M. S. A., Singh, J., & Floridi, L. (2021). Formalising trade-offs beyond algorithmic fairness: Lessons from ethical philosophy and welfare economics. AI Ethics, 1, 529–544.
Article Google Scholar
Li, X., & Meng, X. L. (2021). A multi-resolution theory for approximating infinite-p-zero-n: Transitional inference, individualized predictions, and a world without bias-variance tradeoff. Journal of the American Statistical Association, 116(533), 353–367.
Article Google Scholar
Lippert-Rasmussen, K. (2011). We are all different: Statistical discrimination and the right to be treated as an individual. Journal of Ethics, 15(1–2), 47–59.
Article Google Scholar
Long, R. (2021). Fairness in machine learning: Against false positive rate equality as a measure of fairness. Journal of Moral Philosophy, 19(1), 49–78.
Article Google Scholar
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C. & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.
McKinlay, J. B. (1996). Some contributions from the social system to gender inequalities in heart disease. Journal of Health and Social Behavior, 37(1), 1–26.
Article Google Scholar
Meng, X. L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 1, 538–558.
Google Scholar
Menon, A. K., & Williamson, R. C. (2018). The cost of fairness in binary classification. In S. A. Friedler & C. Wilson (Eds.). Proceedings of the 1st Conference on Fairness, Accountability and Transparency, Volume 81 of Proceedings of Machine Learning Research (pp. 107–118). PMLR.
Mitchell, S., Potash, E., Barocas, S., D’Amour, A., & Lum, K. (2021). Algorithmic fairness: Choices, assumptions, and definitions. Annual Review of Statistics and Its Application, 8(1), 141–163. https://doi.org/10.1146/annurev-statistics-042720-125902
Article Google Scholar
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342
Article Google Scholar
Petersen, L.A., Wright, S.M., Peterson, E.D., & Daley, J. (2002). Impact of race on cardiac care and outcomes in veterans with acute myocardial infarction. Medical Care 40(1): I–86–I–96 .
Powers, M., & Faden, R. (2019). Structural Injustice: Power, Advantage, and Human Rights. Oxford University Press.
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems, 2(3), 1–10.
Google Scholar
Rehavi, M. M., & Starr, S. B. (2014). Racial disparity in federal criminal sentences. Journal of Political Economy, 122(6), 1320–1354.
Article Google Scholar
Reich, C.L., & Vijaykumar, S. (2021). A possibility in algorithmic fairness: Can calibration and equal error rates be reconciled? In K. Ligett and S. Gupta (Eds.), 2nd Symposium on Foundations of Responsible Computing, FORC 2021, Volume 192 of LIPIcs (pp. 4:1–4:21).
Schulman, K. A., Berlin, J. A., Harless, W., Kerner, J. F., Sistrunk, S., Gersh, B. J., Dubé, R., Taleghani, C. K., Burke, J. E., Williams, S., Eisenberg, J. M., Ayers, W., & Escarce, J. J. (1999). The effect of race and sex on physicians’ recommendations for cardiac catheterization. New England Journal of Medicine, 340(8), 618–626.
Article Google Scholar
Sharifi-Malvajerdi, S., Kearns, M., & Roth, A. (2019). Average individual fairness: Algorithms, generalization and experiments. In Advances in Neural Information Processing System, 32, 8242–8251.
Google Scholar
Slobogin, C. (2021). Just algorithms: Using science to reduce incarceration and inform a jurisprudence of risk. Cambridge University Press.
Suresh, H. & Guttag, J. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. In Equity and access in algorithms, mechanisms, and optimization, EAAMO ’21. Association for Computing Machinery.
Vigano’, E., Hertweck, C., Heitz, C., & Loi, M. (2022). People are not coins. morally distinct types of predictions necessitate different fairness constraints. In 2022 ACM Conference on fairness, accountability, and transparency (pp. 2293–2301).
Xie, X., & Meng, X. L. (2017). Dissecting multiple imputation from a multi-phase inference perspective: what happens when god’s, imputer’s and analyst’s models are uncongenial? Statistica Sinica, 1, 1485–1545.
Google Scholar
Young, I. M. (2003). Political responsibility and structural injustice. The Lindley Lecture, University of Kansas.
Zimmermann, A., & Lee-Stronach, C. (2022). Proceed with caution. Canadian Journal of Philosophy, 52(1), 6–25.
Article Google Scholar

Download references

Acknowledgements

For comments on earlier drafts, we would like to thank Brian Hedden, Jan-Willem Romeijn, Katie Steele, Colin O’Neil, Boris Babic, Zoë Johnson King, Fabrizio Cariani, three anonymous reviewers of this journal, audiences at the University of Lausanne and the Australian National University. In addition, we would like to thank Claire Benn, Todd Karhur, Seth Lazar and Pamela Robinson for organizing this special issue on Normative Theory and Artificial Intelligence. Finally, we would like to thank Deborah Mayo for organizing a summer school on the philosophy of statistics at Virginia Tech in the summer of 2019. She made the collaboration between a statistician and a philosopher possible.

Author information

Marcello Di Bello and Ruobin Gong have contributed equally to this work.

Authors and Affiliations

School of Historical, Philosophical and Religious Studies, Arizona State University, 975 S. Myrtle Ave, Tempe, AZ, 85287, USA
Marcello Di Bello
Department of Statistics, Rutgers University, 110 Frelinghuysen Road, Hill Center 404, Piscataway, NJ, 08854, USA
Ruobin Gong

Authors

Marcello Di Bello
View author publications
You can also search for this author in PubMed Google Scholar
Ruobin Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcello Di Bello.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Di Bello, M., Gong, R. Informational richness and its impact on algorithmic fairness. Philos Stud (2023). https://doi.org/10.1007/s11098-023-02004-7

Download citation

Accepted: 12 May 2023
Published: 12 July 2023
DOI: https://doi.org/10.1007/s11098-023-02004-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Informational richness and its impact on algorithmic fairness

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

In AI we trust? Perceptions about automated decision-making by artificial intelligence

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Informational richness and its impact on algorithmic fairness

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

In AI we trust? Perceptions about automated decision-making by artificial intelligence

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation