Skip to main content
Log in

The Epistemology of Non-distributive Profiles

  • Research Article
  • Published:
Philosophy & Technology Aims and scope Submit manuscript

Abstract

The distinction between distributive and non-distributive profiles figures prominently in current evaluations of the ethical and epistemological risks that are associated with automated profiling practices. The diagnosis that non-distributive profiles may coincidentally situate an individual in the wrong category is often perceived as the central shortcoming of such profiles. According to this diagnosis, most risks can be retraced to the use of non-universal generalisations and various other statistical associations. This article develops a top-down analysis of non-distributive profiles in which this fallibility of non-distributive profiles is no longer central. Instead, it focuses on how profiling creates various asymmetries between an individual data-subject and a profiler. The emergence of informational, interest, and perspectival asymmetries between data-subject and profiler explains how non-distributive profiles weaken the epistemic position of a profiled individual. This alternative analysis provides a more balanced assessment of the epistemic risks associated with non-distributive profiles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Several imagined examples from the literature, such as the use of the ad hoc group of ‘dog owners living in Wales aged 38–40 that exercise regularly’ (Mittelstadt 2017, sec. 2) raise this type of worry.

  2. The view that algorithms pose epistemological as well as ethical challenges (Mittelstadt et al., 2016) applies here as well.

  3. See the first part of Hildebrandt and Gutwirth (2008).

  4. This view is not original (fn. 10). Schauer (2003), for instance, argues that our reluctance to base decisions on sound, yet non-universal generalizations is often misguided.

  5. We do not have all the relevant information, we do not even know which of the available information is actually relevant, and all the relevant information might not even be available in principle.

  6. This distinction is commonly related to the distinction between deductive and inductive inference. This view has been criticized by Harman and Kulkarni (2007). Here, we use the terms ampliative and defeasible to characterise the inferential processes that are used to come up with profiles, because this directly singles out the features of profiling that commentators tend to object to.

  7. The reply by Nabeth to Hildebrandt (2008b) hints at a similar point by linking non-distributive profiles to non-monotonic logic, but does not further elaborate on the issue and does not clearly distinguish between non-monotonicity, ampliativity and probabilistic inferences: ‘It is of particular importance to identify the nondistributive nature of the knowledge (also known as non-monotonic logic in artificial intelligence), since it can be at the origin of errors in segregating people due to the only probabilistic nature of some characteristics.’

  8. Here and below, I use ‘sceptical’ in its strict epistemological sense to refer to views that imply that we cannot claim to know or even claim to be justified in our beliefs if we are not absolutely certain.

  9. In the former case, the argument suggests that because existing conceptions of informational privacy are inadequate (privacy as a mechanism for hiding, Schermer (2011, 49)), we should look elsewhere for regulatory guidance. In the latter case, the argument suggests that existing conceptions of informational privacy need to be updated.

  10. Schauer (2003) has developed a detailed defence of the use of non-universal generalities (based on non-spurious statistical correlations, general rules or crisp decision boundaries such as age-limits). His argument is based on the identification of two flaws in how we usually dismiss decisions based on non-universal generalisations: we confuse moral objections with epistemic shortcomings, and we place too much trust in alternative forms of justification based on seemingly more direct or more specific evidence such as direct observation.

  11. Do note that even in such a simple example, patterns can be discerned: In the present dataset, 11 occurs more than 10, whereas 00 occurs more than 01.

  12. A partition of a set S is a set of subsets {S1,  … , Sn} of S that are mutually exclusive and jointly exhaustive. Every member of S will be a member of exactly one such Si. Most groupings we will consider are partitions.

  13. Distances do not only depend on the available dimensions (and their relative scale—note that in the Figure the x and y axes use different scales), but also on the specific metric one uses. The earlier suggestion that we can just measure the distance with a ruler implies that we use a Euclidean measure, but despite its intuitiveness there is no a priori basis for treating this measure as more authoritative or more objective.

  14. The converse claim could be made as well: we should not think of coarser approaches as always superior; even if such approaches can have the virtue of being more equal or allow for profiles that are less invasive.

  15. Remark that this does not prevent us to criticise the use of certain levels of abstraction. Many legal norms concerning privacy and non-discrimination can be read as prohibiting the use of certain levels of abstraction: some distinctions should remain hidden or should not have an influence.

  16. It is unclear to what extent this view is held. Many critical scholars do appear to object to decision-making processes that do not take all the available (potentially relevant) information into accounts (e.g. Bayamlioğlu and Leenes 2018, 14), but it is not always clear whether the criticism is meant to expose a tension between how Big Data analytics actually work and how they are claimed to work, or whether such views are indicative of a stronger belief in the value of particularism (as opposed by Schauer 2003). Some authors show a clear awareness of how reliable prediction depends on generalisation and the reduction of complexity: ‘what counts as information at one point in time may be noise at another point in time and what counts as noise for one individual (organism) may be information for another. (…) To be able to act in an environment adequate generalization is necessary but the question of which generalization is adequate will depend on the context (and on the organism).’ (Hildebrandt 2008a, 25). Similarly, Zarsky (2016, 127–8) develops a balanced evaluation of the risks and concerns in automated decisions in which non-systematic or disproportionate mistakes can be accepted but continues to rely on an absolute notion of relevance and/or on the primacy of the perspective of the individual for the evaluation of what is really relevant.

  17. Actually, there are many measures, but this is irrelevant at this point. All we need here is that predictive success is used as a criterion, and that how well a method satisfies this criterion can be estimated and quantified in many different ways.

  18. In practice, this process of finding the right level of abstraction does not only depend on the value of k, but also on the dimensions we use, their scaling, possible transformations, metrics and ways of determining which nodes are closest.

  19. See for instance: Hastie et al. (2009 Chapt. 7), Bishop (2006, sec. 1.3).

  20. Normally, this split is stratified: the prevalence of each label in the entire data-set should be preserved in the training and test set.

  21. From this, we should not conclude that actual practices rigorously follow this procedure. In recent interviews, François Chollet (the creator of Keras at Google) has drawn attention to a lack of rigour in AI and ML-research. https://www.datacamp.com/community/blog/interview-francois-chollet, https://www.pyimagesearch.com/2018/07/02/an-interview-with-francois-chollet/

  22. The standard formulations of this problem goes back to Venn and Reichenbach; see Hájek (2007).

  23. Schauer (2003, 206–7) makes an analogous claim when he suggests that rules determine which cases ought to be treated alike in the sense that the rules force unlike cases together.

  24. The determination of which differences and similarities are relevant for a given task is not only a central concern to those who care about the accuracy of predictions. It is at least as much a cornerstone of our thinking about fairness. Dwork et al. (2012), for instance, develop an account of individual fairness that is based on a task-specific, externally imposed similarity-metric for classification tasks. In their proposal, we thus notice that the requirement to treat similar individuals similarly depends on a domain-specific rather than on a domain-agnostic approach to similarity.

  25. This extended use of the method of abstraction is implicitly allowed by its standard presentation (which focuses on observables: entities and their relevant properties), but is even more natural when approached from the perspective of how classifications are developed in Barwise and Seligman‘s (1997) theory of information flow in distributed systems.

  26. In other contexts, we would point out that the new information was inconsistent with some of the background assumptions we relied on to conclude that Tweety can fly, but such considerations can be suppressed here.

  27. Compare this with the reference class problem we touched upon at the end of §2: If we are to decide whether Tweety can fly, should we place Tweety in the reference class of birds, or in the smaller reference class of penguins?

  28. If this was not the case, every succession of true/false or false/true should lead to absolute certainty, but this is not what we see: most splits are followed by further splits.

  29. I will use this as an idealisation or upper-bound of the information a data-subject can rely on when she tries to make sense of the prediction and decisions of a profiler. This information can be the outcome of self-observation (‘Who am I?’) or result from knowledge of the personal data a profiler collected (‘What do you know about me?’).

  30. A mathematically more sophisticated analysis based on games with asymmetric information would provide stronger foundations for this analysis but is beyond the scope of the present contribution.

  31. Such considerations can play a role in at least two ways. They can be used to prefer one model above another, or they can be used to guide decisions based on the prior calculation of class probabilities (Bishop 2006, sec. 1.5).

  32. A comparison with standard accounts of justified belief supports this analysis. If we accept that justification is fallible, we must also accept that justified beliefs can be false. This is usually taken to imply that some false beliefs can be justified but does not imply that a justified belief does not have to be rejected when it turns out to be (probably) false. This is the mirror image of the so-called swamping problem: If we already know that a given belief is true, what do we gain from it being justified as well (Kvanvig 2003 Chapt. 3).

  33. Indeed, it might be more prudent to place less trust in our ability to decide on a case-by-case basis (Dawes et al. 1989; Schauer 2003; Bishop and Trout 2004).

  34. A key requirement in cases where automated decisions are used to regulate people (Hildebrandt 2008c, 2016, 2018). See also Schermer (2011, 47) on how information asymmetries between individuals and government may lead to a loss of autonomy, Bayamlioğlu and Leenes (2018, 204) on how the law functions as a ‘procedural safeguard to discern, foresee, understand and contest decisions,’ and Kerr and Earle (2013) which highlight the role of predictability in contexts where algorithms are used to make pre-emptive predictions and decisions.

  35. Here, I use the terms ‘uncertainty’ and ‘ignorance’ as in Floridi (2015). If we are uncertain, we know the possibilities, but do not know which possibility obtains (questions, but no answers). If, however, we are ignorant, we do not even know the possibilities (no questions).

  36. Note that by decidable, we mean decidable by the relevant individual.

  37. But even then, we should keep in mind that decision trees are still a realistic example.

  38. The reasoning is the following: If one can navigate a single decision tree, one can navigate n decision trees. To do so, one needs more time and perseverance, but no additional skills. By contrast, understanding how the data contributes to a given prediction might require different skills depending on whether the decision was reached with a single tree or with many of them. Worries concerning the inscrutability of algorithms are related to the latter gap, but the decidability of profiles is not affected by this worry.

  39. The situation is quite different for profilers, for whom decidable profiles provide a particularly easy path to the creation and use of re-identifiable groups (Floridi 2014, 2): given a way of grouping individuals, one can immediately figure out in which group any previously unseen individual should belong.

  40. There is no practical sense in which a data subject could survey all possible groupings. Humans, and also computers, are rarely in the position to consider all the possibilities, and few forms of supervised learning rely on strategies that require the consideration of all possible groupings.

  41. The role of (pre)processed data in profiling practices, and data science in general, is considerable. Lipton (2016), for instance, argues for the need to consider the impact of learning methods that rely on highly processed features within the discussion on interpretability, and notes that methods that are traditionally considered to be more intelligible like linear regression often rely on extensive pre-processing.

  42. The reader familiar with the method of abstraction might worry at this point. When we shift the range of samples, we seem to be moving our attention to a different system or to a sub-system of the original system. We do not, however, seem to be adopting a different level of abstraction to model the same system. This is a fair, but ultimately misguided objection. I will come back to this issue later in this section.

  43. For a more critical and historically more nuanced appraisal of the reliance on aggregation when it comes to humans, see Hacking (1990 Chapter 19) and Desrosières (2010 Chapter 3).

  44. For the formal counterpart of this idea, see Barwise and Seligman (1997, sec. 4.4) on the type/token-duality.

  45. And in fact, I should then even endorse the stronger position that only a large-enough, unbiased sample of individuals is admissible.

  46. Notably, this worry does not arise when the relevant generalisations are widely known and accepted (Binns, 2018). Even if the generalisations themselves are not universal!

  47. Most researchers do, however, agree that supervised learning (as well as other forms of data mining, and indeed most of experimental science) is primarily concerned with separating the signal from the noise in the data (Hildebrandt (2008a, 17, 26), Woodward (2015), Illari and Russo (2016, sec. 4.1), Bayamlioğlu and Leenes (2018, 298)), and would probably also concede that it can be hard to agree on where this line ought to be drawn.

  48. But see Daston (2004) for a counterpoint.

  49. These considerations appear very natural and convincing, but they are all too easy to discredit. The language we use to characterise the value of properties, such as ‘relevance’ and ‘informativeness,’ has acquired a specific meaning within the sciences. Our most successful mathematical accounts of relevance and information are well-suited to deal with uncertainty, but do not appear to capture the kind of considerations we are after. Shannon information is, for instance, only a fragment of Floridi’s informational map (Floridi 2005 Figure 1). Similarly, Dretske’s attempts to co-opt Shannon’s ideas to develop an information-based semantics and epistemology (Dretske 1999) had to overcome the conceptual gap between Shannon’s focus on infinite series of messages with his own interest in individual messages. As we proceed, we should keep in mind that characterisations of reasonable decisions as decisions that take into account all the relevant considerations do not force us to take the mathematical accounts of relevance for granted.

  50. This is one of the arguments given by Schauer (2003 Chapter 10) in favour of general rules. He explains that while the errors caused by a general rule are often obvious to us, there are many errors of individual judgement we do not notice which are corrected by abiding by a general rule. We give up the very best to also avoid the worst. A similar point is made by Stigler: We cannot select the most reliable measurement, so we should average our measurements (Stigler 2016, 14).

  51. The detailed reconstruction of these arguments on the basis of what we already know about the various ‘levels problems’ in causality and statistics (Russo (2009), Illari and Russo (2014, sec. 5.3)) remains to be written.

References

  • Ananny, M. (2016). Toward an ethics of algorithms: convening, observation, probability, and timeliness. Science, Technology & Human Values, 41(1), 93–117. https://doi.org/10.1177/0162243915606523.

    Article  Google Scholar 

  • Barwise, J., & Seligman, J. (1997). Information Flow: The Logic of Distributed Systems. Vol. 44. Cambridge tracts in theoretical computer science. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Bayamlioğlu, E., & Leenes, .R. (2018). “The ‘rule of law’ implications of data-driven decision-making: a techno-regulatory perspective.” Law, Innovation and Technology, October. Routledge, 1–19. https://doi.org/10.1080/17579961.2018.1527475.

  • Binns, Reuben. 2018. “Algorithmic Accountability and Public Reason.” Philosophy & Technology 31(4): 543–56. https://doi.org/10.1007/s13347-017-0263-5.

  • Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer Verlag.

    Google Scholar 

  • Bishop, M.A., & Trout, J.D.. (2004). Epistemology and the psychology of human judgment. Oxford University Press.

  • Clarke, B., Gillies, D., Illari, P., Russo, F., & Williamson, J. (2013). The evidence that evidence-based medicine omits. Preventive Medicine, 57, 745–747. https://doi.org/10.1016/j.ypmed.2012.10.020.

    Article  Google Scholar 

  • Crawford, K., Miltner, K., & Gray, M. L. (2014). Critiquing big data: politics, ethics, epistemology | special section introduction. International Journal of Communication, 8, 1663–1672.

    Google Scholar 

  • Custers, B. (2003). “Effects of unreliable group profiling by means of data mining.” In International Conference on Discovery Science, 291–96. Springer, Berlin. https://doi.org/10.1007/978-3-540-39644-4_25.

  • Daston, L. (2004). “Whither critical inquiry?” Critical Inquiry 30 (2). JSTOR: 361–64. https://doi.org/10.1086/421133.

  • Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243(4899), 1668–1674.

    Article  Google Scholar 

  • Desrosières, A. (2010). La politique des grands nombres histoire de la raison statistique. 2nd ed. La Découverte.

  • Dretske, F. (1999). Knowledge and the flow of information. The David Hume series of philosophy and cognitive science reissues. Stanford: CSLI.

    Google Scholar 

  • Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). “Fairness through awareness.” In Proceedings of the 3rd innovations in theoretical computer science conference, 214–26. ITCS ‘12. New York, NY, USA: ACM. https://doi.org/10.1145/2090236.2090255.

  • Floridi, L. (2005). “Semantic conceptions of information.” In Stanford encyclopedia of information, edited by Edward N Zalta. Stanford.

  • Floridi, L. (2008). The method of levels of abstraction. Minds and Machines, 18(3), 303–329.

    Article  Google Scholar 

  • Floridi, L. (2011). A defence of constructivism: philosophy as conceptual engineering. Metaphilosophy, 42(3), 282–304. https://doi.org/10.1111/j.1467-9973.2011.01693.x.

    Article  Google Scholar 

  • Floridi, L. (2014). “Open data, data protection, and group privacy.” Philosophy & Technology 27 (1). University of Oxford; Springer Netherlands: 1–3. https://doi.org/10.1007/s13347-014-0157-8.

  • Floridi, L. (2015). “The politics of uncertainty.” Philosophy & Technology 28 (1). University of Oxford; Springer Netherlands: 1–4. https://doi.org/10.1007/s13347-015-0192-0.

  • Gutwirth, S., & Hildebrandt, M. (2010). Data protection in a profiled world. Erasmus University Rotterdam; Springer Netherlands. https://doi.org/10.1007/978-90-481-8865-9_2.

  • Hacking, I. (1990). The taming of chance. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Hájek, A. (2007). “The reference class problem is your problem too.” Synthese 156 (3). Kluwer Academic Publishers: 563–85. https://doi.org/10.1007/s11229-006-9138-5.

  • Harman, G., & Kulkarni, S. (2007). Reliable reasoning: induction and statistical learning theory. MIT Press.

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Data mining, inference, and prediction. 2nd ed. New York: Springer Verlag.

    Google Scholar 

  • Hildebrandt, M. (2006). “Profiling: from data to knowledge.” Datenschutz Und Datensicherheit - DuD 30 (9). Vieweg Verlag: 548–52. https://doi.org/10.1007/s11623-006-0140-3.

  • Hildebrandt, Mireille. 2008a. “Defining profiling: a new type of knowledge?” in Profiling the European Citizen, 17–45. Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-1-4020-6914-7_2.

  • Hildebrandt, M. (2008b). Profiling the European citizen. Erasmus Universiteit Rotterdam; Springer Netherlands. https://doi.org/10.1007/978-1-4020-6914-7_2.

  • Hildebrandt, M. (2008c). “Profiling and the rule of law.” Identity in the Information Society 1 (1). Springer Netherlands: 55–70. https://doi.org/10.1007/s12394-008-0003-1.

  • Hildebrandt, M. (2016). Law as information in the era of data-driven agency. Mod Law Rev Modern Law Review, 79(1), 1–30.

    Article  Google Scholar 

  • Hildebrandt, M. (2018). Law as computation in the era of artificial legal intelligence: speaking law to the power of statistics. University of Toronto Law Journal, 68(supplement 1), 12–35. https://doi.org/10.3138/utlj.2017-0044.

    Article  Google Scholar 

  • Hildebrandt, M, and Serge G. (Eds). (2008). Profiling the European Citizen: Cross-Disciplinary Perspectives. Springer.

  • Illari, P., & Russo, F. (2014). Causality. Philosophical theory meets scientific practice. Oxford: Oxford University Press.

    Google Scholar 

  • Illari, P., & Russo, F. (2016). “Information channels and biomarkers of disease.” Topoi 35 (1). Springer Netherlands: 175–90. https://doi.org/10.1007/s11245-013-9228-1.

  • Kerr, I., & Earle, J. (2013). Prediction, preemption, presumption: how big data threatens big picture privacy. Stanford Law Review, 66, 65–72.

    Google Scholar 

  • Kraemer, F., van Overveld, K., Peterson, M., van Overveld, K., Peterson, M. (2010). “Is there an ethics of algorithms?” Ethics and Information Technology 13 (3). Eindhoven University of Technology; Springer Netherlands: 251–60. https://doi.org/10.1007/s10676-010-9233-7.

  • Kvanvig, J. L. (2003). The value of knowledge and the pursuit of understanding. Cambridge: CUP.

    Book  Google Scholar 

  • Lichman, M. (2013). “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

  • Lipton, Z.C. (2016). “The mythos of model interpretability.” In 2016 Icml workshop in human interpretability in machine learning. Eprint arXiv:1606.03490.

  • McCarthy, J. (1986). Applications of circumscription to formalizing common-sense knowledge. Artificial Intelligence, 28(1), 89–116. https://doi.org/10.1016/0004-3702(86)90032-9.

    Article  Google Scholar 

  • Mittelstadt, B. (2017). “From individual to group privacy in big data analytics.” Philosophy & Technology, February. University of Oxford; Springer Netherlands, 1–20. https://doi.org/10.1007/s13347-017-0253-7.

  • Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter S., & Luciano F. (2016). “The Ethics of Algorithms: Mapping the Debate.” Big Data & Society 3(2): 205395171667967. https://doi.org/10.1177/2053951716679679.

  • Müller, A. C., & Guido, S. (2017). Introduction to machine learning with Python. Sebastopol: O’Reilly Media, Inc..

    Google Scholar 

  • Parkkinen, V.-P., Wallmann, C., Wilde, M., Clarke, B., Illari, P., Kelly, M. P., Norell, C., Russo, F., Shaw, B., & Williamson, J. (2018). Evaluating evidence of mechanisms in medicine. In SpringerBriefs in philosophy. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-94610-8.

    Chapter  Google Scholar 

  • Russo, F. (2009). Causality and causal modelling in the social sciences : measuring variations. Springer Science + Business Media B.V.

  • Schauer, F.F. (2003). Profiles, probabilities, and stereotypes. Belknap Press of Harvard University Press.

  • Schermer, B. W. (2011). The limits of privacy in automated profiling and data mining. Computer Law & Security Review, 27(1), 45–52. https://doi.org/10.1016/j.clsr.2010.11.009.

    Article  Google Scholar 

  • Stigler, S. M. (2016). The seven pillars of statistical wisdom. Cambridge: Harvard University Press.

    Book  Google Scholar 

  • Taylor, L., Floridi, L., & van der Sloot, B. (Eds.). (2017). Group privacy. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-46608-8_2.

    Book  Google Scholar 

  • Vedder, A. (1999). KDD: The challenge to individualism. Ethics and Information Technology, 1(4), 275–281.

    Article  Google Scholar 

  • Wachter, S., & Mittelstadt, B. (2019). “A right to reasonable inferences: re-thinking data protection law in the age of big data and AI.” Columbia Business Law Review forthcoming.

  • Wallmann, C., & Williamson, J. (2017). In G. Hofer-Szabó & L. Wroński (Eds.), Four approaches to the reference class problem (pp. 61–81). Cham: Springer. https://doi.org/10.1007/978-3-319-55486-0_4.

    Chapter  Google Scholar 

  • Woodward, J. (2015). Data, phenomena, signal, and noise. Philosophy of Science, 77(5), 792–803.

    Article  Google Scholar 

  • Zarsky, T. (2014). “Understanding Discrimination in the Scored Society.” Washington Law Review 89(4), 1375–1412

  • Zarsky, T. (2016). The trouble with algorithmic decisions: an analytic road map to examine efficiency and fairness in automated and opaque decision making. Science, Technology & Human Values, 41(1), 118–132. https://doi.org/10.1177/0162243915605575.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Allo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Allo, P. The Epistemology of Non-distributive Profiles. Philos. Technol. 33, 379–409 (2020). https://doi.org/10.1007/s13347-019-00360-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13347-019-00360-z

Keywords

Navigation