1 Introduction: The Ethics of Biomedical Data Analytics

In modern information societies, individuals generate streams of diverse and potentially valuable data. Digital technologies now easily and routinely record data about the behaviours and preferences of individuals at an unprecedented scale. Analytic techniques to make sense of this glut of data have grown in parallel, ushering in what some have called the age of ‘Big Data’.

Data analytics at scale provide huge opportunities to improve private and public life, especially in the health sector. In biomedical research and development, the analysis of large datasets (or ‘Biomedical Big Data’; henceforth BBD) has become a major driver of innovation and success, with partnerships between private data-intensive firms and public health bodies increasingly common (Powles and Hodson 2017). Potentially insightful health-related data can now be generated via social media applications and health platforms (Lupton 2014; Costa 2014), emerging ‘personal health monitoring’ technologies (Mittelstadt et al. 2014), home sensors (Niemeijer et al. 2010) and smart phone applications, and online forums and search queries. These new data sources complement traditional repositories consisting of aggregated clinical trials (Costa 2014), genetic and microbiomic sequencing data (McGuire et al. 2008), biological specimens, electronic health records, and administrative hospital data.

The value of Big Data analytics stems from the seemingly unlimited opportunities now available to link, aggregate, and search across diverse datasets to identify ‘small patterns’ or connections between phenomena and people (Floridi 2012). BBD provides new ways of understanding health and well-being at the level of the individual and society, for example by predicting behaviours, monitoring diseases and outbreaks, and providing risk stratification for individual patients. Possible applications include development of clinically useful predictive models (Choudhury et al. 2014), longitudinal and cross-sectional assessment of the effectiveness and efficiency of interventions and organisations (Tene and Polonetsky 2013), and longitudinal monitoring of chronic conditions and well-being (Boye 2012). Epidemiology (Mittelstadt et al. 2017), infectious disease research, and genomics and genetics (Heitmueller et al. 2014; Kaye et al. 2012) are already deeply affected.

While huge potential exists to advance the diagnosis, treatment, and prevention of diseases as well as foster healthy habits and practices (Costa 2014), the inherent sensitivity of health-related data and the implicit vulnerability of patients (Pellegrino and Thomasma 1993) pose ethical risks which cannot be ignored. The unprecedented volume and variety of data now available to these sectors challenge accepted social, ethical, and professional norms. Further, the growing reliance on algorithms to analyse them and to reach decisions, and the gradual reduction of human oversight over many automatic processes pose pressing issues of fairness, responsibility, and respect of human rights. As is often the case with rapid scientific and technological progress, understanding of these challenges lags behind.

These issues can be addressed successfully. However, if they are overlooked, underestimated or left unresolved, they risk hindering the innovation and progress that BBD can bring to society at large and to future generations. Furthermore, as recent events involving the NHS care.data programme show, BBD projects may face a double bottleneck: ethical mistakes or misunderstandings may lead to social rejection or distorted legislation and policies, which in turn may cripple the acceptance and advancement of data science. Similar to the public debate over genetically modified organisms (Devos et al. 2008), potentially beneficial projects may be put at risk through association with problematic applications.

Ethical foresight, incorporated at all stages of BBD initiatives, can help distinguish the good from the bad. Attention must be paid to known issues with Big Data analytics, covering topics such as informed consent, privacy, confidentiality, diversity, data ownership, digital divides, collective rights, and inclusive governance of research data (Mittelstadt and Floridi 2016; Mittelstadt 2017; Taylor, Floridi, and van der Sloot 2017). Proactive research and governance addressing these issues can help to understand impact, anticipate risks and unethical consequences, suggest early interventions to avoid or mitigate them, foster resilience, reinforce ethical goals and outcomes, and ensure that ethical best practices are developed, implemented, and appreciated.

To contribute to this critical step, this special issue of Philosophy and Technology aims to map new, under-researched but important issues, concepts, and cases that should inform proactive ethical assessment of emerging BBD applications and services. The papers contained within map and critically assess the current and potential ethical challenges facing Big Data in biomedicine.

To begin, the impact of personal health-monitoring devices on user autonomy and agency is examined by two contributions. John Owens and Alan Cribb argue that personal health devices such as the ‘FitBit’, which claim to help users live healthier lives by monitoring behaviour and feeding back information to promote healthy decisions, may instead expose users to risks of anxiety, stigma, and reinforcement of health inequalities. Despite providing potentially useful information, the authors are dubious of the devices’ actual contribution to user autonomy in terms of controlling or improving their health. To do so, they distinguish between procedural (or deliberative) and relational (or action-oriented) notions of autonomy. On the surface, wearable technologies seem to only provide information that could potentially be useful for making decisions about one’s lifestyle, while doing little to enhance actual opportunities to act to improve health.

In contrast, Nils-Frederic Wagner introduces the notion of ‘patiency’ as a correlate to user agency. Health-monitoring devices are often thought to persuade or nudge users paternalistically towards health-promoting behaviours, which would seem to undermine the user’s agency and autonomy. However, by employing the lens of the extended mind and extended will framework, Wagner argues that this portrayal of mHealth is misleading. While mHealth may render the agent passive through the receipt of technological commands, patiency should be viewed not as a foil to agency, but rather as a correlate. From this perspective mHealth can simultaneously promote patiency by nudging behaviour, while also serving as an effective technological tool to enhance user agency.

Concerns about autonomy are also reflected in four contributions looking at the impact of BBD on consent, trust, and data governance across different application areas. J. Patrick Woolley examines the role of trust and justice at a general level in biomedical Big Data analytics. Trust is often cited as a key value in data governance policy and oversight mechanisms, yet is often poorly grounded in a philosophical sense. Woolley argues that this is a key gap in existing scholarship, as different approaches to trust align differently with policy and governance structures. He unpacks how different philosophical notions of trust relate to traditional bioethical concepts and related laws, and their impact on striking a fair balance between individual and group interests in the sharing and re-use of data in BBD.

Chiara Garattini and her co-authors similarly seek to discuss ethical governance of BBD generally through lessons learn from a particular application area: infectious disease. They argue that BBD in infectious disease research and management is marked by new models of data accumulation which introduce four areas of ethical concern: the impact of (1) automation on autonomy, (2) complexity in Big Data analytics on informed consent, (3) profiling on identity and justice, and (4) greater population-level surveillance and interventions on behavioural norms and practices. Given the importance of these types of impact, proactive ethical assessment is urgently needed in infectious disease research and management to ensure responsible development, deployment, and societal acceptance of BBD.

Elvira Perez Vallejos and her co-authors reflect on their experiences with accessing online data from a youth web-counselling service for research. Digital mental health services pose particular ethical challenges for BBD due to the inherent sensitivity of the data in question. When coupled with a vulnerable user population, these challenges become particularly acute. The authors argue that particular attention must be given to the users’ expectations of how their data will be re-used, specifically with regard to the perception of data as public, private, or open. Concrete recommendations for conducting online research involving vulnerable populations are proposed, including the need for a collaborative approach to data governance and access, and explicit opt-in and opt-out recruitment strategies.

Sebastian Schleidgen and his co-authors look at privacy risks in genomics research. Among the myriad data types utilised in BBD, genomics data poses unique risks of re-identification of participants. To help improve informed consent processes, the authors undertook a qualitative focus group in which patients and physicians at the National Center for Tumor Diseases were asked about their assessment of the informational risks of participation in genomics research. The authors concluded that truly informed consent in genomics research requires (1) comprehensive disclosure of informational risks to participants, (2) independent governance entities, and (3) data sharing policies that offer guidance for physicians and researchers.

The final two contributions to the special issue examine the impact of new sources of information on equality and privacy in the delivery of medical care. Kristin Voigt considers how ‘social determinants of health’, which can generate inequalities in health outcomes, should be taken up by primary care providers. Information regarding the health of populations in specific geographical areas can increasingly be built from the ground up in BBD. Applying this population-level information in the care of individual patients can provide greater insight into social determinants of health, but may also pose privacy and equality risks when applied to individuals. Voigt provides a nuanced critique of the relationship between individual care and population-level medical knowledge to ensure BBD is deployed equitably in primary care.

Finally, Michele Loi examines the emergence of the ‘digital phenotype’, an extended human phenotype consisting of digital data that allows for medical conditions to be inferred and predicted (e.g. Tweets, Facebook posts, web search queries). This phenomenon allows for generalizable knowledge to be created from individual records and applied to others perceived to be similar. Loi suggests that ethical obligations are owed to individuals affected by this knowledge, and are not limited to individuals involved or identified in its creation. This philosophical critique of privacy and identity in the age of biomedical Big Data analytics thus stands in contrast to current privacy and data governance policy centred on the notion of ‘personal data’ linked to an identified or identifiable individual.