Introduction

We currently witness converging technological macrotrends—big data, advanced machine learning, and consumer-directed neurotechnological devices—that will likely lead to the collection, storage, and analysis of personal brain data on a large scale. In basic and applied neuroscience, this impending age of “Big Brain Data” may lead to important breakthroughs, particularly for our understanding of the brain’s structure and function, for identifying new biomarkers of brain pathology, as well as for improving the performance of neurotechnological devices (such as brain-computer interfaces, BCIs). But the same technology, when applied in consumer-directed neurotechnological devices, whether for entertainment, the interactive use of web services, or other purposes, may lead to the uncontrolled collection and commodification of neural data that may put vulnerable individuals at risk with respect to the privacy of their brain states.

Big data refers to collecting and storing vast amounts of data, for example from wearable devices (e.g. “fitness trackers”), electronic health records, or our online footprint from using web-based software services. This growing mountain of data, however, would not be of much use was it not for the advanced machine learning algorithms, specifically artificial neural networks (ANN) for “deep learning” and related methods, that are now available for analyzing this data. Most of the personal information that web-based software companies gather today is based on our voluntarily submitting our data—mostly by yielding to convoluted and mostly inscrutable “end-user license agreements” (EULAs). What we not yet have on a large-scale, but what many device and software companies are now actively developing, are consumer-directed wearable devices for recording and uploading our brain activity, mostly based on electroencephalography (EEG) [1, 2]. In combination with other wearable sensors for tracking biometric data, these devices will provide particularly rich multivariate data troves for the “personal sensing” of an individual “physiome”, for the (online) decoding of person’s (neuro)physiological state and behavior [3], and for making predictions on future states or behavior, an application that is studied particularly intensively in the area of mental health [4,5,6,7]. Meanwhile, companies are using powerful algorithms for “deep learning” to create facts on the groundFootnote 1 and invest heavily in leveraging these methods for consumer and health-care applications, especially in basic and clinical neuroscience [9, 10].

This “datafication” [11] across all areas of research and technological development—in which data not only refers to but enacts and guides social life [12]—puts established modes for normative reflection, deliberative value formation, as well as legislative and policy responses under pressure. At the same time, finding sustainable political and legislative responses to this transformation and hedge the relentless stream of highly personalized data against misuse and exploitation is becoming more and more difficult. For one, in order to be able to understand the benefits and risks and to then formulate an adequate regulatory response, lawmakers and politicians (as well as the general public that elects these officials) need to have at least a basic understanding of the complexity of the technologies involved. This is important for governments (or supranational bodies) in order not to succumb to indiscriminate techno-alarmism and pass “laws of fear” [13] that stifle important scientific and technological progress on the one hand, while at the same time not to display a blind techno-enthusiasm that ignores or downplays important risks.

Preferably, a democratic society should provide the necessary space and time for an inclusive and participatory bottom-up deliberative process that involves all stakeholders in the debate on how to regulate and govern the use of personal brain data. In the spirit of such a “reflexive modernization” [14]—that is not taming (human) nature with technology (a defining feature of industrialization and the modern era), but shaping technology through user-centered and value-based design—I will discuss some important ethical and legal ramifications of this profound technological transformation.

Specifically, the aim and scope of this paper is to give a comprehensive overview of (a) big data and machine learning as the driving technologies behind the ‘dataification’ in basic and clinical neuroscience and consumer-directed neurotechnology, (b) some pertinent ethical, legal, social and political challenges that arise from the collection, storage, and analysis of large amounts of brain data from clinical and consumer-directed neurotechnological devices.

Of the many threads and challenges that this emerging techno-social constellation offers, I will focus here on the normative implications, both from an ethical and legal perspective, of big brain data. To this end, I will examine ethical and legal implications in areas in which I believe emerging big data / machine learning applications will have a particularly profound influence. The selection of topics—such as the privacy of brain data, or the problem of bias in machine learning—is therefore motivated mostly by the likely impact of the technological transformation rather than inherent commonalities between these areas of concern (e.g. in terms of ethical theory or political philosophy).

Potential Benefits and Risks of Big Data Analytics in Basic and Clinical Neuroscience

A Brief Introduction to Big Data and Advanced Machine Learning

Before detailing the current use of big data and machine learning in basic and clinical neuroscience, let me first provide some brief definitions of recurring concepts and techniques from computer science:

Artificial intelligence (AI) is a term in computer science and robotics that refers to an embodied (machine/robot) or non-embodied (software program) system that can reason, learn, and plan, and which exhibits behavior which we associate with biological intelligent systems (such as humans) [15].

Big data refers to the collection and/or systematic storage of large amounts of (labeled or unlabeled) data for the purpose of finding hitherto unknown patterns, relationships or other informative features by computational analysis, often involving advanced machine learning algorithms.

Machine learning refers to a programming approach in computer science in which the behavior of a program is not fully determined by the code but can adapt its behavior (i.e. learn) based on the input data (“Learning without being programmed”). The first such program was designed to play the game of checkers (1959) foreshadowing the whirlwind successes of recent deep learning networks in beating humans in games.

Deep learning is a particular variant of machine learning which is often modelled on artificial neural networks (ANN). A typical ANN architecture consists of interconnected nodes – representing artificial neurons – with an input layer, hidden layers and an output layer. In the hidden layers, the data from the input layer undergo linear or nonlinear transformations multiple times (hence “deep”). The power of the ANN for solving data-driven tasks like pattern recognition lies in their ability for reinforcement learning at different levels of abstraction through recurrent modelling. Specific variants of such deep learning architectures, for example convolutional neural networks (ConvNet), have recently been particularly successful in applied machine learning across many research fields (such as neuroscience [16, 17]) and industrial sectors. Historically, many machine learning algorithms were developed to address pattern recognition and classification problems in computer vision and speech recognition. Therefore, detecting features and classes in a large amount of images is still one of the most widely used applications.

Factors Determining the Scope and Limits of Machine Learning Approaches to Data Analysis

Across all of the many different problems, or “use-cases” for which advanced machine learning methods are now employed, we find some commonalities that define the power and limits of these methods:

  • Deep learning works particularly well in data-rich (big data) environments for recognizing patterns and generating predictions, tasks that are generally difficult, very time-consuming or even impossible for humans. Imagine you were asked to differentiate between thousands of animal species by looking at millions of animal images in a short time or learn to play world-class Go by mining databases with millions of recorded games and moves.

  • Scalability, the ability to apply algorithms to very large amounts of data while retaining reasonable computation times and storage requirements, is another important feature of recent advances in applied machine learning. In data-rich environments, scalable machine learning algorithms become ever more accurate and more usable with the increasing data size.Footnote 2 In spite of these impressive achievements, there are still significant challenges and limits for advanced machine learning:

As we have discussed, advanced machine learning is particularly powerful for analyzing large amounts of data. Consequently, in all scenarios in which only few data are available, these methods are substantially less effective. For clinical applications, for instance, rare diseases or rare genotypes would be examples of such data paucity.

One important challenge, therefore, is to devise the right computational model for addressing any particular use-case given the data at hand. This manual tinkering, including also the labeling and/or annotation of data for learning and so-called hyperparameter setting, takes a lot of human resources, knowledge and time and is also error-prone.

The effectiveness of machine learning for data analysis and classification also relies on finding the optimal learning scenario for any given problem. For clinical use-cases, the most effective applications of advanced machine learning have so far relied on a so-called supervised (or semi-supervised) learning scenario and clinical questions related to digitized images. In supervised learning, an algorithm trains with labeled data, for example magnetic resonance images (MRI) of the brain that have been labeled as either normal or abnormal by a radiologist. After learning, the algorithm then analyzes a new data set and can identify abnormal images with high precision.Footnote 3

Finally, most current machine learning programs still have difficulties with transfer learning, applying knowledge extracted from one set of problems to a new challenge [19]. An algorithm that is effective for classifying brain images may not perform particularly well on other types of data.

Benefits in Using Big Data and Advanced Machine Learning in Basic and Clinical Neuroscience

For an informed risk-benefit-analysis, it is important—in my view—to appreciate the actual and potential benefits for patients that big data and advanced machine learning may offer in basic and clinical neuroscience.Footnote 4

While my observations here focus on the area of neuroscience, we should acknowledge that advanced machine learning has revolutionized basic and clinical research across all areas in biomedicine and turbocharged the emerging field of “precision medicine” [20]. To provide just a few recent examples: such algorithms have been shown to achieve dermatologist-level accuracy in classifying skin lesions as cancerous [21], to be able to predict the outcome of antiepileptic drug treatment [22] or to predict the prognosis of small-cell lung cancer from images of pathological tissue samples [23].

For basic and translational neuroscience these methods, particularly deep learning, yield important advances too. Researchers from the University of Freiburg (Germany), for example, used a convolutional neural network (ConvNet) in 2017 for deep learning to decode movement-related information from EEG data [16] and for operating an autonomous robot [24]. ConvNets for deep learning were also successfully used to predict signal processing in primate visual cortex [25] and to predict human brain responses (from functional magnetic resonance imaging, fMRI) to processing natural images [26].

In clinical neuroscience and translational neurotechnology, we see similar advances in leveraging advanced machine learning for diagnostic classification and for predicting disease outcomes or therapeutic responses. The computing revolution mentioned above results in a surge of digital health-related data, from single data points (e.g. lab parameters), to continuous data from monitoring devices over days or weeks (such as continuous ECG monitoring from intensive care or EEG in epilepsy centers) to complete electronic health records (eHR), which can be used for data and text mining [27]. It is therefore not difficult to imagine how these streams of data may inform advanced computational analyses and even outperform human diagnosticians in many medical disciplines.

Most recent efforts in leveraging advanced machine learning in clinical neuroscience have been particularly fruitful in neuroimaging-based diagnosis (and/or prediction) in neurology and psychiatry. In neurology, such approaches have already been able to: detect morphological brain changes typical of Alzheimer’s disease from neuroimaging [28, 29], to predict brain tumor response to chemotherapy from brain images [30], or distinguish typical from atypical Parkinson’s syndromes [31]. In psychiatric research, examples for leveraging machine learning are the prediction of outcomes in psychosis [32], the persistence and severity of depressive symptoms [33], and the prediction of suicidal behavior [7].

Risks of Big Data Analytics and Advanced Machine Learning in Basic and Clinical Neuroscience Research

While acknowledging the many actual and potential benefits of big data analytics with advanced machine learning, it is equally important to discuss some inherent risks of this approach. The sheer scale of the technological transformation discussed here across so many sectors of society naturally invites scrutiny and caution in risk assessment. With an eye on the scope of the paper, however, I will focus on identifiable and concrete risks to individuals, particularly patients and research subjects (rather than transformative effects on society as a whole).

For the time being, I see no immediate risks for the momentary well-being of research subjects during experiments in which big data and advanced machine learning are later used for “offline” analysis of brain data. In such typical research scenarios, data are collected from many individuals, collated on local servers or in cloud-based data repositories, and then analyzed, for example with advanced machine learning algorithms. The collection and storage of neural (and other) personal data does, however, carry certain risks with respect to data privacy which I will discuss next. In subsequent sections, I will then examine, how real-time interaction between users and a neurotechnological device, particularly in closed-loop systems, may affect the autonomy, sense of agency and other aspects of a user’s experience.

Some Ethical and Legal Implications of Big Brain Data

The development described above will quite likely have a transformative effect on research practices in neuroscience as well as clinical neurology and psychiatry. Likewise, consumer-directed neurotechnological devices will create new ways in which users may interact with systems for entertainment, personal computing and mobile devices. Of the many ethical and legal challenges that emerge from this techno-social constellation, I will limit my analysis here to a few issues that I find particularly pressing—fully acknowledging that this selection of topics is neither particularly comprehensive nor representing anything other than my current personal interests.

On the Security of Neurotechnological Devices and the Privacy of Brain Data

First of all, storing an individual’s brain data on local or web-based servers / repositories makes these data vulnerable to unintentional data exposure, intentional data leaks and (cyber) attacks (“hacking”). Furthermore, cross-referencing biometric data with other types of data may allow for the de-anonymizationFootnote 5 of personalized data—i.e. exposing the identity of research subjects or patients. This de-anonymization may then leave individuals vulnerable to identity theft or other criminal acts by third persons (e.g. holding a person ransom by threatening to release potentially damaging information, such as on brain pathology) [34].

While this is a general problem when data records from research participants (or patients) are stored electronically, the highly personalized nature of brain data (see e.g. the possibility for “brain fingerprinting” [35])—much like genomic data—may increase the identifiability of individuals.

Case Example: Deep Learning for Brain-Computer Interfacing in a Severely Paralyzed Patient

For the individual user of a clinical or consumer-directed devices that processes large amounts of neural data, we may discern the following scenario as an example of the importance of privacy of brain data.

In the case of a patient with locked-in syndrome—that is severe paralysis through extensive damage to the brainstem—who uses a spelling system operated by a brain-computer interface that uses deep learning for analyzing her neural data, the following concern might apply:

You may imagine, that if a BCI spelling system was used consistently for some time, it is quite likely that the BCI user conversed with different people at different times; with relatives, nurses, doctors, friends, visitors and others. These different conversations will have been different in topic and their level of intimacy. Perhaps the patient would or would not mind if a mundane conversation with her nurse to adjust the bed was to be read by another person, but she might very well object if an intimate discussion with her husband, for example about her fear of death, would be read by anyone else. Furthermore, as we have discussed above, combining the continuous neural recordings with the spelling content provides a powerful source for unmasking the user’s identity.

Therefore, limiting and securing access to the patient’s data is an important prerequisite for preserving privacy. For the log files, we might ask, for example, whether they should be preserved at all, deleted after a pre-specified time, or remain fleeting—like our spoken conversations usually are? Or should they be recorded permanently but only accessible to the BCI user via a password? What happens when these records may become relevant in a legal context? Imagine that the locked-in patient may no longer be able to use the BCI and a medical emergency occurs, for example a life-threatening pneumonia requiring artificial ventilation in an induced coma. Now, the husband of the patient—having become the legal representative via an advanced directive—wishes no further treatment claiming this to represent the wish of the patient. The doctor however remembers conversing with the patient a couple of weeks before, when she was still able to use the BCI, where she told him to treat any medical emergencies exhaustively. If the case is brought before a judge, will he have the right to subpoena the BCI spelling log files?

Neurohacking and the Emergence of “Neurocrimes”

Another important threat to data privacy and security for the individual patient / user is posed by “neurohacking”.Footnote 6 If the BCI system, in the case presented here, was connected to a web-based cloud server for storing and analyzing the brain recordings, the data could get exposed either intentionally (e.g. a rogue employee of the server company who sells the data), released accidentally or be accessed and/or stolen via hacking into the server.

The feasibility of hacking such active medical devices has already been demonstrated for implantable cardioverter-defibrillator (ICD) systems [36,37,38]—Halperin et al. [38] demonstrate how to use equipment from a general electronics store to remotely hack into a wireless ICD—and it seems likely that BCI systems could be equally vulnerable to electronic attacks.

Such unwarranted access to one’s neural recordings and other types of personalized information (e.g. the spelling logs of the BCI system) would be a valuable data trove for persons with malicious intent. For example, a hacker could use the highly personalized information for holding a person at ransom (threatening release of the personal information) or could disable the BCI and demand a ransom for unlocking the device and/ or its operating software (“ransomware”). In a BCI system that is used for controlling a robotic prosthesis, a hacker could similarly take control of the prosthesis and threaten or cause harm to the user or other persons. Therefore, the safeguarding of these highly personalized biometric (and other) data and ensuring device and system security is an important area of concern and merits an in-depth examination from a legal, forensic and technological perspective to prevent such potential “neurocrimes”.

Technological Barriers and Opportunities for Safeguarding Neurotechnological Devices and Brain Data

Given this importance of safeguarding neurotechnological devices as well as servers and software programs for processing brain data, let us first briefly look at the main technological barriers.

First, the collection, aggregation and (real-time) analysis of large amounts of data requires massive storage and processing units, which today are most often provided by services for server-based cloud computing.Footnote 7 While personal devices can be secured quite effectively against unwarranted access,Footnote 8 cloud-based software repositories are much more difficult to secure. Many of the technology giants that are moving into the consumer neurotechnology market—such as Facebook and Google—have traditionally been software rather than hardware-based enterprises [40]. Therefore, it remains to be seen whether these companies can develop strong safeguards at the hardware level to secure such devices from unwarranted access.

At the software level, Alphabet and other companies are very active in developing new paradigms for securing personal data. One particularly ingenious idea is the concept of federated learning. In federated learning, the algorithm for machine learning with a person’s neural data would operate locally on the neurotechnological device and only share certain, non-personalized, inferences on the data with a central server for further data processing [41]. Such a local encapsulation, when coupled with strong device-level hardware and software security and strong encryption of the transferred data, could make such a system much less vulnerable to device hacking and cyber-attacks. Similarly, other technologies, such as blockchain and “differential privacy”Footnote 9 [42] could be used for the granular auditing and tracking of brain data.

Some observers have noted, however, that it might not be ideal if users will have no other choice in the future, than to leave both their data and the responsibility for data safety in the hands of one company [43]. An alternative could be the creation of so-called data banks, companies that specialize in data security and act as intermediaries for using brain data for research, clinical, or consumer purposes. While this idea merits some further investigation, in my opinion, I would, as others have, worry that such a system could also abet the process of privatization and commodification of health and biometric data [44].

With respect to legislative and regulatory implications, I would argue that legislators should therefore mandate strong security requirements, such as device-level encryption and hardware protection, end-to-end encryption for data transfer, as well as methods for auditing data trails for clinical and consumer-directed neurotechnological devices.

Privacy of Personal Brain Data: Psychological and Social Barriers and Opportunities

In addition to the technical challenges, there are also important psychological and social barriers for safeguarding brain (and other personal) data from unwarranted access.

From the formative, comparatively open years of the early internet to the cordoned off web dominated by oligopoly and “data capitalism” of today, user attitudes towards data privacy as well as the political and legal frameworks for the security of personal data have changed substantially. The current fabric of the web is characterized by a triad of corporatization, commercialization and monopolization for providing content and services in which personal data has become the most important commodity. Surveys in the U.S. suggest that concerns about this commodification of personal data and repeated instances of massive data leaks influence the privacy concerns of internet users. While the main topic of internet users’ concern—the disclosure and trading of personally identifiable information (PII)— has not changed over time (according to a study comparing 2002 and 2008 [45]), the level of concern indeed has risen substantially in this period.

One would think that this gradual “Snowdenization”, the rising awareness and concerns in society about the mass collection, dissemination and misuse of PII, would perhaps have created a fertile ground for a level-headed and evidence-based debate about the future handling of personal brain data. Yet, at the same time, don’t we often wonder why users of online software services seem, on average, to care little about their personal data trails? From an individual psychology point of view, it seems that on social media the actual (or perceived) psychological rewards for using the services often outweigh the possible threats to privacy for the users [46].

Furthermore, the near ubiquitous use of social media for communication may impel vulnerable individuals, such as teenagers or individuals with psychiatric disorders (e.g. social anxiety or depression), to use these services to avoid social exclusion or ostracism and thereby compromise on possible privacy concerns.

Another psychological barrier could be that still today many services have a default opt-in (rather than opt-out) policy concerning the use of personal data. Furthermore, even if there is a default opt-in environment, the EULAs of the web services are often difficult to understand and navigate [47]. Moreover, opting out of data sharing with the service providers may also worsen the usability and consumer experience (or even prevent the usage of these services).

Counteracting these social and psychological pressures would require the restructuring of many basic design and programming features of device- and web-based software services. For a start, we may need to consider to move from an opt-out to an opt-in environment in any context in which sensitive personal information, particularly biodata (and especially brain data), are being transferred. Furthermore, companies could be incentivized (if not obliged) to improve the EULAs of such services, allowing for a granular and transparent consent process for the users. Moreover, in order to move from opaque, black box neurotechnology to transparent systems, users and patients should have the right to know whenever they interact with an intelligent system, who trained the system and what data was used for training.

Transparent EULAs in and of themselves are a necessary but not sufficient step towards improving the consent process for ceding personal brain data, however. This needs to be complemented, in my view, by increasing the average level of basic understanding of the capabilities and limitations of big data and advanced machine learning—“data literacy”— in society. Many educational initiatives in different countries already work toward this goal but I would submit here that the “long view” in shaping future educational policies should include brain data as an emerging (and perhaps special) class of bio(medical) data (and commodity) [48].

Further questions such as whether and to what degree this juridification of the processing of brain data should remain solely in the hands of national governments and/or should also be codified in international treaties and international public law, as well as whether brain data are a different kind of biometric data that may require special “neurorights” [49] exceed the scope of my discussion here, but are certainly important issues for further (comparative) legal scholarship.

Decision-Making and Accountability in Intelligent Closed-Loop Neurotechnological Devices

When humans and intelligent medical devices work in concert—take a closed-loop brain-computer interface that uses deep learning for decoding a user’s EEG data—mutual adaptivity may greatly increase the effectiveness of the intended use, for example by increasing the decoding performance over time. Advanced machine learning, moreover, is also a powerful method for the analysis of brain data, such as EEG or fMRI, in real time (“online”), for example to control of a robot with brain activity [24]. As such closed-loop interaction unfolds in real-time, there is the risk that the output of highly adaptive algorithms for deep learning—which are by their nature evolving and thus unpredictable—may harm participants or patients. In cases in which such intelligent closed-loop devices do not only decode neural data for specific purposes but may also actively interfere with brain states, for example by delivering electric stimulation to the cortex, and the decision if and with what intensity was determined solely by the device, the system gains decision-making capacity. Elsewhere, together with colleagues, I have discussed the problem of an “accountability gap” that may arise in such cases in which an (semi)autonomous intelligent system is granted decision-making capacity based on an evolving and adaptive algorithm [50]. In this paper, we have argued that the regulatory process for approving such closed-loop neurotechnological devices should take these possible effects into account and that further research is absolutely crucial in studying these effects in such devices—whether for medical or non-medical use.

I would add here the importance of developing guidelines and models for promoting a design and development process for neurotechnological devices that is centered on the needs, capabilities and preferences of the intended end-users. To date, most such devices as well as complementary assistive technology, such as robotic systems that may be controlled by a BCI, undergo a top-down design and development process with little input from the end-user perspective.

Possible Neurophenomenological Effects of Closed-Loop Neurotechnological Devices

While it is one thing to employ such closed-loop interaction to optimize the performance of a medical device, for example for regulating seizures in patients with otherwise treatment-resistant epilepsy, using the same capabilities in consumer-directed neurotechnological devices may result in many unintended adverse effects. Especially closed-loop interaction—i.e. changing the parameters of a device based on the real-time sensing of neurophysiological data—may adversely affect the phenomenological experience of individuals, for example by altering the sense of agency, a subject’s sense of authenticity and autonomy, or the self [50,51,52].

Take the simple, non-brain related, example of the now familiar algorithms for “optimizing” web searches or for making recommendations for buying items in online shops. If the algorithm recommends a certain item based on my previous purchases (and other users’ purchases), to what degree does a purchase based on this recommendation reflect a choice based on my preferences (momentary or long-term), and to what degree is the decision shaped by the algorithm?

Similarly, we may examine such effects in (thus far hypothetical) closed-loop consumer-directed neurotechnological devices. Imagine, if you will, a wearable EEG system that is connected to your PC and web-based cloud server, that continuously analyzes your neural data with deep learning and modifies the content of your social media feed (or other software) services adaptively based on this analysis. If the underlying choices (and biases) for classifying your neural data in certain ways that the system makes to “optimize” your user experience remain unknown or opaque, to what degree can you trust the system that the modification of what you experience by the system reflects your true preferences rather than inherent biases in the algorithm (or the data it uses for learning)? If learning occurs not only with a single user’s data but across many users in the cloud, how personalized are the choices the system recommends (or makes) really? Would the co-evolving adaptivity between user and algorithm over time blur the line between the users original and/or genuine mental landscape (preferences, attitudes, opinions, desires and so forth) perhaps even her cognitive abilities on the one side, and the system’s biased inferences on the other side?

One might ask,Footnote 10 of course, whether and how this ‘co-adaptation’ between a human and an intelligent (closed-loop) system substantially differs from the ‘standard’ adaptation of our preferences, attitudes and behavior as we engage with the world. Of course, we also see adaptation between humans and other (data-driven) systems, for example conventional advertising or standard treatment models for common diseases. I would submit, however, that there are indeed substantial, i.e. non-trivial, differences between such established modes of interaction and the co-adaptation between humans and a (closed-loop) system based on big data analytics and advanced machine learning, particularly: (a) The aim of traditional models of data-driven analytics is to derive common parameters / features from data from many individuals to build system that responds well for the average user; in the case of advertisement, for example, a product would be tested on many individuals and then modified according to the average preferences of consumers. In emerging systems based on big data / machine learning, the aim is often to achieve a maximally individualized response based on pattern classification and predictive analysis; again in the case of advertisement, to develop highly personalized ‘targeted’ advertising that can adapt to a consumers changing preferences. For analytics based on brain data, for example, a brain-computer interface for paralyzed patients, such a system—because it continuously analyzes brain responses and can thus adapt to neurophysiological changes—would be much more adaptable to (e.g. disease-related) changes to brain signals over time. (b) In the case of a closed-loop system involving brain data, this co-adaptation is taken even further: A brain-computer interface, for example, based on measurements of bioelectric brain activity from an implanted electrode on the brain surface (which has the capability for delivering electrical stimulation to the cortex), could modify brain activity in real-time through delivering electrical stimulation based on the measured brain activity—a capability that would elude traditional open-loop BCI systems. (c) More mundanely perhaps, the now ubiquitous algorithms that provide recommendations for further purchases in online stores (either based on individual data: “Because you bought item X, you might also like item Y.”; or based on data analysis over many individuals: “People who bought item X also bought item Y.”) can produce eerie effects on our sense of autonomy (and directionality) and authenticity in making choices / decisions: Would I like item Y equally if it had not been flagged by the algorithm? How can I stay open-minded and/or change my preferences if chance encounters with unusual items are more or less eliminated by the ‘filter bubble’ of the algorithmic shopping assistant? Such unease from a close co-adaptation between an algorithm and human interactors, of course, now emerges in many other domains, such as information (‘fake news’) or political opinion formation.

Such continued interactions between a user and an ‘intelligent’ neurotechnological device may thus have a profound and potentially transformative effect on the experience of authenticity, the sense of agency, the active self and other aspects [53]. Therefore, I would recommend to make the study of these “neurophenomenological” effects an integral part of user-centered research on and the development of neurotechnological devices, particularly devices for closed-loop interaction.

The Problem of Bias in Applications Based on Big Data and Machine Learning

As mentioned in the previous section, the influence of bias on machine learning and (closed-loop) neurotechnology may be substantial. Bias in (data) science denotes systematic skews in the way data is collected (e.g. ‘selection bias’, in which particular sources of data are systematically, though mostly unconsciously, ignored), annotated, categorized and so forth. Importantly, however, bias also (and to particularly deleterious effect) operates at the level of human cognition. Convergent research in behavioral psychology and cognitive science has revealed the important and universal influence of cognitive biases on human decision-making and choice-taking. Cognitive biases are mental shortcuts (or heuristics) that all humans are inclined to take in evaluating problems or making decisions. The availability heuristic, for example, refers to the overreliance on readily available information and the recency bias describes the reliance on often highly memorable, recent events or information when making decisionsFootnote 11 [54]. Such human cognitive biases are particularly problematic for techniques, such as machine learning, that rely on large amounts of data that are annotated and categorized by humans. In other words, bias is a ubiquitous and almost inescapable phenomenon which may skew the basis for learning (and subsequent ‘behavior’) of devices based on big data and machine learning at many levels: at the level of the data collection, annotation and categorization; through biases of the programmers; or the biases of the users of such a system [55]. In the case of AI-based decision support systems for clinicians—a system that analyze a patient’s data and may give advice for further tests or recommend treatments—biases in the training data for the underlying artificial neural networks may lead to skewed decision-making [56].

If an ANN for skin cancer detection, for example, was mostly trained on images from light-skinned individuals, it might perform better in screening light-skinned than dark-skinned individuals, which would effectively introduce an ethnic bias into the diagnostic procedure [57].

Distributive Justice: Impact on Research in Data-Poor Areas

As we have discussed, big data and machine learning are particularly effective methods for analyses that are easy for machines but difficult for humans. As researchers increasingly leverage the power of this approach for a variety of problems, we might see whole research programs in neuroscience move into data-rich environments, simply in order to be able to employ these methods to their maximal potential (much in the way that gene editing based on CRISPR/Cas9 is currently transforming research in cell biology and molecular biomedicine).

For basic and clinical research on issues that are not blessed with plentiful data, for example on rare (so called “orphan”) diseases or research in countries with no significant digital infrastructure and a lack of economic resources, these new methods will not be so readily applicable and/or available. It remains to be seen, whether the promise and success of the “datafication” of medicine will drain human and financial resources from such areas or whether the macro-level research policies, at the international, governmental and funding agency level, will find ways for commensurate funding schemes to allow different approaches to flourish.

Mens Mea: On the Legal Status of Brain States

Given the technological development outlined above, it seems realistic to anticipate an increase in the ability to correlate particular brain states more and more reliably with concurrent “mental states”Footnote 12 through advanced machine learning on brain data. This, in turn, could breach the hitherto closed off sanctum of one’s thoughts and feelings—particularly those mental states (often denoted as phenomenological consciousness) that are not accompanied by overt behavior or peripheral physiological state changes and were thus far unobservable. This scenario raises important questions regarding the privileged privacy of one’s mental states, the right to not disclose one’s thoughts or feelings, especially (but not only) in a legal context.

The main question in this regard may well be, whether brain states and inferences on those brain states re their corresponding mental states (through decoding) existing legal concepts and instruments are sufficient to govern the fair use of these data in the courtroom (as well as in the context of policing or criminal investigations). On the one hand, we might ask again whether individuals should have the (inalienable?) right to not have their mental states decoded? If so, would this amount to a (new) fundamental human right or would existing legal frameworks be sufficient to deal with this question (see [49] for an excellent discussion). On the other hand, what if such methods could also be used for exculpatory purposes in favor of the defendant? Today, neuroimaging, particularly for demonstrating structural anomalies in a defendant’s brain, for example, is overwhelmingly used for establishing brain damage as a mitigating factor in criminal cases (see [58] for a recent overview). Should a defendant therefore not have the possibility (if not right) to use decoding methods based on advanced machine learning for establishing mitigating factors? To this end, it should also be discussed whether the existing (in the US legal system anyway) so-called Daubert standard for scientific admissibility is applicable to decoding brain states with the deep learning, given the concerns about the “black box” characteristics of many such machine-learning-based decoding architectures [59].

Jurisprudence and legal philosophy has long known the concept of mens rea—the guilty mind—for determining a defendant’s responsibility (and thus culpability) for his or her actions. Perhaps it is now the time to intensify the discussion of mens mea—the concept of one’s mind as a protected sanctum of thoughts and feelings—with respect to the legal status of brain data and mental states. In terms of civil liberties, we encounter two main scholarly debates on the freedom of our mental states and capacities: (1) The mens mea question mentioned above (often framed in terms of ‘freedom of thought’), i.e. the freedom from unwanted interference with one’s mental states and/or cognitive capacities by others (i.e. ‘negative liberty’) [60,61,62,63,64,65]; and (2) the positive freedom to (for some involving the fundamental right to) maximize / fully realize one’s cognitive capacities, involving the right to employ methods for cognitive / neural enhancement (also referred to as ‘cognitive liberty’) [66,67,68,69,70].

For comparative purposes, it might be interesting to look at the ongoing debate in forensic science and criminal law around the acceptability—both in terms of scientific standards and from a normative point of view—of using DNA analysis for identifying phenotypical traits (e.g. eye or skin color) for identifying suspects and as evidence in the courtroom [71].

Some Thoughts on Regulating and Governing Big Brain Data

In a recent policy paper [57] together with colleagues, we have pointed out the importance of updating existing and/or developing new guidelines for research and development of clinical and consumer-directed neurotechnology that acknowledges the challenges outlined above.

To this, I would like to add the following thoughts. First, I would like to voice concern regarding the usurpation of the ethical and legal discourse by the private sector around the question of brain data privacy and the safety of AI [72]. While the participation and active engagement, preferably beyond the minimal standards of “business ethics”, of the industries that actively shape the development of this technology is highly commendable and important (in the spirit of an inclusive deliberation process), I think it is important to closely monitor the ways in which these companies may dominate these discourses by spending a lot of resources on the issue. If for, for example, the Ethics Board of such companies as DeepMind remains shrouded in intransparancy with respect to their personnel and mission, it is difficult to see the raison d’être of such entities [72]. Arguably, there is a discernable difference between running a corporate policy of honest and transparent participation in public discourse on the one side, and engaging in lobbyism and opinion-mongering on the other, and I hope the companies in question will adhere to the former, rather than the latter, form of corporate social responsibility. To this end, the citizenry should actively participate in the public discussion on neurotechnology and AI and engage the companies in critical discourse on their corporate strategies and policies.

Another important and largely unresolved question concerns the adequate classification of different types and sources of highly personal data, particularly biomedical data, with respect to the appropriate (and proportional) legal and regulatory frameworks. For example, most would agree that results from blood tests or data from wearable fitness trackers constitute biomedical data. What about movement data from a person’s phone GPS sensors or person’s text (or image or voice) entries in her social media account, however? In recent studies, researchers were able to infer suicidality from automated, machine-learning-based analyses of electronic health records [7] and even from user entries in Facebook [73].

Should the casual texts we disseminate via social media thus be considered as biomedical data if they turn out to be highly valuable for AI-based predictive analyses with implications for a person’s well-being? These questions do point to the fact that—perhaps counterintuitively— there is no generally accepted definition, let alone granular classification, of biomedical data as a particular class of data.

In the absence of such a generally accepted, the question whether brain data should be treated just like any other type of data (and it’s ‘value’ solely determined by economic or other parameters), or whether it should be considered to be a special class of data must remain unresolved for the time being.

Given the breathtaking speed with which new methods and devices for gathering massive amounts of highly personalized data enter our lives, however, I would suggest that it is important to develop a comprehensive classification system that precisely defines biomedical data. Better yet, this system should also enable an evidence-based risk stratification (e.g. in terms of the potential for misuse and other risks for the individual from which the data was collected).

Any coordinated effort of classifying biomedical data, of course, will not occur in a normative vacuum but will be motivated by ulterior (ethical, legal and/or political) goals. For example, from a deontological perspective, the goal for such a classification could be to maximize each person’s individual rights, such as civil liberties (e.g. in terms of data ownership), whereas, from a utilitarian perspective the focus could be to maximize the benefit of big data analytics for society and the average individual. In terms of ensuring a transparent and accountable process, developing a biomedical data classification should be managed by institutions that are democratically legitimated, such as commissions in (or between) democratic states or supranational institutions (e.g. the European Commission).

Furthermore, when it comes to forging an international consensus process on how to shape and regulate research and development of neurotechnologies and AI (and Big Brain Data for that matter), we should acknowledge that there are important differences, mostly for historical and systemic political reasons, in the ways in which different nations and supranational bodies address the question of technology and risk assessment.

Without being able to map the full extent of the problem here, let us briefly look at differences between the US and European approach in risk assessment and regulation: While the reality is of course much more nuanced, it seems fair to say that—from a historical perspective—the European take on risk regulation has relied more on the precautionary principleFootnote 13 than the US approach. Historically, we may understand the emergence of precaution as a regulatory strategy against the globalization of large-scale technological hazards such as nuclear proliferation, climate change, genetic engineering and the like, in the modern era. The extensive and ongoing legal and political struggle between the US and the EU on how to regulate genetically modified organisms (GMO) perhaps provides a case in point [74]. In some grand sociological theories, this “risk society” is even considered to be the constitutive condition of modernity.Footnote 14 In this discussion, the distinction between hazards and risks is important for salvaging precaution from being scrapped away as a “paralyzing” principle, as exposed incisively in Cass Sunstein’s book “Laws of Fear: Beyond the Precautionary Principle” [13]. While I concur with Sunstein’s main criticism that precaution in and of itself – without considering feasibility, cost-effectiveness and other contingencies – is at best ineffectual and may even prevent important progress or be harmful [76] I nevertheless think that precaution is an important mechanism to hedge rapid technological developments against unintended (and unforeseen) adverse consequences of neurotechnology and AI and merits further legal and sociological study.

Summary and Conclusions

To summarize, let me point out that my main concern with the Big Brain Data scenario sketched here is not the underlying technology—neurotechnological devices, big data and advanced machine learning—but rather the uncontrolled collection of brain data from vulnerable individuals and the unregulated commodification of such data. We have seen that the attitudes of technology users towards the privacy of PII and device security may vary substantially, from uncritical enthusiasm to broad skepticism and every stance in between. If we accept the basic premise of living in a techno-consumerist society predicated upon a growth model of (data) capitalism, we need to find some discursive space to accommodate both the “enthusiasts’” stance of cognitive liberty—the freedom to shape their selves by means of new technologies—and the “critics’” stance of acknowledging inherent risks of emerging clinical and consumer neurotechnology and proceeding with precaution in the development and application of these devices.

I would submit that those two stances or “conceptual lenses” [77] are not incommensurable in terms of how they might inform and guide our legislative, regulatory and political response (and preemptive strategies) for governing the use of brain data. Despite the differences between these stances, I hope that both sides could agree on some basic guiding principles that may hedge and facilitate this process of deliberation and, ultimately, decision-making:

  1. (1)

    To maximize the knowledge on the technical and (neuro)scientific aspects as well as medical, social and psychological effects of such devices on the individual user and, at the macro level, on societal norms, legal and political processes. This entails making qualitative, participatory and user-centered research a central and indispensable part of the design, development and application of clinical and consumer-oriented neurotechnological devices.

  2. (2)

    To avoid the instrumentalization of neurotechnology, machine learning and big data (in accordance with Kant’s “formula of humanity”), i.e. treating the users of such devices not merely as a means (e.g. to maximize profits through targeted advertising) but as an end, by measurably improving their social, psychological and medical well-being and thereby promoting human flourishing.

  3. (3)

    To integrate the ethical, legal, philosophical and social aspects of (neuro)technological research and development, machine learning and big data into the curricula of disciplines that participate in / contribute to the development of such devices; i.e. computer science, engineering, neurobiology, neuroscience, medicine and others.

  4. (4)

    To explore inclusive and participatory models that combine expert knowledge and opinions with a bottom-up process of public opinion formation to inform the political and legal deliberation and decision-making process. Such a model of indirect normativity, i.e. specifying processes rather than values, could perhaps enhance the acceptance and safety of these emerging technologies and also satisfy some stakeholders’ need for a precautionary approach.

Finally, the broader (and, again taking the “long view”, perhaps decisive) question that is highlighted by the ascent of big data and machine learning across all sectors in society is, in my view, how we as the public—a collective of responsible social and political beingsFootnote 15—can determine and shape the beneficial use of this powerful technology in society, how we can be the sculptors of this process rather than mere data sourcesFootnote 16 and spectators.