Sections

Commentary

Better understanding of data lifecycles can reduce digital harms

If we are ever to address effectively the harms and risks of digital technology, we first need the right language to describe the systems that collect, analyze, share, and store huge amounts of data about us as consumers, patients, and citizens—often with deleterious effects. Misinformation, attention extraction, discriminatory algorithmic profiling, and cybercrime: These digital harms all emerge from the data ecosystem in which we live, but not in ways we can fully see or explain.

Concepts and phrases inspired by ecology—like “information environment” and “social media ecosystem”—are beginning to reframe data and digital harms as parts of a greater whole and are inspiring a fuller understanding of how digital harms function. From the lifecycle of plastics, people have learned to form a holistic picture of consequences on a collective scale, and the concept of the “data lifecycle” can energize new ways of thinking about digital harms. The “data lifecycle” offers a way to break the complicated life of data into its component parts and to think of digital harms like we do externalities, such as air pollution, biodiversity loss, and chemical runoff. With a fresh metaphor, we can better understand the social costs imposed by goods and services in the data economy.

The sum of all data activities on planet earth might be called its “data metabolism,” which in 2020 created or replicated 64.2 zettabytes of data (1 zettabyte=10,000 gigabytes). Though the volume of data produced and consumed around the world is awe-inspiring, numbers offer only a limited understanding of the system. A qualitative representation of the system’s interrelated parts is also needed.

‘Lifecycle’ as a bridge to understanding

The lifecycle of plastics is a well-known model for describing potential harms to the environment across time. It illustrates a series of environmental and social impacts in stages, from extraction of fossil fuels to refining, manufacture, distribution, consumption, disposal, and recycling. Plastics arguably harm the planet less as things than as processes. It is their coming-into-being, their release of toxic chemicals once made, their interaction with living species and ecosystems, and their changes during recycling that produce negative consequences, such as eco-toxicological risks, methane emissions, fossil fuel depletion, wastewater contamination, and loss of recyclability due to degradation. 

For those accustomed to thinking of data as a discrete object or thing, the shift to thinking of it as a process may seem strange. But by considering how we as consumers generate data and how that data is then used, we can begin to understand the dynamic nature of the data lifecycle. E-commerce, for example, attracts consumers who browse and buy online. Their activity produces a trace of personally identifiable data. Brokers harvest the information, aggregate different sources, and sell to buyers, who use the results to make inferences about individuals’ lifestyles, attributes, and preferences—which may reveal sensitive details, such as addictions, credit scores, and medical diagnoses. The resale value of this information feeds a black market for consumers’ data that is stolen or taken without their informed consent. Associated harms can arise from insecurity anywhere in browsers, platforms, networks, apps, operating systems, or devices.

As more businesses and services go digital, responsibility for preventing harm becomes more complex. It’s often the case that those afflicted by digital harms are distant from decisions that give rise to them. Some consumers experience setbacks like discriminatory profiling and privacy breaches as part of the e-commerce data lifecycle. Yet consumers are typically uninvolved in designing data management policies or default settings—nor do they control the economic pressures that frame such choices. They are left with very few levers for safeguarding privacy and security, often on an individual basis at the level of device or account. The data lifecycle is much bigger than any one consumer.  

The “data lifecycle” is a fluid metaphor and is already used in industry and civil society as a specialized concept. With more advocacy and intervention around this metaphor, a growing number of people can learn to communicate the impacts of data capture, maintenance, synthesis, use/analysis, publication/sharing, storage, and discard/reuse—and how this happens across time. 

Observing harm in data processes

Underlying debates on digital harms is a tangle of thorny issues about responsibility and causality. Polarization, inequitable access, and consequences of data breaches don’t fit neatly into current accountability frameworks like privacy protection laws, regulatory oversight, jurisprudence, or financial accounting. Because they arise from interactions rather than single actions, digital harms present hurdles to understanding who or what is at fault.

Consider how threats pulse through the internet in feedback loops. One person’s purchase from an online retailer like Wal-Mart or Amazon, for example, helps reveal the likelihood that other people with similar online attributes and histories will buy the same product. What companies do with inferences about bystanders can be a tradeoff between business and well-being: The ability to send precisely targeted distractions such as personalized ads and notifications is both a source of revenue and a burden on people’s ability to focus, remember, and be present with each other. Although consumers could opt out of transactions in line with their personal risk tolerance, their ability to do so is waning.

When marketers or insurers acquire a person’s data and analyze it together with similar data from thousands of other internet users, this action feeds algorithms and places people into buckets—“unreliable party-goer with a drink habit” or “chronically sick and suffering medical debt,” perhaps. These categories may be used to extrapolate statistical guesses about third parties who have nothing to do with the dataset except that they have things in common with some people who contributed data. For these bystanders, in both their digital and physical lives, algorithms may contribute to behavioral manipulation, misinformation, disinformation, radicalization, undermining of mental health, reduction of consumer choice, and countless other harms. 

Despite its name, cyberspace is not otherworldly. It is an environment inhabited by people and things, a global communication and information system sometimes referred to as an “ecosystem.” Households, retail stores, factories, hospitals, and public spaces are settings for internet-connected devices generating data. Internet users—individual human beings—are endlessly generating data in their interactions with increasingly ubiquitous digital devices and services.

Similar to environmental harms, digital harms affect victims without their permission and often without their knowledge until it is too late. For example, image-based sexual abuse—defined as nonconsensual taking, sharing, or threats to share nude or sexual photos or videos of a person—can evade victims’ awareness until someone informs them. The abuse, particularly when amplified by algorithms that promote engagement and advertising revenue, can affect the victim perpetually, due to the difficulty of removing all instances of the content. And the scale of problems like this are probably underappreciated: In one 2019 survey, one in three people aged 16 to 64 reported that they experienced image-based sexual abuse victimization.

Digital harms afflict groups and society. It’s often inadequate to think of them as isolated incidents affecting one person at a time. Studies of cyberbullying, for instance, have found expansive impacts—not only on victims but also perpetrators, kinship networks, and communities. Analogies in the environmental arena provide new directions for research and advocacy. Think of the setbacks to livelihoods, climate, biodiversity, and erosion control that arise from the decisions of a limited number of people to plunder forests. We can better understand impacts of digital harms on our social fabric—such as setbacks to privacy, dignity, autonomy, and civil discourse—if we turn our attention to our shared reliance on the internet for information and communication, similar to our dependence on natural habitat.

Addressing digital harms

The data lifecycle metaphor brings a measure of structure to our ongoing debates about how to address digital harms. The causes of digital harms are still being worked out by society, so governments have found them tricky to regulate. There’s a disconnect between laws and social license (what people agree to be acceptable behavior). Harms can be individual, collective, or societal—and are still largely unregulated. To change the conversation about how we can agree on responsible data management, we need a model for visualizing and describing what data management is. The data lifecycle, by representing the inevitable relationship between data and process, offers a new model for redesigning a better information ecosystem.

Policymakers can think in terms of data lifecycles to tackle familiar processes in our information economy that enable second-order effects which are harmful and opaque to consumers. For example, data collection is not an isolated event but a practice that, without guardrails to prevent excess, can drive discriminatory analytics, fuel information sharing for purposes that undermine autonomy, and make data harder to protect from cybercriminals. By better understanding the relationships among data practices—the collecting, sharing, analyzing, and storing of data—policymakers can write better, more thoughtful regulations that recalibrate our information ecosystem toward well-being and security.

Jordan Famularo is a postdoctoral scholar at the University of California, Berkeley’s Center for Long-Term Cybersecurity.