Proposed language for data collection standards in privacy legislation

Editor's note:

A version of this post originally appeared on the Lawfare blog.

As a proponent of baseline federal privacy legislation, I am encouraged that proposals that would have been poison pills not long ago, such as individual rights to see, correct and delete data as well as new authority for the Federal Trade Commission, are drawing wide support now. But some crucial and difficult issues remain wide open.

In a recent Brookings paper looking at early draft privacy bills that include some of the proposals, I observed that “thinking about how to address standards for behavior in U.S. legislation—how data is collected, used, and shared—are less evolved than for the individual rights under discussion.” Yet standards for collection, use and sharing of personal information are at the crux of the debate ahead. Such standards will determine how much legislation succeeds in shifting the focus from consumer choice to business behavior and thereby enables individuals to trust that personal information will be protected regardless of what notices they manage to read, pop-up windows they click through or privacy settings they adjust.

Collection is the linchpin, because it defines and can obviate the risks and obligations that come with using and storing data. This post proposes language for a standard on collection. This language is rooted in recognized principles on collection of personal information but adds some general guideposts to inform the application of the principles in the infinitely varying contexts of today’s data-driven world. At the same time, it seeks to allow flexibility for innovative data uses.

The OECD Fair Information Practice Principles, which underlie many privacy laws and frameworks in the U.S., Europe and elsewhere, articulate a “collection limitation principle.” This principle holds that “there should be limits to the collection of personal data” but, other than saying that “any such data should be obtained by lawful and fair means,” does not frame what these limits should be. Some constraint is implied in the “purpose specification principle”: The affirmation that “the purposes for which personal data are collected should be specified not later than at the time of data collection” suggests collection must have some definable purpose.

When the OECD principles were adopted in 1980, collection was limited by available technology. Databases had specified fields and analog information had to be entered in order to digitize it, and limits on computing power and data storage narrowed the choices of what data to collect and process. Now these limits have been obliterated. Instead of putting some constraint on collection, technology enables virtually limitless collection.

The OECD declaration that “there should be limits to the collection of personal data” seems self-evident. It should not be permissible to collect anything and everything available. Some notorious privacy failures have come from people collecting information simply because they could: Google Street View mappers collecting content streams from unprotected Wi-Fi hotspots along their way, the Brightest Flashlight smartphone app collecting location data, Uber employees using the company’s “God View” to identify users taking morning-after “rides of shame,” and Cambridge Analytica leveraging research access to scoop up data on millions more Facebook users.

Hence the need for some contours to the general principle. In addition, any limit on collection needs to be untethered from purpose specification. This principle has become conflated with notice and, in turn, with consumer choice and consent. As a result, privacy policies tend to specify every form of data a company might conceivably want to collect and expansive catalogs of its uses because this provides legal protection. The purpose specification contained in these notices, however, does not define in a considered way what data the entity actually needs. Collection limits need to be independent of whatever forms of notice are provided to individuals, and not self-defined.

The European Union’s General Data Protection Regulation (GDPR) deals with this by providing that processing of personal information is “lawful” only if it fits within enumerated grounds. These primarily include consent, performance of a contract, legal obligations and the “legitimate interest” of the entity responsible for the processing. All these grounds are carefully circumscribed, with legitimate interest balanced by the interests and rights of the “data subject,” the person whom the data relates to.

Although U.S. privacy legislation should encompass the GDPR’s enumerated grounds, it should not follow the EU’s approach in prescribing the exclusive grounds for collection. This approach is rooted in a civil law system that generally aims to govern new development in advance by laying out systematic and comprehensive rules. That differs from the way we think under Anglo-American common law.

An American privacy law should take an approach more consistent with the post hoc, iterative system of the common law. As I reminded Commerce Department staff during the drafting of legislation based on the Obama administration’s 2012 Consumer Privacy Bill of Rights, most tort law and much other law rests on judgments about what is reasonable under the circumstances, the entire body of U.S. competition law is founded on two sentences of Sections 1 and 2 of the Sherman Act, and most constitutional law on particular clauses. I believe it is not entirely coincidental that this iterative approach is also how today’s software-driven technology operates as it develops through versions, updates and patches. A broad standard can allow flexibility for unanticipated data uses and exploration without permitting unbounded collection.

The language that follows is drafted with these considerations in mind:

Collection and processing [defined terms][1] of personal data shall have a reasonable, articulated basis that takes into account reasonable business needs of the [covered entity/controller/etc.][2] engaged in the collection balanced with the intrusion on the privacy and the interests of persons whom the data relates to.[3]

This language is deliberately not detailed, but it is tethered to various strands of law and thought about privacy and other areas. It contains four elements that bound data collection: (1) a reasonable articulated basis, (2) reasonable business needs of the collector, (3) impact on privacy and (4) impact on other interests of individuals.

Related Books

The practical effect of these elements is illustrated by privacy failures mentioned above. The Street View collection was simply a case of engineers in the field happening on data streams from unencrypted Wi-Fi hotspots and scooping them up simply because they were available and might be interesting someday. In other words, it was done with no specific purpose in mind—for no reasonable, articulated basis. The Brightest Flashlight app collected and shared location data continuously, even though location is irrelevant to the functioning of a flashlight. In short, there was no business need for the location (this posits that collecting data purely for the sake of exploiting it unilaterally is not a reasonable business purpose). And while Uber’s abuse of its “God View” fails on all elements—it was mindless and not for any business purpose—it is most notorious for its intrusion on private behavior in malicious ways users could not expect. Cambridge Analytica is a more complex matter that I’ll come back to after discussing the antecedents of the proposed standard.

The proposed standard is an expression of the collection limitation principle as reframed in the White House Consumer Privacy Bill of Rights as “focused collection,” articulated as “a right to reasonable limits on the personal data that companies collect and retain.” The standard outlines contours for these reasonable limits and, by invoking the privacy and other interests of affected individuals, adds meaning to what the OECDOECD called “fair means.” The reasonable, articulated basis element resembles the explanation of the 2012 focused collection principle as holding “that companies should engage in considered decisions about the kinds of data they need to collect to accomplish specific purposes.” The focused collection principle makes data minimization not an absolute but calls on companies to make considered judgments about what data they really need and why they need it. To put it in trending terms—and in contrast to the examples of mindless collection above—the reasonable, articulated basis calls for mindfulness about collection.

This element also harks back to the origins of the repurpose specification principle. By demanding considered choices about what data to collect, it provides an accountability check that need not be connected to what is described in privacy policies or other notices to the public. The focus is on the collection decision itself, not on the disclosure.

The proposal uses the word “reasonable” twice. It is an elastic term but, as mentioned above, there are broad bodies of law that involve judgments of reasonableness—reasonable care, the reasonable person and reasonable fear of harm are commonplace examples—and lawyers and judges are trained to make these judgments. Privacy is not a green field in this respect. William Prosser long ago distilled 70 years of law applying the famous Warren and Brandeis law review article on the right to privacy into the Restatement (Second) of Torts, which defined intrusions on seclusion and private life in terms of what would be “highly offensive … to the reasonable person.” In significant respects what Daniel Solove and Woodrow Hartzog have characterized as the FTC’s “common law of privacy” has defined unfair and deceptive practices in terms of reasonable practices and industry standards; in the high-profile Wyndham Hotels decision, the Third Circuit Court of Appeals held that the FTC’s authority to regulate unfair cybersecurity practices encompasses “cases where a business[’s] … failure to employ reasonable security measures” causes injury to consumers. The Securities and Exchange Commission has followed a similar path in its monitoring of cybersecurity.

There is a growing body of professional standards and practice to inform these judgments. The American Bar Association has recognized a new legal specialty in privacy law, and the International Association of Privacy Professionals now comprises more than 40,000 members worldwide. Numerous other organizations have issued manuals and audit standards on privacy and security practices. But no matter how specific a privacy law, uncertainty will be inherent. The GDPR is more detailed than a great many people would advocate for the U.S., but it has required much explanatory guidance from the European Data Protection Board, the collective body of EU data protection regulators.

The specific language “reasonable, articulated basis” is adapted from existing law protecting privacy interests. The USA FREEDOM Act was adopted in 2015 to constrain federal bulk collection of telephone metadata within the U.S. It requires the government to apply to the Foreign Intelligence Surveillance Court for a warrant to analyze metadata; the warrant application must include a specific selection term and facts showing the term is relevant to an investigation and “a reasonable, articulable suspicion” that the term is associated with a foreign power engaged in international terrorism (which provides the lawful basis for government surveillance).

This standard codifies what the National Security Agency (NSA) used as a standard for the basis on which an analyst would be permitted to query the metadata that the agency collected in bulk from U.S. telephone carriers under Section 215 of the USA PATRIOT Act, which the USA FREEDOM Act replaced. The NSA’s standard was based on the formulation distilled from the Supreme Court’s 1968 decision in Terry v. Ohio, in which the court held that a stop-and-frisk was reasonable under the Fourth Amendment where the police officer was deemed to have a reasonable suspicion based on observation and experience that the suspects were armed and dangerous.

The “reasonable, articulated basis” language therefore has a connection to ways we have protected our conceptions of privacy as a constitutional right. Laws on government access focus on the reasons for obtaining information and the use of information in connection with those reasons. Why not do the same for access to information by other institutions, especially since the time has long passed since we have had much choice about such access? The USA FREEDOM language also has the tactical value of having passed Congress before—and within recent memory at that.

The use of “reasonable business interests” resembles the “legitimate interest” grounds in the GDPR, one of the more subtle provisions of that regulation. Like legitimate interest, it both validates business interests as a reason for collection but (especially in conjunction with the “reasonable, articulated basis” language) judges these interests with an objective standard, and also requires a balancing test. In turn, this balancing against individual interests seeks to promote the data stewardship many companies profess and obligates them to protect the interests of individual subjects of data. ThisThis gets at the “duty of loyalty” in the Data Care Act proposed by Democratic Sen. Brian Schatz of Hawaii, one of the “working group” of Senate commerce committee leaders drafting legislation, and the “golden rule” I proposed last July “that companies should put the interests of the people whom data is about ahead of their own.”

Looking at privacy intrusion for purposes of this balancing also brings in privacy risk assessment, something that is basic to data management. Privacy or information security analysis begins by looking at what data is collected and how it flows, and then assessing the risks associated with this data, its uses and its movement. Privacy risk assessments are called for in the GDPR, a made-in-America idea developed and deployed in federal government agencies. One of the most effective aspects of the bil l debated in the Washington state legislature is a section requiring privacy risk assessments, including whether “potential risks to the rights of the consumer outweigh the interests of the controller, consumer, other stakeholders, and the public in processing the personal data of the consumer.” This gets at the same thing in a less detailed way.

The language does not answer every question that can arise concerning collection of personal information. Neither collection limits nor any other single provision can carry the full weight of addressing business conduct on handling personal data.

A successful privacy statute will have to work holistically, and interpretation of rules on use will have to work holistically to shape the boundaries of a collection standard. For example, it may take provisions or rulemaking that exclude certain sensitive data fields or targeting to establish boundaries for behavioral advertising. But, even if behavioral advertising in general is considered a reasonable business purpose, this collection language could be construed as barring Target’s processing of purchasing data to deliver ads for maternity products to a secretly pregnant teenager as an excessive intrusion on her privacy and interests.

This standard might also reach Cambridge Analytica’s “harvesting” of data from the contacts of its original paid research subjects., but I concede it is not clear-cut. As Danny Weitzner pointed out in Lawfare last year, that case is really about the secondary use of data for purposes outside the context of the research subjects original consents and expectations.

Nor should a particular standard attempt to answer every question. The greater volume, velocity and variety of data today mean a greater volume, velocity and variety of challenges in managing data responsibly. There are trade-offs between certainty and creativity, between precision and flexibility; the issues are too diverse for one size to fit all. So successful privacy legislation should avoid a checklist for compliance and instead be flexible, adaptable and focused on outcomes. This is what the standard above aims to do by putting a check on unthinking collection and putting a focus on the impact on privacy and the individuals who can be linked to the data collected.

[1] “Collection” probably needs to be defined to be clear that it is not limited to online collection.

[2] What to call entities covered by statute varies with proposals. I favor adopting the controller/processor distinction used in EU law and the Washington state bill, but that’s a subject for another day. So is what to call the individuals protected, because I consider using a possessive in this context to be confusing.

[3] A rule of construction may be helpful to guide the meaning of “risks to privacy” and “interests of persons that the data relates to.”