Privacy protections in the Google search case

Last year Judge Amit P. Mehta ruled that Google monopolized the search market. In the current proceeding, he must decide what remedies to impose to restore competition. One of the remedies proposed by the Department of Justice (DOJ) is a data access requirement that would force Google to share its user search data with rivals and potential rivals. The goal is to overcome the scale advantages that give Google an insuperable lead in providing high-quality search results. To protect user privacy, the DOJ also proposed that Google “make this data available in a way that provides suitable security and privacy safeguards…”

This data access remedy and its accompanying privacy protections will have to accommodate the rapidly changing nature of the search business. Search is quickly becoming only a component of artificial intelligence (AI) services. Instead of a standalone service that returns links to user queries, search engines will scour the internet to find answers to questions posed to chatbots. AI agents will also pull the appropriate websites to complete assigned tasks. One of the salient features of this advanced AI service world will be an extreme emphasis on personalization. This business model will set off a race among AI service providers for massive amounts of detailed user information, much of it highly sensitive information concerning religion, political affiliation, sexual orientation and proclivities, medical conditions, as well as more traditional marketing information such as music tastes, brand preferences, and fashion favorites.

Google’s detailed search histories will be a treasure trove—of interest to all companies competing to provide advanced AI services. As a result, the privacy protections that the DOJ proposes to require for user data transferred from Google to Google’s AI competitors will have to be extraordinarily robust to overcome the natural drive of these AI rivals to fully exploit the data that has been put in their hands, not by user consent, but by judicial fiat.

In the face of these new AI privacy risks, the DOJ’s proposed privacy protections are woefully lacking. They must be expanded to include additional requirements for reasonable deidentification, a ban on attempted reidentification, and the inclusion of a privacy expert in the technical committee administering the antitrust remedies.

Back to top

The DOJ’s proposed data access requirement and privacy protections

The DOJ’s proposed data access remedy would, among other things, require Google to make available “…all data that can be obtained from users in the United States, directly through a search engine’s interaction with the user’s Device, including software running on that Device, by automated means.” It also includes information Google collects “…when answering commercial, tail, and local queries.”

The data must be provided to “…any provider of, or potential entrant in the provision of, a General Search Engine (GSE) or of Search Text Ads in the United States.” These competitors must make a “showing” of a “plan to invest and compete in” these search markets and such showing must be “sufficient” as determined by the DOJ in consultation with a technical committee. A qualified competitor must also not pose a national security risk, a requirement that seems specifically designed to exclude Chinese AI companies, such as DeepSeek.

The point of the data access requirement is to overcome network effects in search engine efficiency. Google, according to the DOJ and the court, has by far the largest supply of queries and clicks and has used this advantage to create and maintain an insuperable quality advantage over its few rivals. Data access gives these rivals and potential rivals access to high quality training data that Google currently has in its sole possession. Of all the proposed remedies, data access is the one most likely to produce a rival with a search engine that can match Google’s and thereby draw users away from its platform. It is an essential part of an effective remedy to restore search competition.

European legislators have reached the same judgment concerning the need for data access. The Digital Markets Act (DMA) requires Google to provide search competitors with access “to ranking, query, click and view data in relation to free and paid search generated by end users on its online search engines.”

As the Federal Trade Commission (FTC) notes in its amicus brief in the case, “Given the scope of this proposed data sharing, privacy considerations are critically important.” To protect the privacy of Google users, the DOJ proposes that before data sets are shared, Google must “use ordinary course techniques to remove any Personally Identifiable Information.” In addition, a qualified competitor must agree to regular privacy audits by the technical committee.

Google must communicate to the company receiving the information “any anonymization or privacy-enhancing technique that was applied.” Before the data sets can be released to competitors, the DOJ, in consultation with the technical committee, must determine that the privacy safeguards are “fully functional.”

In the face of the emerging personalized AI services, these two privacy protections fall far short of what is needed. Removing personal information is not by itself an effective way to deidentify a data set. It might be necessary, but it is not sufficient. In addition, nothing in the proposed remedies prevents recipients of Google user search data from seeking to reidentify the deidentified data they receive. A requirement for regular privacy audits sounds reassuring, but it is an empty gesture unless the underlying privacy obligations are robust, something lacking in their current form.

Back to top

The emerging personalized AI services

As I discussed in a recent commentary for Tech Policy Press, AI companies have been clear about their interest in exploiting personal data to provide their users with a personalized AI experience. OpenAI’s Sam Altman captured the promise and the privacy threat of personalized chatbots when he described his ideal of a “very tiny reasoning model with a trillion tokens of context that you put your whole life into.”

AI agents raise even more privacy concerns than all-knowing chatbots. In a comprehensive report, the privacy think tank Future of Privacy Forum writes, “AI agents may be at their most valuable when they are able to assist with tasks that involve highly sensitive data (e.g., managing a person’s email, calendar, or financial portfolio, or assisting with healthcare decision-making).”

Meta head Mark Zuckerberg said a good AI assistant will be like a friend who has a “deep understanding of what’s going on in this person’s life and what’s going on with your friends…”

One Bluesky user captured the creepy feeling these business plans evoke in many, saying, “Make tech bros rewatch the specific episode of [Netflix’s] Black Mirror that they are trying to create—Clockwork Orange style—until they understand the point of the episode.”

AI firms would benefit from access to Google’s search data in personalized form in pursuit of these personalized AI services. Google already allows its users to opt in to the use of their search history to provide them with more personalized AI chatbot services. It will almost certainly extend this option to users of its coming AI agent service. Sam Altman, with his vision of an all-knowing chatbot, would certainly want to mine the gold in Google’s search history. Anthropic, too, would have an interest in access to personalized Google search results.

The DOJ’s pro-competition remedies in the search case should not allow Google’s AI rivals to access Google search data in personalized form. They have not necessarily signed up as customers of these AI companies and might have no business relationship at all with them. Allowing personalized data of Google users to flow unhindered to all Google’s rivals would make these challenging AI privacy issues even more challenging. Such privacy invasions are clearly not the intent of the DOJ in proposing its data access remedy.

The following recommendations for an amended data access remedy seek to implement the intent of the privacy protections already contained in the DOJ’s proposed remedies. They are designed to allow for an effective spur to the development of competing search engines while also preserving Google user privacy rights in an age of personalized AI.

Back to top

Require deidentification

Just removing personal information from search records, as the DOJ proposes, is plainly inadequate as the famous AOL case from 2006 showed. When AOL released a treasure chest of search data with the names removed to benefit academic researchers, two New York Times reporters were able to track down one user in a matter of days. Other data has to be removed or disguised as well. As Harvard researcher Latanya Sweeney has shown, three quasi-identifiers—a five-digit zip code, birth data, and gender—are enough to identify 87% of U.S. adults.

More needs to be done. Google should be required to deidentify the data using reasonably available techniques. Inserting enough noise (extra meaningless data) into search histories, for instance, might enable these released data sets to pass a differential privacy test. This could reduce the risk that a receiving company could identify an individual in a Google search data set to a negligible level.

Europe requires more than this. It demands full anonymization under the Digital Markets Act (DMA) by mandating that the transferred search data must be “irreversibly altered” and not be related to an identified person. But this would require introducing so much noise into the data set as to destroy any usable pattern that could aid in the development of an alternative search engine. The rival search company DuckDuckGo has criticized Google’s proposed anonymization method in Europe as creating a data set that is unusable for search training purposes.

Instead of full anonymization, the privacy requirement in the data access remedy should be for reasonable deidentification, which is more than simply removing names but less than full, irreversible anonymization.

This has a precedent in FTC privacy policy. In a far-reaching privacy report released in 2012, the FTC recommended that data should not be considered “reasonably linkable” to a consumer, computer, or device to the extent that a company, among other things, “takes reasonable measures to ensure that the data is deidentified.” What qualifies as “reasonable measures” depends on the circumstances, including “the available methods and technologies.”

This requirement for reasonable deidentification should be added to the DOJ’s data access remedy and should be paired with a requirement that the data transferred must be usable for the intended purpose of developing an alternative search algorithm. This would counter Google’s natural incentive to use privacy as an excuse to unreasonably restrain the ability of competitors and potential rivals to develop a search engine that matches its own.

Back to top

Ban attempts at reidentification

Requiring only a standard of reasonable deidentification combined with a usability mandate could make the risk of reidentification of transferred Google user data non-negligible. So, a second privacy safeguard is that Google search rivals should be permitted to receive deidentified Google user search data only on the condition that they agree to make no attempt to reidentify the data. The regular privacy audit required in the DOJ’s proposed remedies should specifically examine the records of Google rivals to assess whether they have made such a reidentification effort and, if they have, penalties should include suspension of the data flows from Google and destruction of the records already received.

Oddly, the DOJ’s proposed data access requirement contains no such prohibition on reidentification, and no explanation is provided for this puzzling omission. But in its absence, the enormous value of identified search data sets will push the AI firms that will be the primary recipients of Google user data to attempt to match Google users with their own. Appending a Google user’s entire search history to any records of the same individual held by a search rival will enormously increase the value of the combined data set to the AI firm, but at the expense of an extreme privacy violation.

A legal or contractual restriction on reidentification is a common approach to supplement technological measures to deidentify data sets. The FTC’s recommendations in its 2012 privacy report required that data should not be considered “reasonably linkable” to a consumer, computer, or device to the extent that the company deidentifies it and, in addition, to the extent that the company “publicly commits not to try to reidentify the data.”

Another precedent comes from a contractual deidentification proposal made in 2010 by privacy expert Robert Gellman. His key requirement prohibits recipients of deidentified data “from reidentifying or attempting to reidentify any potentially identifiable personal information under the threat of civil and criminal penalties.”

Back to top

A privacy expert for the technical committee

In its proposed remedies, the DOJ recommends that the court appoint a technical committee composed of five people who are experts “in some combination of software engineering, information retrieval, artificial intelligence, economics, and behavioral science.” Membership, apparently, is to be based “solely on the[se] requirements,” meaning others with different expertise—in privacy, for instance—are barred from the technical committee unless they are also experts in the approved areas. The committee may hire such staff or consultants as it deems necessary, but these hires must meet the same requirements as the members of the committee.

The technical committee has broad responsibilities in privacy. It must conduct regular privacy audits of competitors. The DOJ must consult with it in connection with a determination that Google’s privacy safeguards are “fully functional” before data is shared.

The apparently deliberate exclusion of privacy expertise from the technical committee is therefore puzzling and completely unexplained. Given the enormous privacy issues at stake in the enforcement of the search case antitrust remedies, and the explicit privacy responsibilities assigned to the committee, the membership of the technical committee should always contain at least one privacy expert. The staff hired by the technical committee should also have the necessary privacy expertise. The legal academy, research institutions, and civil society organizations house many such experts. The DOJ needs to bring them in to provide needed privacy guidance to meet the many technical and policy challenges in implementing a remedy that implicates both antitrust and privacy issues.

Back to top

Google’s continued access to its own users’ personal data

In its initial proposed remedies from 2024, the DOJ adopted what seemed to be an attractive competitive equity principle. It said: “Google is prohibited from using and retaining data to which access cannot be provided to Qualified Competitors on the basis of privacy or security concerns.” The initial remedies proposal was vague about privacy safeguards, saying only that the transfer of Google data had to be “consistent with personal privacy.” But under the new proposed remedies, Google must remove personal information before it transfers its search data. Applying this equity principle in those circumstances would mean that Google would have to stop using its personal search data, thereby ending personalized search services for its own users, which many seem to highly value.

The DOJ’s current remedies proposal, however, does not contain this equity principle. Under the DOJ’s current proposed remedies, Google would be allowed to continue to access and exploit the personal search data of its own users, while its rivals can access only deidentified Google user data. Doesn’t this asymmetry allow Google to perpetuate its search monopoly? Shouldn’t Google be required to deidentify its own users’ data and refrain from attempts at deidentification to create a level competitive field?

No. The deidentification requirements should not be applied to Google itself. The point of the data access requirement is to overcome the scale advantage that Google enjoys in devising and improving its search algorithm. This can be done without destroying Google user privacy through nonconsensual transfer of their personal data to an indefinitely large range of third-party companies and without depriving users of Google’s personalized search services that many seem to value. The appropriate compromise, which the current the DOJ proposal attempts to implement, means that competitors must have access to Google’s treasure trove of search data in a form that is usable to help them train their own search algorithm, while still not revealing the identity of Google’s users. The technical committee will have the difficult task of setting balanced privacy constraints that do not prevent search rivals from effectively using deidentified Google data for search training purposes, but which are strong enough so that the personal search data Google has compiled does not flow seamlessly to any and all Google rivals.

Ending personalized search for Google’s users is not needed to overcome the scale advantage Google currently enjoys. It simply degrades the range and quality of the search services available to Google users. This would create another form of competitive asymmetry, where Google could not provide personalized search services to its users, but its competitors could provide personalized search services—by analyzing personal data from its own users. This might encourage Google users who value personalized service to move to another search platform, where such personalized service would be available. But, it is an odd antitrust remedy that would encourage competition by diminishing the perceived quality of the monopoly service to such a degree that users leave it.

Back to top

Going forward

In its amicus brief, the FTC concludes that the DOJ’s remedies proposal contains “terms that are consistent” with the requirements in its privacy and security orders. But, the DOJ does not require competitors to implement even the minimal pro-privacy program the FTC describes, which say nothing about deidentifying transferred data. The DOJ’s data access requirement is consistent with the FTC’s privacy requirements in its remedial programs only in the narrowest sense that its terms do not affirmatively forbid Google’s competitors from adopting them.

More needs to be done. The new chatbot and AI agent services are a privacy nightmare for many. The DOJ might make the problem substantially worse by allowing Google user search data to flow to frontier AI labs, be transformed into personally identifiable form, and inserted on a real-time basis into the profiles these AI companies are constructing to provide personalized AI services to their users. This indeed might make people flee the Google search service, not because of a better alternative, but because of government-sanctioned privacy invasions that they are powerless to resist once they use the Google search engine.

The DOJ remedies should contain an explicit mandate for Google to provide search data to search rivals only in deidentified form. Given the enormous commercial value of identified search histories, rivals will be tempted to use all available resources to crack the code that protects this data, unless they are explicitly forbidden to make the attempt. To prevent this, the remedies should require that rivals may receive and continue to receive Google search data only on the condition that they make no attempt to reidentify it. To ensure these privacy requirements are properly implemented, the supervising technical committee must always contain at least one privacy expert.