Sections

Commentary

Can journalism survive AI?

Courtney C. Radsch
Courtney Radsch
Courtney C. Radsch Nonresident Fellow - Governance Studies, Center for Technology Innovation, Director, Center for Journalism and Liberty - Open Markets Institute

March 25, 2024


  • In the past 20 years, the U.S. lost two-thirds of its newspaper journalist jobs—jobs that AI cannot fill.
  • Despite that, AI advancements are continuing the “platformization” of journalism and enabling a handful of technology firms to maintain their control over our information channels.
  • Journalism can only survive if the news industry unites to double down on journalists and demand a framework in their deals with tech giants that benefits journalism in the public interest.
Printer Jose Lomeli holds newspaper printing plates for the first copies of the inaugural Los Angeles Register newspaper in Santa Ana, California.
Printer Jose Lomeli holds newspaper printing plates for the first copies of the inaugural Los Angeles Register newspaper in Santa Ana, California April 16, 2014. The newspaper, owned by Freedom Communications, will have 50-60 pages on weekdays and 80-90 pages on Sundays. Credit: REUTERS/Lucy Nicholson

Can journalism survive artificial intelligence (AI)? The answer will depend on whether journalism can adapt its business models to the AI era. If policymakers intervene to correct market imbalances, they must enforce intellectual property rights and ensure that journalism has a fighting chance in the era of generative AI.

Over the past nearly two decades, as tech companies like Apple, Amazon, Google, Meta, and Microsoft grew to become some of the most valuable companies in the world, the United States lost a third of its newspapers and two-thirds of its newspaper journalists. They cannot be replaced with AI.

Last year alone, the U.S. journalism industry slashed 2,700 jobs, and 2.5 newspapers closed each week on average. Despite a 43% rise in traffic to the top 46 news sites over the past decade, their revenues declined 56%. The dominance of less than a handful of privately owned, Silicon Valley-based tech corporations over digital advertising, publishing, audience, data, cloud, and search decimated the business models of journalism worldwide. And now AI is doing it again.

But unlike journalists, AI can not go into the courtroom or interview a defendant behind bars, meet with the grieving parents of the latest school shooting victim, cultivate the trust of a whistleblower, or brave the frontlines of the latest war. Furthermore, without access to human-created, high-quality content that is a relatively accurate portrayal of reality—and that journalism provides—the foundational models that fuel machine learning and generative AI applications of all types will malfunction, degrade, and potentially even collapse, putting the entire system at risk.

The rapid advances in artificial intelligence are becoming yet another way for a handful of powerful tech corporations to extend and entrench their already dominant market positions. This will make it difficult, if not impossible, for sectors like journalism or the creative industries to remain independent, much less to maintain a public interest orientation as should be the case for the news industry.

The AI revolution underway extends the “platformization” of journalism and the power that a handful of tech firms maintain over our information channels and our public discourse. This, in turn, will exacerbate the ways in which these corporations are already threatening and cheapening real journalism while exploiting the labor of millions of journalists and others to build their models and develop applications that alter our economies and societies.

Journalism is particularly valuable to generative AI search, where it provides real-time information, context, fact-checking, and human language. This is where journalism, including local journalism, could be particularly valuable and thus must be able to monetize. Searching for information about local businesses, community issues, or government is going to be lot less useful if there is no local journalism informing the results. Similarly, journalism that focuses on niche topics, breaking news, and investigative reporting are also likely to be especially valuable to applications that want to provide up-to-date, relevant, and timely information to their users while fighting the scourge of misinformation and low-quality content online.

Publishers are deeply concerned about how AI will further exacerbate the trend toward zero-click searches, which display the information requested without sending a user to an actual news site. They have been on a steady upward trend since 2019. A 2022 study found that half of all Google generative AI searches were zero-click and just a tiny fraction of Facebook users click through on the content in their newsfeeds.

Equally distressing is the way that AI companies are building their systems on the widescale theft of intellectual property and uncompensated use of journalistic content, which is far more than just a collection of facts and is often collected at great costs to the journalists who report the news. Journalism is an essential part of many of the foundational data sets used to develop and train generative artificial intelligence systems. News makes up half of the top 10 sites in the training data of a Google dataset that is used to train some of the most popular large language models (LLMs), and accounts for nearly half of the top 25 most represented sites in the Colossal Clean Crawled Corpus, a snapshot of the open source Common Crawl dataset filtered to retain high-quality English sources and discard low-quality and problematic content like profanity and hate speech.

Even content that was put behind paywalls and intended to be restricted to paid users is present in LLMs and recycled in generated responses. Last year, ChatGPT and Bing had to stop a new product partnership because users were able to bypass publisher paywalls. More than half of 1,159 publishers surveyed have requested AI web crawlers to stop scanning their sites, though compliance is voluntary and can be ignored with impunity.

If AI companies are allowed to further cannibalize content and revenue from the journalism industry, as has been the case with search and social media, they will divert readers and potential subscribers away from publishers. This further reduces revenues that could be earned from subscription, advertising, licensing, and affiliates, undermining not just the ability to produce quality journalism but also the underlying business models for the entire sector.

Luckily, despite protestations that freely using journalism to develop foundation models and fuel generative AI applications like search and content generation is fair use, AI companies have already started to strike deals with news publishers for access to their content. OpenAI, which is substantially owned by Microsoft, has inked licensing agreements with some of the largest journalism organizations in the world including the Associated Press, Axel Springer, Le Monde, and Spanish media conglomerate Prisa, while several more are reportedly in discussion with Apple and Google. Although the terms are largely unknown, many of them appear to cover licensing content, including archives, for a defined period (two years seems to be the norm) as well as access to AI tools in the newsrooms.

But smaller, niche, minority, investigative, and local media are being left behind, in part because they don’t necessarily understand the value proposition of their journalism throughout the AI value chain, the resources and sway to seek out deals, or the power to negotiate effectively.

How we decide to allocate intellectual property rights and how we decide on whether fair use applies to developing and training artificial intelligence systems will have profound ramifications. This is where efforts to require tech platforms to negotiate with news publishers and allow publishers to collectively bargain for use of their content could be particularly helpful.

News media bargaining codes, which are already in place in Australia and Canada and under consideration in a dozen more jurisdictions including the United States and several states, were initially seen as a way to require fair compensation for the value that news snippets provide to Google Search and Meta’s social media platforms. But they could, and should, be used to demand compensation for the scraping and crawling of content for AI systems as well, as I told the California Senate, Canadian Parliament, and South African Competition Commission in hearings held over the past few months.

The Center for Journalism and Liberty, which I direct, tracks adopted and proposed regulations around the world on the Technology and Media Fair Compensation Frameworks global tracker. Although none of them explicitly refer to the use of news content in large language models or generative AI products, they do cover scraping and crawling of news publisher websites. The former head of Australia’s competition commission and author of the country’s pioneering news media bargaining codes has similarly urged publishers to leverage the existing framework to negotiate deals.

Requiring tech companies to license the use of news publishers’ content through this type of legislation would help ensure that smaller, local, niche, and non-English language news publishers would also be able to negotiate for the use of their content and data. This type of journalism could be particularly useful for localizing generative search, summarization, content creation, and other applications that make use of journalism to provide more accurate, timely, and relevant results, particularly in languages other than English.

Journalism can also be an important source of data for improving the quality of foundation models, which suffer from bias, misinformation, and spam that make access to diverse sources of quality, factual information, especially in low-resourced digital languages, even more valuable. Furthermore, as the quality of data becomes as important as the quantity of data, journalism provides a constant source of new, timely, human-generated data.

We are in a moment when the news industry needs to unite. As giant media conglomerates and major publications strike deals with the tech giants, they need to demand a framework that will benefit journalism in the public interest, not just line the pockets of their corporate owners. That is why the only way journalism will survive AI is to double down on its journalists. As important as it will be for journalism to adapt to and integrate AI, newsrooms that replace journalists will hasten its demise, with profound ramifications for democracy in the U.S. and around the world.

News outlets must consider how to optimize revenue streams and assert their pricing autonomy throughout the AI value chain. They will need to figure out how to unlock the value of journalism by adopting sophisticated and dynamic compensation frameworks and pricing strategies for news content in various parts of AI systems and AI applications. They will need access to information about the way their content is used in AI systems, including data sets and foundational model weights. And they will need government regulations that enable them to do so.

Authors

  • Acknowledgements and disclosures

    Amazon, Google, Microsoft, and Meta are general, unrestricted donors to the Brookings Institution. The findings, interpretations, and conclusions posted in this piece are solely those of the author and are not influenced by any donation.