Sections

Research

Counting AI: A blueprint to integrate AI investment and use data into US national statistics

A stack of counting blocks with a new block labeled "AI" being added
Shutterstock / The KonG

Output was easier to measure when the bulk of the economy consisted of agriculture and manufactured goods. The rise of the knowledge economy and service sector has made it much harder to construct accurate national income and product accounts (NIPA) that undergird gross domestic product (GDP) calculations. That is because intangible capital tends to drive the bulk of investment and growth in certain sectors, and increasingly, in all parts of the economy.

Research led by Ellen McGrattan and the late Edward Prescott has shown that failure to incorporate intangible capital leads to an underestimate of GDP in macroeconomic models of the business cycle. Other research has found that about $800 billion in annual U.S. business spending on intangibles (as of 2003) was uncounted, leaving over $3 trillion in intangible capital stock invisible in official data. Computer-related, intangible organizational complements in particular have grown, which can help explain how computers affect the demand of skilled labor and productivity growth.

If these concerns about underestimating GDP were true with the rise of software and brand investments, they are even more pressing with the rise of AI, especially generative AI (genAI).

In this article, we first describe three reasons why AI’s impact on the economy may be under-estimated, focusing on the treatment of AI as expensed intangible capital; the mismeasurement of quality change and service flows; the scalability and spillovers of AI; the J-curve created by complementary organizational investments; and the missing value of free or bundled AI services. Then, we discuss the parallel problem of undercounted costs, including cyber risk, privacy loss, and intellectual property erosion, and argue that AI creates both intangible assets and “intangible liabilities” that fall outside current national accounts. We also lay out a two-horizon measurement agenda: a near-term Generative AI Intensity Index that combines statistical surveys with provider telemetry to track AI adoption by sector and region, and a medium-term reform of national and satellite accounts that separately identifies AI-related capital, services, labor reallocation, and household time use. We conclude with several policy recommendations, including the creation of an interagency AI measurement task force, the development of NIST-led standards for AI usage metrics, and the systematic integration of AI measures into Bureau of Economic Analysis (BEA) and Bureau of Labor Statistics (BLS) products so that macroeconomic, labor market, and infrastructure policy rests on statistics that reflect the actual scale and distribution of AI adoption.

Three reasons AI’s impact on the economy is underestimated

First, genAI often takes the form of intangible capital that national accounts still record as ordinary operating costs rather than as investment. For example, firms build models, curate proprietary datasets, refine orchestration layers, develop prompt libraries, and create internal evaluation systems. They also invest in the organizational routines required to deploy these tools at scale. Under current accounting rules most of these outlays are expensed, so they raise measured costs in the short run rather than adding to the capital stock. That treatment masks the buildup of productive capacity and produces the familiar J-curve dynamic: Expenditures and organizational adjustment arrive first, depressing measured productivity, while the associated output gains materialize only after firms have redesigned workflows, retrained staff, and integrated AI into decision processes. Because the capital stock is understated throughout this adjustment period, growth accounting has an attenuated base from which to measure capital deepening. The share of output growth that should be attributed to the service flow of AI assets is either misassigned to total factor productivity or disappears entirely, and sectors that adopt early can look weak before adoption and strong once the intangible capital is in place.

Second, even for the AI-related spending that is recognized as investment or output, the price and quantity side is poorly measured. Firm-specific AI assets are heterogeneous and evolve quickly, so statistical agencies often rely on sum-of-costs valuation and generic software deflators. That approach inherits accounting conventions and can miss quality changes. If the effective service flow from a model doubles because of fine-tuning, retrieval, or tool use while recorded cost is flat, real output and capital services are understated. The same issue arises when genAI capabilities are folded into existing software products at an unchanged sticker price. In principle, quality-adjusted deflators should register these gains as faster price declines and, correspondingly, higher real output. In practice, deflators for software and other intangibles are often too coarse or slow-moving to capture them. Constructing accurate deflators for digital services is difficult: The products change frequently, quality is multidimensional, and there are few observable market transactions to anchor prices, particularly for diffuse technologies.

Third, a large share of the value created by digital services never passes through priced market transactions. Models and datasets are scalable and partly non-rival: Once developed, they can be applied across tasks and users at near-zero marginal cost, and their benefits propagate through supply chains and user communities rather than remaining on a single firm’s balance sheet. Much of that value takes the form of consumer surplus and cross-firm spillovers, which conventional accounts do not capture. The diffusion of genAI also relies on services that are free or bundled into existing products, from text-generation tools available without subscription to AI features layered onto productivity suites at unchanged sticker prices. Users gain time savings and quality improvements that do not appear as higher measured output, investment, or labor income. As AI becomes embedded in nonmarket activities such as household production and informal problem-solving, the gap between measured GDP and the economy’s underlying productive capacity widens. This is not a marginal omission: It is a structural shift in how digital technologies create and distribute value, and current statistical systems are not built to detect it.

Unmeasured costs, too

A comprehensive approach must nonetheless also account for the costs and risks associated with genAI’s intangibles. Just as intangible benefits of AI go unmeasured, so do many intangible costs. One example is cybersecurity risk: Greater reliance on AI and data can expose firms to new vulnerabilities and attack vectors. These digital vulnerabilities impose real economic costs that do not neatly appear in GDP. Recent research finds that companies with high cybersecurity exposure (e.g., many unresolved vulnerabilities in their networks) significantly underperform in the stock market, with roughly a 0.33% lower return per month compared to more secure peers. GenAI can exacerbate offensive cyber risk by lowering the barriers of cyberattack through automating hacking techniques or increasing the attack surface via AI-integrated systems, but such unintended consequences are invisible in official output or investment data.

Another uncounted cost is intellectual property (IP) value erosion. Generative AI models are trained on vast amounts of data, often scraping copyrighted text, code, or images without clear permission. This has sparked a wave of lawsuits and a “tragedy of the commons” concern in creative industries. Content creators argue that AI firms are appropriating value (i.e., training on their work and potentially displacing their market) without compensation, effectively shifting value away from original producers. Courts are now grappling with whether using copyrighted material to train AI constitutes massive-scale infringement. AI’s rapid progress might be coming at the expense of creative labor and IP holders, a cost not reflected in GDP.

Acknowledging these hidden costs is important when updating our measurement frameworks. An ideal “AI satellite account” or improved GDP methodology would net out not just the hidden benefits of AI but also these intangible liabilities—from data breaches and privacy loss to IP disputes—to fully understand AI’s net contribution to welfare and productivity.

A two-horizon measurement roadmap

To close these gaps, the U.S. should pursue a dual strategy: (a) a near-term Generative AI Intensity Index that leverages provider data for a timely view of AI use across the economy, and (b) medium-term reforms to national accounts aligned with the system of national accounts (SNA) and a holistic AI value-chain framework. This two-horizon approach delivers immediate situational awareness while laying the groundwork for robust long-run integration of AI into official statistics. In effect, we can gain a fast proxy for AI’s footprint now, without waiting years for perfect measures, all while steering those eventual measures in the right direction.

Near-term: An AI intensity index

Using already existing tools, we could implement an AI Intensity Index—a real-time metric of AI usage that augments traditional surveys with the “digital exhaust” of AI systems. Such a real-time index would have broad appeal, and policy groups like SeedAI have been increasingly making suggestions for better measurement of AI as well.

Kristina McElheran and coauthors integrated questions on AI into the Census’ Annual Business Survey of 850,000 firms across the U.S. They find that fewer than 6% of firms were regularly utilizing AI for business operations in 2018. The average, however, masks substantial heterogeneity: When weighted by employment, the weighted average adoption rate was 18%. More recently, the Census’ monthly Business Trends and Outlook Survey (BTOS) reports that, as of December 2025, approximately 18% of firms report using AI in the last two weeks with just under 22% reporting likely use in the next 6 months. Similarly, some industries have widely adopted AI, with the information sector at more than 35%, but manufacturing and retail trade at roughly 10-15%.

While highly informative and integral to benchmarking, the BTOS is not designed to measure intensity of use, the share of workers or tasks affected, or household-level adoption, and their high-frequency modules are constrained by respondent burden. For those reasons, they are essential inputs but not a full solution to the AI measurement problem. This aligns with priorities identified by the AEA Committee on Economic Statistics, which emphasized that sustained, high quality, representative business survey measurement is necessary for credible, economy-wide inference, even as complementary data innovations are developed

The Gallup Workplace Panel, however, is a large, nationally-representative sample of the working population. Starting in the second quarter of 2023, Gallup began asking individuals annually about AI utilization—broadly defined as algorithms that are used to do what humans typically do—and, in particular, the frequency with which they use AI. In 2025, Gallup started asking the AI questions quarterly, providing even higher frequency measurements of AI adoption across the labor market. Recent work by one of us has leveraged the longitudinal variation within the Gallup Panel to document trends on AI adoption and learn more about its drivers. As of the fourth quarter of 2025, nearly 26% of respondents are using AI daily or frequently during the week, and an additional 19% are using AI at least sometimes during the year.

While these surveys convey critical information, they cannot be fully dynamic and real time. To augment these survey measures, a Generative AI Intensity Index could measure actual usage intensity by tracking the volume of AI model outputs. Major AI providers already meter how many tokens or queries their models process, so the Index could aggregate these usage metrics into a standardized unit called Normalized Token Equivalents (NTEs). NTEs could convert raw indicators into a common token-based scale, enabling consistent comparison across different model types and providers. For example, whether a firm is using OpenAI’s GPT-4, an image generation API, or another model, their usage can be translated into NTEs as a measure of generative AI activity. This provides a direct proxy for AI adoption intensity, capturing how deeply and frequently AI tools are used rather than just whether they are used.

Crucially, the Intensity Index would be mapped to industries and regions. By pairing usage data with the NAICS industry codes of the firms or clients using the AI and location meta-data among users, the Index can report which sectors and geographies are deploying the most AI and how intensively. Together, sector and regional breakdowns would give policymakers a granular, timely heat map of where generative AI is being used across the country.

The availability of quarterly data from actual survey respondents using AI, coupled with high frequency AI telemetry data, would allow for a dynamic index. In one sense, the index would share properties of the Atlanta Federal Reserve’s GDPNow forecast, which incorporates data into its GDP forecast as it becomes available, becoming more accurate in real time.

The design of the Index emphasizes privacy and low burden. Because it relies on automated metering by AI providers, firms themselves would not need to fill out new surveys or disclose proprietary data. All data can be aggregated and anonymized by the providers before sharing, ensuring no sensitive information about individual companies or users is exposed. The goal is to have privacy-preserving telemetry: Providers report counts and metrics that are stripped of personal identifiers and are aggregated at industry or regional levels. A collaborative approach with standards-setting can facilitate this. The National Institute of Standards and Technology (NIST) could establish technical standards for how tokens and related metadata are measured and reported, including protocols to preserve privacy and protect confidential business use. Using common standards will also ensure that data from different AI platforms is comparable.

With NIST and statistical agencies involved, the Index can be implemented in a way that adheres to strict privacy, security, and confidentiality norms—similar to how official statistics handle sensitive company data under Title 13 protections.

Medium-term: Aligning national counts with AI

Looking beyond the immediate index, the U.S. must modernize its national accounting system to properly record AI’s contributions as the technology matures. Much like when early pioneers in the intangible capital literature worked hard to improve their measurement in the national accounts, the emergence of generative AI offers new opportunities and challenges. This means tracking AI inputs, the services AI provides, changes in production processes and labor, and even effects on household non-market production. We outline four priority areas:

  1. Capturing AI-specific inputs: Many of the investments that currently enable AI are hidden in broad categories. We need to isolate and track the “hard” AI infrastructure and other inputs more directly. This includes identifying spending on specialized semiconductors (chips), data center equipment, and cloud infrastructure that is driven by AI development. These capital investments could be recorded as a distinct subcategory in national accounts (e.g., separating AI-related information and communications technology (ICT) investment). Similarly, R&D expenditures on AI algorithms and large-scale data preparation should be tagged and tracked. A key challenge is that AI infrastructure is globally distributed—a U.S. company’s AI system might run on servers in multiple countries. To address this cross-border complexity, BEA and international partners should develop extended, multi-regional input-output (MRIO) tables and related capital flow accounts. These would trace how AI-related investments and intermediate inputs flow across borders—for instance, linking U.S. AI services to the foreign energy, hardware, and data.
  2. Measuring AI services and quality changes: AI can be thought of as a new intermediate service that businesses consume to produce output. However, today these services are not well-accounted for: They may be lumped into software or simply not observed if provided for free. Statistical agencies should better capture AI-as-a-service in economic accounts. One approach is to treat AI outputs as an intermediate input in production statistics and input-output tables. For example, if an insurance company uses an AI model for claims processing, the value of that AI service (even if internal) should be recorded similarly to how we record the use of, say, legal or IT services. This requires collecting data on expenditures on AI (where they exist) or imputing values for internally developed AI solutions. BEA could create a category for AI service inputs in industry accounts, improving visibility into which sectors are actually driving AI demand. Additionally, price and quality measurement techniques must evolve. When AI services are embedded or free, traditional price indexes miss the consumer benefit. The BLS and BEA can incorporate new quality-adjustment metrics for AI software and services—for instance using performance benchmarks, user satisfaction ratings, or other telemetry as proxies for quality change. Experimental measures could leverage data like AI model benchmark scores or reliability metrics to adjust software price indices.
  3. Enhancing labor and productivity measurement: AI’s effect on work is at the level of tasks and skills, which our existing labor statistics are ill-equipped to monitor. Over the medium term, the statistical system should integrate task-level data collection to complement job-level metrics. For example, labor force or establishment surveys can add questions about the use of AI tools on the job and which tasks are being automated or augmented by AI. Another rich source is time-use data: Regular time-use surveys could be expanded to capture time spent on tasks with AI assistance versus without. By linking these data to productivity measurements, we can start to attribute productivity gains to task-level automation rather than treating it as an unexplained residual. Moreover, matched employer-employee datasets and administrative payroll records can be leveraged to see how AI adoption correlates with employment, wages, and skill demand within firms. New categories for occupations like machine learning engineers, prompt designers, or AI ethicists could also be added, and firm registries could tag AI-focused businesses. All these improvements align with Diane Coyle’s view that we need more granular, task-based statistics to understand AI’s impact on productivity.
  4. Accounting for AI in households and non-market production: AI is not only changing the market economy; it is also entering our homes and daily lives, performing tasks that never showed up in GDP because they were done as unpaid household work. That will be true to an even greater extent once genAI integrates with humanoid robots to complete tasks at home. To fully grasp welfare and productivity, the national accounts should extend to measure these household-level gains. These technologies effectively increase household productivity; a family can get chores done faster or better, freeing up time for leisure or other work. While traditional GDP will not capture a faster-cleaned house or an AI-scheduled calendar, we can develop satellite accounts for household production to quantify these benefits. The BLS already runs the American Time Use Survey; expanding it to measure time saved due to AI assistance would provide data to value those time savings. One approach is to assign an imputed economic value to the hours of housework or caregiving that AI tools take over, similar to how we sometimes value unpaid labor by replacement cost. However, as Coyle points out, current household satellite accounts rely on valuing unpaid work by the hours of human labor—a method that breaks down if AI capital replaces human effort. We may need to treat home AI devices as a form of household capital and measure their services explicitly. Over time, improvements in these accounts will let us track how AI contributes to living standards beyond the market, capturing gains in free time and well-being that GDP misses.

Concrete policy applications

A well-implemented AI Intensity Index and improved AI accounts would directly inform a range of policy decisions. The Index could serve as an early warning system for labor transitions in different sectors. If a surge in generative AI usage is observed in, say, the legal services industry or graphic design sector, it may foreshadow significant changes in the demand for those professionals. Policymakers could use this insight to proactively ramp up worker retraining programs or education curricula in affected regions. For instance, local workforce boards seeing high AI intensity in marketing and writing services might launch initiatives to help content creators upskill in using AI tools or shift to tasks that AI cannot do.

In macroeconomic policy, the data would help distinguish productivity-driven growth from weak demand. One perennial challenge for economists is interpreting movements in productivity and output. If productivity (output per hour) is rising, is it because firms are innovating and becoming more efficient or because they cut labor due to slack demand? An AI Intensity Index provides additional context. For example, if productivity jumps in an industry at the same time that industry’s AI intensity is climbing, it is likely that the productivity gain is technology-driven rather than just a cyclical blip. With an AI usage metric, analysts could identify whether a slowdown in output is occurring alongside AI uptake, thus aiding monetary and fiscal policymakers in interpreting the data.

Infrastructure and regional planners would also benefit. The Intensity Index could show which areas are becoming AI hubs and might face infrastructure bottlenecks. A region with many AI-using firms will likely see growing needs for data center capacity, electricity, and high-bandwidth connectivity. If the Index shows a cluster of high AI activity in a particular state or metro, that could prompt coordination with utilities and local governments to ensure adequate electric power and water supply. It could also feed into decisions about where to incentivize new data center construction or fiber-optic network upgrades. Additionally, environmental regulators could use AI usage data to project carbon emissions hotspots, tying into environmental impact assessments and sustainability planning. In short, AI intensity data adds a forward-looking layer to infrastructure policy: rather than reacting to strain on the grid or water systems, planners could foresee it by tracking the growth trajectory of AI compute demand.

The Index and related data would inform productivity and innovation policies as well. By revealing which sectors are lagging in AI adoption, it could guide policymakers on where to promote diffusion. For example, if parts of manufacturing show low generative AI intensity relative to other sectors, commerce departments or NIST might target those areas for AI pilot programs or technical assistance to boost competitiveness.

Finally, an AI measurement system would aid in program evaluation after policy implementation. If the government enacts an AI adoption tax credit or funds AI research hubs, the Generative AI Intensity Index could help gauge the success or failure of these initiatives over time. It also helps avoid false narratives. For example, if productivity growth remains low in headline stats, skeptics might claim AI is overhyped. But if the Index shows heavy usage, it means the benefits might be manifesting in ways GDP does not fully capture (like quality improvements or consumer surplus). That evidence can justify more careful output metrics and continuing supportive policies for AI diffusion coupled with modernizing metrics, rather than assuming AI isn’t contributing to the economy. In sum, better measurement is not just an academic exercise—it directly enables smarter, more timely policy responses in workforce development, macroeconomic management, infrastructure, and innovation strategy.

From concrete uses to a national AI measurement roadmap

Better measurement of AI is both feasible and necessary if policymakers are to make informed choices about innovation, competition, and the labor market. The blueprint in this essay moves from abstract concern about AI to a concrete statistical agenda that treats AI use and investment as measurable features of the economy rather than as a residual. Without that visibility, debates over productivity, wages, and regional development will continue to unfold with AI largely off the books, even as the technology permeates everyday business practice.

Recent federal initiatives provide a natural home for such a roadmap. Most recently, President Trump’s Genesis Mission executive order on November 24, 2025 directs federal agencies to build a common AI platform that leverages national laboratory compute and high value government datasets in service of scientific discovery, national security, and public health. America’s AI Action Plan similarly set priorities for research funding, safety, and deployment. In addition, the Department of Labor, with the BLS, Census, and BEA, plan to launch an AI Workforce Research Hub to generate recurring analyses and scenario planning on AI adoption, displacement, and wage effects, with the explicit goal of translating measurement into workforce and education policy. These are all valuable, but we need a parallel commitment to track how AI is actually diffusing through firms, sectors, and regions. A dedicated AI measurement program would supply that missing layer, augmenting and informing the federal AI statistical strategy

There are precedents for this kind of measurement roadmap in other domains. BEA’s digital economy satellite account turned an amorphous concept, the digital economy, into a regular statistical product linked to the core national accounts. The Environmental Protection Agency’s greenhouse gas inventory created a standing infrastructure for quantifying emissions and evaluating climate policy. Real-time tools such as the Atlanta Fed’s GDPNow model illustrate how experimental indicators can complement, rather than replace, traditional statistics. In each case, the federal government chose to build durable, cross-agency measurement systems so that new technologies and risks could be treated as objects of policy rather than anecdotes.

AI now merits the same treatment. Just as the United States built a satellite account to track digitalization, it should adopt an explicit roadmap for AI measurement. In the near term, that roadmap would establish a Generative AI Intensity Index as a real-time indicator of AI use across sectors and regions, governed by an interagency AI measurement task force and implemented in partnership with major AI providers. Over a somewhat longer horizon, it would integrate AI into the national accounts through identifiable AI capital, associated service flows, labor reallocation, and household production. Paired with the Genesis Mission’s investments in AI for science and national priorities, such a measurement program would give policymakers the same kind of asset they now rely on for emissions and GDP: a shared empirical baseline on which to build workforce, macroeconomic, infrastructure, and innovation policy.

Authors

  • Acknowledgements and disclosures

    Corresponding author: Christos Makridis, Arizona State University and Gallup.

    The authors acknowledge the following support for this article:

    • Research: Aidan T. Kane
    • Editorial: Robert Seamans, Sanjay Patnaik, and Chris Miller 
  • Footnotes
    1. Chad Syverson addressed this argument in a seminal article, “Challenges to Mismeasurement Explanations for the US Productivity Slowdown,” Journal of Economic Perspectives, 31(2): 2017. He argued that unmeasured quality improvements would have to be implausibly large to account for the mismeasured productivity growth. However, especially with the advent of genAI as a general purpose technology, his exercise—which focused on three computer related sub-sectors—would need to be expanded. Indeed, many of the sectors where genAI innovation is taking place are outside the computer sector. There is also a growing recognition that North American Industry Classification System (NAICS) codes are becoming too rigid (e.g., Amazon is more of a technology company than a retail one).
    2. This concept comes from the work by Joshua New, Marina Meyjes, and Austin Carson in SeedAI, “The U.S. Needs A Generative AI Intensity Index.”

The Brookings Institution is committed to quality, independence, and impact.
We are supported by a diverse array of funders. In line with our values and policies, each Brookings publication represents the sole views of its author(s).