Estimating income tax liabilities with data from the Survey of Consumer Finances

Editor's note:

This blog is based on a paper published January 11, 2022.

In a recent paper, we describe a new methodology for estimating income tax liabilities in public-use Survey of Consumer Finances (SCF) micro data files. Most recently conducted in 2019, the SCF is a triennial household survey with extensive demographic, income, and balance sheet information for the designated survey respondent, and if present, the respondent’s spouse/partner. The survey also collects basic demographic information, financial dependency indicators, and summary income measures for up to ten additional household members. The SCF is unique among public-use household surveys because it oversamples wealthy households and is thus suitable for studying trends in top wealth and income shares (Bhutta et al. 2020; Bricker et al. 2016). Like most household surveys, however, the SCF does not ask detailed questions about household tax filing or tax liabilities.

At the outset, a natural question is why constructing a way to address tax issues with SCF data is useful. There are both general and specific answers. At the most general level, tax data alone is insufficient to address tax issues because the income reported on tax forms is already affected by laws, avoidance strategies, and evasion practices. In terms of a specific application, we show in a companion paper that less than half of all income generated by closely held businesses in the United States actually shows up on tax forms. The massive “leakage” between the generation of economic income and the listing of income on tax forms is a ripe area for analysis and policy recommendation. A data set like the SCF, with information on household income as well as wealth, can be a valuable tool to undertake such analysis, if it contains tax information as well.

Our overall strategy is to divide SCF households into tax units, reconcile survey and taxable incomes measures, create the other necessary inputs for estimating income tax liabilities, and then estimate income tax liabilities for SCF tax unit micro files in conjunction with the most recent version of NBER’s on-line tax calculator TAXSIM. TAXSIM replicates U.S. federal income tax rules over time, including the 1995 to 2019 period (tax years 1994 to 2018) spanned by the SCF micro data files that we use.

We proceed in several steps. First, we create tax units within SCF households. For most SCF households—such as a single person or married couple living alone or with dependent children—this process is simple. These households also account for the vast majority of income. Some households, however, contain multiple potential filing units—because they consist of either different generations or unrelated individuals. In these cases, we use data on demographic relationships, financial dependence measures, marital histories, and incomes to simulate tax filing units. We also benchmark our simulated outcomes against published tax filings in the Statistics of Income (SOI).

Second, we map SCF incomes into income concepts consistent with those reported on tax forms. SCF incomes are largely intended to be consistent with their taxable counterparts, but even after resolving conceptual differences, we show that the survey values are systematically higher than the published tax values. Although we do not exhaustively explore aggregate and distributional differences across income categories, the key observation that emerges is that the gap in business incomes (mathematically) accounts for most of the overall income gap.

Third, we model itemized deductions. Taxpayers can choose between itemized deductions and a standard deduction that varies with filing status. The SCF captures about half of itemizable expenses, and we impute the other half using published SOI deductions. Our two key benchmarks are how well we track the number of tax filers who choose to itemize and the total value of itemized deductions.

Fourth, we present baseline tax liability estimates, before and after credits, using the NBER TAXSIM model and benchmark those against published SOI values. Because incomes are systematically higher in the SCF relative to SOI, our estimated tax liabilities are also higher. Because the gap between SCF and SOI incomes is concentrated at the top of the income distribution, and the tax system is progressive, the gap in tax liabilities is not surprisingly larger than the income gap.

We conclude by noting that the results contained in this methodology paper, especially the differences in business income across data sources, have important implications for recent controversies regarding the distribution of income and wealth. We explore these topics in a companion paper that builds on the methodology developed here.

The Brookings Institution is financed through the support of a diverse array of foundations, corporations, governments, individuals, as well as an endowment. A list of donors can be found in our annual reports published online here. The findings, interpretations, and conclusions in this report are solely those of its author(s) and are not influenced by any donation.

Estimating income tax liabilities with data from the Survey of Consumer Finances

Subscribe to the Economic Studies Bulletin

Estimating income tax liabilities with data from the Survey of Consumer Finances