Determining systematic differences in human graders for machine learning-based automated hiring

Candidates sitting in waiting room
Editor's note:

This is a Brookings Center on Regulation and Markets working paper.


Firms routinely utilize natural language processing combined with other machine learning (ML) tools to assess prospective employees through automated resume classification based on pre-codified skill databases. The rush to automation can however backfire by encoding unintentional bias against groups of candidates. We run two experiments with human evaluators from two different countries to determine how cultural differences may affect hiring decisions. We use hiring materials provided by an international skill testing firm which runs hiring assessments for Fortune 500 companies. The company conducts a video-based interview assessment using machine learning, which grades job applicants automatically based on verbal and visual cues. Our study has three objectives: to compare the automatic assessments of the video interviews to assessments of the same interviews by human graders in order to assess how they differ; to examine which characteristics of human graders may lead to systematic differences in their assessments; and to propose a method to correct human evaluations using automation. We find that systematic differences can exist across human graders and that some of these differences can be accounted for by an ML tool if measured at the time of training.

Download the full working paper here.

Principal Investigator Mike Teodorescu was funded by a USAID grant through a Visiting Scholar position at Massachusetts Institute of Technology, D-Lab, which helped pay for research assistants and experiment expenses. Professors Mike Teodorescu, Nailya Ordabayeva, and Marios Kokkodis received summer research support from Boston College.

Aspiring Minds (acquired by SHL in 2020), the organization which provided data access, had the opportunity to review the content of this working paper. Varun Aggarwal is Chief AI Officer at SHL. Other than the aforementioned, the authors did not receive financial support from any firm or person for this article or from any firm or person with a financial or political interest in this article. Other than the aforementioned, the authors are not currently an officer, director, or board member of any organization with a financial or political interest in this article.