Over the past several years, researchers in our community, including ourselves, have developed methods for evaluating machine learning and artificial intelligence (AI) models for issues of bias, discrimination, and other adverse impacts, as well as mitigation of other problems as they arise. These advances have uncovered problematic model behavior and successfully informed the use of algorithms in high stakes settings. However, recent advances in AI offer new challenges to evaluation and regulation. More traditional algorithmic models were largely designed for a specific context. When the context is known a priori, adverse impacts can more easily be foreseen. However, the latest generation of AI models are much more general-purpose in nature with the full scope of potential uses currently unknown to the public. Without better information about how generative AI models are used, researchers and policymakers are left to discuss the hypothetical risks of AI in the abstract, not the concrete risks we are currently facing. If we are to create guardrails that effectively address the real impacts of AI, we first need better information about how the models are used today.
Modern generative AI models, such as GPT-4, differ from previous generations of machine-learned models in that they are more general-purpose. Until now, most of the work to audit models for adverse outcomes, discrimination, fairness, and bias has been done in the context of models that have a specific predictive objective and/or whose range of intended use is self-evident from the design. For example, consider the canonical example of a “high-stakes model” that is used to help make lending decisions. Such a model would typically have structured, pre-determined inputs and a well-defined prediction target—for example, whether an applicant will successfully repay a loan. The context in which such a model would be used is clear. Indeed, it would be hard to imagine how such a model would be used for tasks well beyond the scope of its intended use, such as offering medical advice, ranking local coffee shops, or drafting emails. In these cases, where a model’s intended use is clear and structured, the last several years have seen rapid progress in techniques to evaluate models for issues related to use in that context and inform regulation where necessary.
For the newest generation of generative AI models, the range of tasks for which the models can and will be used is much greater. While “general-purpose” might overstate the models’ capabilities if it is taken to mean they are useful for any task, it is certainly fair to say that the models can and are being used for a much wider variety of tasks than the conventional machine learning models. Recent research provided preliminary insights into the various tasks GPT models are used for, including tasks that have not been considered in natural language processing benchmarks. Researchers released a dataset of 570,000 real-world user-ChatGPT conversations, which offers a resource to identify some of the tasks that users perform. Anyone can use these models as they do not require machine learning or programming experience. For example, it is now commonplace to see text-based generative AI models used to help draft computer code, summarize, edit, or evaluate written documents, create creative content such as poetry and art, and much more. Each of these activities come with their own attendant risks and harms, specific to the context in which the model is used.
Even in highly regulated contexts, like employment, it is not entirely clear how generative AI models will be or are used. They could be used like traditional machine learning models (e.g., predicting employee success) simply by instructing the model to rate or screen employee application materials. They also could be used to compile and summarize interviewer notes about candidates. Candidates themselves also might use AI technologies, like generative formats, to help draft cover letters or resumes. Moreover, the applicant’s former colleagues and supervisors may use generative AI to draft letters of recommendation. There are other ways they are used in this context that we are not currently aware of. Without comprehensive information on use cases, it is difficult to design evaluations that speak to the way bias, discrimination, or adverse outcomes could manifest with respect to that use.
Ideally, the work of creating regulations for AI will be undertaken by incorporating the best research on the AI models. Principled evaluations of the risks that these models pose will also be critical to the creation of regulation that addresses the real impacts these models are having. By understanding risks, policymakers can prioritize attention to the applications with the highest levels of immediate or impactful risks. However, without better information about how the models are being used, researchers, regulators, and the public are left shooting in the dark. The task of creating useful evaluations and effective regulation becomes much harder when we not only have to anticipate the attendant risks and adverse outcomes associated with a particular use case, but we also must foresee the modes of use themselves.
There are many ways better information about use could be obtained. One way is through government mandates for direct reporting. For example, in 2023 S.3050 was introduced, which would require entities in the financial services industry to report “which tasks are most frequently being assisted or completed with artificial intelligence,” and the U.S. government recently released a report cataloging over 700 diverse ways AI is being used by agencies in the federal government. According to a recent White House memo and pursuant to the Advancing American AI Act, federal agencies “must annually submit an inventory of its AI use cases to Office of Management and Budget (OMB) and subsequently post a public version on the agency’s website,” so government uses of the technology will likely remain up to date. These types of orders could be expanded to include other highly regulated industries and high-stakes application areas.
Government mandates for providing information on AI use only go so far. While these offer a good starting point for thinking through risks and potential use and misuse, they will miss out on use cases employed by private individuals outside of the context of highly regulated industries. This gap could be filled by independent researchers through surveys or user interviews, though we expect these methods would likely miss cases of malicious use or other use cases users would be reluctant to disclose, perhaps because they would be embarrassing or display socially undesirable behavior by the users.
This is where the tech companies who are building and selling access to the models come in. Undoubtedly, they have access to log data recording the various ways users are interacting with their products. It may not be straightforward to process this data to monitor and taxonomize different types of use. And reporting on findings based on data from user interaction with the models would need to be done with care to preserve user privacy. However, the companies are best positioned to provide information on types of use, as they are the only ones with access to the data needed to provide a broad picture of the types of use taking place. As representatives from major AI companies implore lawmakers to create regulations that ensure responsible use of the technology they are creating, they need to do their part to facilitate discussion and debate that is grounded not in hypothetical risks but rather risks based on real use. Although they may be reluctant to share such information due to the competitive advantage it confers, if we are to take their pleas for regulation in good faith, the tech companies building more general-purpose AI need to step up and provide regulators and the general public with the basic information necessary to actually do what they are asking. If they don’t, researchers and regulators alike are likely to overlook many of the real risks we face today.
Commentary
Effective AI regulation requires understanding general-purpose AI
January 29, 2024