This post summarizes “How open-source software shapes AI policy,” a recent report from Alex Engler and The Brookings Institution’s Artificial Intelligence and Emerging Technology (AIET) Initiative.
Open-source software (OSS), which is free to access, use, and change without restrictions, plays a central role in the development and use of artificial intelligence (AI). An AI algorithm can be thought of as a set of instructions—that is, what calculations must be done and in what order; developers then write software which contains these conceptual instructions as actual code. If that software is subsequently published in an open-source manner—where the underlying code publicly available for anyone to use and modify—any data scientist can quickly use that algorithm with little effort. There are thousands of implementations of AI algorithms that make using AI easier in this way, as well as a critical family of emerging tools that enable more ethical AI. Simultaneously, there are a dwindling number of OSS tools in the especially important subfield of deep learning—leading to the enhanced market influence of the companies that develop that OSS, Facebook and Google. Few AI governance documents focus sufficiently on the role of OSS, which is an unfortunate oversight, despite this quietly affecting nearly every issue in AI policy. From research to ethics, and from competition to innovation, open-source code is playing a central role in AI and deserves more attention from policymakers.
1. OSS speeds AI adoption
OSS enables and increases AI adoption by reducing the level of mathematical and technical knowledge necessary to use AI. Writing the complex math of algorithms into code is difficult and time-consuming, which means any existing open-source alternative can be a huge benefit for data scientists. OSS benefits from both a collaborative and competitive environment in that developers work together to find bugs just as often as they compete to write the best version of an algorithm. This frequently results in more accessible, robust, and high-quality code relative to what an average data scientist—often more of a data explorer and pragmatic problem-solver than pure mathematician—might develop. This means that well-written open-source AI code significantly expands the capacity of the average data scientist, letting them use more-modern machine learning algorithms and functionality. Thus, while much attention has been paid to training and retaining AI talent, making AI easier to use—as OSS code does—may have a similarly significant impact in enabling economic growth from AI.”
2. OSS helps fight AI bias
Open-source AI tools can also enable the broader and better use of ethical AI. Open-source tools like IBM’s AI Fairness 360, Microsoft’s Fairlearn, and the University of Chicago’s Aequitas ease technical barriers to fighting AI bias. There is also OSS software that makes it easier for data scientists to interrogate their models, such as IBM’s AI Explainability 360 or Chris Molnar’s interpretable machine learning tool and book. These tools can help time-constrained data scientists who want to build more responsible AI systems, but are under pressure to finish projects and deliver for clients. While more government oversight of AI is certainly necessary, policymakers should also more frequently consider investing in open-source ethical AI software as an alternative lever to improve AI’s role in society. The National Science Foundation is already funding research into AI fairness, but grant-making agencies and foundations should consider OSS as an integral component of ethical AI, and further fund its development and adoption.
3. OSS AI tools advance science
In 2007, a group of researchers argued that “the lack of openly available algorithmic implementations is a major obstacle to scientific progress” in a paper entitled “The Need for Open Source Software in Machine Learning.” It’s hard to imagine this problem today, as there is now a plethora of OSS AI tools for scientific discovery. As just one example, the open-source AI software Keras is being used to identify subcomponents of mRNA molecules and build neural interfaces to better help blind people see. OSS software also makes research easier to reproduce, enabling scientists to check and confirm one another’s results. Even small changes in how an AI algorithm was implemented can lead to very different results; using shared OSS can mitigate this source of uncertainty. This makes it easier for scientists to critically evaluate the results of their colleague’s research, a common challenge in the many disciplines facing an ongoing replication crisis.
While OSS code is far more common today, there are still efforts to raise the percent of academic papers which publicly release their code—currently around 50 to 70 percent at major machine learning conferences. Policymakers also have a role in supporting OSS code in the sciences, such as by encouraging federally funded AI research projects to publicly release the resulting code. Grant-making agencies might also consider funding the ongoing maintenance of OSS AI tools, which is often a challenge for critical software. The Chan Zuckerberg Initiative, which funds critical OSS projects, writes that OSS “is crucial to modern scientific research… yet even the most widely-used research software lacks dedicated funding.”
4. OSS can either help or hinder tech sector competition
OSS has significant ramifications for competition policy. On one hand, the public release of machine learning code broadens and better enables its use. In many industries, this will enable more AI adoption with less AI talent—likely a net good for competition. However, for Google and Facebook, the open sourcing of their deep learning tools (Tensorflow and PyTorch, respectively), may further entrench them in their already fortified positions. Almost all the developers for Tensorflow and PyTorch are employed by Google and Facebook, suggesting that the companies are not relinquishing much control. While these tools are certainly more accessible to the public, the oft stated goal of ‘democratizing’ technology through OSS is, in this case, euphemistic.
Tensorflow and PyTorch have become the most common deep learning tools in both industry and academia, leading to great benefits for their parent companies. Google and Facebook benefit more immediately from research conducted with their tools because there is no need to translate academic discoveries into a different language or framework. Further, their dominance manifests a pipeline of data scientists and machine learning engineers trained in their systems and helps position them as the cutting-edge companies to work for. All told, the benefits to Google and Facebook to controlling OSS deep learning are significant and may persist far into the future. This should be accounted for in any discussions of technology sector competition.
5. OSS creates default AI standards
OSS AI also has important implications for standards bodies, such as IEEE, ISO/JTC, and CEN-CENELEC, which seek to influence the industry and politics of AI. In other industries, standards bodies often add value by disseminating best practices and enabling interoperable technology. However, in AI, the diversified use of operating systems, programming languages, and tools means that interoperability challenges have already received substantial attention. Further, the AI practitioner community is somewhat informal, with many practices and standards disseminated through twitter, blog posts, and OSS documentation. The dominance of Tensorflow and PyTorch in the deep learning subfield means that Google and Facebook have outsized influence, which they may be reluctant to cede to the consensus-driven standards bodies. So far, OSS developers have not been extensively engaged in the work of the international standards bodies, and this may significantly inhibit their influence on the AI field.
AI policy is tied to Open-Source Software
From research to ethics, and from competition to innovation, open-source code is playing a central role in the developing use of artificial intelligence. This makes the consistent absence of open-source developers from the policy discussions quite notable, since they wield meaningful influence over, and highly specific knowledge of, the direction of AI. Involving more OSS AI developers can help AI policymakers more routinely consider the influence of OSS in the pursuit of the just and equitable development of AI.
The National Science Foundation, Facebook, Google, Microsoft, and IBM are donors to the Brookings Institution. The findings, interpretations, and conclusions posted in this piece are solely those of the authors and not influenced by any donation.