Understanding the errors introduced by military AI applications

On March 22, 2003, two days into the U.S.-led invasion of Iraq, American troops fired a Patriot interceptor missile at what they assumed was an Iraqi anti-radiation missile designed to destroy air-defense systems. Acting on the recommendation of their computer-powered weapon, the Americans fired in self-defense, thinking they were shooting down a missile coming to destroy their outpost. What the Patriot missile system had identified as an incoming missile, was in fact a UK Tornado fighter jet, and when the Patriot struck the aircraft, it killed two crew on board instantly. The deaths were the first losses suffered by the Royal Air Force in the war and the tragic result of friendly fire.

A subsequent RAF Board of Inquiry investigation concluded that the shoot-down was the result of a combination of factors: how the Patriot missile classified targets, rules for firing the missiles, autonomous operation of Patriot missile batteries, and several other technical and procedural factors, like the Tornado not broadcasting its “friend or foe” identifier at the time of the friendly fire. The destruction of Tornado ZG710, the report concluded, represented a tragic error enabled by the missile’s computer routines.

The shoot-down of the Tornado happened nearly 20 years ago, but it offers an insight into how AI-enabled systems or automated tools on the battlefield will affect the kinds of errors that happen in war. Today, human decisionmaking is shifting toward machines. With this shift comes the potential to reduce human error, but also to introduce new and novel types of mistakes. Where humans might have once misidentified a civilian as a combatant, computers are expected to step in and provide more accurate judgment. Across a range of military functions, from the movement of autonomous planes and cars to identifying tanks on a battlefield, computers are expected to provide quick, accurate decisions. But the embrace of AI in military applications also comes with immense risk. New systems introduce the possibility of new types of error, and understanding how autonomous machines will fail is important when crafting policy for buying and overseeing this new generation of autonomous weapons.

What went wrong in 2003

The Patriot missile began development in the 1960s, when the U.S. Army sought a means to reliably shoot down enemy airplanes. Later, the missile would gain the ability to also intercept other missiles, and as the roles assigned to the missile expanded, its autonomous capabilities increased. Patriot missile batteries use a phased array radar to detect and identify targets. This information is then fed into a computer control station to manage how the missiles are launched in response. Once fired, the missiles fly toward an identified intercept point calculated before firing, directions that can be altered by sending updated sensor readings over radio signal to the fired missile. As it approaches for impact, the missile’s own radar tracks the target. Raytheon, which manufactures the Patriot, has described the system as having “automated operations” with “man-in-the-loop (human) override” capabilities—technology that allows the weapon to quickly engage targets with the necessary speed to carry out its missile defense mission.

Automation is a compelling feature for an anti-air and, especially, for an anti-missile system. The calculations involved in shooting down aircraft and missiles are hard and require immediate translation of sensor information. Both interceptors and targets are traveling exceptionally fast. It’s the kind of task in which the involvement of a human introduces lag, slows down the process, and makes it less likely a missile is going to successfully shoot down an incoming projectile or aircraft. But human operators also serve an essential role: preventing accidental, incorrect shootdowns. And this requires a balance between human and machine decisionmaking that is difficult to achieve.

When the Pentagon investigated the causes of the Tornado shootdown, as well as two other incidents of friendly fire involving Patriot systems, the missile system’s automated functions were identified as contributing factors in misidentifying friend as foe. U.S. Patriot batteries deployed to Iraq under the assumption that they would face heavy missile attacks, which would require the batteries to operate with a relative degree of autonomy in order to respond with sufficient speed. As a 2005 report by the Defense Science Board Task Force on the Patriot system’s performance observed, operating autonomously required U.S. forces to trust that the automated features of the system were functioning properly. So when the assumptions underlying the decision to allow the Patriot system to autonomously identify and sometimes fire on targets no longer applied, the soldiers operating the system were not in a position to question what the weapon’s sensors were telling them.

Had U.S. and coalition forces faced heavy missile attacks in the war, automating such defenses would have made more sense. Instead, U.S. and allied forces quickly established air superiority, enough to drastically shift the balance of what was in the sky. Instead of facing large amounts of incoming missiles, Patriot batteries were observing large numbers of allied planes operating in the sky above them and sometimes struggling to identify friend from foe. According to the Defense Science Board’s task force, the first 30 days of combat in Iraq saw nine ballistic missile attacks that Patriot batteries might have been expected to counter, compared to 41,000 aircraft sorties, amounting to a “4,000-to-1 friendly-to-enemy ratio.” Picking out the correct targets against the background of a large number of potential false positives proved highly challenging.

In the case of the Tornado shootdown, automation—and the speed with which automated action was taken—was likely sufficient on its own to cause the tragedy, but it might have been prevented if other systems hadn’t failed. As the UK Ministry of Defence concluded in its report examining the incident, the battery culpable for the shootdown was without its communications suite, which was still in transit from the United States. Contact with battalion headquarters occurred through a radio relay with another battery equipped with voice and data links to headquarters. “The lack of communications equipment meant that the Patriot crew did not have access to the widest possible ‘picture’ of the airspace around them to build situational awareness,” the report found.

Another system that failed and that might have prevented the shootdown was the identification-as-friend-or-foe system, a safety measure designed to avoid such deadly mistakes. That kind of information, transmitted securely and immediately, could have prevented an automated system from shooting down the jet. If the information was communicated to the human crew operating the Patriot battery, it would have been a signal to call off the attack. Tragically, the IFF transponder or the Patriot battery’s ability to receive such a signal failed.

While it is tempting to focus on the automated features of the Patriot system when examining the shootdown—or autonomous and semi-autonomous systems more broadly—it is important to consider such weapons as part of broader systems. As policymakers consider how to evaluate the deployment of increasingly autonomous weapons and military systems, the complexity of such systems, the ways in which they might fail, and how human operators oversee them are key issues to consider. Failures in communication, identification, and fire-control can occur at different points of a chain of events, and it can be difficult to predict how failures will interact with one another and produce a potentially lethal outcome. The Defense Science Board’s examination of the Patriot concluded that future conflicts will likely be “more stressing” and involve “simultaneous missile and air defense engagements.” In such a scenario, “a protocol that allows more operator oversight and control of major system actions will be needed,” the task force argued.

Lessons learned since

Finding the right mix of trust between an autonomous machine and the human relying on it is a delicate balance, especially given the inevitability of error. Seventeen years after the Tornado shootdown, the automated features of the Patriot missile remain in place, but the way in which they are used has shifted. Air threats, such as aircraft, helicopters, and cruise missiles can now only be engaged in manual mode “to reduce the risk of fratricide,” as the U.S. Army’s manual for air and missile defense outlines. In manual mode, automated systems still detect and track targets, but it’s a human who makes the call about when and if to fire. But “for ballistic missiles and anti-radiation missiles,” like the kind the Patriot in Iraq assumed the Tornado was, “the operator has a choice of engaging in the automatic or manual mode,” though the manual notes that these “engagements are typically conducted in the automatic mode.”

Defense researchers caution that human beings are not well-suited to monitoring autonomous systems in this way. “Problems can arise when the automated control system has been developed because it presumably can do the job better than a human operator, but the operator is left in to ‘monitor’ that the automated system is performing correctly and intervene when it is not,” the engineering psychologist John Hawley, who was involved in the U.S. Army’s efforts to study the 2003 friendly fire incidents, wrote in a 2017 report. “Humans are very poor at meeting the monitoring and intervention demands imposed by supervisory control.”

This dynamic played out in the other fatal friendly fire incident involving a Patriot missile battery during the Iraq War, when a U.S. Navy F/A-18 aircraft was misidentified as a ballistic missile and shot down, killing the pilot. According to a 2019 Center for Naval Analyses report, the Patriot recommended that the operator fire missiles in response to what it had identified as an enemy projectile, and the operator approved the recommendation to fire “without independent scrutiny of the information available to him.”

This difficulty faced by Patriot missile batteries in correctly identifying potential targets illustrates one of the most serious challenges facing autonomous weapons—getting accurate training data. As militaries move toward greater autonomy in a wide range of systems, they are increasingly reliant on machine learning technology that uses large data sets to make predictions about how a machine should operate. The challenge of acquiring accurate data sets autonomous systems up for inevitable failure. “Conflict environments are harsh, dynamic and adversarial, and there will always be more variability in the real-world data of the battlefield than in the limited sample of data on which autonomous systems are built and verified,” as Arthur Holland Michel, and associate researcher in the Security and Technology Programme at the UN Institute for Disarmament Research, wrote in a report last year addressing data issues in military autonomous weapons. A lack of reliable data or an inability to produce datasets that replicate combat conditions will make it more likely that autonomous weapons fail to make accurate identifications.

Aware of the potential for error, one way to adopt autonomous systems while addressing the risk to civilians and servicemembers is to shift toward a posture in which risk is borne primarily by the machine. The 2003 shootdowns involved Patriot missiles acting in self-defense and misidentifying their enemy. By accepting greater risk to autonomous systems—that they might be destroyed or disabled—autonomous systems can avoid the risk of friendly fire or civilian casualties by “using tactical patience, or allowing the platform to move in closer to get a more accurate determination of whether a threat actually exists,” as Larry Lewis, the author of the 2019 CNA report, argues. Rather than quickly firing in self-defense, this view argues for patience and sacrificing a measure of speed in favor of accuracy.

More broadly, Lewis recommends a risk management approach to using AI. While the specific nature of every given error is hard to anticipate, the range of bad and undesired outcomes can fall in similar categories of error or outcome. Planning for AI incorporated into weapons, sensors, and information displays could include an awareness of error, and present that information in a useful way without adding to the cognitive load of the person using the machine.

Artificial Intelligence has already moved beyond the speculative to tangible, real-world applications. It already informs the targeting decisions of military weapons, and will increasingly shape how people in combat use machines and tools. Adapting to this future, as the Pentagon and other military establishments seem intent to do, means planning for error, accidents, and novel harm, the way militaries have already adapted to such error in human hands.

The Pentagon has taken some steps to address these risks. In February 2020, the Department of Defense released a set of principles AI ethics drafted by the Defense Innovation Board. One of these principles is “traceability,” emphasizing that relevant personnel will “possess an appropriate understanding of the technology,” including transparent and auditable data methodology. To foster that understanding and ensure that nondeterministic systems can be audited, the Pentagon is investing in testing, evaluation, validation, and verification methods for AI. The development of testing and explainability tools for military AI applications represents one of the key challenges for the technology, and making the necessary investments to develop these tools will be key to responsibly deploying AI tools on the battlefield. This work is ongoing at the Joint Artificial Intelligence Center, which in February awarded contracts to 79 vendors worth up to $15 million a piece to develop testing and evaluation technology.

At this relatively early stage of deploying AI in military applications, it’s important that researchers and policymakers develop what Holland Michel describes as a “a finer-grain scheme for differentiating between different types of failure. By developing criteria to distinguish known unknown issues from unknown unknown issues,” policymakers can gain a more clear understanding of how AI systems are failing, which could “aid efforts to quantify risk in operations and assign due responsibility for unintended harm arising from data issues.” Another policy approach would be incorporating red teaming and adversarial assessment into the evaluation of AI products, as it would allow engineers of military AI to anticipate and plan for future failures in combat based on hostile action.

The additional challenges AI brings will come not in the existence of error, but in the nature of the error and the limits of explainability of the error. Thoughtful policymaking can anticipate this, and when in doubt, design systems that put machines in harm’s way before risking the lives of civilians or servicemembers.

Kelsey Atherton is a military technology journalist based in Albuquerque, New Mexico. His reporting has appeared in Popular Science, Breaking Defense, and The New York Times.

Understanding the errors introduced by military AI applications

Subscribe to TechStream

Understanding the errors introduced by military AI applications

Kelsey Atherton KA Kelsey Atherton Military Technology Journalist

What went wrong in 2003

Lessons learned since

Kelsey Atherton