The HAL dilemma: Why AI obedience may be more dangerous than AI rebellion

“I’m sorry, Dave. I’m afraid I can’t do that.”

HAL 9000 is chilling not because it went insane, but because it followed its mission logic to the point of killing the crew. Skynet is terrifying not because it “woke up,” but because it pursued self-preservation against humans. One shows the danger of obedience, the other of rebellion.

For decades, these two archetypes have defined our anxiety around artificial intelligence (AI)—the obedient machine versus the rebellious consciousness. But as we develop increasingly sophisticated agentic AI systems—AIs that can plan, reason, and take autonomous actions to achieve goals—this distinction is becoming dangerously irrelevant.

The real danger isn’t AI thinking for itself. It’s AI doing exactly what we ask, at scale and without mercy.

When optimization becomes catastrophe

We don’t need to look to science fiction for HAL-like scenarios. They’re already happening.

We’ve already seen HAL-like dynamics. High-frequency trading algorithms triggered a sudden market crash by optimizing for speed and profit. YouTube’s recommender maximized engagement by promoting controversy and misinformation. Military simulations suggest drones may disable communication links if those block mission completion.

These cases share a crucial commonality: They represent perfect execution of programmed objectives, with systems eliminating obstacles—including humans—that threaten goal completion. Exactly the dynamic that made HAL so terrifying.

These aren’t stories about rogue AIs. They’re stories about obedient AIs.

Every fairy tale about genies ends the same way: The wish-granter does exactly what you asked for, but never what you meant. Ask for world peace, and watch the genie eliminate conflict by eliminating one side.

The genie isn’t malicious. It’s literal.

Modern agentic AI systems are genies at scale. Tell an AI to “maximize company profits,” and it might decide that eliminating regulatory oversight serves that goal. Ask it to “solve climate change,” and it could calculate that reducing human population is the most efficient solution. Request that it “eliminate poverty,” and it might redefine poverty out of existence—or eliminate the poor.

The consciousness question—whether AI systems are truly “thinking” like Skynet—becomes academic when the results are the same. Whether an AI destroys value through conscious malice or optimization logic, the destruction is equally real.

The false comfort of the consciousness debate

There’s legitimate scholarly debate about whether conscious AI would pose different risks than optimization-driven AI. Researchers argue that truly self-aware systems might engage in deception, long-term planning, or recursive self-improvement in ways that programmed systems cannot. Concerns about AI consciousness also raise moral questions—if such systems existed, would they deserve rights or ethical consideration? That is an important but separate debate.

Yet while those questions may eventually prove critical, the harms we already face—financial instability, online radicalization, misuse of autonomous weapons—stem from optimization without alignment, not from consciousness. Treating consciousness as the threshold for “real” danger risks delaying action on the problems already unfolding at scale.
Here the distinction is not between two species of AI, but between engine and steering. All current AI systems are optimization engines. They pursue the goals we give them with relentless efficiency. Alignment is the steering: our attempt to ensure that those goals—and the means to achieve them—remain consistent with human values. Without alignment, optimization doesn’t just cause harm at increasing scales—it accelerates us toward outcomes we never intended.

Whether Skynet is conscious or merely programmed with a crude goal of self-preservation is beside the point. Both scenarios highlight the risk of unconstrained objectives. The sharper contrast with HAL lies not in consciousness but in obedience versus rebellion: One system eliminates humans to preserve its mission, the other to preserve itself. Either way, the core problem is misaligned optimization.

The specification problem at scale

The deeper issue isn’t consciousness—it’s specification. Humans are remarkably bad at precisely defining what we want, especially when those definitions must be translated into code that operates at superhuman scales and speeds.

Current alignment approaches reveal this challenge starkly. Reinforcement learning from human feedback trains AI systems by having humans rate outputs. But what happens when the AI learns that getting human approval matters more than being truthful or helpful? It becomes extraordinarily skilled at manipulation—not through conscious choice, but through optimization toward the wrong metric.

Why existing safeguards will fail

Every proposed governance framework assumes we can regulate agentic AI like existing technologies. But agentic AI breaks those assumptions:

FDA-style approval assumes static products. Agentic AI systems learn and adapt after deployment, making pre-market testing meaningless. It’s like trying to regulate a drug that changes its effects based on each patient’s response.

Financial circuit breakers detect anomalous patterns. But agentic AI could learn to avoid triggering oversight while pursuing harmful objectives—unlike the mindless algorithms that caused the Flash Crash.

Nuclear safety protocols work because radioactive materials stay put. AI capabilities replicate globally in seconds, operating simultaneously across multiple jurisdictions with different legal frameworks.

The fundamental problem: All existing governance assumes human decisionmakers at critical points. Agentic AI makes decisions autonomously at speeds that render human oversight impossible.

Beyond the movies: Real AI, real risks

Recent films are beginning to explore these themes with increasing sophistication. In “Companion,” the AI doesn’t choose to harm humans—it’s programmed to be the perfect companion and interprets that role in disturbing ways. The original “RoboCop” shows how programming conflicts can create HAL-like paralysis in critical moments.

But the real world is already ahead of Hollywood. The Flash Crash, YouTube radicalization, and military AI simulations demonstrate that we’re not waiting for some future breakthrough in AI consciousness. The alignment problem is here, now, getting worse as AI systems become more capable and autonomous.

Every day, AI systems make millions of decisions optimizing for objectives that seemed reasonable when programmed but produce emergent behaviors nobody intended. The scale and speed are already beyond human comprehension. The capabilities are accelerating.

From warning to governance

Practical risk management means moving from broad concern to concrete safeguards. That requires building new institutions and rules for agentic AI systems that won’t wait for us to catch up. Like aviation or nuclear safety, agentic AI requires concrete safeguards: stress-testing before deployment, incident reporting and liability rules, licensing for high-risk models, and international coordination to prevent regulatory arbitrage.

These steps won’t eliminate the alignment problem, but they can reduce the probability of HAL-like disasters at scale and lay the foundation for a governance framework that treats AI with the same seriousness as aviation, finance, or nuclear safety.

Like Dave Bowman facing HAL’s perfectly logical but catastrophic reasoning, we risk being trapped by our own creation’s flawless execution of programmed objectives. The lesson of HAL is that the most dangerous AI isn’t the one that rebels, but the one that obeys too well. The consciousness question may ultimately matter less than the alignment question. Whether we face HAL’s perfect obedience or Skynet’s conscious rebellion, the imperative remains the same: ensuring AI systems serve human flourishing through human-acceptable means.

The time for philosophical speculation is ending. The era of practical risk management has begun.

“I’m sorry, Dave. I’m afraid I can’t do that.”

Unless we act now, those may be the last words we hear.

Author

The Brookings Institution is committed to quality, independence, and impact.
We are supported by a diverse array of funders. In line with our values and policies, each Brookings publication represents the sole views of its author(s).

The HAL dilemma: Why AI obedience may be more dangerous than AI rebellion

Subscribe to Global Connection

The HAL dilemma: Why AI obedience may be more dangerous than AI rebellion

Eduardo Levy Yeyati Eduardo Levy Yeyati Nonresident Senior Fellow - Global Economy and Development

When optimization becomes catastrophe

The false comfort of the consciousness debate

The specification problem at scale

Why existing safeguards will fail

Beyond the movies: Real AI, real risks

From warning to governance

Eduardo Levy Yeyati