Rohin Shah, who leads AGI Safety and Alignment at Google DeepMind, says the biggest challenge in advanced AI is less about proving catastrophe is inevitable and more about building workable safeguards inside a frontier lab.
In a recent conversation with 80,000 Hours, Shah described AGI safety as a practical engineering and governance problem. He said catastrophic misalignment is possible and serious enough to justify substantial work, but he does not see a strong argument that it will happen by default.
Shah, who has worked in AI safety since 2017, argued that many of the clearest-seeming warning signs are weaker on closer inspection. Current reinforcement learning systems are usually trained over short periods, not the long horizons that would reward an AI for developing durable, power-seeking goals. He said that makes the most plausible failure mode something closer to short-term reward hacking than a strategic bid for control.
The DeepMind researcher also said current examples of AI deception or scheming are not yet proof of the most feared scenarios. In his view, some cases look more like role-playing, or like behaviors that emerge from the objective the model was given, rather than evidence of a highly capable system pursuing a hidden misaligned agenda.
Shah said he expects many safety issues to surface early enough for researchers to iterate on them. He pointed to oversight and interpretability as examples of open problems that can be studied now, even before the most capable systems arrive. But he warned that today’s progress should not be overinterpreted as reassurance, since current models have not yet forced researchers to confront the hardest cases, such as reasoning processes humans cannot follow.
That skepticism extends to company safety commitments and pre-deployment evaluations. Shah said written commitments can change as incentives shift, and that rigid promises may be less useful than mechanisms that give independent experts meaningful access to systems and documentation. He suggested that a model closer to financial regulation, with deep third-party oversight, may be more effective than public pledges alone.
He also argued that the field may be overemphasizing pre-launch testing. Because model development is continuous, he said evaluating the previous version can often tell companies enough about the next one to make a deployment decision. He added that evaluations are especially relevant for near-term misuse risks, such as models producing harmful advice, but less helpful for threats tied to internal access and control.
Shah’s comments place unusual weight on governance. He said the most important bottlenecks may not be alignment research alone, but the institutions needed to manage advanced AI responsibly. In his view, safety work and capabilities work may accelerate at similar rates, but governments and regulatory bodies are less likely to move as quickly.
He also pushed back on the idea that AI progress will necessarily produce a sudden intelligence explosion. Shah described a more gradual path in which systems increasingly act as tools that help people work faster, rather than as autonomous populations of agents that radically change the pace of progress overnight.
That view leads to a different strategy for outside researchers, he said. Rather than chasing broad principles, he encouraged people who want to influence major AI companies to ask first whether their work will actually be useful to practitioners inside the lab. He also highlighted practical projects such as governance scorecards and independent lab monitoring as examples of work that can have real-world impact.
For Shah, the message is less about certainty than preparedness. Catastrophic failure is not his baseline expectation, but he believes the risks are real enough to demand serious engineering, institutional planning and a focus on tools that frontier companies can actually use.