AI Safety

AI Safety is, brutally put, the quest to avoid that superintelligent robots kill us all one day.

Why would they kill us? This could happen for at least two reasons. Either we might totally mess up the goal we give them: For example, if we task a superintelligent AI to make all humans happy, it might behave benignly for a while until it achieves sufficient power to kill everyone. When every human is dead, no human is not happy, and the goal is perfected. Though it might sound stupid, it's generally hard to predict how a superintelligence will interpret a goal.

Another possible reason is that almost regardless the goal, resources will help a lot in achieving it. And humans control (and consist of) a lot of resources.

Stuart Russell, top professor in AI and author of the famous textbook Artificial Intelligence: A Modern Approach , elaborates on these points:

AI Safety does not mean that we shouldn't build superintelligent machines, only that we should spend a fair deal of effort to avoid the potentially very bad outcomes.

Possible solutions


Some AI Safety resources:

Blog Posts

Slate Star Codex has an extensive collection of quotes from AI researchers on AI risk.


A bit longer and more technical talk by Nick Bostrom given at Google:


The main book to read on AI Safety is Nick Bostrom's Superintelligence: Paths, Dangers, and Strategies.


Omohundro (2008), The Basic AI Drives provides good intuition for what drives we might expect from any sufficiently intelligent system.

Chalmers (2010), The Singularity: A Philosophical Analysis argues for why superintelligence is likely to be developed soon, and explores some associated challenges.

Sotala and Yampolskiy (2015), Responses to Catastrophic AGI Risk: a Survey summarises a large set of potential AI Safety approaches, and gives plenty of references.


Machine Intelligence Research Institute is an independent research institute that does foundational research on the AI Safety problem.

Future of Life Institute funds research in AI Safety, among other things.

Future of Humanity Institute is an Oxford University institute focusing on existential risks in general, with strong emphasis on AI Safety.

Back to