Tom Everitt

PhD Student
Australian National University (ANU)
Email: tom.everitt at


I'm a PhD student in computer science/artificial intelligence (AI) at the Australian National University. My supervisor is Prof. Marcus Hutter.

I'm working on AI Safety, i.e. how we can safely use AI with greater-than-human intelligence.


New AI Safety papers:

Research and Publications

A full list of publications is available here and at my Google Scholar. Below I list my papers together with some context.

AI Safety

Background. An easy way to get an AI to do what we want it to do is to build it to optimise a reward signal. We give the AI reward when it does something good, and no reward if it does something bad (carrot and stick principle, aka reinforcement learning). By construction, all the AI want is to optimise the reward signal, which means that the AI will want to do things that we think are good so that we give it reward.

My book chapter and AGI tutorial offer gentle introductions to the formal model I use in my other works:

Self-modification. One risk is that the AI modifies itself to get an easier goal than optimising reward given by capricious humans. It turns out that depending on subtle details in how the AI is constructed, self-modification may be a risk or not:

Wireheading. There is also a risk that AI finds a way to counterfeit reward by modifying its reward sensor to always report maximum reward. This is called wireheading.

Self-preservation and death. Another risk with AIs is the self-preservation drive. Much like humans and animals, AIs may have a strong desire not to be shut off or being terminated. This self-preservation drive may lead the AI to hide its true powers (so that we don't become scared of it and shut it off).

There is a natural mathematical definition of death in the Universal Artificial Intelligence framework. It turns out that there are both self-preserving and suicidal RL agents:

For cooperative IRL agents, the self-preservation drive is more balanced, with the self-preservation drive often following the desires of the human supervisor:

Decision theory. Strangely, robots and other agents that are part of their environment may be able to infer properties of themselves from their own actions. For example, my having petted a lot of cats in the past may be evidence that I have toxoplasmosis, a disease which makes you fond of cats. Now, if I see a cat, should I avoid petting it to reduce the risk that I have the disease? (note that petting cats never causes toxoplasmosis). The two standard answers for how to reason in this situation are called CDT and EDT. We show that CDT and EDT turns into three possibilities for how to reason in sequential settings where multiple actions are interleaved with observations:

Reinforcement Learning

Exploration A fundamental problem in reinforcement learning is how to explore an unknown environment effectively. Ideally, an exploration strategy should direct us to regions with potentially high reward, while not being too expensive to compute. In the following paper, we find a way to employ standard function approximation techniques to estimate the novelty of different actions, which gives state-of-the-art performance in the popular Atari Learning Environment while being much cheaper to compute than most alternative strategies:

Search and Optimisation

Background. Search and optimisation are fundamental aspects of AI and of intelligence in general. Intelligence can actually be defined as optimisation ability (Legg and Hutter, Universal Intelligence: A Definition of Machine Intelligence, 2007).

(No) Free Lunch. The No Free Lunch theorems state that intelligent optimisation is impossible without knowledge about what you're trying optimise. I argue against these theorems, and show that under a natural definition of complete uncertainty, intelligent (better-than-random) optimisation is possible. Unfortunately, I was also able to show that there are pretty strong limits on how much better intelligent search can be compared to random search.

Optimisation difficulty. In a related paper, we give a formal definition of how hard a function is to optimise:

How to search. Two of the most fundamental strategies for search is DFS and BFS. In DFS, you search depth-first; for example, you follow one path until its very end before trying something else. In BFS, you instead try to search as broadly as possible, focusing on breadth rather than depth. I calculate the expected search times for both methods, and derive some results on which method is preferable in which situations:


Logic. In my Bachelor's thesis I studied logic and automated theorem proving.



I have co-supervised the following students/projects:

Curriculum vitae

My CV (updated 2017-01-04).


I like mathematical approaches to philosophical problems with practical relevance. For example:


Co-founder of, a webservice that lets you see how different aspects of your life affect each other. The analysis is based on automatically and/or manually recorded life data, and is presented as a directed graph.

A prototype for a computer game written from scratch in JavaScript.

Other web presences

Find me on Facebook, Google+, LinkedIn, Google scholar, ORCID.