Tom Everitt

PhD Student
Australian National University (ANU)
Email: tom.everitt at


I'm a PhD student in computer science/artificial intelligence (AI) at the Australian National University. My supervisor is Marcus Hutter.

I'm working on AI Safety, i.e. how we can safely use AI with greater-than-human intelligence.

Research and Publications

A full list of publications is available here and at my Google Scholar. Below I list my papers together with some context.

AI Safety

Background. An easy way to get an AI to do what we want it to do is to build it to optimise a reward signal. We give the AI reward when it does something good, and no reward if it does something bad (carrot and stick principle, aka reinforcement learning). By construction, all the AI want is to optimise the reward signal, which means that the AI will want to do things that we think are good so that we give it reward.

My draft book chapter and AGI tutorial offer gentle introductions to the formal model I use in my other works:

Self-modification. One risk is that the AI modifies itself to get an easier goal than optimising reward given by capricious humans. It turns out that depending on subtle details in how the AI is constructed, self-modification may be a risk or not:

Wireheading. There is also a risk that AI finds a way to counterfeit reward by modifying its reward sensor to always report maximum reward. This is called wireheading. The beginning of one possible solution is developed:

Self-preservation and death. Another risk with AIs is the self-preservation drive. Much like humans and animals, AIs may have a strong desire not to be shut off or being terminated. This self-preservation drive may lead the AI to hide its true powers (so that we don't become scared of it and shut it off). We can give a mathematical definition of death for AIs, and give examples of AIs that want to live and of AIs that want to die:

Decision theory. Strangely, robots and other agents that are part of their environment may be able to infer properties of themselves from their own actions. For example, my having petted a lot of cats in the past may be evidence that I have toxoplasmosis, a disease which makes you fond of cats. Now, if I see a cat, should I avoid petting it to reduce the risk that I have the disease? (note that petting cats never causes toxoplasmosis). The two standard answers for how to reason in this situation are called CDT and EDT. We show that CDT and EDT turns into three possibilities for how to reason in sequential settings where multiple actions are interleaved with observations:

Search and Optimisation

Background. Search and optimisation are fundamental aspects of AI and of intelligence in general. Intelligence can actually be defined as optimisation ability (Legg and Hutter, Universal Intelligence: A Definition of Machine Intelligence, 2007).

(No) Free Lunch. The No Free Lunch theorems state that intelligent optimisation is impossible without knowledge about what you're trying optimise. I argue against these theorems, and show that under a natural definition of complete uncertainty, intelligent (better-than-random) optimisation is possible. Unfortunately, I was also able to show that there are pretty strong limits on how much better intelligent search can be compared to random search.

Optimisation difficulty. In a related paper, we give a formal definition of how hard a function is to optimise:

How to search. Two of the most fundamental strategies for search is DFS and BFS. In DFS, you search depth-first; for example, you follow one path until its very end before trying something else. In BFS, you instead try to search as broadly as possible, focusing on breadth rather than depth. I calculate the expected search times for both methods, and derive some results on which method is preferable in which situations:


Logic. In my Bachelor's thesis I studied logic and automated theorem proving.


Curriculum vitae

My CV (updated 2016-02-04).


I like mathematical approaches to philosophical problems with practical relevance. For example:


Co-founder of, a webservice that lets you see how different aspects of your life affect each other. The analysis is based on automatically and/or manually recorded life data, and is presented as a directed graph.

A prototype for a computer game written from scratch in JavaScript.

Other web presences

Find me on Facebook, Google+, LinkedIn, Google scholar.