Tom EverittPhD Student
Australian National University (ANU)
Email: tom.everitt at anu.edu.au
I'm a PhD student in computer science/artificial intelligence (AI) at the Australian National University. My supervisor is Marcus Hutter.
I'm working on AI Safety, i.e. how we can safely use AI with greater-than-human intelligence.
Background. An easy way to get an AI to do what we want it to do is to build it to optimise a reward signal. We give the AI reward when it does something good, and no reward if it does something bad (carrot and stick principle, aka reinforcement learning). By construction, all the AI want is to optimise the reward signal, which means that the AI will want to do things that we think are good so that we give it reward.
My draft book chapter and AGI tutorial offer gentle introductions to the formal model I use in my other works:
Self-modification. One risk is that the AI modifies itself to get an easier goal than optimising reward given by capricious humans. It turns out that depending on subtle details in how the AI is constructed, self-modification may be a risk or not:
Self-Modification of Policy and Utility Function in Rational Agents Tom Everitt, Daniel Filan, Mayank Daswani, and Marcus Hutter. In AGI-16 and arXiv, 2016. Slides, video. Winner of the Kurzweil prize for best AGI paper.
Wireheading. There is also a risk that AI finds a way to counterfeit reward by modifying its reward sensor to always report maximum reward. This is called wireheading. The beginning of one possible solution is developed:
Self-preservation and death. Another risk with AIs is the self-preservation drive. Much like humans and animals, AIs may have a strong desire not to be shut off or being terminated. This self-preservation drive may lead the AI to hide its true powers (so that we don't become scared of it and shut it off). We can give a mathematical definition of death for AIs, and give examples of AIs that want to live and of AIs that want to die:
Decision theory. Strangely, robots and other agents that are part of their environment may be able to infer properties of themselves from their own actions. For example, my having petted a lot of cats in the past may be evidence that I have toxoplasmosis, a disease which makes you fond of cats. Now, if I see a cat, should I avoid petting it to reduce the risk that I have the disease? (note that petting cats never causes toxoplasmosis). The two standard answers for how to reason in this situation are called CDT and EDT. We show that CDT and EDT turns into three possibilities for how to reason in sequential settings where multiple actions are interleaved with observations:
Background. Search and optimisation are fundamental aspects of AI and of intelligence in general. Intelligence can actually be defined as optimisation ability (Legg and Hutter, Universal Intelligence: A Definition of Machine Intelligence, 2007).
(No) Free Lunch. The No Free Lunch theorems state that intelligent optimisation is impossible without knowledge about what you're trying optimise. I argue against these theorems, and show that under a natural definition of complete uncertainty, intelligent (better-than-random) optimisation is possible. Unfortunately, I was also able to show that there are pretty strong limits on how much better intelligent search can be compared to random search.
Universal Induction and Optimisation: No Free Lunch? Tom Everitt Supervised by Tor Lattimore, Peter Sunehag, and Marcus Hutter at ANU. Master thesis, Department of Mathematics, Stockholm University, 2013.
Optimisation difficulty. In a related paper, we give a formal definition of how hard a function is to optimise:
How to search. Two of the most fundamental strategies for search is DFS and BFS. In DFS, you search depth-first; for example, you follow one path until its very end before trying something else. In BFS, you instead try to search as broadly as possible, focusing on breadth rather than depth. I calculate the expected search times for both methods, and derive some results on which method is preferable in which situations:
Analytical Results on the BFS vs. DFS Algorithm Selection Problem. Part I, Tree Search. Tom Everitt and Marcus Hutter. In 28th Australasian Joint Conference on AI and arXiv, 2015. Slides, Source Code.
Analytical Results on the BFS vs. DFS Algorithm Selection Problem. Part II, Graph Search. Tom Everitt and Marcus Hutter. In 28th Australasian Joint Conference on AI and arXiv, 2015. Slides, Source Code.
Automated Theorem Proving. Tom Everitt, Supervised by Rikard Bøgvad. Bachelor thesis, Department of Mathematics, Stockholm University, 2010.
AI and the future -- Introduction to AI Safety. ANU Regnet, ANU Learning Communities, ANU XSA (XSAC talk), 2016 Slides
AI Safety -- Overview of recent models and results. Effective Altruism Sydney retreat 2016 Slides
My CV (updated 2017-01-04).
Formal models of reasoning, decision-making, and (artificial) intelligence and their implications for the AI Safety problem. The AI Safety problem is well described in Bostrom's book Superintelligence. See also intelligence.org.
Universal induction (aka Solomonoff induction), which is a general, mathematical solution to the induction problem based on Algorithmic Information Theory and Bayes' rule (see Universal Artificial Intelligence and A Philosophical Treatise of Universal Induction).
Cooperation theory. Robert Axelrod's brilliant computer tournaments and subsequent book The Evolution of Cooperation sparked this field, which investigates when self-interested individuals end up cooperating. See also Nowak's Supercooperators.
Inference of causal relationships from observational data.
Co-founder of Kaus.se, a webservice that lets you see how different aspects of your life affect each other. The analysis is based on automatically and/or manually recorded life data, and is presented as a directed graph.