About Me

Hi, I’m Joe! I’m currently working on scalable oversight as part of the Alignment Science team at Anthropic. For more about my research, see here. Before that, I was a PhD student in the Department of Statistics at the University of Oxford, where I was supervised by Arnaud Doucet and George Deligiannidis and worked on the theory of diffusion models.

I’m particularly interested in working to ensure that the development of powerful machine learning systems is beneficial for humanity. I previously worked at the UK Frontier AI Taskforce (now the UK AI Safety Institute), helping it get set up in its early days, and remain excited about building government capacity to understand and regulate frontier AI systems. I’ve also spent time at several other AI safety research organisations, including Redwood Research where I worked on mechanistic anomaly detection, the Alignment Research Center where I worked on formalizing heuristic arguments, and the Center for Human Compatible AI.

Outside my research, I consider myself a part of the Effective Altruism community, and helped to found Raise, a UK-wide student movement which has raised over £500,000 for the Against Malaria Foundation. In my spare time, you’ll typically find me running running through the hills in Berkeley, learning something new (currently skateboarding), or reading.

Joe Benton

About Me