About Me

Hi, I’m Joe! I’m currently a member of technical staff on the Alignment Science team at Anthropic, where I think about how we might make safety cases for very powerful ML systems and evaluate their alignment relevant properties. For more about my research, see here. Previously, I was a PhD student in the Department of Statistics at the University of Oxford, where I was supervised by Arnaud Doucet and George Deligiannidis and worked on the theory of diffusion models.

I’m particularly interested in working to ensure that the development of powerful machine learning systems is beneficial for humanity. I previously worked at the UK Frontier AI Taskforce (now the UK AI Safety Institute), helping it get set up in its early days, and remain excited about building government capacity to understand and regulate frontier AI systems. I’ve also spent time at several other AI safety research organisations, including Redwood Research where I worked on mechanistic anomaly detection, the Alignment Research Center where I worked on formalizing heuristic arguments, and the Center for Human Compatible AI.

Outside my research, I consider myself a part of the Effective Altruism community, and am a trustee for Raise, a UK-wide student movement which has raised over £460,000 for the Against Malaria Foundation. In my spare time, you’ll typically find me running running through the hills in Berkeley, learning something new (currently skateboarding), or reading.

Joe Benton

About Me