About Me

Hi, I’m Joe! I’m currently a member of technical staff on the Alignment Science team at Anthropic, where I work on adversarial testing of large language models and think about how we might evaluate the alignment properties of very powerful ML systems. For more about my research, see here. Previously, I was a PhD student in the Department of Statistics at the University of Oxford, where I was supervised by Arnaud Doucet and George Deligiannidis and worked on the theory of diffusion models.

I’m particularly interested in working to ensure that the development of powerful machine learning systems is beneficial for humanity. To this end, I’ve worked at several AI safety research organisations, including Redwood Research where I worked on mechanistic anomaly detection, the Alignment Research Center where I worked on formalizing heuristic arguments, and the Center for Human Compatible AI. I’ve also spent time with the UK Frontier AI Taskforce (now the UK AI Safety Institute) and remain excited about building government capacity to understand and regulate frontier AI systems.

Outside my research, I am a trustee for Raise, a UK-wide student movement which has raised over £460,000 for the Against Malaria Foundation, and consider myself a part of the Effective Altruism community. In my spare time, you’ll typically find me running running through the hills in Berkeley, playing the guitar (badly!), or reading. To get in touch, you can email me at benton@stats.ox.ac.uk.