I’m currently a member of technical staff on the Alignment Science team at Anthropic, where I work on adversarial testing of large language models and think about how we might evaluate the alignment properties of very powerful ML systems. Previously, I was a PhD student in the Department of Statistics at the University of Oxford, where I was supervised by Arnaud Doucet and George Deligiannidis and worked on the theory of diffusion models. I’ve also spent time with the UK Frontier AI Taskforce (now the UK AI Safety Institute) and remain excited about building government capacity to understand and regulate frontier AI systems.


From Denoising Diffusions to Denoising Markov Models. Joe Benton, Yuyang Shi, Valentin De Bortoli, George Deligiannidis, Arnaud Doucet. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2024.

Nearly d-Linear Convergence Bounds for Diffusion Models via Stochastic Localization. Joe Benton, Valentin De Bortoli, Arnaud Doucet, George Deligiannidis. International Conference on Learning Representations, 2024.

Error Bounds for Flow Matching Methods. Joe Benton, George Deligiannidis, Arnaud Doucet. Transactions on Machine Learning Research, February 2024.

Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics. Kamélia Daudel, Joe Benton*, Yuyang Shi*, Arnaud Doucet. Journal of Machine Learning Research, 24(243):1−83, 2023.

Measuring Feature Sparsity in Language Models. Mingyang Deng, Lucas Tao, Joe Benton. NeurIPS 2023 Workshop on Socially Responsible Language Modelling Research, 2023.

A Continuous Time Framework for Discrete Denoising Models. Andrew Campbell, Joe Benton, Valentin De Bortoli, Tom Rainforth, George Deligiannidis, Arnaud Doucet. Advances in Neural Information Processing Systems, 2022.

Polysemanticity and Capacity in Neural Networks. Adam Scherlis, Kshitij Sachan, Adam S. Jermyn, Joe Benton, Buck Shlegeris. arXiv preprint, arXiv:2210.01892, 2022.