News

Even Superhuman Go AIs Have Surprising Failure Modes

Our adversarial testing algorithm uncovers a simple, human-interpretable strategy that consistently beats superhuman Go AIs. We explore the implications this has for the robustness and safety of AI systems.

Adam Gleave, Euan McLean, Kellin Pelrine, Tony Wang, Tom Tseng

Last updated on Nov 29, 2023 17 min read News

AI Safety in a World of Vulnerable Machine Learning Systems

All contemporary machine learning systems are vulnerable to adversarial attack. This poses serious problems for existing alignment proposals. We explore these issues and propose several research directions FAR is pursuing to overcome this challenge.

Last updated on Feb 9, 2024 43 min read News, Agenda