News
Publications
Jobs
Donate
About
Team
SAIF
Transparency
Contact Us
News
Even Superhuman Go AIs Have Surprising Failure Modes
Our adversarial testing algorithm uncovers a simple, human-interpretable strategy that consistently beats superhuman Go AIs. We explore the implications this has for the robustness and safety of AI systems.
Adam Gleave
,
Euan McLean
,
Kellin Pelrine
,
Tony Wang
,
Tom Tseng
Last updated on Nov 29, 2023
17 min read
News
AI Safety in a World of Vulnerable Machine Learning Systems
All contemporary machine learning systems are vulnerable to adversarial attack. This poses serious problems for existing alignment proposals. We explore these issues and propose several research directions FAR is pursuing to overcome this challenge.
Adam Gleave
Last updated on Feb 9, 2024
43 min read
News
,
Agenda
«
Cite
×