Anthropic: Responsible Scaling Policy
DOI:
https://doi.org/10.70777/si.v2i1.13657Keywords:
agi governance, agi risk, artificial general intelligence risk, agi safety, agi alignment, artificial general intelligence safety value alignmentAbstract
In September 2023, we released our Responsible Scaling Policy (RSP), a public commitment not to train or deploy models capable of causing catastrophic harm unless we have implemented safety and security measures that will keep risks below acceptable levels. We are now updating our RSP to account for the lessons we’ve learned over the last year. This updated policy reflects our view that risk governance in this rapidly evolving domain should be proportional, iterative, and exportable.
AI Safety Level Standards (ASL Standards) are a set of technical and operational measures for safely training and deploying frontier AI models. These currently fall into two categories: Deployment Standards and Security Standards. As model capabilities increase, so will the need for stronger safeguards, which are captured in successively higher ASL Standards. At present, all of our models must meet the ASL-2 Deployment and Security Standards. To determine when a model has become sufficiently advanced such that its deployment and security measures should be strengthened, we use the concepts of Capability Thresholds and Required Safeguards. A Capability Threshold tells us when we need to upgrade our protections, and the corresponding Required Safeguards tell us what standard should apply.
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2025 Evan Hubinger

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.