Safety Methods

41 Items

Methods to ensure AGI safety, as distinguished from methods to advance AI toward AGI.

All Items

  • Acceptable Use Policies for Foundation Models

    Kevin Klyman
    20
    DOI: https://doi.org/10.70777/si.v1i1.10917
  • Against Purposeful Artificial Intelligence Failures

    Roman Yampolskiy
    DOI: https://doi.org/10.70777/si.v1i1.9943
  • AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges

    Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee
    DOI: https://doi.org/10.70777/si.v2i3.15161
  • Aligning Artificial Superintelligence via a Multi-Box Protocol

    Avraham Yair Negozio
    DOI: https://doi.org/10.70777/si.v2i5.15579
  • Anthropic: Responsible Scaling Policy

    Evan Hubinger
    DOI: https://doi.org/10.70777/si.v2i1.13657
  • Benchmark Early and Red Team Often A Framework for Assessing and Managing Dual-Use Hazards of Ai Foundation Models

    Anthony Barrett, Krystal Jackson, Evan R. Murphy, Nada Madkour, Jessica Newman
    DOI: https://doi.org/10.70777/si.v1i1.10601
  • Can a Bayesian Oracle Prevent Harm from an Agent?

    Yoshua Bengio, Matt McDermott, Michael K. Cohen, Nikolay Malkin, Damiano Fornasiere, Pietro Greiner, Younesse Kaddar
    DOI: https://doi.org/10.70777/si.v2i1.13799
  • Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents

    Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, Jeff Clune
    DOI: https://doi.org/10.70777/si.v2i3.15063
  • Deliberative Alignment: Reasoning Enables Safer Language Models

    Melody Y. Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex, Amelia Glaese
    DOI: https://doi.org/10.70777/si.v2i3.15159
  • Evidence Integrity Before Capability: A Prerequisite for Safe Artificial Intelligence

    Jennifer Flygare Kinne
    DOI: https://doi.org/10.70777/si.v2i6.16393
  • From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training

    Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar, Alex Beutel, Andrea Vallone, Saachi Jain
    DOI: https://doi.org/10.70777/si.v2i6.15625
  • GDPVAL: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

    Tejal Patwardhan, Rachel Dias, Elizabeth Proehl, Grace Kim, Michele Wang, Olivia Watkins, Sim´on Posada Fishman, Marwan Aljubeh, Phoebe Thacker, Laurance Fauconnet, Natalie S. Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li, Michael Sharman, Alexandra Barr, Amelia Glaese, Jerry Tworek
    DOI: https://doi.org/10.70777/si.v2i4.17197
  • Hardware-Enabled Mechanisms for Verifying Responsible AI Development

    Aidan O’Gara, Gabriel, Will Hodgkins, James Petrie, Vincent Immler, Aydin Aysu, Kanad Basu, Shivam Bhasin, Stjepan Picek, Ankur Srivastava
    DOI: https://doi.org/10.70777/si.v2i3.15157
  • Highlights of the Issue: Singapore Consensus – Safety Technology In Progress

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i5.15525
  • HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions

    Mohammad Farhad, Sabbir Rahman, Shuvalaxmi Dass
    DOI: https://doi.org/10.70777/si.v3i2.18033
  • International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

    Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Philip Fox, Nestor Maslej, Conor McGlynn, Malcolm Murray, Shalaleh Rismani, Stephen Casper, Jessica Newman, Daniel Privitera, Sören Mindermann, Daron Acemoglu, Thomas G. Dietterich, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Susan Leavy, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Sarvapali D. (Gopal) Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Alvaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhang
    DOI: https://doi.org/10.70777/si.v2i4.16671
  • International Al Safety Report: First Key Update Capabilities and Risk Implications

    Yoshua Bengio, Benjamin Bucknall, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Philip Fox, Tiancheng Hu, Cameron Jones, Sam Manning, Nestor Maslej, Vasilios Mavroudis, Conor McGlynn, Malcolm Murray, Shalaleh Rismani, Charlotte Stix, Lucia Velasco, Nicole Wheeler, Daniel Privitera, Sören Mindermann, Daron Acemoglu, Thomas G. Dietterich, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Susan Leavy, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Sarvapali D. (Gopal) Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Alvaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhan
    DOI: https://doi.org/10.70777/si.v2i6.16253
  • LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures

    Franciso Aguilera-Martinez, Fernando Berzal
    DOI: https://doi.org/10.70777/si.v2i2.14441
  • Measuring AI Agent Autonomy: Towards a Scalable Approach with Code Inspection

    Peter Cihon, Merlin Stein, Gagan Bansal, Sam Manning, Kevin Xu
    DOI: https://doi.org/10.70777/si.v2i3.15295
  • Outline: Proposed Zero Draft for a Standard on AI Testing, Evaluation, Verification, and Validation

    NIST
    DOI: https://doi.org/10.70777/si.v2i5.15513
  • Pitfalls of Evidence-Based AI Policy

    Stephen Casper, David Krueger, Dylan Hadfield-Menell
    DOI: https://doi.org/10.70777/si.v2i2.14611
  • Precedents for the Unprecedented: Historical Analogies for Thirteen Artificial Superintelligence Risks

    James D. Miller
    DOI: https://doi.org/10.70777/si.v2i6.16999
  • Responsible Agentic Reasoning and AI Agents: A Critical Survey Proposal for Safe Agentic AI via Responsible Reasoning AI Agents (R2A2)

    Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis
    DOI: https://doi.org/10.70777/si.v2i6.16169
  • Review: AI Governance through Markets Philip Moreira Tomei, Rupal Jain, Matija Franklin

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14601
  • Review: On Regulating Downstream AI Developers Sophie Williams, Jonas Schuett, Markus Anderljung

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14587
1-25 of 41