Safety Methods

41 Items

Methods to ensure AGI safety, as distinguished from methods to advance AI toward AGI.

All Items

  • Review: Safety at Scale: Comprehensive Survey of Large Model Safety Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, ... Yu-Gang Jiang

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14741
  • Review: Strategic Patience: Long-Horizon AI Dominance and the Erosion of Human Vigilance Roman Yampolskiy

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14603
  • Standardizing Intelligence: Aligning Generative AI for Regulatory and Operational Compliance

    Joseph Marvin Imperial, Matthew D. Jones, Harish Tayyar Madabushi
    DOI: https://doi.org/10.70777/si.v2i5.16189
  • Strategic Patience: Long-Horizon AI Dominance and the Erosion of Human Vigilance

    Roman Yampolskiy
    DOI: https://doi.org/10.70777/si.v2i2.14435
  • Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

    Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Soren Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, David Williams-King
    DOI: https://doi.org/10.70777/si.v2i5.15569
  • The 2025 Foundation Model Transparency Index

    Alexander Wan, Kevin Klyman, Sayash Kapoor, Nestor Maslej, Shayne Longpre, Betty Xiong, Percy Liang, Rishi Bommasani
    DOI: https://doi.org/10.70777/si.v2i4.17165
  • The First International AI Safety Report The International Scientific Report on the Safety of Advanced AI

    Yoshua Bengio
    DOI: https://doi.org/10.70777/si.v2i2.14755
  • The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

    HyunJin Kim, Xiaoyuan Yi, JinYeong Bak, Jing Yao, Jianxun Lian, Muhua Huang, Shitong Duan, Xing Xie
    DOI: https://doi.org/10.70777/si.v2i1.13963
  • The Singapore Consensus on Global AI Safety Research Priorities Building a Trustworthy, Reliable and Secure AI Ecosystem

    Yoshua Bengio, Max Tegmark, Stuart Russell, Dawn Song, Sören Mindermann, Lan Xue, Stephen Casper, Luke Ong, Vanessa Wilfred, Tegan Maharaj, Wan Sie Lee, Ya-Qin Zhang
    DOI: https://doi.org/10.70777/si.v2i5.15503
  • Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation

    Tharindu Kumarage, Ninareh Mehrabi, Anil Ramakrishna, Xinyan Zhao, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, Charith Peris
    DOI: https://doi.org/10.70777/si.v2i3.15249
  • Trends in Frontier AI Model Count: A Forecast to 2028

    Iyngkarran Kumar, Sam Manning
    DOI: https://doi.org/10.70777/si.v2i3.15155
  • Unconditional Basic Meaning as Digital Public Good

    Soenke Ziesche, Roman V. Yampolskiy
    DOI: https://doi.org/10.70777/si.v2i4.16427
  • What AI evaluations for preventing catastrophic risk can and cannot do

    Peter Barnett, Lisa Thiergart
    DOI: https://doi.org/10.70777/si.v2i4.17167
  • Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock

    Didier Sornette, Sandro Claudio Lera, Ke Wu
    DOI: https://doi.org/10.70777/si.v2i4.17163
  • Why Today’s Humanoids Won’t Learn Dexterity

    Rodney Brooks
    DOI: https://doi.org/10.70777/si.v3i3.17351
  • Why We Might Need Advanced AI to Save Us from Doomers, Rather than the Other Way Around A Review of If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All by Eliezer Yudkowsky and Nate Soares

    Preston Estep
    DOI: https://doi.org/10.70777/si.v2i6.16251
26-50 of 41