Safety, Alignment & Ethics

60 Items

Safety, alignment, ethics. Proofs of safe AGI. Epistemology of provably safe AGI. Provably Compliant Systems.

All Items

  • HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions

    Mohammad Farhad, Sabbir Rahman, Shuvalaxmi Dass
    DOI: https://doi.org/10.70777/si.v3i2.18033
  • Why Today’s Humanoids Won’t Learn Dexterity

    Rodney Brooks
    DOI: https://doi.org/10.70777/si.v3i3.17351
  • The Iceberg Index: Measuring Workforce Exposure in the AI Economy

    Ayush Chopra, Santanu Bhattacharya, DeAndrea Salvador, Ayan Paul, Teddy Wright, Aditi Garg, Feroz Ahmad, Alice C. Schwarze, Ramesh Raskar, Prasanna Balaprakash
    DOI: https://doi.org/10.70777/si.v2i4.17207
  • The AI Productivity Index (APEX)

    Bertie Vidgen, Abby Fennelly, Evan Pinnix, Chirag Mahapatra, Zach Richards, Austin Bridges, Calix Huang, Ben Hunsberger, Fez Zafar, Brendan Foody, Dominic Barton, Cass R. Sunstein, Eric Topol, Osvald Nitski
    DOI: https://doi.org/10.70777/si.v2i4.17205
  • Unconditional Basic Meaning as Digital Public Good

    Soenke Ziesche, Roman V. Yampolskiy
    DOI: https://doi.org/10.70777/si.v2i4.16427
  • GDPVAL: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

    Tejal Patwardhan, Rachel Dias, Elizabeth Proehl, Grace Kim, Michele Wang, Olivia Watkins, Sim´on Posada Fishman, Marwan Aljubeh, Phoebe Thacker, Laurance Fauconnet, Natalie S. Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li, Michael Sharman, Alexandra Barr, Amelia Glaese, Jerry Tworek
    DOI: https://doi.org/10.70777/si.v2i4.17197
  • What AI evaluations for preventing catastrophic risk can and cannot do

    Peter Barnett, Lisa Thiergart
    DOI: https://doi.org/10.70777/si.v2i4.17167
  • The 2025 Foundation Model Transparency Index

    Alexander Wan, Kevin Klyman, Sayash Kapoor, Nestor Maslej, Shayne Longpre, Betty Xiong, Percy Liang, Rishi Bommasani
    DOI: https://doi.org/10.70777/si.v2i4.17165
  • Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock

    Didier Sornette, Sandro Claudio Lera, Ke Wu
    DOI: https://doi.org/10.70777/si.v2i4.17163
  • On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis

    Hector Zenil
    DOI: https://doi.org/10.70777/si.v2i4.17159
  • Enabling Frontier Lab Collaboration to Mitigate AI Safety Risks

    Nicholas Felstead
    DOI: https://doi.org/10.70777/si.v2i6.16439
  • Precedents for the Unprecedented: Historical Analogies for Thirteen Artificial Superintelligence Risks

    James D. Miller
    DOI: https://doi.org/10.70777/si.v2i6.16999
  • International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

    Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Philip Fox, Nestor Maslej, Conor McGlynn, Malcolm Murray, Shalaleh Rismani, Stephen Casper, Jessica Newman, Daniel Privitera, Sören Mindermann, Daron Acemoglu, Thomas G. Dietterich, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Susan Leavy, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Sarvapali D. (Gopal) Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Alvaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhang
    DOI: https://doi.org/10.70777/si.v2i4.16671
  • Evidence Integrity Before Capability: A Prerequisite for Safe Artificial Intelligence

    Jennifer Flygare Kinne
    DOI: https://doi.org/10.70777/si.v2i6.16393
  • Aligning Artificial Superintelligence via a Multi-Box Protocol

    Avraham Yair Negozio
    DOI: https://doi.org/10.70777/si.v2i5.15579
  • Timeline to Artificial General Intelligence 2025 – 2030+ A prediction of how AI will progress, year by year. Updated Oct 30, 2025.

    Gil Syswerda
    DOI: https://doi.org/10.70777/si.v2i6.16375
  • The Asymptotic Intelligence Thesis: Rethinking the Ceiling of AGI Cognition

    Jeffrey E. Arle, MD, PhD, FAANS, FCNS
    DOI: https://doi.org/10.70777/si.v2i6.16255
  • International Al Safety Report: First Key Update Capabilities and Risk Implications

    Yoshua Bengio, Benjamin Bucknall, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Philip Fox, Tiancheng Hu, Cameron Jones, Sam Manning, Nestor Maslej, Vasilios Mavroudis, Conor McGlynn, Malcolm Murray, Shalaleh Rismani, Charlotte Stix, Lucia Velasco, Nicole Wheeler, Daniel Privitera, Sören Mindermann, Daron Acemoglu, Thomas G. Dietterich, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Susan Leavy, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Sarvapali D. (Gopal) Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Alvaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhan
    DOI: https://doi.org/10.70777/si.v2i6.16253
  • Why We Might Need Advanced AI to Save Us from Doomers, Rather than the Other Way Around A Review of If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All by Eliezer Yudkowsky and Nate Soares

    Preston Estep
    DOI: https://doi.org/10.70777/si.v2i6.16251
  • Standardizing Intelligence: Aligning Generative AI for Regulatory and Operational Compliance

    Joseph Marvin Imperial, Matthew D. Jones, Harish Tayyar Madabushi
    DOI: https://doi.org/10.70777/si.v2i5.16189
  • Responsible Agentic Reasoning and AI Agents: A Critical Survey Proposal for Safe Agentic AI via Responsible Reasoning AI Agents (R2A2)

    Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis
    DOI: https://doi.org/10.70777/si.v2i6.16169
  • From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training

    Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar, Alex Beutel, Andrea Vallone, Saachi Jain
    DOI: https://doi.org/10.70777/si.v2i6.15625
  • Thinking Isn’t an Illusion Overcoming the Limitations of Reasoning Models via Tool Augmentations

    Zhao Song, Song Yue, Jiahao Zhang
    DOI: https://doi.org/10.70777/si.v2i6.15961
  • Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

    Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Soren Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, David Williams-King
    DOI: https://doi.org/10.70777/si.v2i5.15569
  • The Singapore Consensus on Global AI Safety Research Priorities Building a Trustworthy, Reliable and Secure AI Ecosystem

    Yoshua Bengio, Max Tegmark, Stuart Russell, Dawn Song, Sören Mindermann, Lan Xue, Stephen Casper, Luke Ong, Vanessa Wilfred, Tegan Maharaj, Wan Sie Lee, Ya-Qin Zhang
    DOI: https://doi.org/10.70777/si.v2i5.15503
1-25 of 60