Safety, Alignment & Ethics

60 Items

Safety, alignment, ethics. Proofs of safe AGI. Epistemology of provably safe AGI. Provably Compliant Systems.

All Items

HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions

Mohammad Farhad, Sabbir Rahman, Shuvalaxmi Dass

DOI: https://doi.org/10.70777/si.v3i2.18033
Why Today’s Humanoids Won’t Learn Dexterity

Rodney Brooks

DOI: https://doi.org/10.70777/si.v3i3.17351
The Iceberg Index: Measuring Workforce Exposure in the AI Economy

Ayush Chopra, Santanu Bhattacharya, DeAndrea Salvador, Ayan Paul, Teddy Wright, Aditi Garg, Feroz Ahmad, Alice C. Schwarze, Ramesh Raskar, Prasanna Balaprakash

DOI: https://doi.org/10.70777/si.v2i4.17207
The AI Productivity Index (APEX)

Bertie Vidgen, Abby Fennelly, Evan Pinnix, Chirag Mahapatra, Zach Richards, Austin Bridges, Calix Huang, Ben Hunsberger, Fez Zafar, Brendan Foody, Dominic Barton, Cass R. Sunstein, Eric Topol, Osvald Nitski

DOI: https://doi.org/10.70777/si.v2i4.17205
Unconditional Basic Meaning as Digital Public Good

Soenke Ziesche, Roman V. Yampolskiy

DOI: https://doi.org/10.70777/si.v2i4.16427
GDPVAL: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

Tejal Patwardhan, Rachel Dias, Elizabeth Proehl, Grace Kim, Michele Wang, Olivia Watkins, Sim´on Posada Fishman, Marwan Aljubeh, Phoebe Thacker, Laurance Fauconnet, Natalie S. Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li, Michael Sharman, Alexandra Barr, Amelia Glaese, Jerry Tworek

DOI: https://doi.org/10.70777/si.v2i4.17197
What AI evaluations for preventing catastrophic risk can and cannot do

Peter Barnett, Lisa Thiergart

DOI: https://doi.org/10.70777/si.v2i4.17167
The 2025 Foundation Model Transparency Index

Alexander Wan, Kevin Klyman, Sayash Kapoor, Nestor Maslej, Shayne Longpre, Betty Xiong, Percy Liang, Rishi Bommasani

DOI: https://doi.org/10.70777/si.v2i4.17165
Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock

Didier Sornette, Sandro Claudio Lera, Ke Wu

DOI: https://doi.org/10.70777/si.v2i4.17163
On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis

Hector Zenil

DOI: https://doi.org/10.70777/si.v2i4.17159
Enabling Frontier Lab Collaboration to Mitigate AI Safety Risks

Nicholas Felstead

DOI: https://doi.org/10.70777/si.v2i6.16439
Precedents for the Unprecedented: Historical Analogies for Thirteen Artificial Superintelligence Risks

James D. Miller

DOI: https://doi.org/10.70777/si.v2i6.16999
International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Philip Fox, Nestor Maslej, Conor McGlynn, Malcolm Murray, Shalaleh Rismani, Stephen Casper, Jessica Newman, Daniel Privitera, Sören Mindermann, Daron Acemoglu, Thomas G. Dietterich, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Susan Leavy, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Sarvapali D. (Gopal) Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Alvaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhang

DOI: https://doi.org/10.70777/si.v2i4.16671
Evidence Integrity Before Capability: A Prerequisite for Safe Artificial Intelligence

Jennifer Flygare Kinne

DOI: https://doi.org/10.70777/si.v2i6.16393
Aligning Artificial Superintelligence via a Multi-Box Protocol

Avraham Yair Negozio

DOI: https://doi.org/10.70777/si.v2i5.15579
Timeline to Artificial General Intelligence 2025 – 2030+ A prediction of how AI will progress, year by year. Updated Oct 30, 2025.

Gil Syswerda

DOI: https://doi.org/10.70777/si.v2i6.16375
The Asymptotic Intelligence Thesis: Rethinking the Ceiling of AGI Cognition

Jeffrey E. Arle, MD, PhD, FAANS, FCNS

DOI: https://doi.org/10.70777/si.v2i6.16255
International Al Safety Report: First Key Update Capabilities and Risk Implications

Yoshua Bengio, Benjamin Bucknall, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Philip Fox, Tiancheng Hu, Cameron Jones, Sam Manning, Nestor Maslej, Vasilios Mavroudis, Conor McGlynn, Malcolm Murray, Shalaleh Rismani, Charlotte Stix, Lucia Velasco, Nicole Wheeler, Daniel Privitera, Sören Mindermann, Daron Acemoglu, Thomas G. Dietterich, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Susan Leavy, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Sarvapali D. (Gopal) Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Alvaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhan

DOI: https://doi.org/10.70777/si.v2i6.16253
Why We Might Need Advanced AI to Save Us from Doomers, Rather than the Other Way Around A Review of If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All by Eliezer Yudkowsky and Nate Soares

Preston Estep

DOI: https://doi.org/10.70777/si.v2i6.16251
Standardizing Intelligence: Aligning Generative AI for Regulatory and Operational Compliance

Joseph Marvin Imperial, Matthew D. Jones, Harish Tayyar Madabushi

DOI: https://doi.org/10.70777/si.v2i5.16189
Responsible Agentic Reasoning and AI Agents: A Critical Survey Proposal for Safe Agentic AI via Responsible Reasoning AI Agents (R2A2)

Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis

DOI: https://doi.org/10.70777/si.v2i6.16169
From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training

Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar, Alex Beutel, Andrea Vallone, Saachi Jain

DOI: https://doi.org/10.70777/si.v2i6.15625
Thinking Isn’t an Illusion Overcoming the Limitations of Reasoning Models via Tool Augmentations

Zhao Song, Song Yue, Jiahao Zhang

DOI: https://doi.org/10.70777/si.v2i6.15961
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Soren Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, David Williams-King

DOI: https://doi.org/10.70777/si.v2i5.15569
The Singapore Consensus on Global AI Safety Research Priorities Building a Trustworthy, Reliable and Secure AI Ecosystem

Yoshua Bengio, Max Tegmark, Stuart Russell, Dawn Song, Sören Mindermann, Lan Xue, Stephen Casper, Luke Ong, Vanessa Wilfred, Tegan Maharaj, Wan Sie Lee, Ya-Qin Zhang

DOI: https://doi.org/10.70777/si.v2i5.15503

1-25 of 60 Next