Safety, Alignment & Ethics

60 Items

Safety, alignment, ethics. Proofs of safe AGI. Epistemology of provably safe AGI. Provably Compliant Systems.

All Items

  • Outline: Proposed Zero Draft for a Standard on AI Testing, Evaluation, Verification, and Validation

    NIST
    DOI: https://doi.org/10.70777/si.v2i5.15513
  • America's AI Action Plan Winning the Race

    Office of Science and Technology Policy (OSTP)
    DOI: https://doi.org/10.70777/si.v2i5.15507
  • The Singapore Consensus on Global AI Safety Research Priorities Building a Trustworthy, Reliable and Secure AI Ecosystem

    Yoshua Bengio, Max Tegmark, Stuart Russell, Dawn Song, Sören Mindermann, Lan Xue, Stephen Casper, Luke Ong, Vanessa Wilfred, Tegan Maharaj, Wan Sie Lee, Ya-Qin Zhang
    DOI: https://doi.org/10.70777/si.v2i5.15503
  • Measuring AI Agent Autonomy: Towards a Scalable Approach with Code Inspection

    Peter Cihon, Merlin Stein, Gagan Bansal, Sam Manning, Kevin Xu
    DOI: https://doi.org/10.70777/si.v2i3.15295
  • Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation

    Tharindu Kumarage, Ninareh Mehrabi, Anil Ramakrishna, Xinyan Zhao, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta, Charith Peris
    DOI: https://doi.org/10.70777/si.v2i3.15249
  • AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges

    Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee
    DOI: https://doi.org/10.70777/si.v2i3.15161
  • Deliberative Alignment: Reasoning Enables Safer Language Models

    Melody Y. Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex, Amelia Glaese
    DOI: https://doi.org/10.70777/si.v2i3.15159
  • Hardware-Enabled Mechanisms for Verifying Responsible AI Development

    Aidan O’Gara, Gabriel, Will Hodgkins, James Petrie, Vincent Immler, Aydin Aysu, Kanad Basu, Shivam Bhasin, Stjepan Picek, Ankur Srivastava
    DOI: https://doi.org/10.70777/si.v2i3.15157
  • Trends in Frontier AI Model Count: A Forecast to 2028

    Iyngkarran Kumar, Sam Manning
    DOI: https://doi.org/10.70777/si.v2i3.15155
  • Comparing Apples to Oranges: A Taxonomy for Navigating the Global Landscape of AI Regulation

    Sacha Alanoca, Shira Gur-Arieh, Tom Zick, Kevin Klyman
    DOI: https://doi.org/10.70777/si.v2i3.15137
  • Timeline to Artificial General Intelligence 2025 – 2030+

    Gil Syswerda
    DOI: https://doi.org/10.70777/si.v2i3.15119
  • Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents

    Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, Jeff Clune
    DOI: https://doi.org/10.70777/si.v2i3.15063
  • Review: Addressing the challenges of harmonizing law and artificial intelligence technology in modern society Lamprini Seremeti, Sofia Anastasiadou, Andreas Masouras, Stylianos Papalexandris

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14807
  • The Perilous State of AI Governance, June 2025

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14801
  • The First International AI Safety Report The International Scientific Report on the Safety of Advanced AI

    Yoshua Bengio
    DOI: https://doi.org/10.70777/si.v2i2.14755
  • Review: Safety at Scale: Comprehensive Survey of Large Model Safety Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, ... Yu-Gang Jiang

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14741
  • Review: Large Language Models Pass the Turing Test Cameron R. Jones and Benjamin K. Bergen

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14697
  • Review: Strategic Patience: Long-Horizon AI Dominance and the Erosion of Human Vigilance Roman Yampolskiy

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14603
  • Strategic Patience: Long-Horizon AI Dominance and the Erosion of Human Vigilance

    Roman Yampolskiy
    DOI: https://doi.org/10.70777/si.v2i2.14435
  • LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures

    Franciso Aguilera-Martinez, Fernando Berzal
    DOI: https://doi.org/10.70777/si.v2i2.14441
  • A Framework for the Private Governance of Frontier Artificial Intelligence

    Dean Ball
    DOI: https://doi.org/10.70777/si.v2i2.14519
  • Review: On Regulating Downstream AI Developers Sophie Williams, Jonas Schuett, Markus Anderljung

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14587
  • Review: AI Governance through Markets Philip Moreira Tomei, Rupal Jain, Matija Franklin

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14601
  • Review: Large language Model-Powered AI Systems Achieve Self-Replication with No Human Intervention Xudong Pan (潘旭东), Jiarun Dai† (戴嘉润), Yihe Fan (范一禾), Minyuan Luo (罗铭源), Changyi Li (李长艺), Min Yang∗ (杨珉)

    Kris Carlson
    DOI: https://doi.org/10.70777/si.v2i2.14607
  • Pitfalls of Evidence-Based AI Policy

    Stephen Casper, David Krueger, Dylan Hadfield-Menell
    DOI: https://doi.org/10.70777/si.v2i2.14611
26-50 of 60