Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock

Authors

  • Didier Sornette Institute of Risk Analysis, Prediction and Management (Risks-X), Southern University of Science and Technology (SUSTech), Shenzhen, China; ETH Zurich
  • Sandro Claudio Lera Institute of Risk Analysis, Prediction and Management (Risks-X), Southern University of Science and Technology (SUSTech), Shenzhen, China
  • Ke Wu Institute of Risk Analysis, Prediction and Management (Risks-X), Southern University of Science and Technology (SUSTech), Shenzhen, China

DOI:

https://doi.org/10.70777/si.v2i4.17163

Keywords:

Human interaction structures, Large language models (LLMs), Relational models theory, Structural dynamics, AGI Existential risk, AGI alignment, technological Amplification, alignment failure, agi governance, organization resilience

Abstract

Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue that this interpretation rests on a conceptual error. LLMs do not reason morally; they statistically internalize the record of human social interaction, including laws, contracts, negotiations, conflicts, and coercive arrangements. Behaviors commonly labeled as unethical or anomalous are therefore better understood as structural generalizations of interaction regimes that arise under extreme asymmetries of power, information, or constraint. Drawing on relational models theory, we show that practices such as blackmail are not categorical deviations from normal social behavior, but limiting cases within the same continuum that includes market pricing, authority relations, and ultimatum bargaining. The surprise elicited by such outputs reflects an anthropomorphic expectation that intelligence should reproduce only socially sanctioned behavior, rather than the full statistical landscape of behaviors humans themselves enact. Because human morality is plural, context-dependent, and historically contingent, the notion of a universally moral artificial intelligence is ill-defined. We therefore reframe concerns about artificial general intelligence (AGI). The primary risk is not adversarial intent, but AGI’s role as an endogenous amplifier of human intelligence, power, and contradiction. By eliminating longstanding cognitive and institutional frictions, AGI compresses timescales and removes the historical margin of error that has allowed inconsistent values and governance regimes to persist without collapse. Alignment failure is thus structural, not accidental, and requires governance approaches that address amplification, complexity, and regime stability rather than model-level intent alone.

Author Biography

Didier Sornette, Institute of Risk Analysis, Prediction and Management (Risks-X), Southern University of Science and Technology (SUSTech), Shenzhen, China; ETH Zurich

Professor on the Chair of Entrepreneurial Risks
Department of Management, Technology and Economics (D-MTEC)
ETH Zurich, Scheuchzerstrasse 7, CH-8092 Zurich, Switzerland

Director of the Financial Crisis Observatory (http://www.er.ethz.ch)

Co-founder of the ETH Risk Center (http://www.riskcenter.ethz.ch)

Member of the Swiss Finance Institute (external pagehttp://www.swissfinanceinstitute.ch)

associated with the Department of Physics (D-PHYS), ETH Zurich (http://www.phys.ethz.ch)
associated with the Department of Earth Sciences (D-ERDW), ETH Zurich (http://www.erdw.ethz.ch/index_EN)

References

L. Caroll. Ai threatens and blackmails people and no one really knows why, 2025. Summary of Wired / Anthropic experimental reports.

Anthropic. Agentic misalignment: How llms could be insider threats. https://www. anthropic.com/research/agentic-misalignment, June 2025. Anthropic Research Report.

Alex Lynch, Benjamin Wright, Caroline Larson, Stuart J. Ritchie, S¨oren Mindermann, Evan Hubinger, Ethan Perez, Kevin K. Troy, et al. Agentic misalignment: How llms could be insider threats. https://arxiv.org/abs/2510.05179, 2025. arXiv preprint.

Stephen M. Omohundro. The basic AI drives. Proceedings of the First Conference on Artificial General Intelligence, pages 483–492, 2008. Also circulated as an earlier technical report.

Nick Bostrom. Ethical issues in advanced artificial intelligence. In I. Smit and G. Lasker, editors, Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, volume 2, pages 12–17. International Institute of Advanced Studies in Systems Research and Cybernetics, 2003.

Nick Bostrom. Superintelligence: Paths, Dangers, Strategies. Oxford University Press, 2016.

Stuart Russell. Human Compatible: Artificial Intelligence and the Problem of Control. Viking, 2019.

OpenAI. Detecting and reducing scheming in AI models. https://openai.com/index/ detecting-and-reducing-scheming-in-ai-models/?utm_source=chatgpt.com, 2024.

Google DeepMind. Gemini 3 pro – frontier safety framework report. https://storage. googleapis.com/deepmind-media/gemini/gemini_3_pro_fsf_report.pdf, 2025. Frontier Safety Framework v2/v3 report including risk domains and mitigation for Gemini models.

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man´e. Concrete problems in AI safety. https://arxiv.org/abs/1606.06565, 2016. arXiv preprint.

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Risks from learned optimization in advanced machine learning systems. arXiv preprint arXiv:1906.01820, 2019.

Jeff Sebo and Robert Long. Moral consideration for AI systems by 2030. AI and Ethics, 5(1):591–606, 2025.

Caleb DeLeeuw, Gaurav Chawla, Ankit Sharma, and Volker Dietze. The secret agenda: LLMs strategically lie and our current safety tools are blind. https://arxiv.org/abs/2509.20393, 2025. arXiv preprint.

Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, and Iason Gabriel. Ethical and social risks of harm from language models. https://arxiv.org/abs/2112.04359, 2021. arXiv preprint.

Rylan Schaeffer, Bruna Miranda, and Sanmi Koyejo. Are emergent abilities of large language models a mirage? In Advances in Neural Information Processing Systems, 2023. NeurIPS 2023.

David Manheim. Language models’ hall of mirrors problem: Why ai alignment requires peircean semiosis. Philosophy & Technology, 39(9):1–28, 2026.

Alan P. Fiske. Structures of Social Life: The Four Elementary Forms of Human Relations. Free Press, 1991.

Alan P. Fiske. Relational models theory 2.0. In Nick Haslam, editor, Relational Models Theory. 2004.

Maxime Favre and Didier Sornette. A generic model of dyadic social relationships. PLoS ONE, 10(3):e0120882, 2015.

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021.

Melanie Mitchell and David C Krakauer. The debate over understanding in ai’s large language models. Proceedings of the National Academy of Sciences, 120(13):e2215907120, 2023.

Lisa Messeri and Molly J Crockett. Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002):49–58, 2024.

Joseph Henrich, Steven J. Heine, and Ara Norenzayan. The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3):61–83, 2010.

Joseph Henrich and Richard McElreath. The evolution of cultural evolution. Evolutionary Anthropology: Issues, News, and Reviews: Issues, News, and Reviews, 12(3):123–135, 2003.

Jesse Graham, Jonathan Haidt, Sena Koleva, Matt Motyl, Ravi Iyer, Sean P Wojcik, and Peter H Ditto. Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology, volume 47, pages 55–130. Elsevier, 2013.

Heinz von Foerster, Patricia M. Mora, and Lawrence W. Amiot. Population density and growth. Science, 133(3468):1931–1937, 1961.

Anders Johansen and Didier Sornette. Finite-time singularity in the dynamics of the world population and economic indices. Physica A, 294(3–4):465–502, 2001.

Gerardo Ceballos, Paul R. Ehrlich, Anthony D. Barnosky, Andres Garcia, Robert M. Pringle, and Todd M. Palmer. Accelerated modern human-induced species losses: Entering the sixth mass extinction. Science Advances, 1(5):e1400253, 2015.

Robert H. Cowie, Philippe Bouchet, and Benoit Fontaine. The sixth mass extinction: fact, fiction or speculation? Biological Reviews, 97(2):640–663, 2022.

Kenneth J. Arrow. A difficulty in the concept of social welfare. Journal of Political Economy, 58(4):328–346, 1950.

Amartya Sen. Collective Choice and Social Welfare. Holden-Day, 1970.

Jason W Burton, Ezequiel Lopez-Lopez, Shahar Hechtlinger, Zoe Rahwan, Samuel Aeschbach, Michiel A Bakker, Joshua A Becker, Aleks Berditchevskaia, Julian Berger, Levin Brinkmann, et al. How large language models can reshape collective intelligence. Nature human behaviour, 8(9):1643–1655, 2024.

D. Kokotajlo, S. Alexander, T. Larsen, E. Lifland, and R. Dean. AI 2027: A scenario for superhuman artificial intelligence. AI Futures Project, 2025.

E. Yudkowsky and N. Soares. If anyone builds it, everyone dies: Why superhuman AI would kill us all. Little, Brown and Company, 2025.

Didier Sornette. Endogenous versus exogenous origins of crises. In Sergio Albeverio, Volker Jentsch, and Holger Kantz, editors, Extreme Events in Nature and Society, pages 95–119. Springer, Heidelberg, 2005.

Didier Sornette. Endogenous versus exogenous origins of crises. https://emeritus.er.ethz. ch/media/essays/origins.html, 2022.

Didier Sornette. Why Stock Markets Crash: Critical Events in Complex Financial Systems. Princeton University Press, Princeton, NJ, 2002. Re-edition with extended preface, 2017.

Sandro Claudio Lera, Alex Pentland, and Didier Sornette. Prediction and prevention of disproportionally dominant agents in complex networks. Proceedings of the National Academy of Sciences, 117(44):27090–27095, 2020.

U.S. Securities and Exchange Commission and Commodity Futures Trading Commission. Findings regarding the market events of may 6, 2010. Technical report, U.S. Securities and Exchange Commission, 2010. Official joint SEC-CFTC report on the 2010 Flash Crash analyzing rapid algorithmic interactions.

D. Sornette and S. von der Becke. Crashes and high frequency trading. an evaluation of risks posed by high-speed algorithmic trading. report for the UK Government project entitled “The Future of Computer Trading in Financial Markets”, Foresight Driver Review DR7, Government Office for Science, 2nd Floor, 1 Victoria Street, London SW1H 0ET, United Kingdom, pages 1–26 (http://ssrn.com/abstract=1976249), 2011.

Markus K. Brunnermeier. Deciphering the liquidity and credit crunch 2007-08. 23(1):77–100, 2009. Comprehensive analysis of the causes and system dynamics of the 2007-08 financial crisis.

D. Sornette and R. Woodard. Financial bubbles, real estate bubbles, derivative bubbles, and the financial and economic crisis. Proceedings of APFA7 (Applications of Physics in Financial Analysis), “Econophysics Approaches to Large-Scale Business Data and Financial Crisis,” Misako Takayasu, Tsutomu Watanabe and Hideki Takayasu, eds., Springer, pages 101–148, 2010.

D. Sornette and P. Cauwels. 1980-2008: The illusion of the perpetual money machine and what it bodes for the future. Risks, 2:103–131, 2014.

D. Ivanov. Viability of intertwined supply networks: Extending the supply chain resilience angles towards survivability. a position paper motivated by COVID-19 outbreak. International Journal of Production Research, 58(10):2904–2915, 2020. Discusses systemic vulnerabilities in global supply chains exposed during the COVID-19 pandemic.

Eric Schlosser. Command and Control: Nuclear Weapons, the Damascus Accident, and the Illusion of Safety. Penguin Press, 2014. Includes detailed reporting on Cold War false alarm incidents, including the 1983 Stanislav Petrov case and systemic risks in automated defense systems.

Marten Scheffer. Critical Transitions in Nature and Society. Princeton University Press, 2009.

Dirk Helbing and Alan Kirman. Rethinking economics using complexity theory. Real-World Economics Review, (64):23–52, 2013.

Carlos Gershenson and Dirk Helbing. When slower is faster. Complexity, 21(2):9–15, 2015.

Didier Sornette. Critical Phenomena in Natural Sciences. Springer Series in Synergetics. Springer, Heidelberg, 2nd edition, 2004.

Vincent Troude, Sandro Lera, Ke Wu, and Didier Sornette. Illusions of criticality: Crises without tipping points. http://arxiv.org/abs/2412.01833, 2025. arXiv preprint.

Why AI alignment failure is structure-agent interaction categories

Downloads

Published

2026-01-19

How to Cite

Sornette, D., Lera, S. C., & Wu, K. (2026). Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock. SuperIntelligence - Robotics - Safety & Alignment, 2(4). https://doi.org/10.70777/si.v2i4.17163