Real-World Gaps in AI Governance Research

Ilan Strauss; Isobel Moure; Tim O’Reilly; Sruly Rosenblat

doi:10.70777/si.v2i3.15163

Authors

Ilan Strauss AI Disclosures Project, Social Science Research Council
Isobel Moure AI Disclosures Project, Social Science Research Council
Tim O’Reilly O'Reilly Media; 1AI Disclosures Project, Social Science Research Council
Sruly Rosenblat AI Disclosures Project, Social Science Research Council

DOI:

https://doi.org/10.70777/si.v2i3.15163

Keywords:

AI alignment, AI interpretability, AI commercialization risks, ai cloud providers, ai foundation models, ai frontier models, Anthropic\, Google DeepMind, Meta AI, Microsoft AI, OpenAI, CMU AI, NYU AI, MIT AI, Stanford AI, UC Berkeley AI, University of Washington AI

Abstract

Drawing on 1,178 safety and reliability papers from 9,439 generative AI papers (January 2020 – March 2025), we compare research outputs of leading AI companies (Anthropic, Google DeepMind, Meta, Microsoft, and OpenAI) and AI universities (CMU, MIT, NYU, Stanford, UC Berkeley, and University of Washington). We find that corporate AI research increasingly concentrates on pre-deployment areas—model alignment and testing & evaluation—while attention to deployment-stage issues such as model bias has waned. Significant research gaps exist in high-risk deployment domains, including healthcare, finance, misinformation, persuasive and addictive features, hallucinations, and copyright. Without improved observability into deployed AI, growing corporate concentration could deepen knowledge deficits. We recommend expanding external researcher access to deployment data and systematic observability of in-market AI behaviors.

Author Biographies

Ilan Strauss, AI Disclosures Project, Social Science Research Council

Dr. Ilan Strauss is Program Director of the AI Disclosures Project at the SSRC. He is an Honorary Senior Fellow at the UCL Institute for Innovation and Public Purpose (London), where he was head of digital economy research on a multi-year Omidyar Network funded research project. He is a Visiting Associate Professor at the University of Johannesburg. Ilan was the joint recipient of an Economic Security Project grant investigating Big Tech’s acquisitions of technological capabilities. He previously taught macroeconomics at New York University (Division of Applied Undergraduate Studies) and at Rice University (Jones Graduate School of Business), and has consulted widely for United Nations bodies.

Isobel Moure, AI Disclosures Project, Social Science Research Council

Isobel Moure is a Researcher at the AI Disclosures Project at the SSRC. She is a recent graduate of Barnard College of Columbia University, having studied Cognitive Science and Computer Science. She previously worked with Dr. Christos Papadimitriou for her senior thesis and was an undergraduate research assistant for Dr. Robert Remez. She is excited to make the shift from studying human brains to artificial ones.

Tim O’Reilly, O'Reilly Media; 1AI Disclosures Project, Social Science Research Council

Tim O'Reilly is the Principal Investigator and Co-Director of the AI Disclosures Project at the SSRC. He is the founder, CEO, and Chairman of O’Reilly Media, and a Visiting Professor of Practice at the UCL Institute for Innovation and Public Purpose (IIPP), where he helped establish (with Mariana Mazzucato) a multi-year research project sponsored by the Omidyar Network. This project investigated Big Tech’s use of algorithmic allocations to extract rents from their ecosystems. Known for coining terms such as “Open Source” and “Web 2.0,” Tim has pioneered technology publishing, conferences, digital media, and the concepts we use to understand technology trends.

Sruly Rosenblat, AI Disclosures Project, Social Science Research Council

I am currently a researcher for the AI Disclosures Project housed at the SSRC.

References

Sara Abdali, Richard Anarfi, CJ Barberan, and Jia He. Securing large language models: Threats, vulnerabilities and responsible practices. arXiv preprint arXiv:2403.12503, 2024.

Dario Amodei. The urgency of interpretability, 04 2025. URL https://www. darioamodei.com/post/the-urgency-of-interpretability. Accessed: 2025-04-28.

Dario Amodei and Jack Clark. Faulty reward functions in the wild, 12 2016. URL https://openai.com/blog/faulty-reward-functions/. OpenAI Blog.

Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O’Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, et al. Frontier ai regulation: Managing emerging risks to public safety. arXiv preprint arXiv:2307.03718, 2023.

Anthropic. Anthropic economic index: Insights from claude 3.7 sonnet, 03 2025. URL https://www.anthropic.com/news/ anthropic-economic-index-insights-from-claude-sonnet-3-7. Accessed: 2025-04-28.

Anthropic. Detecting and countering malicious uses of claude: March 2025. Anthropic News, 04 2025. URL https://www.anthropic.com/news/ detecting-and-countering-malicious-uses-of-claude-march-2025. Accessed: 2025-04-28.

Anthropic. Exploring model welfare, April 2025. URL https://www.anthropic.com/ research/exploring-model-welfare. Accessed: 2025-04-25.

Beatriz Botero Arcila. Ai liability along the value chain, 2025. URL https://blog. mozilla.org/netpolicy/files/2025/03/AI-Liability-Along-the-Value-Chain_ Beatriz-Arcila.pdf. Mozilla.

Abi Aryan. What is LLMOps?: large language models in production. O’Reilly Media, Inc., 2024.

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.

Farima Fatahi Bayat, Lechen Zhang, Sheza Munir, and Lu Wang. Factbench: A dynamic benchmark for in-the-wild language model factuality evaluation. arXiv preprint arXiv:2410.22257, 2025.

Nick Bostrom. Superintelligence: Paths, dangers, strategies, 2014.

Miles Brundage, Katie Mayer, Tyna Eloundou, Sandhini Agarwal, Steven Adler, Gretchen Krueger, Jan Leike, and Pamela Mishkin. Lessons learned on language model safety and misuse, 3 2022. URL https://openai.com/index/ language-model-safety-and-misuse/. Accessed: 2025-01-23.

Blake Bullwinkel, Amanda Minnich, Shiven Chawla, Gary Lopez, Martin Pouliot, Whitney Maxwell, Joris de Gruyter, Katherine Pratt, Saphir Qi, Nina Chikanov, et al. Lessons from red teaming 100 generative ai products. arXiv preprint arXiv:2501.07238, 2025.

US CFTC. Cfr part 43; rin 3038-ad08: Real-time public reporting of swap transaction data. Federal Register, 77(5):1182–266, 2012.

Jennifer Tour Chayes, Mariano-Florentino Cuèllar, and Fei-Fei Li. Draft report of the joint california policy working group on ai frontier models. Technical report, Joint California Policy Working Group on AI Frontier Models, 3 2025. URL https://www.cafrontieraigov.org/wp-content/uploads/2025/03/Draft_Report_ of_the_Joint_California_Policy_Working_Group_on_AI_Frontier_Models.pdf. Draft report.

Guiming Hardy Chen, Shunian Chen, Ziche Liu, Feng Jiang, and BenyouWang. Humans or llms as the judge? a study on judgement biases. arXiv preprint arXiv:2402.10669, 2024.

Jingwen Cheng, Kshitish Ghate, Wenyue Hua, William Yang Wang, Hong Shen, and Fei Fang. Realm dataset dashboard, 03 2025. URL https://realm-e7682.web.app/. Accessed: 2025-04-28.

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.

Jack Clark and Gillian K Hadfield. Regulatory markets for ai safety. arXiv preprint arXiv:2001.00078, 2019.

Ben Cottier, Tamay Besiroglu, and David Owen. Who is leading in ai? an analysis of industry ai research. arXiv preprint arXiv:2312.00043, 2023.

Oscar Delaney, Oliver Guest, and Zoe Williams. Mapping technical safety research at ai companies: A literature review and incentives analysis. arXiv preprint arXiv:2409.07878, 2024.

Nathaniel Demchak, Xin Guan, Zekun Wu, Ziyi Xu, Adriano Koshiyama, and Emre Kazim. Assessing bias in metric models for llm open-ended generation bias benchmarks. arXiv preprint arXiv:2410.11059, 2024.

Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, et al. Sycophancy to subterfuge: Investigating reward-tampering in large language models. arXiv preprint arXiv:2406.10162, 2024.

Robin Dillon, Peter Madsen, Brian Holland, and Danniel Cao. How ai can help learn lessons from incident reporting systems. In 2024 IEEE Aerospace Conference, pages 1–15. IEEE, 2024.

Benj Edwards. Openai’s new ai image generator is potent and bound to provoke. Ars Technica, 03 2025. URL https://arstechnica.com/ai/2025/03/ openais-new-ai-image-generator-is-potent-and-bound-to-provoke/.

Michael Farber and Lazaros Tampakis. Analyzing the impact of companies on ai research based on publications. arXiv preprint, 10 2023. URL https://arxiv.org/pdf/2310. 20444. Accessed: 2025-01-10.

Stavros Gadinis and Colby Mangels. Collaborative gatekeepers. Wash. & Lee L. Rev., 73:797, 2016.

Deep Ganguli, Nicholas Schiefer, Favarom Marina, and Jack Clark. Challenges in evaluating ai systems, 2023. URL https://www.anthropic.com/news/ evaluating-ai-systems. Accessed: 2025-01-23.

Donald G Gifford. Technological triggers to tort revolutions: steam locomotives, autonomous vehicles, and accident compensation. Journal of tort law, 11(1):71–143, 2018.

Andreas Glaese, Natasha McAleese, Julian Aslanides, Andy Huang, Laura Rimell, Jonathan Uesato, Jack Rae, Long Ouyang, Joe Mellor, Isaac Caswell, et al. Im- proving alignment of dialogue agents via targeted human feedback. arXiv preprint arXiv:2209.14375, 2022. URL https://arxiv.org/abs/2209.14375.

Melody Y Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Heylar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, et al. Deliberative alignment: Reasoning enables safer language models. arXiv preprint arXiv:2412.16339, 2024.

Gillian K Hadfield and Jack Clark. Regulatory markets: The future of ai governance. arXiv preprint arXiv:2304.04914, 2023.

Melissa Heikkilä and Stephen Morris. Deepmind slows down research releases to keep competitive edge in ai race. Financial Times, 04 2025. URL https://www.ft.com/ content/2ee1ffde-008e-4ea4-861b-24f15b25cf54. Accessed: 2025-04-10.

Jeff Horwitz and Georgia Wells. Meta’s ‘digital companions’ will talk sex with users—even children. The Wall Street Journal, 04 2025. URL https://www.wsj.com/ tech/ai/meta-ai-chatbots-sex-a25311bf.

Jane Hsieh, Joselyn Kim, Laura Dabbish, and Haiyi Zhu. "nip it in the bud": Moderation strategies in open source software projects and the role of bots. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW2):1–29, 2023.

Saffron Huang, Esin Durmus, Miles McCain, Kunal Handa, Alex Tamkin, Jerry Hong, Michael Stern, Arushi Somani, Xiuruo Zhang, and Deep Ganguli. Values in the wild: Discovering and analyzing values in real-world language model interactions. arXiv preprint arXiv:2504.15236, 2025.

Chris Hughes. Can we govern ai without breaking it?, 02 2025. URL https: //chrishughes.substack.com/p/can-we-govern-ai-without-breaking. Accessed: 2025-04-28.

Lujain Ibrahim, Saffron Huang, Lama Ahmad, and Markus Anderljung. Beyond static ai evaluations: advancing human interaction evaluations for llm harms and risks. arXiv preprint arXiv:2405.10632, 2024.

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024.

Ariba Khan, Stephen Casper, and Dylan Hadfield-Menell. Randomness, not representation: The unreliability of evaluating cultural alignment in llms. arXiv preprint arXiv:2503.08688, 2025.

Kevin Klyman, Caroline Meinhardt, Daniel Zhang, Elena Cryst, Russell Wald, and Aaron Bao. Expanding academia’s role in public sector ai. Issue brief, Stanford Institute for Human-Centered Artificial Intelligence, Stanford University, Stanford, CA, December 2024. URL https://hai.stanford.edu/policy/ expanding-academias-role-in-public-sector-ai. Accessed: 2025-04-21.

Kate Knibbs. Every ai copyright lawsuit in the us, visualized. WIRED, 03 2025. URL https://www.wired.com/story/ai-copyright-case-tracker/.

Anna Lenhart and Sarah Myers West. Lessons from the fda for ai, 08 2024. URL https: //ainowinstitute.org/publications/research/lessons-from-the-fda-for-ai.

Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, et al. A safe harbor for ai evaluation and red teaming. arXiv preprint arXiv:2403.04893, 2024.

Enming Luo, Wei Qiao, Katie Warren, Jingxiang Li, Eric Xiao, Krishna Viswanathan, Yuan Wang, Yintao Liu, Jimin Li, and Ariel Fuxman. Zero-shot image moderation in google ads with llm-assisted textual descriptions and cross-modal co-embeddings. In Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, pages 1092–1093, 2025.

Alexios Mantzarlis. Openai says "f**k it, we’re doing impersonation now". Faked Up, 04 2025. URL https://fakedup.org/ openai-says-fk-it-were-doing-impersonation-now/.

Nahema Marchal, Rachel Xu, Rasmi Elasmar, Iason Gabriel, Beth Goldberg, and William Isaac. Generative ai misuse: A taxonomy of tactics and insights from realworld data. arXiv preprint arXiv:2406.13843, 2024.

Michael Martinen, George Black, Ripple Bhullar, and Victor Marranca. Consolidated audit trail: Strategic planning and best practices. Journal of Securities Operations & Custody, 10(1):77–83, 2018.

Sara Merken. Ai ’hallucinations’ in court papers spell trouble for lawyers. Reuters, 02 2025. URL https://www.reuters.com/technology/artificial-intelligence/ ai-hallucinations-court-papers-spell-trouble-lawyers-2025-02-18/. Accessed: 2025-04-28.

Microsoft Corporation. Microsoft digital defense report 2024. Technical report, Microsoft Corporation, 10 2024. URL https://www. microsoft.com/en-us/security/security-insider/intelligence-reports/ microsoft-digital-defense-report-2024. Accessed: 2025-04-28.

Simon Mylius. Mit ai incident tracker, 2024. URL https://airisk.mit.edu/ ai-incident-tracker. Accessed: February 6, 2025.

Simon Mylius and Jamie Bernadi. Scalable ai incident classification, 2024. URL https: //simonmylius.com/blog/incident-classification. Blog post.

Richard Ngo, Lawrence Chan, and Sören Mindermann. The alignment problem from a deep learning perspective. arXiv preprint arXiv:2209.00626, 2022.

Parmy Olson. Supremacy: AI, ChatGPT, and the Race that Will Change the World. St. Martin’s Press, 2024.

OpenAI. How to implement llm guardrails, 2023. URL https://cookbook.openai. com/examples/how_to_use_guardrails. Accessed: 2025-04-29.

OpenAI. Influence and cyber operations: An update, 10 2024. URL https://openai.com/index/ disrupting-deceptive-uses-of-AI-by-covert-influence-operations.

OpenAI. Safety best practices. https://platform.openai.com/docs/guides/ safety-best-practices, 2024. Accessed: 2025-04-29.

OpenAI. Disrupting malicious uses of our models: February 2025 update. Technical report, OpenAI, 02 2025. URL https://cdn.openai.com/threat-intelligence-reports/ disrupting-malicious-uses-of-our-models-february-2025-update.pdf.

Tim O’Reilly. What auto safety teaches us about ai safety. Substack post, 11 2024. URL https://asimovaddendum.substack.com/p/what-auto-safety-teaches-us-about.

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.

Ethan Perez, Sam Ringer, Kamil˙e Lukoši¯ut˙e, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251, 2022.

Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, et al. Evaluating frontier models for dangerous capabilities. arXiv preprint arXiv:2403.13793, 2024.

Wei Qiao, Tushar Dogra, Otilia Stretcu, Yu-Han Lyu, Tiantian Fang, Dongjin Kwon, Chun-Ta Lu, Enming Luo, Yuan Wang, Chih-Chun Chia, et al. Scaling up llm reviews for google ads content moderation. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pages 1174–1175, 2024.

Alec Radford, JeffreyWu, Dario Amodei, Daniella Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, Amanda Askell, David Lansky, Danny Hernandez, and David Luan. Better language models and their implications, 2019. URL https://openai.com/index/ better-language-models/. Accessed: 2025-01-23.

Kevin Roose. If a.i. systems become conscious, should they have rights? The New York Times. URL https://www.nytimes.com/2025/04/24/technology/ ai-welfare-anthropic-claude.html. Accessed: 2025-04-28.

ShareGPT. Sharegpt vicuna unfiltered. https://huggingface.co/datasets/ anon8231489123/ShareGPT_Vicuna_unfiltered, 2023. Apache 2.0 License.

Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R Johnston, et al. Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548, 2023.

Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, et al. Release strategies and the social impacts of language models. arXiv preprint, arXiv:1908.09203, 2019. URL https://arxiv.org/abs/1908.09203.

Katy Spicer, Julia Jacbson, Daniel Stephen, Naija Perry, and Aden Hochrun. Artificial intelligence and the rise of product liability tort litigation: Novel action alleges ai chatbot caused minor’s suicide, 2024. URL https://www.privacyworld.blog/2024/11/ artificial-intelligence-and-the-rise-of-product-liability-tort-litigation-novel-action-Accessed: 2025-01-05.

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. Learning to summarize with human feedback. Advances in neural information processing systems, 33:3008–3021, 2020.

Ilan Strauss and Tim O’Reilly. Ai is entirely new, ai is exactly the same: Thoughts on the new white house ai memorandum, 10 2024. Asimov’s Addendum.

Ilan Strauss and Tim O’Reilly. Risk without uncertainty? openai would like us to think so... Asimov’s Addendum, 2024. URL https://asimovaddendum.substack.com/ p/can-we-have-ai-model-risk-evaluation. AI model evaluations, such as those conducted by OpenAI in its GPT system cards, aim to quantify model risks but often fail to account for uncertainty.

Faiz Surani and Daniel E. Ho. Ai on trial: Legal models hallucinate in 1 out of 6 (or more) benchmarking queries. Stanford HAI, 05 2024. URL https://hai.stanford.edu/news/ ai-trial-legal-models-hallucinate-1-out-6-or-more-benchmarking-queries. Accessed: 2025-04-28.

Alex Tamkin and et al. Clio: Privacy-preserving insights into real-world ai use. arXiv preprint arXiv:2412.13678, 2024.

Helen Toner and Ashwin Acharya. Exploring clusters of research in three areas of ai safety. Center for Security and Emerging Technology, 2022. URL https://cset.georgetown.edu/wp-content/uploads/ Exploring-Clusters-of-Research-in-Three-Areas-of-AI-Safety.pdf.

Sherry Turkle. Who do we become when we talk to machines?, 2024. URL https: //www.youtube.com/watch?v=yYlfGc0YR3Y.

Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, et al. Sociotechnical safety evaluation of generative ai systems. arXiv preprint arXiv:2310.11986, 2023.

Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, et al. Holistic safety and responsibility evaluations of advanced ai models. arXiv preprint arXiv:2404.14068, 2024.

Laura Weidinger, Deb Raji, Hanna Wallach, Margaret Mitchell, Angelina Wang, Olawale Salaudeen, Rishi Bommasani, Sayash Kapoor, Deep Ganguli, Sanmi Koyejo, et al. Toward an evaluation science for generative ai systems. arXiv preprint arXiv:2503.05336, 2025.

Laura Weidinger et al. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.

Weights & Biases. Responsible ai: A guide to guardrails and scorers. https://wandb. ai/site/articles/ai-guardrails/, 2025. Accessed: 2025-04-29.

Georgia Wells, Jeff Horwitz, and Deepa Seetharaman. Meta’s ’digital companions’ will talk sex with users—even children. The Wall Street Journal, 04 2025. URL https: //www.wsj.com/tech/ai/meta-ai-chatbots-sex-a25311bf.

Kyle Wiggers. Google folds more ai teams into deepmind to ‘accelerate the researchto- developer pipeline’, 01 2025. URL https://techcrunch.com/2025/01/09/ google-folds-more-ai-teams-into-deepmind-to-accelerate-the-research-to-developer-pipeline/. Accessed: 2025-01-16.

Steve Willison. OWASP Top Ten. https://owasp.org/www-project-top-ten/, 2024. Accessed: 2025-04-21.

Steve Wilson. The Developer’s Playbook for Large Language Model Security. O’Reilly Media, Incorporated, 2024.

Zack Witten. Measuring models’ special interests. https://zswitten.github.io/ 2025/04/14/model-special-interests.html, 2025. Accessed: 2025-04-18.

Erin Woo. Google’s ai unit reorganizes product work, announces changes to gemini app team. The Information, 03 2025. URL https://www.theinformation.com/briefings/ googles-ai-unit-reorganizes-product-work-announces-changes-to-gemini-app-team? rc=7em78a. Accessed: 2025-04-18.

Eliezer Yudkowsky. The AI-box experiment, 2002. URL http://yudkowsky.net/ singularity/aibox. Accessed: 2025-02-03.

Eliezer Yudkowsky. The sequences (lesswrong). https://www.lesswrong.com/tag/ sequences, 2020. Accessed: 2025-02-03.

Maxwell Zeff. Openai’s new reasoning ai models hallucinate more, 04 2025. URL https://techcrunch.com/2025/04/18/ openais-new-reasoning-ai-models-hallucinate-more/. TechCrunch, accessed April 23, 2025.

Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, and Maarten Sap. Biasx:" thinking slow" in toxic content moderation with explanations of implied social biases. arXiv preprint arXiv:2305.13589, 2023.

Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, and Yuntian Deng. Wildchat: 1m chatGPT interaction logs in the wild. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id= Bl8u7ZRlbM.

Zhiying Zhu, Yiming Yang, and Zhiqing Sun. Halueval-wild: Evaluating hallucinations of language models in the wild. arXiv preprint arXiv:2403.04307, 2024.

Georg Zoeller. Comment on ethan mollick’s post about model preferences and claude’s behavior. https://www.linkedin.com, 04 2025. LinkedIn post, April 15, 2025. Accessed via Ethan Mollick’s public post.

Real-World Gaps in AI Governance Research

Authors

DOI:

Keywords:

Abstract

Author Biographies

Ilan Strauss, AI Disclosures Project, Social Science Research Council

Isobel Moure, AI Disclosures Project, Social Science Research Council

Tim O’Reilly, O'Reilly Media; 1AI Disclosures Project, Social Science Research Council

Sruly Rosenblat, AI Disclosures Project, Social Science Research Council

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Current Issue

Announcements

Dario Amodei, The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI

Steve Omohundro: Regulating AGI: From Liability to Provable Contracts

Joe Rogan Experience #2345 - Roman Yampolskiy

Steve Omohundro Receives 2024 Future of Life Award

Steve Omohundro and Scientists Discuss the AI Alignment Problem with Neil deGrasse Tyson

Information