Pitfalls of Evidence-Based AI Policy

Stephen Casper; David Krueger; Dylan Hadfield-Menell

doi:10.70777/si.v2i2.14611

Authors

Stephen Casper MIT CSAIL
David Krueger Mila
Dylan Hadfield-Menell MIT CSAIL

DOI:

https://doi.org/10.70777/si.v2i2.14611

Keywords:

ai governance, ai regulation, safety methods

Abstract

Nations across the world are working to govern AI. However, from a technical perspective, there is uncertainty and disagreement on the best way to do this. Meanwhile, recent debates over AI regulation have led to calls for “evidence-based AI policy” which emphasize holding regulatory action to a high evidentiary standard. Evidence is of irreplaceable value to policymaking. However, holding regulatory action to too high an evidentiary standard can lead to systematic neglect of certain risks. In historical policy debates (e.g., over tobacco ca. 1965 and fossil fuels ca. 1985) “evidence-based policy” rhetoric is also a well-precedented strategy to downplay the urgency of action, delay regulation, and protect industry interests. Here, we argue that if the goal is evidence-based AI policy, the first regulatory objective must be to actively facilitate the process of identifying, studying, and deliberating about AI risks. We discuss a set of 15 regulatory goals to facilitate this and show that Brazil, Canada, China, the EU, South Korea, the UK, and the USA all have substantial opportunities to adopt further evidence-seeking policies.

Author Biography

Stephen Casper, MIT CSAIL

Hi, I’m Stephen Casper, but most people call me Cas. I work on technical AI governance. I’m a fourth-year PhD student at MIT in Computer Science (EECS) in the Algorithmic Alignment Group, advised by Dylan Hadfield-Menell. I’m also leading a research stream for MATS, and I was a writer for the International AI Safety Report and the Singapore Consensus. I’m supported by the Vitalik Buterin Fellowship from the Future of Life Institute. Formerly, I have worked with the Harvard Kreiman Lab and the Center for Human-Compatible AI.

Stalk me on Google Scholar, Twitter, and BlueSky. See also my core beliefs about AI risks and my thoughts on reframing AI safety as a neverending institutional challenge. I also have a personal feedback form. Feel free to use it to send me anonymous, constructive feedback about how I can be better.

Papers 2025

Tegmark, M., Song, D., Xue, L., Ong, L., Russell, S., Maharaj, T., Zhang, Y.-Q., Bengio, Y., Mindermann, S., Casper, S., Lee, W. S., & Wilfred, V. (2025). The Singapore Consensus on Global AI Safety Research Priorities.

Staufer, L., Yang, M., Reuel, A., & Casper, S. (2025). Audit Cards: Contextualizing AI Evaluations. arXiv preprint arXiv:2504.13839.

Casper, S., Bailey, L., & Schreier, T. (2025). Practical Principles for AI Cost and Compute Accounting. arXiv preprint arXiv:2502.15873.

Schwinn, L., Scholten, Y., Wollschläger, T., Xhonneux, S., Casper, S., Günnemann, S., & Gidel, G. (2025). Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives. arXiv preprint arXiv:2502.11910.

Casper, S., Krueger, D., & Hadfield-Menell, D. (2025). Pitfalls of Evidence-Based AI Policy. ICLR 2025 Blog Post.

Khan, A., Casper, S., & Hadfield-Menell, D. (2025). Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs. Proceedings of the 2025 ACM conference on fairness, accountability, and transparency. 2025.

Che, Z.,* Casper, S.,* Kirk, R., Satheesh, A., Slocum, S., McKinney, L. E., … & Hadfield-Menell, D. (2025). Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities. arXiv preprint arXiv:2502.05209.

Casper, S., Bailey, L., Hunter, R., Ezell, C., Cabalé, E., Gerovitch, M., … & Kolt, N. (2025). The AI Agent Index. arXiv preprint arXiv:2502.01635.

Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., … & Zeng, Y. (2025). International AI Safety Report. arXiv preprint arXiv:2501.17805.

Sharkey, L., Chughtai, B., Batson, J., Lindsey, J., Wu, J., Bushnaq, L., … Casper, S … & McGrath, T. (2025). Open Problems in Mechanistic Interpretability. arXiv preprint arXiv:2501.16496.

Barez, F., Fu, T., Prabhu, A., Casper, S., Sanyal, A., Bibi, A., … & Gal, Y. (2025). Open Problems in Machine Unlearning for AI Safety. arXiv preprint arXiv:2501.04952.

References

Mohamed Abdalla and Moustafa Abdalla. The grey hoodie project: Big tobacco, big tech, and the threat on academic integrity. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2020. URL https://api.semanticscholar.org/CorpusID:221995749.

Sara Abdulla and Husanjot Chahal. Voices of innovation: An analysis of influential ai researchers in the united states. July 2023. doi: 10.51593/20220022. URL https://doi.org/10. 51593/20220022. ‘

AI Index Steering Committee. Diversity in ai. In The AI Index Report: Measuring Trends in Artificial Intelligence, chapter 6. Stanford Institute for Human-Centered Artificial Intelligence (HAI), Stanford, CA, 2021. URL https://aiindex.stanford.edu/ ai-index-report-2021/.

Mike Ananny and Kate Crawford. Seeing without knowing. Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society, 20:973 – 989, 2018. URL https://api.semanticscholar.org/CorpusID:5001487.

Markus Anderljung, Everett Thornton Smith, Joe O’Brien, Lisa Soder, Benjamin Bucknall, Emma Bluemke, Jonas Schuett, Robert Trager, Lacey Strahm, and Rumman Chowdhury. Towards publicly accountable frontier llms: Building an external scrutiny ecosystem under the aspire framework. arXiv preprint arXiv:2311.14711, 2023.

Thornton Smith, Joe O’Brien, Lisa Soder, Benjamin Bucknall, Emma Bluemke, Jonas Schuett, Robert Trager, Lacey Strahm, and Rumman Chowdhury. Towards publicly accountable frontier llms: Building an external scrutiny ecosystem under the aspire framework.arXiv preprint arXiv:2311.14711, 2023.

Anthropic. Responsible scaling program updates, October 2024. URL https://www.anthropic.com/rsp-updates. Accessed: 2024-11-21.

Sinan Arda. Taxonomy to regulation: A (geo) political taxonomy for ai risks and regulatory measuresin the eu ai act. arXiv preprint arXiv:2404.11476, 2024.

Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. Measuring implicit biasin explicitly unbiased large language models. arXiv preprint arXiv:2402.04105, 2024.

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, YuvalNoah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian K. Hadfield, JeffClune, Tegan Maharaj, Frank Hutter, Atilim Gunes Baydin, Sheila A. McIlraith, Qiqi Gao,Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman,Jan Markus Brauner, and S¨oren Mindermann. Managing extreme ai risks amid rapid progress. Science,384:842 – 845, 2023. URL https://api.semanticscholar.org/CorpusID:269929051.

Yoshua Bengio, Soren Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Danielle Goldfarb, Hoda Heidari, Leila Khalatbari, et al. International scientific report on the safety of advanced ai (interim report). arXiv preprint arXiv:2412.05282,2024.

Yoshua Bengio, S¨oren Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, StephenCasper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, et al. International ai safetyreport. arXiv preprint arXiv:2501.17805, 2025.

Shaily Bhatt, Sunipa Dev, Partha Talukdar, Shachi Dave, and Vinodkumar Prabhakaran. Recontextualizingfairness in nlp: The case of india. arXiv preprint arXiv:2209.12226, 2022.

Joseph Biden. Executive order 14110: Safe, secure, and trustworthy developmentand use of artificial intelligence, October 2023. URL https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence.Accessed: 2024-11-21.

Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao.The values encoded in machine learning research. In Proceedings of the 2022 ACM Conferenceon Fairness, Accountability, and Transparency, pp. 173–184, 2022.14 Published as a blog post at ICLR 2025

Rishi Bommasani, Sanjeev Arora, Yejin Choi, Li Fei-Fei, Daniel E. Ho, Dan Jurafsky,Sanmi Koyejo, Hima Lakkaraju, Arvind Narayanan, Alondra Nelson, Emma Pierson, JoellePineau, Gael Varoquaux, Suresh Venkatasubramanian, Ion Stoica, Percy Liang, and DawnSong. A path for science- and evidence-based ai policy, 2024a. URL https://understanding-ai-safety.org/.

Rishi Bommasani, Kevin Klyman, Sayash Kapoor, Shayne Longpre, Betty Xiong, Nestor Maslej,and Percy Liang. The foundation model transparency index v1.1: May 2024. ArXiv,abs/2407.12929, 2024b. URL https://api.semanticscholar.org/CorpusID:271270331.

Brazil. Bill No. 2338 of 2023: Regulating the Use of ArtificialIntelligence, Including Algorithm Design and Technical Standards,2023. URL https://digitalpolicyalert.org/event/11237-introduced-bill-no-2338-of-2023-regulating-the-use-of-artificial-intelligence-Accessed: 2024-11-21.

Paul Cairney. Evidence-based policymaking. In European Commission Joint Research Centre, pp.1–3. 2021. URL https://paulcairney.wordpress.com/wp-content/uploads/2021/11/3_cairney_evidence-based-policymaking-16.11.21.pdf.

Canada. AI and Data Act: Part of Bill C-27, Digital Charter Implementation Act, 2022,2022. URL https://www.parl.ca/DocumentViewer/en/44-1/bill/C-27/first-reading. Accessed: 2024-11-21.

Martin Casado. Base ai policy on evidence, not existential angst. AndreessenHorowitz, December 2024. URL https://a16z.com/base-ai-policy-on-evidence-not-existential-angst/.

Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Ben Bucknall,Andreas A. Haupt, Kevin Wei, J’er’emy Scheurer, Marius Hobbhahn, Lee Sharkey, SatyapriyaKrishna, Marvin von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, DavidBau, Max Tegmark, David Krueger, and Dylan Hadfield-Menell. Black-box access is insufficientfor rigorous ai audits. Proceedings of the 2024 ACM Conference on Fairness, Accountability,and Transparency, 2024. URL https://api.semanticscholar.org/CorpusID:267301601.

Stephen Casper, Luke Bailey, Rosco Hunter, Carson Ezell, Emma Cabal´e, Michael Gerovitch, StewartSlocum, Kevin Wei, Nikola Jurkovic, Ariba Khan, et al. The ai agent index. arXiv preprintarXiv:2502.01635, 2025.CASS. Model Artificial Intelligence Law Version 1.0 (Expert SuggestionDraft), 2023. URL https://digichina.stanford.edu/work/translation-artificial-intelligence-law-model-law-v-1-0-expert-suggestion-draft-aug-Accessed: 2024-11-21.

Mauro Cazzaniga, Ms Florence Jaumotte, Longji Li, Mr Giovanni Melina, Augustus J Panton, CarloPizzinelli, Emma J Rockall, and Ms Marina Mendes Tavares. Gen-AI: Artificial intelligence andthe future of work. International Monetary Fund, 2024.

Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov,Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasinglyagentic algorithmic systems. In Proceedings of the 2023 ACM Conference on Fairness,Accountability, and Transparency, pp. 651–666, 2023.China. Provisions on the Management of Algorithmic Recommendations in Internet InformationServices, 2022a. URL https://www.chinalawtranslate.com/en/algorithms/.Accessed: 2024-11-21.

China. Provisions on the Administration of Deep Synthesis Internet Information Services, 2022b.URL https://www.chinalawtranslate.com/en/deep-synthesis/. Accessed:2024-11-21.15Published as a blog post at ICLR 2025China. Interim Measures for the Management of Generative Artificial Intelligence Services, 2023.URL https://www.chinalawtranslate.com/en/generative-ai-interim/.Accessed: 2024-11-21.

Sidney Dekker. Foundations of Safety Science: A Century of Understanding Accidents and Disasters.Routledge, Abingdon, UK and New York, USA, 2019. ISBN 978-1138493989.

Department of Commerce. Establishment of reporting requirements for the developmentof advanced artificial intelligence models and computing clusters. https://www.federalregister.gov/documents/2024/09/11/2024-20529/establishment-of-reporting-requirements-for-the-development-of-advanced-artificial-2024.

EU. Regulation (eu) 2024/1689 of the european parliament and of the council of 13 june 2024laying down harmonised rules on artificial intelligence and amending certain union legislative acts(artificial intelligence act). https://eur-lex.europa.eu/eli/reg/2024/1689/oj,2024. Accessed: 2024-11-21.

Chris Gebhardt. 1983-1986: The missions and history of space shuttle challenger. NASA SpaceFlight, 28, 2011.Ronald N. Giere, John Bickle, and Robert F. Mauldin. Understanding Scientific Reasoning.Wadsworth Publishing, Belmont, CA, 5th edition, 2006. ISBN 978-0495004724.

Gillian K. Hadfield and Jack Clark. Regulatory markets: The future of ai governance. ArXiv,abs/2304.04914, 2023. URL https://api.semanticscholar.org/CorpusID:258060072.Lennart Heim and Leonie Koessler. Training compute thresholds: Features and functions in airegulation. arXiv preprint arXiv:2405.10799, 2024.

Dan Hendrycks and Thomas Woodside. A bird’s eye view of the mlfield [pragmatic ai safety #2]. AI Alignment Forum, 2022. URLhttps://www.alignmentforum.org/posts/AtfQFj8umeyBBkkxa/a-bird-s-eye-view-of-the-ml-field-pragmatic-ai-safety-2.

Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, and Sharese King. Dialect prejudicepredicts ai decisions about people’s character, employability, and criminality. arXiv preprintarXiv:2403.00742, 2024.

Sara Hooker. On the limitations of compute thresholds as a governance strategy. ArXiv,abs/2407.05694, 2024. URL https://api.semanticscholar.org/CorpusID:271051333.

Thomas S Kuhn. The structure of scientific revolutions. University of, 965, 1962.

Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, BorhaneBlili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, et al. A safeharbor for ai evaluation and red teaming. arXiv preprint arXiv:2403.04893, 2024.

Elliot McKernon, Gwyn Glasser, Deric Cheng, and Gillian Hadfield. Ai model registries: A foundational tool for ai governance. 2024. URL https://api.semanticscholar.org/CorpusID:273345835.

Shivani Metta, Isaac Chang, Jack Parker, Michael P. Roman, and Arturo F. Ehuan. Generative ai incybersecurity, 2024. URL https://arxiv.org/abs/2405.01674.

Micah Musser. A cost analysis of generative language models and influence operations. arXivpreprint arXiv:2308.03740, 2023.

Arvind Narayanan and Sayash Kapoor. AI Snake Oil: What artificial intelligence can do, what itcan’t, and how to tell the difference. Princeton University Press, 2024.16. Published as a blog post at ICLR 2025.

National Assembly of the Republic of Korea. Bill details: Prc r2v4h1w1t2k5m1o6e4q9t0v7q9s0u0,2025. URL https://likms.assembly.go.kr/bill/billDetail.do?billId=PRC_R2V4H1W1T2K5M1O6E4Q9T0V7Q9S0U0. Accessed: 2025-02-12.S.

Nevo, D. Lahav, A. Karpur, Y. O. G. E. V. Bar-On, H. A. Bradley, and J. Alstott. Securing ai model weights. Technical report, RAND Corporation, 2024.URL https://www.rand.org/content/dam/rand/pubs/research_reports/RRA2800/RRA2849-1/RAND_RRA2849-1.pdf.

California Court of Appeal. Grimshaw v. ford motor co. 119, 1981. URL https://law.justia.com/cases/california/court-of-appeal/3d/119/757.html. Court decision concerning product liability and punitive damages related to the Ford Pinto. OpenAI. Introducing the model spec, 2024. URL https://openai.com/index/introducing-the-model-spec/.

Naomi Oreskes and Erik M. Conway. Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming. Bloomsbury Publishing, 2010.

David Patterson. Technical perspective: For better or worse, benchmarks shape a field. Commun.ACM, 55(7), 2012.

Vinodkumar Prabhakaran, Rida Qadri, and Ben Hutchinson. Cultural incongruencies in artificialintelligence. arXiv preprint arXiv:2211.13069, 2022.

Rida Qadri, Renee Shelby, Cynthia L Bennett, and Emily Denton. Ai’s regimes of representation:A community-centered study of text-to-image models in south asia. In Proceedings of the 2023ACM Conference on Fairness, Accountability, and Transparency, pp. 506–517, 2023.

Inioluwa Deborah Raji, Emily M Bender, Amandalynne Paullada, Emily Denton, and Alex Hanna.Ai and the everything in the whole wide world benchmark. arXiv preprint arXiv:2111.15366,2021.

Inioluwa Deborah Raji, Peggy Xu, Colleen Honigsberg, and Daniel E. Ho. Outsider oversight:Designing a third party audit ecosystem for ai governance. Proceedings of the 2022 AAAI/ACMConference on AI, Ethics, and Society, 2022. URL https://api.semanticscholar.org/CorpusID:249605439.

David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani,Julian Michael, and Samuel R Bowman. Gpqa: A graduate-level google-proof q&a benchmark.arXiv preprint arXiv:2311.12022, 2023.

Richard Ren, Steven Basart, Adam Khoja, Alice Gatti, Long Phan, Xuwang Yin, Mantas Mazeika,Alexander Pan, Gabriel Mukobi, Ryan H Kim, et al. Safetywashing: Do AI safety benchmarksactually measure safety progress? arXiv preprint arXiv:2407.21792, 2024.

Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran.Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM conferenceon fairness, accountability, and transparency, pp. 315–328, 2021.

Jonas B Sandbrink. Artificial intelligence and biological misuse: Differentiating risks of languagemodels and biological design tools. arXiv preprint arXiv:2306.13952, 2023.

Girish Sastry, Lennart Heim, Haydn Belfield, Markus Anderljung, Miles Brundage, Julian Hazell,Cullen O’Keefe, Gillian K Hadfield, Richard Ngo, Konstantin Pilz, et al. Computing power andthe governance of artificial intelligence. arXiv preprint arXiv:2402.08797, 2024.

Shivalika Singh, Freddie Vargus, Daniel Dsouza, B¨orje F Karlsson, Abinaya Mahendiran, Wei-YinKo, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, et al. Aya dataset:An open-access collection for multilingual instruction tuning. arXiv preprint arXiv:2402.06619,2024.17. Published as a blog post at ICLR 2025,

Peter Slattery, Alexander K Saeri, Emily AC Grundy, Jess Graham, Michael Noetel, Risto Uuk,James Dao, Soroush Pour, Stephen Casper, and Neil Thompson. The AI risk repository: A comprehensivemeta-review, database, and taxonomy of risks from artificial intelligence. arXiv preprintarXiv:2408.12622, 2024.

Matthew Caleb Stephenson. Information acquisition and institutional design. Harvard Law Review,124:1422–1484, 2011. URL https://api.semanticscholar.org/CorpusID:154434972.

Nassim Nicholas Taleb, Rupert Read, Raphael Douady, Joseph Norman, and Yaneer Bar-Yam. The precautionary principle (with application to the genetic modification of organisms). arXiv preprintarXiv:1410.5787, 2014.

Philip Moreira Tomei, Rupal Jain, and Matija Franklin. Ai governance through markets. arXivpreprint arXiv:2501.17755, 2025.

USA. H.R. 9497, AI Advancement and Reliability Act, 2024a. URL https://science.house.gov/2024/9/h-r-xxxx-ai-advancement-and-reliability-act. Accessed:2024-11-21.USA. S.4178 - Future of Artificial Intelligence Innovation Act of 2024, 2024b. URL https://www.congress.gov/bill/118th-congress/senate-bill/4178. Accessed:2024-11-21.USA. Preserving american dominance in artificial intelligence act of 2024. https://www.congress.gov/bill/118th-congress/senate-bill/5616/text, 2024c. Accessed:2025-02-04.

YixinWan and Kai-Wei Chang. White men lead, black women help: Uncovering gender, racial, andintersectional bias in language agency. arXiv preprint arXiv:2404.10508, 2024.

Yixin Wan, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, and Nanyun Peng. ”kelly isa warm person, joseph is a role model”: Gender biases in llm-generated reference letters. arXivpreprint arXiv:2310.09219, 2023.

Pitfalls of Evidence-Based AI Policy

Authors

DOI:

Keywords:

Abstract

Author Biography

Stephen Casper, MIT CSAIL

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Most read articles by the same author(s)

Current Issue

Announcements

Dario Amodei, The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI

Steve Omohundro: Regulating AGI: From Liability to Provable Contracts

Joe Rogan Experience #2345 - Roman Yampolskiy

Steve Omohundro Receives 2024 Future of Life Award

Steve Omohundro and Scientists Discuss the AI Alignment Problem with Neil deGrasse Tyson

Information