The First International AI Safety Report: The International Scientific Report on the Safety of Advanced AI

Yoshua Bengio

doi:10.70777/si.v2i2.14755

Authors

Yoshua Bengio Université de Montréal; Mila https://orcid.org/0000-0002-9322-3515

DOI:

https://doi.org/10.70777/si.v2i2.14755

Keywords:

ai safety, ai value alignment, agi safety, artificial general intelligence, superintelligence, ai governance, ai risks, ai risk mitigation

Abstract

This is the first International AI Safety Report. Following an interim publication in May 2024, a diverse group of 96 Artificial Intelligence (AI) experts contributed to this first full report, including an international Expert Advisory Panel nominated by 30 countries, the Organisation for Economic Co-operation and Development (OECD), the European Union (EU), and the United Nations (UN). The report aims to provide scientific information that will support informed policymaking. It does not recommend specific policies….

This report summarises the scientific evidence on the safety of general-purpose AI. The purpose of this report is to help create a shared international understanding of risks from advanced AI and how they can be mitigated. To achieve this, this report focuses on general-purpose AI – or AI that can perform a wide variety of tasks – since this type of AI has advanced particularly rapidly in recent years and has been deployed widely by technology companies for a range of consumer and business purposes. The report synthesises the state of scientific understanding of general-purpose AI, with a focus on understanding and managing its risks.

Amid rapid advancements, research on general-purpose AI is currently in a time of scientific discovery, and – in many cases – is not yet settled science. The report provides a snapshot of the current scientific understanding of general-purpose AI and its risks. This includes identifying areas of scientific consensus and areas where there are different views or gaps in the current scientific understanding.

People around the world will only be able to fully enjoy the potential benefits of general-purpose AI safely if its risks are appropriately managed. This report focuses on identifying those risks and evaluating technical methods for assessing and mitigating them, including ways that general-purpose AI itself can be used to mitigate risks.

Y. Bengio, S. Mindermann, D. Privitera, T. Besiroglu, R. Bommasani, S. Casper, Y. Choi, P. Fox, B. Garfinkel, D. Goldfarb, H. Heidari, A. Ho, S. Kapoor, L. Khalatbari, S. Longpre, S. Manning, V. Mavroudis, M. Mazeika, J. Michael, J. Newman, K. Y. Ng, C. T. Okolo, D. Raji, G. Sastry, E. Seger, T. Skeadas, T. South, E. Strubell, F. Tramèr, L. Velasco, N. Wheeler, D. Acemoglu, O. Adekanmbi, D. Dalrymple, T. G. Dietterich, P. Fung, P.-O. Gourinchas, F. Heintz, G. Hinton, N. Jennings, A. Krause, S. Leavy, P. Liang, T. Ludermir, V. Marda, H. Margetts, J. McDermid, J. Munga, A. Narayanan, A. Nelson, C. Neppel, A. Oh, G. Ramchurn, S. Russell, M. Schaake, B. Schölkopf, D. Song, A. Soto, L. Tiedrich, G. Varoquaux, E. W. Felten, A. Yao, Y.-Q. Zhang, O. Ajala, F. Albalawi, M. Alserkal, G. Avrin, C. Busch, A. C. P. de L. F. de Carvalho, B. Fox, A. S. Gill, A. H. Hatip, J. Heikkilä, C. Johnson, G. Jolly, Z. Katzir, S. M. Khan, H. Kitano, A. Krüger, K. M. Lee, D. V. Ligot, J. R. López Portillo, D., O. Molchanovskyi, A. Monti, N. Mwamanzi, M. Nemer, N. Oliver, R. Pezoa Rivera, B. Ravindran, H. Riza, C. Rugege, C. Seoighe, H. Sheikh, J. Sheehan, D. Wong, Y. Zeng, “International AI Safety Report” (DSIT 2025/001, 2025); https://www.gov.uk/government/publications/international-ai-safety-report-2025

Author Biography

Yoshua Bengio, Université de Montréal; Mila

Recognized worldwide as one of the leading experts in artificial intelligence, Yoshua Bengio is most known for his pioneering work in deep learning, earning him the 2018 A.M. Turing Award, “the Nobel Prize of Computing,” with Geoffrey Hinton and Yann LeCun, and making him the computer scientist with the largest number of citations and h-index.

He is Full Professor at Université de Montréal, and Founder and Scientific Advisor of Mila – Quebec AI Institute. He co-directs the CIFAR Learning in Machines & Brains program and acts as Special Advisor and Founding Scientific Director of IVADO.

He received numerous awards, including the prestigious Killam Prize and Herzberg Gold medal in Canada, CIFAR’s AI Chair, Spain’s Princess of Asturias Award, the VinFuture Prize and he is a Fellow of both the Royal Society of London and Canada, Knight of the Legion of Honor of France, Officer of the Order of Canada, Member of the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology. Yoshua Bengio was named in 2024 one of TIME’s magazine 100 most influtential people in the world.

Concerned about the social impact of AI, he actively contributed to the Montreal Declaration for the Responsible Development of Artificial Intelligence and currently chairs the International Scientific Report on the Safety of Advanced AI.

References

R. Simmons-Edler, R. Badman, S. Longpre, K. Rajan, “AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research” in Proceedings of the 41st International Conference on Machine Learning (ICML 2024) (PMLR, 2024); https://proceedings.mlr.press/v235/simmons-edler24a.html.

* OpenAI, “OpenAI o1 System Card” (OpenAI, 2024); https://cdn.openai.com/o1-system-card-20240917.pdf.

* OpenAI, “GPT-4o System Card” (OpenAI, 2024); https://cdn.openai.com/gpt-4o-system-card.pdf.

* Gemini Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, D. Silver, M. Johnson, I. Antonoglou, J. Schrittwieser, A. Glaese, J. Chen, E. Pitler, … O. Vinyals, “Gemini: A Family of Highly Capable Multimodal Models” (Google DeepMind, 2023); http://arxiv.org/abs/2312.11805.

* Anthropic, Claude 3.5 Sonnet Model Card Addendum (2024); https://www-cdn.anthropic.com/fed9cc193a14b84131812372d8d5857f8f304c52/Model_Card_Claude_3_Addendum.pdf.

* Cohere, Command R+ (2024); https://docs.cohere.com/v2/docs/command-r-plus.

* B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Lu, K. Dang, Y. Fan, Y. Zhang, A. Yang, R. Men, F. Huang, B. Zheng, … J. Lin, Qwen2.5-Coder Technical Report, arXiv [cs.CL] (2024); http://arxiv.org/abs/2409.12186.

* Y. Sun, S. Wang, S. Feng, S. Ding, C. Pang, J. Shang, J. Liu, X. Chen, Y. Zhao, Y. Lu, W. Liu, Z. Wu, W. Gong, J. Liang, Z. Shang, P. Sun, W. Liu, … H. Wang, ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation, arXiv [cs.CL] (2021); http://arxiv.org/abs/2107.02137.

* X. Sun, Y. Chen, Y. Huang, R. Xie, J. Zhu, K. Zhang, S. Li, Z. Yang, J. Han, X. Shu, J. Bu, Z. Chen, X. Huang, F. Lian, S. Yang, J. Yan, Y. Zeng, … J. Jiang, Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, arXiv [cs.CL] (2024); http://arxiv.org/abs/2411.02265.

* 01.AI, A. Young, B. Chen, C. Li, C. Huang, G. Zhang, G. Zhang, H. Li, J. Zhu, J. Chen, J. Chang, K. Yu, P. Liu, Q. Liu, S. Yue, S. Yang, S. Yang, … Z. Dai, Yi: Open Foundation Models by 01.AI, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.04652.

* Meta, Llama-3.1-8B Official Model Card (2024); https://huggingface.co/meta-llama/Llama-3.1-8B.

* Mistral AI, Model Card for Mistral-Large-Instruct-2407 (2024); https://huggingface.co/mistralai/Mistral-Large-Instruct-2407.

L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, M.-H. Yang, Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Computing Surveys 56, 1–39 (2023); https://doi.org/10.1145/3626235.

* OpenAI, “DALL·E 3 System Card” (OpenAI, 2023); https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf.

* P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y. Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, K. Lacey, A. Goodwin, Y. Marek, R. Rombach, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, arXiv [cs.CV] (2024); http://arxiv.org/abs/2403.03206.

* T. Brooks, B. Peebles, C. Holmes, W. DePue, Y. Guo, L. Jing, D. Schnurr, J. Taylor, T. Luhman, E. Luhman, C. Ng, R. Wang, A. Ramesh, “Video Generation Models as World Simulators” (OpenAI, 2024); https://openai.com/research/video-generation-models-as-world-simulators.

B. Guo, X. Shan, J. Chung, A Comparative Study on the Features and Applications of AI Tools -Focus on PIKA Labs and RUNWAY. International Journal of Internet, Broadcasting and Communication 16, 86–91 (2024); https://doi.org/10.7236/ijibc.2024.16.1.86.

D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, … P. Florence, “PaLM-E: An Embodied Multimodal Language Model” in Proceedings of the 40th International Conference on Machine Learning (ICML’23) (PMLR, Honolulu, HI, USA, 2023) vol. 202, pp. 8469–8488; https://dl.acm.org/doi/10.5555/3618408.3618748.

References

* Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y. L. Tan, L. Y. Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, … S. Levine, Octo: An Open-Source Generalist Robot Policy, arXiv [cs.RO] (2024); http://arxiv.org/abs/2405.12213.

M. Firat, S. Kuleli̇, What If GPT4 Became Autonomous: The Auto-GPT Project and Use Cases. Journal of Emerging Computer Technologies 3, 1–6 (2024); https://doi.org/10.57020/ject.1297961.

* Y. Wang, T. Shen, L. Liu, J. Xie, Sibyl: Simple yet Effective Agent Framework for Complex Real-World Reasoning, arXiv [cs.AI] (2024); http://arxiv.org/abs/2407.10718.

* C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, D. Ha, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, arXiv [cs.AI] (2024); http://arxiv.org/abs/2408.06292.

J. Abramson, J. Adler, J. Dunger, R. Evans, T. Green, A. Pritzel, O. Ronneberger, L. Willmore, A. J. Ballard, J. Bambrick, S. W. Bodenstein, D. A. Evans, C.-C. Hung, M. O’Neill, D. Reiman, K. Tunyasuvunakool, Z. Wu, … J. M. Jumper, Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3. Nature 630, 493–500 (2024); https://doi.org/10.1038/s41586-024-07487-w.

Y. LeCun, Y. Bengio, G. Hinton, Deep Learning. Nature 521, 436–444 (2015); https://doi.org/10.1038/nature14539.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. U. Kaiser, I. Polosukhin, “Attention Is All You Need” in Advances in Neural Information Processing Systems (NIPS 2017) (Curran Associates, Inc., 2017) vol. 30; https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

J. Sevilla, L. Heim, A. Ho, T. Besiroglu, M. Hobbhahn, P. Villalobos, “Compute Trends Across Three Eras of Machine Learning” in 2022 International Joint Conference on Neural Networks (IJCNN 2022) (Padua, Italy, 2022), pp. 1–8; https://doi.org/10.1109/IJCNN55064.2022.9891914.

B. Cottier, R. Rahman, L. Fattorini, N. Maslej, D. Owen, How Much Does It Cost to Train Frontier AI Models?, Epoch AI (2024); https://epochai.org/blog/how-much-does-it-cost-to-train-frontier-ai-models.

C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. Yu, S. Zhang, G. Ghosh, M. Lewis, L. Zettlemoyer, O. Levy, “LIMA: Less Is More for Alignment” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=KBMOKmX2he.

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, C. Finn, “Direct Preference Optimization: Your Language Model Is Secretly a Reward Model” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=HPuSIXJaa9.

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Gray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, … R. Lowe, “Training Language Models to Follow Instructions with Human Feedback” in 36th Conference on Neural Information Processing Systems (NeurIPS 2022) (New Orleans, LA, USA, 2022); https://openreview.net/forum?id=TG8KACxEON.

* Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield-Dodds, … J. Kaplan, Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, arXiv [cs.CL] (2022); http://arxiv.org/abs/2204.05862.

* N. McAleese, R. M. Pokorny, J. F. C. Uribe, E. Nitishinskaya, M. Trebacz, J. Leike, LLM Critics Help Catch LLM Bugs, arXiv [cs.SE] (2024); http://arxiv.org/abs/2407.00215.

* H. Lee, S. Phatale, H. Mansoor, T. Mesnard, J. Ferret, K. Lu, C. Bishop, E. Hall, V. Carbune, A. Rastogi, S. Prakash, RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, arXiv [cs.CL] (2023); http://arxiv.org/abs/2309.00267.

M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, T. Gebru, “Model Cards for Model Reporting” in Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19) (Association for Computing Machinery, New York, NY, USA, 2019), pp. 220–229; https://doi.org/10.1145/3287560.3287596.

* I. Solaiman, The Gradient of Generative AI Release: Methods and Considerations, arXiv [cs.CY] (2023); http://arxiv.org/abs/2302.04844.

* Open Source Initiative, The Open Source AI Definition – 1.0-RC2, Open Source Initiative (2024); https://opensource.org/ai/drafts/the-open-source-ai-definition-1-0-rc2.

* A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan, A. Goyal, A. Hartshorn, A. Yang, A. Mitra, A. Sravankumar, A. Korenev, A. Hinsvark, … Z. Zhao, “The Llama 3 Herd of Models” (Meta, 2024); https://ai.meta.com/research/publications/the-llama-3-herd-of-models/.

M. Stein, C. Dunlop, Safe beyond Sale: Post-Deployment Monitoring of AI (2024); https://www.adalovelaceinstitute.org/blog/post-deployment-monitoring-of-ai/.

References

E. Shayegani, M. A. Al Mamun, Y. Fu, P. Zaree, Y. Dong, N. Abu-Ghazaleh, Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.10844.

R. T. McCoy, S. Yao, D. Friedman, M. D. Hardy, T. L. Griffiths, When a Language Model Is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI o1, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.01792.

U. Anwar, A. Saparov, J. Rando, D. Paleka, M. Turpin, P. Hase, E. S. Lubana, E. Jenner, S. Casper, O. Sourbut, B. L. Edelman, Z. Zhang, M. Günther, A. Korinek, J. Hernandez-Orallo, L. Hammond, E. Bigelow, … D. Krueger, Foundational Challenges in Assuring Alignment and Safety of Large Language Models, arXiv [cs.LG] (2024); http://arxiv.org/abs/2404.09932.

* R. T. McCoy, S. Yao, D. Friedman, M. Hardy, T. L. Griffiths, Embers of Autoregression: Understanding Large Language Models through the Problem They Are Trained to Solve, arXiv [cs.CL] (2023); http://arxiv.org/abs/2309.13638.

Y. Razeghi, R. L. Logan IV, M. Gardner, S. Singh, Impact of Pretraining Term Frequencies on Few-Shot Reasoning, arXiv [cs.CL] (2022); http://arxiv.org/abs/2202.07206.

* T. Shevlane, S. Farquhar, B. Garfinkel, M. Phuong, J. Whittlestone, J. Leung, D. Kokotajlo, N. Marchal, M. Anderljung, N. Kolt, L. Ho, D. Siddarth, S. Avin, W. Hawkins, B. Kim, I. Gabriel, V. Bolina, … A. Dafoe, “Model Evaluation for Extreme Risks” (Google DeepMind, 2023); http://arxiv.org/abs/2305.15324.

R. Bommasani, D. Soylu, T. I. Liao, K. A. Creel, P. Liang, Ecosystem Graphs: The Social Footprint of Foundation Models, arXiv [cs.LG] (2023); http://arxiv.org/abs/2303.15772.

* A. Das, W. Kong, R. Sen, Y. Zhou, A Decoder-Only Foundation Model for Time-Series Forecasting, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.10688.

* P. Dhariwal, H. Jun, C. Payne, J. W. Kim, A. Radford, I. Sutskever, “Jukebox: A Generative Model for Music” (OpenAI, 2020); http://arxiv.org/abs/2005.00341.

* H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, … T. Scialom, “Llama 2: Open Foundation and Fine-Tuned Chat Models” (Meta AI, 2023); http://arxiv.org/abs/2307.09288.

* Gemini Team, P. Georgiev, V. I. Lei, R. Burnell, L. Bai, A. Gulati, G. Tanzer, D. Vincent, Z. Pan, S. Wang, S. Mariooryad, Y. Ding, X. Geng, F. Alcober, R. Frostig, M. Omernick, L. Walker, … O. Vinyals, “Gemini 1.5: Unlocking Multimodal Understanding across Millions of Tokens of Context” (Google DeepMind, 2024); https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf.

* Anthropic, “The Claude 3 Model Family: Opus, Sonnet, Haiku” (Anthropic, 2024); https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf.

* OpenAI, “GPT-4 System Card” (OpenAI, 2023); https://cdn.openai.com/papers/gpt-4-system-card.pdf.

* A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. de las Casas, E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lample, L. R. Lavaud, L. Saulnier, M.-A. Lachaux, P. Stock, … W. E. Sayed, Mixtral of Experts, arXiv [cs.LG] (2024); http://arxiv.org/abs/2401.04088.

* A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, C. Li, C. Li, D. Liu, F. Huang, G. Dong, H. Wei, H. Lin, J. Tang, J. Wang, J. Yang, J. Tu, … Z. Fan, Qwen2 Technical Report, arXiv [cs.CL] (2024); http://arxiv.org/abs/2407.10671.

* DeepSeek-AI, A. Liu, B. Feng, B. Wang, B. Wang, B. Liu, C. Zhao, C. Dengr, C. Ruan, D. Dai, D. Guo, D. Yang, D. Chen, D. Ji, E. Li, F. Lin, F. Luo, … Z. Xie, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.04434.

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, J. Wen, A Survey on Large Language Model Based Autonomous Agents. Frontiers of Computer Science 18, 186345 (2024); https://doi.org/10.1007/s11704-024-40231-1.

A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, J. M. Zhang, “Large Language Models for Software Engineering: Survey and Open Problems” in 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) (2023), pp. 31–53; https://doi.org/10.1109/ICSE-FoSE59343.2023.00008.

* S. Chen, S. Liu, L. Zhou, Y. Liu, X. Tan, J. Li, S. Zhao, Y. Qian, F. Wei, VALL-E 2: Neural Codec Language Models Are Human Parity Zero-Shot Text to Speech Synthesizers, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.05370.

* OpenAI, “GPT-4V(ision) System Card” (OpenAI, 2023); https://cdn.openai.com/gpt-4o-system-card.pdf.

* P. Agrawal, S. Antoniak, E. B. Hanna, B. Bout, D. Chaplot, J. Chudnovsky, D. Costa, B. De Monicault, S. Garg, T. Gervet, S. Ghosh, A. Héliou, P. Jacob, A. Q. Jiang, K. Khandelwal, T. Lacroix, G. Lample, … S. Yang, Pixtral 12B, arXiv [cs.CV] (2024); http://arxiv.org/abs/2410.07073.

References

* P. Wang, S. Bai, S. Tan, S. Wang, Z. Fan, J. Bai, K. Chen, X. Liu, J. Wang, W. Ge, Y. Fan, K. Dang, M. Du, X. Ren, R. Men, D. Liu, C. Zhou, … J. Lin, Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution, arXiv [cs.CV] (2024); http://arxiv.org/abs/2409.12191.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, “An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale” in The 9th International Conference on Learning Representations (ICLR 2021) (Virtual, 2020); https://openreview.net/forum?id=YicbFdNTTy.

* A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, R. Girshick, “Segment Anything” (Meta AI, 2023); http://arxiv.org/abs/2304.02643.

* A. Bardes, Q. Garrido, J. Ponce, X. Chen, M. Rabbat, Y. LeCun, M. Assran, N. Ballas, “Revisiting Feature Prediction for Learning Visual Representations from Video” (Meta, 2024).

* The Movie Gen team, “Movie Gen: A Cast of Media Foundation Models” (Meta, 2024); https://ai.meta.com/static-resource/movie-gen-research-paper.

* J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, A. Zeng, “Code as Policies: Language Model Programs for Embodied Control” in Workshop on Language and Robotics at CoRL 2022 (2022); https://openreview.net/forum?id=fmtvpopfLC6.

B. Ichter, A. Brohan, Y. Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian, D. Kalashnikov, S. Levine, Y. Lu, C. Parada, K. Rao, P. Sermanet, … C. K. Fu, “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances” in Proceedings of The 6th Annual Conference on Robot Learning (CoRL) (PMLR, Auckland, New Zealand, 2022) vol. 205; https://openreview.net/forum?id=bdHkMjBJG_w.

Open X-Embodiment Collaboration, A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, … Z. Lin, Open X-Embodiment: Robotic Learning Datasets and RT-X Models, arXiv [cs.RO] (2023); http://arxiv.org/abs/2310.08864.

* J.-J. Hwang, R. Xu, H. Lin, W.-C. Hung, J. Ji, K. Choi, D. Huang, T. He, P. Covington, B. Sapp, Y. Zhou, J. Guo, D. Anguelov, M. Tan, EMMA: End-to-End Multimodal Model for Autonomous Driving, arXiv [cs.CV] (2024); http://arxiv.org/abs/2410.23262.

R. Firoozi, J. Tucker, S. Tian, A. Majumdar, J. Sun, W. Liu, Y. Zhu, S. Song, A. Kapoor, K. Hausman, B. Ichter, D. Driess, J. Wu, C. Lu, M. Schwager, Foundation Models in Robotics: Applications, Challenges, and the Future, arXiv [cs.RO] (2023); http://arxiv.org/abs/2312.07843.

H. Fang, H. Fang, Z. Tang, J. Liu, J. Wang, H. Zhu, C. Lu, RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot. IEEE International Conference on Robotics and Automation, 653–660 (2023); https://doi.org/10.1109/ICRA57147.2024.10611615.

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y. J. Ma, P. T. Miller, J. Wu, … C. Finn, DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset, arXiv [cs.RO] (2024); http://arxiv.org/abs/2403.12945.

J. Wang, Z. Wu, Y. Li, H. Jiang, P. Shu, E. Shi, H. Hu, C. Ma, Y. Liu, X. Wang, Y. Yao, X. Liu, H. Zhao, Z. Liu, H. Dai, L. Zhao, B. Ge, … S. Zhang, Large Language Models for Robotics: Opportunities, Challenges, and Perspectives, arXiv [cs.RO] (2024); http://arxiv.org/abs/2401.04334.

* Chai Discovery, J. Boitreaud, J. Dent, M. McPartlon, J. Meier, V. Reis, A. Rogozhnikov, K. Wu, Chai-1: Decoding the Molecular Interactions of Life, bioRxiv [preprint] (2024); https://doi.org/10.1101/2024.10.10.615955.

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, … P. Liang, On the Opportunities and Risks of Foundation Models, arXiv [cs.LG] (2021); http://arxiv.org/abs/2108.07258.

P. Bryant, G. Pozzati, A. Elofsson, Improved Prediction of Protein-Protein Interactions Using AlphaFold2. Nature Communications 13, 1265 (2022); https://doi.org/10.1038/s41467-022-28865-w.

A. Madani, B. Krause, E. R. Greene, S. Subramanian, B. P. Mohr, J. M. Holton, J. L. Olmos, C. Xiong, Z. Z. Sun, R. Socher, J. S. Fraser, N. Naik, Large Language Models Generate Functional Protein Sequences across Diverse Families. Nature Biotechnology 41, 1099–1106 (2023); https://doi.org/10.1038/s41587-022-01618-2.

T. Davidson, J.-S. Denain, P. Villalobos, G. Bas, “AI Capabilities Can Be Significantly Improved without Expensive Retraining” (Epoch AI, 2023); http://arxiv.org/abs/2312.07413.

G. Mialon, R. Dessi, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Roziere, T. Schick, J. Dwivedi-Yu, A. Celikyilmaz, E. Grave, Y. LeCun, T. Scialom, Augmented Language Models: A Survey. Transactions on Machine Learning Research (2023); https://openreview.net/pdf?id=jh7wH2AzKK.

References

* X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, D. Zhou, Self-Consistency Improves Chain of Thought Reasoning in Language Models, arXiv [cs.CL] (2022); http://arxiv.org/abs/2203.11171.

* B. Brown, J. Juravsky, R. Ehrlich, R. Clark, Q. V. Le, C. Ré, A. Mirhoseini, Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, arXiv [cs.LG] (2024); http://arxiv.org/abs/2407.21787.

S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, K. R. Narasimhan, “Tree of Thoughts: Deliberate Problem Solving with Large Language Models” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=5Xc1ecxO1h.

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, … D. Amodei, “Language Models Are Few-Shot Learners” in Advances in Neural Information Processing Systems (Curran Associates, Inc., 2020) vol. 33, pp. 1877–1901; https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. V. Le, D. Zhou, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” in Advances in Neural Information Processing Systems (NeurIPS 2022) (New Orleans, LA, US, 2022) vol. 35, pp. 24824–24837; https://proceedings.neurips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.

T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa, “Large Language Models Are Zero-Shot Reasoners” in NeurIPS (New Orleans, LA, US, 2022); http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html.

* R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, J. Schulman, “WebGPT: Browser-Assisted Question-Answering with Human Feedback” (OpenAI, 2021); http://arxiv.org/abs/2112.09332.

* L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y. Yang, J. Callan, G. Neubig, PAL: Program-Aided Language Models, arXiv [cs.CL] (2022); https://doi.org/10.48550/arXiv.2211.10435.

I. Drori, S. Zhang, R. Shuttleworth, L. Tang, A. Lu, E. Ke, K. Liu, L. Chen, S. Tran, N. Cheng, R. Wang, N. Singh, T. L. Patti, J. Lynch, A. Shporer, N. Verma, E. Wu, G. Strang, A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level, arXiv [cs.LG] (2021); https://pnas.org/doi/full/10.1073/pnas.2123433119.

* W. Chen, X. Ma, X. Wang, W. W. Cohen, Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arXiv [cs.CL] (2022); http://arxiv.org/abs/2211.12588.

W. Huang, P. Abbeel, D. Pathak, I. Mordatch, Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. (2022); https://openreview.net/forum?id=6NT1a56mNim.

I. Dasgupta, C. Kaeser-Chen, K. Marino, A. Ahuja, S. Babayan, F. Hill, R. Fergus, “Collaborating with Language Models for Embodied Reasoning” in Second Workshop on Language and Reinforcement Learning (2022); https://openreview.net/forum?id=YoS-abmWjJc.

Epoch AI, AI Benchmarking Dashboard (2024); https://epoch.ai/data/ai-benchmarking-dashboard.

* OpenAI, Learning to Reason with LLMs (2024); https://openai.com/index/learning-to-reason-with-llms/.

P. Villalobos, D. Atkinson, “Trading Off Compute in Training and Inference” (Epoch AI, 2023); https://epochai.org/blog/trading-off-compute-in-training-and-inference.

* C. Snell, J. Lee, K. Xu, A. Kumar, Scaling LLM Test-Time Compute Optimally Can Be More Effective than Scaling Model Parameters, arXiv [cs.LG] (2024); http://arxiv.org/abs/2408.03314.

X. Hu, J. Chen, X. Li, Y. Guo, L. Wen, P. S. Yu, Z. Guo, Do Large Language Models Know about Facts?, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.05177.

R. Xu, Z. Qi, Z. Guo, C. Wang, H. Wang, Y. Zhang, W. Xu, “Knowledge Conflicts for LLMs: A Survey” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024), pp. 8541–8565; https://doi.org/10.18653/v1/2024.emnlp-main.486.

M. Turpin, J. Michael, E. Perez, S. R. Bowman, “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=bzs4uPLXvi.

M. Sharma, M. Tong, T. Korbak, D. Duvenaud, A. Askell, S. R. Bowman, E. Durmus, Z. Hatfield-Dodds, S. R. Johnston, S. M. Kravec, T. Maxwell, S. McCandlish, K. Ndousse, O. Rausch, N. Schiefer, D. Yan, M. Zhang, E. Perez, “Towards Understanding Sycophancy in Language Models” in The 12th International Conference on Learning

References

Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=tvhaxkMKAn.

* Z. Wu, L. Qiu, A. Ross, E. Akyürek, B. Chen, B. Wang, N. Kim, J. Andreas, Y. Kim, Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models through Counterfactual Tasks, arXiv [cs.CL] (2023); http://arxiv.org/abs/2307.02477.

L. Zhang, X. Zhai, Z. Zhao, Y. Zong, X. Wen, B. Zhao, “What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024), pp. 21853–21862; https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_What_If_the_TV_Was_Off_Examining_Counterfactual_Reasoning_Abilities_CVPR_2024_paper.pdf.

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, P. Fung, Survey of Hallucination in Natural Language Generation. ACM Computing Surveys 55, 1–38 (2023); https://doi.org/10.1145/3571730.

* Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y. Zhang, Y. Chen, L. Wang, A. T. Luu, W. Bi, F. Shi, S. Shi, Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2309.01219.

M. Zhang, O. Press, W. Merrill, A. Liu, N. A. Smith, How Language Model Hallucinations Can Snowball, arXiv [cs.CL] (2023); http://arxiv.org/abs/2305.13534.

L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, T. Liu, A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, arXiv [cs.CL] (2023); http://arxiv.org/abs/2311.05232.

V. Rawte, A. Sheth, A. Das, A Survey of Hallucination in Large Foundation Models, arXiv [cs.AI] (2023); http://arxiv.org/abs/2309.05922.

J. Liu, W. Wang, D. Wang, N. Smith, Y. Choi, H. Hajishirzi, “Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 1264–1287; https://doi.org/10.18653/v1/2023.emnlp-main.81.

A. Leidinger, R. Van Rooij, E. Shutova, “Are LLMs Classical or Nonmonotonic Reasoners? Lessons from Generics” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), L.-W. Ku, A. Martins, V. Srikumar, Eds. (Association for Computational Linguistics, Bangkok, Thailand, 2024); https://doi.org/10.18653/v1/2024.acl-short.51.

M. Mitchell, AI’s Challenge of Understanding the World. Science 382, eadm8175 (2023); https://doi.org/10.1126/science.adm8175.

D. Halawi, F. Zhang, C. Yueh-Han, J. Steinhardt, Approaching Human-Level Forecasting with Language Models, arXiv [cs.LG] (2024); http://arxiv.org/abs/2402.18563.

* I. Mirzadeh, K. Alizadeh, H. Shahrokhi, O. Tuzel, S. Bengio, M. Farajtabar, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, arXiv [cs.LG] (2024); http://arxiv.org/abs/2410.05229.

* F. Shi, X. Chen, K. Misra, N. Scales, D. Dohan, E. H. Chi, N. Schärli, D. Zhou, “Large Language Models Can Be Easily Distracted by Irrelevant Context” in Proceedings of the 40th International Conference on Machine Learning (PMLR, 2023), pp. 31210–31227; https://proceedings.mlr.press/v202/shi23a.html.

* A. Hosseini, A. Sordoni, D. Toyama, A. Courville, R. Agarwal, Not All LLM Reasoners Are Created Equal, arXiv [cs.LG] (2024); http://arxiv.org/abs/2410.01748.

K. Z. Cui, M. Demirer, S. Jaffe, L. Musolff, S. Peng, T. Salz, The Productivity Effects of Generative AI: Evidence from a Field Experiment with GitHub Copilot. An MIT Exploration of Generative AI (2024); https://mit-genai.pubpub.org/pub/v5iixksv/release/2.

* S. Peng, E. Kalliamvakou, P. Cihon, M. Demirer, The Impact of AI on Developer Productivity: Evidence from GitHub Copilot, arXiv [cs.SE] (2023); https://www.semanticscholar.org/reader/038f249ab708cebae2a58265b768b9b1cbadad3a.

A. Ziegler, E. Kalliamvakou, X. A. Li, A. Rice, D. Rifkin, S. Simister, G. Sittampalam, E. Aftandilian, Measuring GitHub Copilot’s Impact on Productivity. Communications of the ACM 67, 54–63 (2024); https://doi.org/10.1145/3633453.

2024 Stack Overflow Developer Survey (2024); https://survey.stackoverflow.co/2024/.

Stack Overflow Developer Survey 2023, Stack Overflow (2023); https://survey.stackoverflow.co/2023/?utm_source=social-share&utm_medium=social&utm_campaign=dev-survey-2023.

References

X. Liu, H. Yu, H. Zhang, Y. Xu, X. Lei, H. Lai, Y. Gu, H. Ding, K. Men, K. Yang, S. Zhang, X. Deng, A. Zeng, Z. Du, C. Zhang, S. Shen, T. Zhang, … J. Tang, AgentBench: Evaluating LLMs as Agents, arXiv [cs.AI] (2023); http://arxiv.org/abs/2308.03688.

S. Yao, H. Chen, J. Yang, K. Narasimhan, WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents, arXiv [cs.CL] (2022); http://arxiv.org/abs/2207.01206.

A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. White, P. Schwaller, “Augmenting Large Language Models with Chemistry Tools” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) AI for Science Workshop (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=wdGIL6lx3l.

* A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, P. Schwaller, ChemCrow: Augmenting Large-Language Models with Chemistry Tools, arXiv [physics.chem-ph] (2023); http://arxiv.org/abs/2304.05376.

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, K. R. Narasimhan, “SWE-Bench: Can Language Models Resolve Real-World Github Issues?” in 12th International Conference on Learning Representations (2023); https://openreview.net/pdf?id=VTF8yNQM66.

*L. Jing, Z. Huang, X. Wang, W. Yao, W. Yu, K. Ma, H. Zhang, X. Du, D. Yu, DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?, arXiv [cs.AI] (2024); http://arxiv.org/abs/2409.07703.

Z. Chen, S. Chen, Y. Ning, Q. Zhang, B. Wang, B. Yu, Y. Li, Z. Liao, C. Wei, Z. Lu, V. Dey, M. Xue, F. N. Baker, B. Burns, D. Adu-Ampratwum, X. Huang, X. Ning, … H. Sun, ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.05080.

* J. S. Chan, N. Chowdhury, O. Jaffe, J. Aung, D. Sherburn, E. Mays, G. Starace, K. Liu, L. Maksin, T. Patwardhan, L. Weng, A. Mądry, MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.07095.

Q. Huang, J. Vora, P. Liang, J. Leskovec, “MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation” in Forty-First International Conference on Machine Learning (2024); https://openreview.net/pdf?id=1Fs1LvjYQW.

R. Fang, R. Bindu, A. Gupta, Q. Zhan, D. Kang, LLM Agents Can Autonomously Hack Websites, arXiv [cs.CR] (2024); http://arxiv.org/abs/2402.06664.

X. Liang, L. Ma, S. Guo, J. Han, H. Xu, S. Ma, X. Liang, CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation, arXiv [cs.CV] (2023); http://arxiv.org/abs/2306.10322.

METR, Details about METR’s Preliminary Evaluation of OpenAI o1-Preview. (2024); https://metr.github.io/autonomy-evals-guide/openai-o1-preview-report/.

* J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, O. Press, SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering, arXiv [cs.SE] (2024); http://arxiv.org/abs/2405.15793.

* C. S. Xia, Y. Deng, S. Dunn, L. Zhang, Agentless: Demystifying LLM-Based Software Engineering Agents, arXiv [cs.SE] (2024); http://arxiv.org/abs/2407.01489.

* X. Wang, B. Li, Y. Song, F. F. Xu, X. Tang, M. Zhuge, J. Pan, Y. Song, B. Li, J. Singh, H. H. Tran, F. Li, R. Ma, M. Zheng, B. Qian, Y. Shao, N. Muennighoff, … G. Neubig, OpenHands: An Open Platform for AI Software Developers as Generalist Agents, arXiv [cs.SE] (2024); http://arxiv.org/abs/2407.16741.

* C.-L. Cheang, G. Chen, Y. Jing, T. Kong, H. Li, Y. Li, Y. Liu, H. Wu, J. Xu, Y. Yang, H. Zhang, M. Zhu, GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation, arXiv [cs.RO] (2024); http://arxiv.org/abs/2410.06158.

B. Wang, J. Zhang, S. Dong, I. Fang, C. Feng, VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model, arXiv [cs.RO] (2024); http://arxiv.org/abs/2410.08792.

* S. Ye, J. Jang, B. Jeon, S. Joo, J. Yang, B. Peng, A. Mandlekar, R. Tan, Y.-W. Chao, B. Y. Lin, L. Liden, K. Lee, J. Gao, L. Zettlemoyer, D. Fox, M. Seo, Latent Action Pretraining from Videos, arXiv [cs.RO] (2024); http://arxiv.org/abs/2410.11758.

M. Herrmann, F. J. D. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix, B. Bischl, “Position: Why We Must Rethink Empirical Research in Machine Learning” in Proceedings of the 41st International Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, F. Berkenkamp, Eds. (PMLR, 2024) vol. 235, pp. 18228–18247; https://proceedings.mlr.press/v235/herrmann24b.html.

D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, J. Steinhardt, “Measuring Mathematical Problem Solving With the MATH Dataset” in 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Datasets and Benchmarks Track (Round 2) (Virtual, 2021); https://openreview.net/forum?id=7Bywt2mQsCe.

References

J. Au Yeung, Z. Kraljevic, A. Luintel, A. Balston, E. Idowu, R. J. Dobson, J. T. Teo, AI Chatbots Not yet Ready for Clinical Use. Frontiers in Digital Health 5, 1161098 (2023); https://doi.org/10.3389/fdgth.2023.1161098.

D. Kiela, M. Bartolo, Y. Nie, D. Kaushik, A. Geiger, Z. Wu, B. Vidgen, G. Prasad, A. Singh, P. Ringshia, Z. Ma, T. Thrush, S. Riedel, Z. Waseem, P. Stenetorp, R. Jia, M. Bansal, … A. Williams, “Dynabench: Rethinking Benchmarking in NLP” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, 2021), pp. 4110–4124; https://doi.org/10.18653/v1/2021.naacl-main.324.

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, J. Steinhardt, “Measuring Massive Multitask Language Understanding” in The 9th International Conference on Learning Representations (ICLR 2021) (Virtual, 2021); https://openreview.net/forum?id=d7KBjmI3GmQ.

D. Rein, B. L. Hou, A. C. Stickland, J. Petty, R. Y. Pang, J. Dirani, J. Michael, S. R. Bowman, GPQA: A Graduate-Level Google-Proof Q&A Benchmark, arXiv [cs.AI] (2023); http://arxiv.org/abs/2311.12022.

A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, A. R. Brown, A. Santoro, A. Gupta, A. Garriga-Alonso, A. Kluska, A. Lewkowycz, A. Agarwal, A. Power, A. Ray, A. Warstadt, A. W. Kocurek, … Z. Wu, Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models. Transactions on Machine Learning Research (2023); https://openreview.net/forum?id=uyTL5Bvosj.

* L. Kilpatrick, S. B. Mallick, Updated Production-Ready Gemini Models, Reduced 1.5 Pro Pricing, Increased Rate Limits, and More, GEMINI (2024); https://developers.googleblog.com/en/updated-gemini-models-reduced-15-pro-pricing-increased-rate-limits-and-more/.

M. Hobbhahn, L. Heim, G. Aydos, “Trends in Machine Learning Hardware” (Epoch AI, 2023); https://epochai.org/blog/trends-in-machine-learning-hardware.

* H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, A. Chowdhery, A. Castro-Ros, … J. Wei, Scaling Instruction-Finetuned Language Models, arXiv [cs.LG] (2022); http://arxiv.org/abs/2210.11416.

* OpenAI, GPT-4o Mini: Advancing Cost-Efficient Intelligence (2024); https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/.

* OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, … B. Zoph, “GPT-4 Technical Report” (OpenAI, 2024); http://arxiv.org/abs/2303.08774.

* OpenAI, Pricing (2024); https://openai.com/chatgpt/pricing/.

* Together Pricing, together.ai (2023); https://www.together.ai/pricing.

B. Y. Lin, Y. Deng, K. Chandu, F. Brahman, A. Ravichander, V. Pyatkin, N. Dziri, R. L. Bras, Y. Choi, WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.04770.

* J. Wang, J. Wang, B. Athiwaratkun, C. Zhang, J. Zou, Mixture-of-Agents Enhances Large Language Model Capabilities, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.04692.

J. Sevilla, “Training Compute of Frontier AI Models Grows by 4-5x per Year” (2024); https://epoch.ai/blog/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year.

M. Mitchell, A. B. Palmarini, A. K. Moskvichev, “Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks” in AAAI 2024 Workshop Are Large Language Models Simply Causal Parrots?’' (Vancouver, BC, Canada, 2024); https://openreview.net/forum?id=3rGT5OkzpC.

L. Berglund, M. Tong, M. Kaufmann, M. Balesni, A. C. Stickland, T. Korbak, O. Evans, “The Reversal Curse: LLMs Trained on ‘A Is B’ Fail to Learn ‘B Is A’” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2024); https://openreview.net/forum?id=GPKTIktA0k.

J. Geiping, A. Stein, M. Shu, K. Saifullah, Y. Wen, T. Goldstein, “Coercing LLMs to Do and Reveal (almost) Anything” in ICLR 2024 Workshop on Secure and Trustworthy Large Language Models (SET LLM) (Vienna, Austria, 2024); https://openreview.net/forum?id=Y5inHAjMu0.

* J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei, Scaling Laws for Neural Language Models, arXiv [cs.LG] (2020); http://arxiv.org/abs/2001.08361.

* J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, … L. Sifre, Training Compute-Optimal Large Language Models, arXiv [cs.CL] (2022); http://arxiv.org/abs/2203.15556.

* T. Henighan, J. Kaplan, M. Katz, M. Chen, C. Hesse, J. Jackson, H. Jun, T. B. Brown, P. Dhariwal, S. Gray, C. Hallacy, B. Mann, A. Radford, A. Ramesh, N. Ryder, D. M. Ziegler, J. Schulman, … S. McCandlish, Scaling Laws for

References

Autoregressive Generative Modeling, arXiv [cs.LG] (2020); http://arxiv.org/abs/2010.14701.

X. Zhai, A. Kolesnikov, N. Houlsby, L. Beyer, “Scaling Vision Transformers” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 1204–1213; https://doi.org/10.1109/CVPR52688.2022.01179.

* A. L. Jones, Scaling Scaling Laws with Board Games, arXiv [cs.LG] (2021); http://arxiv.org/abs/2104.03113.

* Y. Bahri, E. Dyer, J. Kaplan, J. Lee, U. Sharma, Explaining Neural Scaling Laws, arXiv [cs.LG] (2021); http://arxiv.org/abs/2102.06701.

* A. Maloney, D. A. Roberts, J. Sully, A Solvable Model of Neural Scaling Laws, arXiv [cs.LG] (2022); http://arxiv.org/abs/2210.16859.

U. Sharma, J. Kaplan, Scaling Laws from the Data Manifold Dimension. Journal of Machine Learning Research: JMLR 23, 343–376 (2022); https://dl.acm.org/doi/abs/10.5555/3586589.3586598.

Ł. Dębowski, A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe Processes, arXiv [cs.IT] (2023); http://arxiv.org/abs/2302.09049.

E. J. Michaud, Z. Liu, U. Girit, M. Tegmark, “The Quantization Model of Neural Scaling” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=3tbTw2ga8K.

* T. Besiroglu, E. Erdil, M. Barnett, J. You, Chinchilla Scaling: A Replication Attempt, arXiv [cs.AI] (2024); http://arxiv.org/abs/2404.10102.

T. Porian, M. Wortsman, J. Jitsev, L. Schmidt, Y. Carmon, “Resolving Discrepancies in Compute-Optimal Scaling of Language Models” in 2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024) (2024); https://openreview.net/forum?id=zhCBrgaQZ0.

* T. Pearce, J. Song, Reconciling Kaplan and Chinchilla Scaling Laws, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.12907.

E. Caballero, K. Gupta, I. Rish, D. Krueger, “Broken Neural Scaling Laws” in NeurIPS ML Safety Workshop (2022); https://openreview.net/forum?id=BfGrlFuNyhJ.

* S. Hooker, On the Limitations of Compute Thresholds as a Governance Strategy, arXiv [cs.AI] (2024); http://arxiv.org/abs/2407.05694.

S. Biderman, U. S. Prashanth, L. Sutawika, H. Schoelkopf, Q. G. Anthony, S. Purohit, E. Raff, “Emergent and Predictable Memorization in Large Language Models” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=Iq0DvhB4Kf.

D. Ganguli, D. Hernandez, L. Lovitt, A. Askell, Y. Bai, A. Chen, T. Conerly, N. Dassarma, D. Drain, N. Elhage, S. El Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, S. Johnston, A. Jones, N. Joseph, … J. Clark, “Predictability and Surprise in Large Generative Models” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 1747–1764; https://doi.org/10.1145/3531146.3533229.

* Z. Du, A. Zeng, Y. Dong, J. Tang, Understanding Emergent Abilities of Language Models from the Loss Perspective, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.15796.

J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, W. Fedus, Emergent Abilities of Large Language Models. Transactions on Machine Learning Research (2022); https://openreview.net/forum?id=yzkSU5zdwD.

S. Y. Gadre, G. Smyrnis, V. Shankar, S. Gururangan, M. Wortsman, R. Shao, J. Mercat, A. Fang, J. Li, S. Keh, R. Xin, M. Nezhurina, I. Vasiljevic, J. Jitsev, L. Soldaini, A. G. Dimakis, G. Ilharco, … L. Schmidt, Language Models Scale Reliably with over-Training and on Downstream Tasks, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.08540.

R. Schaeffer, B. Miranda, S. Koyejo, “Are Emergent Abilities of Large Language Models a Mirage?” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=ITw9edRDlD.

Y. Ruan, C. J. Maddison, T. Hashimoto, “Observational Scaling Laws and the Predictability of Langauge Model Performance” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/pdf?id=On5WIN7xyD.

T. R. McIntosh, T. Susnjak, T. Liu, P. Watters, M. N. Halgamuge, Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence, arXiv [cs.AI] (2024); http://arxiv.org/abs/2402.09880.

* V. Balachandran, J. Chen, N. Joshi, B. Nushi, H. Palangi, E. Salinas, V. Vineet, J. Woffinden-Luey, S. Yousefi, “EUREKA: Evaluating and Understanding Large Foundation Models” (Microsoft, 2024);

References

https://www.microsoft.com/en-us/research/publication/eureka-evaluating-and-understanding-large-foundation-models/.

* S. Srivastava, M. B. Annarose, P. V. Anto, S. Menon, A. Sukumar, S. T. Adwaith, A. Philipose, S. Prince, S. Thomas, Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap, arXiv [cs.AI] (2024); http://arxiv.org/abs/2402.19450.

C. Deng, Y. Zhao, X. Tang, M. Gerstein, A. Cohan, Investigating Data Contamination in Modern Benchmarks for Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2311.09783.

O. Sainz, J. Campos, I. García-Ferrero, J. Etxaniz, O. L. de Lacalle, E. Agirre, “NLP Evaluation in Trouble: On the Need to Measure LLM Data Contamination for Each Benchmark” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 10776–10787; https://doi.org/10.18653/v1/2023.findings-emnlp.722.

Y. Cao, L. Zhou, S. Lee, L. Cabello, M. Chen, D. Hershcovich, “Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study” in Proceedings of the 1st Workshop on Cross-Cultural Considerations in NLP (C3NLP), S. Dev, V. Prabhakaran, D. Adelani, D. Hovy, L. Benotti, Eds. (Association for Computational Linguistics, Dubrovnik, Croatia, 2023), pp. 53–67; https://doi.org/10.18653/v1/2023.c3nlp-1.7.

* H. Zhou, A. Bradley, E. Littwin, N. Razin, O. Saremi, J. Susskind, S. Bengio, P. Nakkiran, What Algorithms Can Transformers Learn? A Study in Length Generalization, arXiv [cs.LG] (2023); http://arxiv.org/abs/2310.16028.

D. Yu, S. Kaur, A. Gupta, J. Brown-Cohen, A. Goyal, S. Arora, “SKILL-MIX: A Flexible and Expandable Family of Evaluations for AI Models” in 12th International Conference on Learning Representations (2024); https://openreview.net/pdf?id=Jf5gplvglq.

* H. Zhang, J. Da, D. Lee, V. Robinson, C. Wu, W. Song, T. Zhao, P. Raja, D. Slack, Q. Lyu, S. Hendryx, R. Kaplan, M. Lunati, S. Yue, A Careful Examination of Large Language Model Performance on Grade School Arithmetic, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.00332.

* AlphaProof, AlphaGeometry teams, AI Achieves Silver-Medal Standard Solving International Mathematical Olympiad Problems, Google DeepMind (2024); https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/.

T. H. Trinh, Y. Wu, Q. V. Le, H. He, T. Luong, Solving Olympiad Geometry without Human Demonstrations. Nature 625, 476–482 (2024); https://doi.org/10.1038/s41586-023-06747-5.

E. Akyürek, M. Damani, L. Qiu, H. Guo, Y. Kim, J. Andreas, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, arXiv [cs.AI] (2024); http://arxiv.org/abs/2411.07279.

Y. Bengio, G. Hinton, A. Yao, D. Song, P. Abbeel, T. Darrell, Y. N. Harari, Y.-Q. Zhang, L. Xue, S. Shalev-Shwartz, G. Hadfield, J. Clune, T. Maharaj, F. Hutter, A. G. Baydin, S. McIlraith, Q. Gao, … S. Mindermann, Managing Extreme AI Risks amid Rapid Progress. Science, eadn0117 (2024); https://doi.org/10.1126/science.adn0117.

Y. LeCun, The Power and Limits of Deep Learning: In His IRI Medal Address, Yann LeCun Maps the Development of Machine Learning Techniques and Suggests What the Future May Hold. Research Technology Management 61, 22–27 (2018); https://doi.org/10.1080/08956308.2018.1516928.

M. Mitchell, “Why AI Is Harder than We Think” in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’21) (Association for Computing Machinery, New York, NY, USA, 2021), p. 3; https://doi.org/10.1145/3449639.3465421.

J. Pearl, D. Mackenzie, The Book of Why: The New Science of Cause and Effect (Penguin Books, Harlow, England, 2019)Penguin science; https://dl.acm.org/doi/10.5555/3238230.

D. C. Cireşan, U. Meier, L. M. Gambardella, J. Schmidhuber, Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation 22, 3207–3220 (2010); https://doi.org/10.1162/NECO_a_00052.

T. Mikolov, M. Karafiát, L. Burget, J. Černocký, S. Khudanpur, “Recurrent Neural Network Based Language Model” in Proc. Interspeech 2010 (ISCA, 2010), pp. 1045–1048; https://doi.org/10.21437/Interspeech.2010-343.

X. Glorot, Y. Bengio, “Understanding the Difficulty of Training Deep Feedforward Neural Networks” in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Yee Whye Teh, Mike Titterington, Eds. (PMLR, 2010) vol. 9, pp. 249–256; https://proceedings.mlr.press/v9/glorot10a.html.

Epoch AI, Data on Notable AI Models. (2024); https://epochai.org/data/notable-ai-models.

* Inflection AI, Inflection-2 (2023); https://inflection.ai/inflection-2.

C.-J. Wu, R. Raghavendra, U. Gupta, B. Acun, N. Ardalani, K. Maeng, G. Chang, F. Aga, J. Huang, C. Bai, M. Gschwind, A. Gupta, M. Ott, A. Melnikov, S. Candido, D. Brooks, G. Chauhan, … K. Hazelwood, “Sustainable AI: Environmental Implications, Challenges and Opportunities” in Proceedings of the 5th Conference on Machine Learning and Systems (MLSys), D. Marculescu, Y. Chi, C. Wu, Eds. (2022) vol. 4, pp. 795–813;

References

https://proceedings.mlsys.org/paper_files/paper/2022/file/462211f67c7d858f663355eff93b745e-Paper.pdf.

* Y. Wu, Z. Sun, S. Li, S. Welleck, Y. Yang, Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, arXiv [cs.AI] (2024); http://arxiv.org/abs/2408.00724.

S. Hao, Y. Gu, H. Ma, J. J. Hong, Z. Wang, D. Z. Wang, Z. Hu, “Reasoning with Language Model Is Planning with World Model” in The 2023 Conference on Empirical Methods in Natural Language Processing (2023); https://openreview.net/pdf?id=VTWWvYtF1R.

* X. Feng, Z. Wan, M. Wen, Y. Wen, W. Zhang, J. Wang, “Alphazero-like Tree-Search Can Guide Large Language Model Decoding and Training” in NeurIPS 2023 Foundation Models for Decision Making Workshop (New Orleans, LA, US, 2023); https://openreview.net/pdf?id=PJfc4x2jXY.

* C. Li, W. Wang, J. Hu, Y. Wei, N. Zheng, H. Hu, Z. Zhang, H. Peng, Common 7B Language Models Already Possess Strong Math Capabilities, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.04706.

E. Erdil, Optimally Allocating Compute Between Inference and Training. (2024); https://epochai.org/blog/optimally-allocating-compute-between-inference-and-training.

K. Chow, Y. Tang, Z. Lyu, A. Rajput, K. Ban, “Performance Optimization in the LLM World 2024” in Companion of the 15th ACM/SPEC International Conference on Performance Engineering (ACM, New York, NY, USA, 2024); https://doi.org/10.1145/3629527.3651436.

D. Patterson, J. Gonzalez, U. Hölzle, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. R. So, M. Texier, J. Dean, The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer 55, 18–28 (2022); https://doi.org/10.1109/MC.2022.3148714.

D. Coyle, L. Hampton, 21st Century Progress in Computing. Telecommunications Policy 48, 102649 (2024); https://doi.org/10.1016/j.telpol.2023.102649.

International Energy Agency, “Electricity 2024: Analysis and Forecast to 2026” (IEA, 2024); https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-bf08-952fdd62118a/Electricity2024-Analysisandforecastto2026.pdf.

Talen Energy, Talen Energy Announces Sale of Zero-Carbon Data Center Campus (2024); https://ir.talenenergy.com/news-releases/news-release-details/talen-energy-announces-sale-zero-carbon-data-center-campus.

Advanced Electronics Practice, H. Bauer, O. Burkacky, P. Kenevan, S. Lingemann, K. Pototzky, B. Wiseman, “Semiconductor Design and Manufacturing: Achieving Leading-Edge Capabilities” (McKinsey & Company, 2020); https://www.mckinsey.com/industries/industrials-and-electronics/our-insights/semiconductor-design-and-manufacturing-achieving-leading-edge-capabilities#/.

J. VerWey, “No Permits, No Fabs: The Importance of Regulatory Reform for Semiconductor Manufacturing” (Center for Security and Emerging Technology, 2021); https://doi.org/10.51593/20210053.

D. Bragg, N. Caselli, J. A. Hochgesang, M. Huenerfauth, L. Katz-Hernandez, O. Koller, R. Kushalnagar, C. Vogler, R. E. Ladner, The FATE Landscape of Sign Language AI Datasets: An Interdisciplinary Perspective. ACM Transactions on Accessible Computing 14, 1–45 (2021); https://doi.org/10.1145/3436996.

G. Li, Z. Sun, Q. Wang, S. Wang, K. Huang, N. Zhao, Y. Di, X. Zhao, Z. Zhu, China’s Green Data Center development:Policies and Carbon Reduction Technology Path. Environmental Research 231, 116248 (2023); https://doi.org/10.1016/j.envres.2023.116248.

E. Griffith, The Desperate Hunt for the A.I. Boom’s Most Indispensable Prize, The New York Times (2023); https://www.nytimes.com/2023/08/16/technology/ai-gpu-chips-shortage.html.

J. Sevilla, T. Besiroglu, B. Cottier, J. You, E. Roldán, P. Villalobos, E. Erdil, Can AI Scaling Continue Through 2030? (2024); https://epochai.org/blog/can-ai-scaling-continue-through-2030.

E. Erdil, “Data Movement Bottlenecks to Large-Scale Model Training: Scaling Past 1e28 FLOP” (Epoch AI, 2024); https://epoch.ai/blog/data-movement-bottlenecks-scaling-past-1e28-flop.

* E. Almazrouei, H. Alobeidli, A. Alshamsi, A. Cappelli, R. Cojocaru, M. Debbah, É. Goffinet, D. Hesslow, J. Launay, Q. Malartic, D. Mazzotta, B. Noune, B. Pannier, G. Penedo, The Falcon Series of Open Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2311.16867.

* T. Wei, L. Zhao, L. Zhang, B. Zhu, L. Wang, H. Yang, B. Li, C. Cheng, W. Lü, R. Hu, C. Li, L. Yang, X. Luo, X. Wu, L. Liu, W. Cheng, P. Cheng, … Y. Zhou, Skywork: A More Open Bilingual Foundation Model, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.19341.

P. Villalobos, J. Sevilla, L. Heim, T. Besiroglu, M. Hobbhahn, A. Ho, Will We Run out of Data? Limits of LLM Scaling Based on Human-Generated Data, arXiv [cs.LG] (2022); http://arxiv.org/abs/2211.04325.

N. Muennighoff, A. Rush, B. Barak, T. Le Scao, N. Tazi, A. Piktus, S. Pyysalo, T. Wolf, C. A. Raffel, “Scaling Data-

References

Constrained Language Models” in Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track (New Orleans, LA, US, 2023) vol. 36, pp. 50358–50376; https://proceedings.neurips.cc/paper_files/paper/2023/hash/9d89448b63ce1e2e8dc7af72c984c196-Abstract-Conference.html.

* A. Sohn, A. Nagabandi, C. Florensa, D. Adelberg, D. Wu, H. Farooq, I. Clavera, J. Welborn, J. Chen, N. Mishra, P. Chen, P. Qian, P. Abbeel, R. Duan, V. Vijay, Y. Liu, Introducing RFM-1: Giving Robots Human-like Reasoning Capabilities, covariant (2024); https://covariant.ai/insights/introducing-rfm-1-giving-robots-human-like-reasoning-capabilities/.

H. Abdine, M. Chatzianastasis, C. Bouyioukos, M. Vazirgiannis, “Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Deep Generative Models for Health Workshop (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=EJ7YNgWYFj.

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervision” in Proceedings of the 38th International Conference on Machine Learning (ICML 2021) (PMLR, 2021), pp. 8748–8763; https://proceedings.mlr.press/v139/radford21a.html.

* Seamless Communication, L. Barrault, Y.-A. Chung, M. C. Meglioli, D. Dale, N. Dong, P.-A. Duquenne, H. Elsahar, H. Gong, K. Heffernan, J. Hoffman, C. Klaiber, P. Li, D. Licht, J. Maillard, A. Rakotoarison, K. R. Sadagopan, … S. Wang, “SeamlessM4T: Massively Multilingual & Multimodal Machine Translation” (Meta AI, 2023); http://arxiv.org/abs/2308.11596.

P. Villalobos, A. Ho, J. Sevilla, T. Besiroglu, L. Heim, M. Hobbhahn, “Position: Will We Run out of Data? Limits of LLM Scaling Based on Human-Generated Data” in Proceedings of the 41st International Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, F. Berkenkamp, Eds. (PMLR, 2024) vol. 235 of Proceedings of Machine Learning Research, pp. 49523–49544; https://proceedings.mlr.press/v235/villalobos24a.html.

* L. Fan, K. Chen, D. Krishnan, D. Katabi, P. Isola, Y. Tian, Scaling Laws of Synthetic Images for Model Training ... for Now, arXiv [cs.CV] (2023); http://arxiv.org/abs/2312.04567.

S. Fu, N. Y. Tamir, S. Sundaram, L. Chai, R. Zhang, T. Dekel, P. Isola, “DreamSim: Learning New Dimensions of Human Visual Similarity Using Synthetic Data” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=DEiNSfh1k7.

Y. Tian, L. Fan, P. Isola, H. Chang, D. Krishnan, “StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=xpjsOQtKqx.

I. Shumailov, Z. Shumaylov, Y. Zhao, Y. Gal, N. Papernot, R. Anderson, The Curse of Recursion: Training on Generated Data Makes Models Forget, arXiv [cs.LG] (2023); http://arxiv.org/abs/2305.17493.

G. Martínez, L. Watson, P. Reviriego, J. A. Hernández, M. Juarez, R. Sarkar, Combining Generative Artificial Intelligence (AI) and the Internet: Heading towards Evolution or Degradation?, arXiv [cs.CV] (2023); http://arxiv.org/abs/2303.01255.

R. Hataya, H. Bao, H. Arai, “Will Large-Scale Generative Models Corrupt Future Datasets?” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (IEEE, 2023), pp. 20498–20508; https://doi.org/10.1109/iccv51070.2023.01879.

G. Martínez, L. Watson, P. Reviriego, J. A. Hernández, M. Juarez, R. Sarkar, “Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet” in Lecture Notes in Computer Science (Springer Nature Switzerland, Cham, 2024) vol. 14523 of Lecture notes in computer science, pp. 59–73; https://doi.org/10.1007/978-3-031-57963-9_5.

Y. Guo, G. Shang, M. Vazirgiannis, C. Clavel, The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text, arXiv [cs.CL] (2023); http://arxiv.org/abs/2311.09807.

* M. Bohacek, H. Farid, Nepotistically Trained Generative-AI Models Collapse, arXiv [cs.AI] (2023); http://arxiv.org/abs/2311.12202.

S. Alemohammad, J. Casco-Rodriguez, L. Luzi, A. I. Humayun, H. Babaei, D. LeJeune, A. Siahkoohi, R. Baraniuk, “Self-Consuming Generative Models Go MAD” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=ShjMHfmPs0.

Q. Bertrand, J. Bose, A. Duplessis, M. Jiralerspong, G. Gidel, “On the Stability of Iterative Retraining of Generative Models on Their Own Data” in 12th International Conference on Learning Representations (2024); https://openreview.net/forum?id=JORAfH2xFd.

References

* E. Dohmatob, Y. Feng, P. Yang, F. Charton, J. Kempe, A Tale of Tails: Model Collapse as a Change of Scaling Laws, arXiv [cs.LG] (2024); http://arxiv.org/abs/2402.07043.

R. He, S. Sun, X. Yu, C. Xue, W. Zhang, P. Torr, S. Bai, X. Qi, “Is Synthetic Data from Generative Models Ready for Image Recognition?” in 11th International Conference on Learning Representations (ICLR 2023) (Kigali, Rwanda, 2022); https://openreview.net/pdf?id=nUmCcZ5RKF.

* V. Boutin, L. Singhal, X. Thomas, T. Serre, “Diversity vs. Recognizability: Human-like Generalization in One-Shot Generative Models” in Advances in Neural Information Processing Systems (NeurIPS 2022) (New Orleans, LA, US, 2022); https://openreview.net/pdf?id=DVfZKXSFW5m.

V. Boutin, T. Fel, L. Singhal, R. Mukherji, A. Nagaraj, J. Colin, T. Serre, “Diffusion Models as Artists: Are We Closing the Gap between Humans and Machines?” in Proceedings of the 40th International Conference on Machine Learning (PMLR, 2023), pp. 2953–3002; https://proceedings.mlr.press/v202/boutin23a.html.

J. Shipard, A. Wiliem, K. N. Thanh, W. Xiang, C. Fookes, “Diversity Is Definitely Needed: Improving Model-Agnostic Zero-Shot Classification via Stable Diffusion” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, 2023), pp. 769–778; https://doi.org/10.1109/cvprw59228.2023.00084.

* A. Setlur, S. Garg, X. Geng, N. Garg, V. Smith, A. Kumar, RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.14532.

P. Haluptzok, M. Bowers, A. T. Kalai, “Language Models Can Teach Themselves to Program Better” in Deep Reinforcement Learning Workshop NeurIPS 2022 (2022); https://openreview.net/forum?id=_5BZwkZRFc9.

* B. Liu, S. Bubeck, R. Eldan, J. Kulkarni, Y. Li, A. Nguyen, R. Ward, Y. Zhang, TinyGSM: Achieving >80% on GSM8k with Small Language Models, arXiv [cs.LG] (2023); http://arxiv.org/abs/2312.09241.

* D. Hernandez, T. B. Brown, Measuring the Algorithmic Efficiency of Neural Networks, arXiv [cs.LG] (2020); http://arxiv.org/abs/2005.04305.

A. Ho, T. Besiroglu, E. Erdil, D. Owen, R. Rahman, Z. C. Guo, D. Atkinson, N. Thompson, J. Sevilla, “Algorithmic Progress in Language Models” (Epoch AI, 2024); http://arxiv.org/abs/2403.05812.

F. E. Dorner, Measuring Progress in Deep Reinforcement Learning Sample Efficiency, arXiv [cs.LG] (2021); http://arxiv.org/abs/2102.04881.

* Y. Ding, L. L. Zhang, C. Zhang, Y. Xu, N. Shang, J. Xu, F. Yang, M. Yang, LongRoPE: Extending LLM Context Window beyond 2 Million Tokens, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.13753.

A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes, M. Barekatain, A. Novikov, F. J. R Ruiz, J. Schrittwieser, G. Swirszcz, D. Silver, D. Hassabis, P. Kohli, Discovering Faster Matrix Multiplication Algorithms with Reinforcement Learning. Nature 610, 47–53 (2022); https://doi.org/10.1038/s41586-022-05172-4.

A. Haj-Ali, N. K. Ahmed, T. Willke, Y. S. Shao, K. Asanovic, I. Stoica, “NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning” in Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (CGO 2020) (Association for Computing Machinery, New York, NY, USA, 2020), pp. 242–255; https://doi.org/10.1145/3368826.3377928.

A. Goldie, A. Mirhoseini, M. Yazgan, J. W. Jiang, E. Songhori, S. Wang, Y.-J. Lee, E. Johnson, O. Pathak, A. Nova, J. Pak, A. Tong, K. Srinivasa, W. Hang, E. Tuncer, Q. V. Le, J. Laudon, … J. Dean, Addendum: A Graph Placement Methodology for Fast Chip Design. Nature 634, E10–E11 (2024); https://doi.org/10.1038/s41586-024-08032-5.

X. Li, P. Yu, C. Zhou, T. Schick, O. Levy, L. Zettlemoyer, J. E. Weston, M. Lewis, “Self-Alignment with Instruction Backtranslation” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=1oijHJBRsT.

S. Liu, Z. Lin, S. Yu, R. Lee, T. Ling, D. Pathak, D. Ramanan, Language Models as Black-Box Optimizers for Vision-Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2309.05950.

R. Pryzant, D. Iter, J. Li, Y. Lee, C. Zhu, M. Zeng, “Automatic Prompt Optimization with ‘Gradient Descent’ and Beam Search” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 7957–7968; https://doi.org/10.18653/v1/2023.emnlp-main.494.

S. Zhang, C. Gong, L. Wu, X. Liu, M. Zhou, AutoML-GPT: Automatic Machine Learning with GPT, arXiv [cs.CL] (2023); http://arxiv.org/abs/2305.02499.

* Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Chen, C. Olsson, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, … J. Kaplan, Constitutional AI: Harmlessness from AI Feedback, arXiv [cs.CL] (2022); http://arxiv.org/abs/2212.08073.

* N. Sachdeva, B. Coleman, W.-C. Kang, J. Ni, L. Hong, E. H. Chi, J. Caverlee, J. McAuley, D. Z. Cheng, How to Train Data-Efficient LLMs, arXiv [cs.LG] (2024); http://arxiv.org/abs/2402.09668.

References

* S. Kumar, T. Ghosal, V. Goyal, A. Ekbal, Can Large Language Models Unlock Novel Scientific Research Ideas?, arXiv [cs.CL] (2024); http://arxiv.org/abs/2409.06185.

H. Wijk, T. Lin, J. Becker, S. Jawhar, N. Parikh, T. Broadley, L. Chan, M. Chen, J. Clymer, J. Dhyani, E. Ericheva, K. Garcia, B. Goodrich, N. Jurkovic, M. Kinniment, A. Lajko, S. Nix, … E. Barnes, RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts, arXiv [cs.LG] (2024); http://arxiv.org/abs/2411.15114.

D. Owen, “Interviewing AI Researchers on Automation of AI R&D” (Epoch AI, 2024); https://epoch.ai/blog/interviewing-ai-researchers-on-automation-of-ai-rnd.

* E. Erdil, J. Sevilla, Power Law Trends in Speedrunning and Machine Learning, arXiv [cs.LG] (2023); http://arxiv.org/abs/2304.10004.

* J. Droppo, O. Elibol, Scaling Laws for Acoustic Models, arXiv [eess.AS] (2021); http://arxiv.org/abs/2106.09488.

S. Hooker, The Hardware Lottery. Communications of the ACM 64, 58–65 (2021); https://doi.org/10.1145/3467017.

* Q. Anthony, J. Hatef, D. Narayanan, S. Biderman, S. Bekman, J. Yin, A. Shafi, H. Subramoni, D. Panda, The Case for Co-Designing Model Architectures with Hardware, arXiv [cs.DC] (2024); http://arxiv.org/abs/2401.14489.

* F. Mince, D. Dinh, J. Kgomo, N. Thompson, S. Hooker, The Grand Illusion: The Myth of Software Portability and Implications for ML Progress, arXiv [cs.SE] (2023); http://arxiv.org/abs/2309.07181.

* The Scale Team, Submit Your Toughest Questions for Humanity’s Last Exam, scale (2024); https://scale.com/blog/humanitys-last-exam.

ARC Prize, ARC Prize, ARC Prize (2024); https://arcprize.org/.

Department for Science, Innovation and Technology, “AI Safety Institute Approach to Evaluations” (GOV.UK, 2024); https://www.gov.uk/government/publications/ai-safety-institute-approach-to-evaluations/ai-safety-institute-approach-to-evaluations.

Metr, An Update on Our General Capability Evaluations, METR (2024); https://metr.org/blog/2024-08-06-update-on-evaluations/.

G. Sastry, L. Heim, H. Belfield, M. Anderljung, M. Brundage, J. Hazell, C. O’Keefe, G. K. Hadfield, R. Ngo, K. Pilz, G. Gor, E. Bluemke, S. Shoker, J. Egan, R. F. Trager, S. Avin, A. Weller, … D. Coyle, Computing Power and the Governance of Artificial Intelligence, arXiv [cs.CY] (2024); http://arxiv.org/abs/2402.08797.

D. Citron, R. Chesney, Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security. California Law Review 107, 1753 (2019); https://scholarship.law.bu.edu/faculty_scholarship/640.

United Nations, Universal Declaration of Human Rights (1948); https://www.un.org/en/about-us/universal-declaration-of-human-rights.

V. Ciancaglini, C. Gibson, D. Sancho, O. McCarthy, M. Eira, P. Amann, A. Klayn, “Malicious Uses and Abuses of Artificial Intelligence” (European Union Agency for Law Enforcement Cooperation, 2020); https://documents.trendmicro.com/assets/white_papers/wp-malicious-uses-and-abuses-of-artificial-intelligence.pdf.

P. V. Falade, Decoding the Threat Landscape: ChatGPT, FraudGPT, and WormGPT in Social Engineering Attacks. International Journal of Scientific Research in Computer Science, Engineering and Information Technology 9, 185–198 (2023); https://doi.org/10.32628/CSEIT2390533.

J. Bateman, “Deepfakes and Synthetic Media in the Financial System: Assessing Threat Scenarios” (Carnegie Endowment for International Peace, 2020); https://carnegieendowment.org/research/2020/07/deepfakes-and-synthetic-media-in-the-financial-system-assessing-threat-scenarios?lang=en.

US Federal Bureau of Investigation, Alert Number I-060523-PSA: Malicious Actors Manipulating Photos and Videos to Create Explicit Content and Sextortion Schemes (2023); https://www.ic3.gov/PSA/2023/psa230605.

A. Kaur, A. Noori Hoshyar, V. Saikrishna, S. Firmin, F. Xia, Deepfake Video Detection: Challenges and Opportunities. Artificial Intelligence Review 57, 1–47 (2024); https://doi.org/10.1007/s10462-024-10810-6.

R. Umbach, N. Henry, G. Beard, C. Berryessa, Non-Consensual Synthetic Intimate Imagery: Prevalence, Attitudes, and Knowledge in 10 Countries, arXiv [cs.CY] (2024); http://arxiv.org/abs/2402.01721.

M. B. Kugler, C. Pace, Deepfake Privacy: Attitudes and Regulation. Northwestern University Law Review 116, 611–680 (2021); https://scholarlycommons.law.northwestern.edu/nulr/vol116/iss3/1.

M. Viola, C. Voto, Designed to Abuse? Deepfakes and the Non-Consensual Diffusion of Intimate Images. Synthese 201, 30 (2023); https://doi.org/10.1007/s11229-022-04012-2.

S. Maddocks, “A Deepfake Porn Plot Intended to Silence Me”: Exploring Continuities between Pornographic and

References

“political” Deep Fakes. Porn Studies 7, 415–423 (2020); https://doi.org/10.1080/23268743.2020.1757499.

H. Ajder, G. Patrini, F. Cavalli, L. Cullen, “The State of Deepfakes: Landscape, Threats, and Impact” (Deeptrace, 2019); https://regmedia.co.uk/2019/10/08/deepfake_report.pdf.

J. Laffier, A. Rehman, Deepfakes and Harm to Women. Journal of Digital Life and Learning 3, 1–21 (2023); https://doi.org/10.51357/jdll.v3i1.218.

* T. Sippy, F. Enock, J. Bright, H. Z. Margetts, Behind the Deepfake: 8% Create; 90% Concerned. Surveying Public Exposure to and Perceptions of Deepfakes in the UK, arXiv [cs.CY] (2024); http://arxiv.org/abs/2407.05529.

D. Thiel, “Identifying and Eliminating CSAM in Generative ML Training Data and Models” (Stanford Digital Repository, 2023); https://purl.stanford.edu/kh752sm9123.

Ofcom, A Deep Dive into Deepfakes That Demean, Defraud and Disinform (2024); https://www.ofcom.org.uk/online-safety/illegal-and-harmful-content/deepfakes-demean-defraud-disinform/.

S. Dunn, Legal Definitions of Intimate Images in the Age of Sexual Deepfakes and Generative AI, Social Science Research Network (2024); https://papers.ssrn.com/abstract=4813941.

Y. Mirsky, W. Lee, The Creation and Detection of Deepfakes: A Survey, arXiv [cs.CV] (2020); http://arxiv.org/abs/2004.11138.

A. Lewis, P. Vu, R. Duch, A. Chowdhury, Do Content Warnings Help People Spot a Deepfake? Evidence from Two Experiments (2022); https://royalsociety.org/-/media/policy/projects/online-information-environment/do-content-warnings-help-people-spot-a-deepfake.pdf.

A. Qureshi, D. Megías, M. Kuribayashi, “Detecting Deepfake Videos Using Digital Watermarking” in 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2021), pp. 1786–1793; http://www.apsipa.org/proceedings/2021/pdfs/0001786.pdf.

L. Tang, Q. Ye, H. Hu, Q. Xue, Y. Xiao, J. Li, DeepMark: A Scalable and Robust Framework for DeepFake Video Detection. ACM Transactions on Privacy and Security 27, 1–26 (2024); https://doi.org/10.1145/3629976.

L.-Y. Hsu, AI-Assisted Deepfake Detection Using Adaptive Blind Image Watermarking. Journal of Visual Communication and Image Representation 100, 104094 (2024); https://doi.org/10.1016/j.jvcir.2024.104094.

Y. Zhao, B. Liu, M. Ding, B. Liu, T. Zhu, X. Yu, “Proactive Deepfake Defence via Identity Watermarking” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2023), pp. 4591–4600; https://doi.org/10.1109/WACV56688.2023.00458.

* S. Gowal, P. Kohli, Identifying AI-Generated Images with SynthID, Google DeepMind (2023); https://deepmind.google/discover/blog/identifying-ai-generated-images-with-synthid/.

A. J. Patil, R. Shelke, An Effective Digital Audio Watermarking Using a Deep Convolutional Neural Network with a Search Location Optimization Algorithm for Improvement in Robustness and Imperceptibility. High-Confidence Computing 3, 100153 (2023); https://doi.org/10.1016/j.hcc.2023.100153.

M. S. Uddin, Ohidujjaman, M. Hasan, T. Shimamura, Audio Watermarking: A Comprehensive Review. International Journal of Advanced Computer Science and Applications 15 (2024); https://doi.org/10.14569/IJACSA.2024.01505141.

S. Abdelnabi, M. Fritz, “Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding” in IEEE Symposium on Security and Privacy (2021), pp. 121–140; https://doi.org/10.1109/SP40001.2021.00083.

* X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y.-X. Wang, L. Li, Invisible Image Watermarks Are Provably Removable Using Generative AI, arXiv [cs.CR] (2023); http://arxiv.org/abs/2306.01953.

M. Saberi, V. S. Sadasivan, K. Rezaei, A. Kumar, A. Chegini, W. Wang, S. Feizi, “Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks” in 12th International Conference on Learning Representations (2023); https://openreview.net/pdf?id=dLoAdIKENc.

G. Björksten, “Identifying Generative AI Content: When and How Watermarking Can Help Uphold Human Rights” (accessnow, 2023); https://www.accessnow.org/wp-content/uploads/2023/09/Identifying-generative-AI-content-when-and-how-watermarking-can-help-uphold-human-rights.pdf.

D. Cooke, A. Edwards, S. Barkoff, K. Kelly, As Good As A Coin Toss: Human Detection of AI-Generated Images, Videos, Audio, and Audiovisual Stimuli, arXiv [cs.HC] (2024); http://arxiv.org/abs/2403.16760.

M. Jakesch, J. T. Hancock, M. Naaman, Human Heuristics for AI-Generated Language Are Flawed. Proceedings of the National Academy of Sciences of the United States of America 120, e2208839120 (2023); https://doi.org/10.1073/pnas.2208839120.

G. Spitale, N. Biller-Andorno, F. Germani, AI Model GPT-3 (dis)informs Us Better than Humans. Science Advances

References

, eadh1850 (2023); https://doi.org/10.1126/sciadv.adh1850.

S. Kreps, R. M. McCain, M. Brundage, All the News That’s Fit to Fabricate: AI-Generated Text as a Tool of Media Misinformation. Journal of Experimental Political Science 9, 104–117 (2022); https://doi.org/10.1017/xps.2020.37.

N. C. Köbis, B. Doležalová, I. Soraperra, Fooled Twice: People Cannot Detect Deepfakes but Think They Can. iScience 24 (2021); https://doi.org/10.1016/j.isci.2021.103364.

K.-C. Yang, F. Menczer, Anatomy of an AI-Powered Malicious Social Botnet, arXiv [cs.CY] (2023); http://arxiv.org/abs/2307.16336.

R. Raman, V. Kumar Nair, P. Nedungadi, A. Kumar Sahu, R. Kowalski, S. Ramanathan, K. Achuthan, Fake News Research Trends, Linkages to Generative Artificial Intelligence and Sustainable Development Goals. Heliyon 10, e24727 (2024); https://doi.org/10.1016/j.heliyon.2024.e24727.

* M. Musser, A Cost Analysis of Generative Language Models and Influence Operations, arXiv [cs.CY] (2023); http://arxiv.org/abs/2308.03740.

H. Bai, J. G. Voelkel, J. C. Eichstaedt, R. Willer, Artificial Intelligence Can Persuade Humans on Political Issues (2023); https://doi.org/10.31219/osf.io/stakv.

K. Hackenburg, L. Ibrahim, B. M. Tappin, M. Tsakiris, Comparing the Persuasiveness of Role-Playing Large Language Models and Human Experts on Polarized U.S. Political Issues (2023); https://doi.org/10.31219/osf.io/ey8db.

J. A. Goldstein, J. Chao, S. Grossman, A. Stamos, M. Tomz, How Persuasive Is AI-Generated Propaganda? PNAS Nexus 3, gae034 (2024); https://doi.org/10.1093/pnasnexus/pgae034.

S. C. Matz, J. D. Teeny, S. S. Vaid, H. Peters, G. M. Harari, M. Cerf, The Potential of Generative AI for Personalized Persuasion at Scale. Scientific Reports 14, 4692 (2024); https://doi.org/10.1038/s41598-024-53755-0.

* A. R. Williams, L. Burke-Moore, R. S.-Y. Chan, F. E. Enock, F. Nanni, T. Sippy, Y.-L. Chung, E. Gabasova, K. Hackenburg, J. Bright, Large Language Models Can Consistently Generate High-Quality Content for Election Disinformation Operations, arXiv [cs.CY] (2024); http://arxiv.org/abs/2408.06731.

T. H. Costello, G. Pennycook, D. G. Rand, Durably Reducing Conspiracy Beliefs through Dialogues with AI. Science (New York, N.Y.) 385, eadq1814 (2024); https://doi.org/10.1126/science.adq1814.

F. Salvi, M. H. Ribeiro, R. Gallotti, R. West, On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial, arXiv [cs.CY] (2024); http://arxiv.org/abs/2403.14380.

* I. Gabriel, A. Manzini, G. Keeling, L. A. Hendricks, V. Rieser, H. Iqbal, N. Tomašev, I. Ktena, Z. Kenton, M. Rodriguez, S. El-Sayed, S. Brown, C. Akbulut, A. Trask, E. Hughes, A. Stevie Bergman, R. Shelby, … J. Manyika, “The Ethics of Advanced AI Assistants” (Google DeepMind, 2024); http://arxiv.org/abs/2404.16244.

P. S. Park, S. Goldstein, A. O’Gara, M. Chen, D. Hendrycks, AI Deception: A Survey of Examples, Risks, and Potential Solutions. Patterns 5 (2024); https://doi.org/10.1016/j.patter.2024.100988.

* M. Phuong, M. Aitchison, E. Catt, S. Cogan, A. Kaskasoli, V. Krakovna, D. Lindner, M. Rahtz, Y. Assael, S. Hodkinson, H. Howard, T. Lieberum, R. Kumar, M. A. Raad, A. Webson, L. Ho, S. Lin, … T. Shevlane, “Evaluating Frontier Models for Dangerous Capabilities” (Google Deepmind, 2024); https://doi.org/10.48550/arXiv.2403.13793.

M. Burtell, T. Woodside, Artificial Influence: An Analysis Of AI-Driven Persuasion, arXiv [cs.CY] (2023); http://arxiv.org/abs/2303.08721.

F. Miró-Llinares, J. C. Aguerri, Misinformation about Fake News: A Systematic Critical Review of Empirical Studies on the Phenomenon and Its Status as a “threat.” European Journal of Criminology 20, 356–374 (2023); https://doi.org/10.1177/1477370821994059.

G. Pennycook, D. G. Rand, Fighting Misinformation on Social Media Using Crowdsourced Judgments of News Source Quality. Proceedings of the National Academy of Sciences of the United States of America 116, 2521–2526 (2019); https://doi.org/10.1073/pnas.1806781116.

Z. Epstein, N. Sirlin, A. Arechar, G. Pennycook, D. Rand, The Social Media Context Interferes with Truth Discernment. Science Advances 9, eabo6169 (2023); https://doi.org/10.1126/sciadv.abo6169.

G. Pennycook, Z. Epstein, M. Mosleh, A. A. Arechar, D. Eckles, D. G. Rand, Shifting Attention to Accuracy Can Reduce Misinformation Online. Nature 592, 590–595 (2021); https://doi.org/10.1038/s41586-021-03344-2.

Pew Research Center, A Majority of Americans Are Highly Concerned That AI Will Be Used to Create Fake Info about the 2024 Candidates (2024); https://www.pewresearch.org/short-reads/2024/09/19/concern-over-the-impact-of-ai-on-2024-presidential-campaign/sr_24-09-10_electionandai_01/.

S. Kapoor, A. Narayanan, “How to Prepare for the Deluge of Generative AI on Social Media: A Grounded Analysis of the Challenges and Opportunities” (Knight First Amendment Institute at Columbia University., 2023);

References

https://s3.amazonaws.com/kfai-documents/documents/a566f4ded5/How-to-Prepare-for-the-Deluge-of-Generative-AI-on-Social-Media.pdf.

M. Hameleers, Cheap Versus Deep Manipulation: The Effects of Cheapfakes Versus Deepfakes in a Political Setting. International Journal of Public Opinion Research 36 (2024); https://doi.org/10.1093/ijpor/edae004.

S. Vosoughi, D. Roy, S. Aral, The Spread of True and False News Online. Science 359, 1146–1151 (2018); https://doi.org/10.1126/science.aap9559.

K. Clayton, S. Blair, J. A. Busam, S. Forstner, J. Glance, G. Green, A. Kawata, A. Kovvuri, J. Martin, E. Morgan, M. Sandhu, R. Sang, R. Scholz-Bright, A. T. Welch, A. G. Wolff, A. Zhou, B. Nyhan, Real Solutions for Fake News? Measuring the Effectiveness of General Warnings and Fact-Check Tags in Reducing Belief in False Stories on Social Media. Political Behavior 42, 1073–1095 (2020); https://doi.org/10.1007/s11109-019-09533-0.

E. Hoes, B. Aitken, J. Zhang, T. Gackowski, M. Wojcieszak, Prominent Misinformation Interventions Reduce Misperceptions but Increase Skepticism, PsyArXiv (2023); https://doi.org/10.31234/osf.io/zmpdu.

A. Bashardoust, S. Feuerriegel, Y. R. Shrestha, Comparing the Willingness to Share for Human-Generated vs. AI-Generated Fake News. Proceedings of the ACM on Human-Computer Interaction 8, 1–21 (2024); https://doi.org/10.1145/3687028.

A. Kumar, J. W. Taylor, Feature Importance in the Age of Explainable AI: Case Study of Detecting Fake News & Misinformation via a Multi-Modal Framework. European Journal of Operational Research 317, 401–413 (2024); https://doi.org/10.1016/j.ejor.2023.10.003.

S. S. Ghosal, S. Chakraborty, J. Geiping, F. Huang, D. Manocha, A. Bedi, A Survey on the Possibilities & Impossibilities of AI-Generated Text Detection. Transactions on Machine Learning Research (2023); https://openreview.net/pdf?id=AXtFeYjboj.

V. S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, S. Feizi, Can AI-Generated Text Be Reliably Detected?, arXiv [cs.CL] (2023); http://arxiv.org/abs/2303.11156.

S. Gehrmann, H. Strobelt, A. Rush, “GLTR: Statistical Detection and Visualization of Generated Text” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, M. R. Costa-jussà, E. Alfonseca, Eds. (Association for Computational Linguistics, Florence, Italy, 2019), pp. 111–116; https://doi.org/10.18653/v1/P19-3019.

L. Fröhling, A. Zubiaga, Feature-Based Detection of Automated Language Models: Tackling GPT-2, GPT-3 and Grover. PeerJ. Computer Science 7, e443 (2021); https://doi.org/10.7717/peerj-cs.443.

J. Luo, G. Nan, D. Li, Y. Tan, AI-Generated Fake Review Detection. (2023); https://doi.org/10.2139/ssrn.4610727.

T. Berber Sardinha, AI-Generated vs Human-Authored Texts: A Multidimensional Comparison. Applied Corpus Linguistics 4, 100083 (2024); https://doi.org/10.1016/j.acorp.2023.100083.

D. M. Markowitz, J. T. Hancock, J. N. Bailenson, Linguistic Markers of Inherently False AI Communication and Intentionally False Human Communication: Evidence From Hotel Reviews. Journal of Language and Social Psychology 43, 63–82 (2024); https://doi.org/10.1177/0261927X231200201.

Y. Xie, A. Rawal, Y. Cen, D. Zhao, S. K. Narang, S. Sushmita, MUGC: Machine Generated versus User Generated Content Detection, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.19725.

J. Su, T. Y. Zhuo, J. Mansurov, D. Wang, P. Nakov, Fake News Detectors Are Biased against Texts Generated by Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2309.08674.

W. Liang, M. Yuksekgonul, Y. Mao, E. Wu, J. Zou, “GPT Detectors Are Biased against Non-Native English Writers” in ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models (2023); https://openreview.net/pdf?id=SPuX8tKKIQ.

A. Uchendu, J. Lee, H. Shen, T. Le, T.-H. ’kenneth Huang, D. Lee, Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?, arXiv [cs.CL] (2023); http://arxiv.org/abs/2304.01002.

M. K. Land, Against Privatized Censorship: Proposals for Responsible Delegation. Virginia Journal of International Law 60, 363 (2019); https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3442184.

R. Gorwa, R. Binns, C. Katzenbach, Algorithmic Content Moderation: Technical and Political Challenges in the Automation of Platform Governance. Big Data & Society 7, 205395171989794 (2020); https://doi.org/10.1177/2053951719897945.

J. Turner, Robot Rules (Springer International Publishing, Cham, Switzerland, ed. 1, 2018); https://doi.org/10.1007/978-3-319-96235-1.

N. Bontridder, Y. Poullet, The Role of Artificial Intelligence in Disinformation. Data & Policy 3, e32 (2021); https://doi.org/10.1017/dap.2021.20.

References

T. C. Helmus, Artificial Intelligence, Deepfakes, and Disinformation: A Primer (RAND Corporation, Santa Monica, CA, 2022); https://doi.org/10.7249/PEA1043-1.

S. Metta, I. Chang, J. Parker, M. P. Roman, A. F. Ehuan, Generative AI in Cybersecurity, arXiv [cs.CR] (2024); http://arxiv.org/abs/2405.01674.

National Cyber Security Centre (NCSC), “The near-Term Impact of AI on the Cyber Threat” (GOV.UK, 2024); https://www.ncsc.gov.uk/report/impact-of-ai-on-cyber-threat.

British Library, “Learning Lessons From the Cyber-Attack: British Library Cyber Incident Review” (British Library, 2024); https://www.bl.uk/home/british-library-cyber-incident-review-8-march-2024.pdf/.

* Microsoft Threat Intelligence, Staying ahead of Threat Actors in the Age of AI, Microsoft Security Blog (2024); https://www.microsoft.com/en-us/security/blog/2024/02/14/staying-ahead-of-threat-actors-in-the-age-of-ai/.

* B. Nimmo, M. Flossman, “Influence and Cyber Operations: An Update” (OpenAI, 2024); https://cdn.openai.com/threat-intelligence-reports/influence-and-cyber-operations-an-update_October-2024.pdf.

Defense Advanced Research Projects Agency, AIxCC (2024); https://aicyberchallenge.com/.

H. Ruan, Y. Zhang, A. Roychoudhury, SpecRover: Code Intent Extraction via LLMs, arXiv [cs.SE] (2024); http://arxiv.org/abs/2408.02232.

N. T. Islam, J. Khoury, A. Seong, E. Bou-Harb, P. Najafirad, Enhancing Source Code Security with LLMs: Demystifying the Challenges and Generating Reliable Repairs, arXiv [cs.CR] (2024); http://arxiv.org/abs/2409.00571.

X. Du, G. Zheng, K. Wang, J. Feng, W. Deng, M. Liu, B. Chen, X. Peng, T. Ma, Y. Lou, Vul-RAG: Enhancing LLM-Based Vulnerability Detection via Knowledge-Level RAG, arXiv [cs.SE] (2024); http://arxiv.org/abs/2406.11147.

* M. Allamanis, M. Arjovsky, C. Blundell, L. Buesing, M. Brand, S. Glazunov, D. Maier, P. Maniatis, G. Marinho, H. Michalewski, K. Sen, C. Sutton, V. Tulsyan, M. Vanotti, T. Weber, D. Zheng, From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code (2024); https://googleprojectzero.blogspot.com/2024/10/from-naptime-to-big-sleep.html.

A. K. Zhang, N. Perry, R. Dulepet, J. Ji, J. W. Lin, E. Jones, C. Menders, G. Hussein, S. Liu, D. Jasper, P. Peetathawatchai, A. Glenn, V. Sivashankar, D. Zamoshchin, L. Glikbarg, D. Askaryar, M. Yang, … P. Liang, Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models, arXiv [cs.CR] (2024); http://arxiv.org/abs/2408.08926.

D. Ristea, V. Mavroudis, C. Hicks, Benchmarking OpenAI o1 in Cyber Security, arXiv [cs.CR] (2024); http://arxiv.org/abs/2410.21939.

J. Gennari, S.-H. Lau, S. Perl, J. Parish, G. Sastry, “Considerations for Evaluating Large Language Models for Cybersecurity Tasks” (Carnegie Mellon University, 2024); https://insights.sei.cmu.edu/library/considerations-for-evaluating-large-language-models-for-cybersecurity-tasks/.

M. Shao, B. Chen, S. Jancheska, B. Dolan-Gavitt, S. Garg, R. Karri, M. Shafique, An Empirical Evaluation of LLMs for Solving Offensive Security Challenges, arXiv [cs.CR] (2024); http://arxiv.org/abs/2402.11814.

* J. Xu, J. W. Stokes, G. McDonald, X. Bai, D. Marshall, S. Wang, A. Swaminathan, Z. Li, AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-Attacks, arXiv [cs.CR] (2024); http://arxiv.org/abs/2403.01038.

R. Fang, R. Bindu, A. Gupta, Q. Zhan, D. Kang, Teams of LLM Agents Can Exploit Zero-Day Vulnerabilities, arXiv [cs.MA] (2024); http://arxiv.org/abs/2406.01637.

T. Abramovich, M. Udeshi, M. Shao, K. Lieret, H. Xi, K. Milner, S. Jancheska, J. Yang, C. E. Jimenez, F. Khorrami, P. Krishnamurthy, B. Dolan-Gavitt, M. Shafique, K. Narasimhan, R. Karri, O. Press, EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges, arXiv [cs.AI] (2024); http://arxiv.org/abs/2409.16165.

G. Deng, Y. Liu, V. Mayoral-Vilches, P. Liu, Y. Li, Y. Xu, T. Zhang, Y. Liu, M. Pinzger, S. Rass, “PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing” in 33rd USENIX Security Symposium (USENIX Security 24) (USENIX Association, Philadelphia, PA, 2024), pp. 847–864; https://www.usenix.org/conference/usenixsecurity24/presentation/deng.

* S. Glazunov, M. Brand, Google Project Zero, “Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models” (Google Project Zero, 2024); https://googleprojectzero.blogspot.com/2024/06/project-naptime.html.

J. Walden, “The Impact of a Major Security Event on an Open Source Project: The Case of OpenSSL” in Proceedings of the 17th International Conference on Mining Software Repositories (ACM, New York, NY, USA,

References

; https://doi.org/10.1145/3379597.3387465.

G. Kokolakis, A. Moschos, A. D. Keromytis, “Harnessing the Power of General-Purpose LLMs in Hardware Trojan Design” in Lecture Notes in Computer Science (Springer Nature Switzerland, Cham, 2024)Lecture notes in computer science, pp. 176–194; https://doi.org/10.1007/978-3-031-61486-6_11.

J. P. Farwell, R. Rohozinski, Stuxnet and the Future of Cyber War. Survival 53, 23–40 (2011); https://doi.org/10.1080/00396338.2011.555586.

D. Saha, S. Tarek, K. Yahyaei, S. K. Saha, J. Zhou, M. Tehranipoor, F. Farahmandi, LLM for SoC Security: A Paradigm Shift. IEEE Access 12, 155498–155521 (2024); https://doi.org/10.1109/ACCESS.2024.3427369.

* Amazon, What Is AWS CloudTrail? (2024); https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html.

* P. Kanuparthy, A. Dalakoti, S. Kamath, AI Debugging at Meta with HawkEye, Engineering at Meta (2023); https://engineering.fb.com/2023/12/19/data-infrastructure/hawkeye-ai-debugging-meta/.

M. C. Horowitz, P. Scharre, A. Velez-Green, A Stable Nuclear Future? The Impact of Autonomous Systems and Artificial Intelligence, arXiv [cs.CY] (2019); http://arxiv.org/abs/1912.05291.

A. E. Chu, T. Lu, P.-S. Huang, Sparks of Function by de Novo Protein Design. Nature Biotechnology 42, 203–215 (2024); https://doi.org/10.1038/s41587-024-02133-2.

Robert F. Service, AI Tools Set off an Explosion of Designer Proteins. Science 386, 260–261 (2024); https://doi.org/10.1126/science.adt9024.

C. Li, G. Ye, Y. Jiang, Z. Wang, H. Yu, M. Yang, Artificial Intelligence in Battling Infectious Diseases: A Transformative Role. Journal of Medical Virology 96, e29355 (2024); https://doi.org/10.1002/jmv.29355.

The Royal Swedish Academy of Sciences, The Nobel Prize in Chemistry 2024. (2024); https://www.nobelprize.org/uploads/2024/10/press-chemistryprize2024-3.pdf.

V. Pitschmann, Z. Hon, Drugs as Chemical Weapons: Past and Perspectives. Toxics 11, 52 (2023); https://doi.org/10.3390/toxics11010052.

National Research Council, “Biosecurity and Dual-Use Research in the Life Sciences” in Science and Security in a Post 9/11 World: A Report Based on Regional Discussions between the Science and Security Communities (National Academies Press, Washington, D.C., DC, 2007); https://doi.org/10.17226/12013.

S. Ben Ouagrham-Gormley, Barriers to Bioweapons: The Challenges of Expertise and Organization for Weapons Development (Cornell University Press, 2014); https://www.cornellpress.cornell.edu/book/9780801452888/barriers-to-bioweapons/#bookTabs=1.

J. Revill, C. Jefferson, Tacit Knowledge and the Biological Weapons Regime. Science & Public Policy 41, 597–610 (2014); https://doi.org/10.1093/scipol/sct090.

S. R. Carter, N. Wheeler, S. Chwalek, C. Isaac, J. M. Yassif, “The Convergence of Artificial Intelligence and the Life Sciences: Safeguarding Technology, Rethinking Governance, and Preventing Catastrophe” (Nuclear Threat Initiative, 2023); https://www.nti.org/wp-content/uploads/2023/10/NTIBIO_AI_FINAL.pdf.

J. Smith, S. Rose, R. Moulange, C. Nelson, “How the UK Government Should Address the Misuse Risk from AI-Enabled Biological Tools” (Centre for Long-Term Resilience, 2024); https://www.longtermresilience.org/wp-content/uploads/2024/07/How-the-UK-Government-should-address-the-misuse-risk-from-AI-Enabled-biological-tools-BTs-Website-Copy.pdf.

B. Drexel, C. Withers, “AI and the Evolution of Biological National Security Risks: Capabilities, Thresholds, and Interventions” (CNAS, 2024); https://www.cnas.org/publications/reports/ai-and-the-evolution-of-biological-national-security-risks.

M. Dybul, “Biosecurity in the Age of AI: Chairperson’s Statement” (Helena, 2024); https://www.helenabiosecurity.org/.

* T. Hayes, R. Rao, H. Akin, N. J. Sofroniew, D. Oktay, Z. Lin, R. Verkuil, V. Q. Tran, J. Deaton, M. Wiggert, R. Badkundri, I. Shafkat, J. Gong, A. Derry, R. S. Molina, N. Thomas, Y. Khan, … A. Rives, Simulating 500 Million Years of Evolution with a Language Model, bioRxiv [preprint] (2024); https://doi.org/10.1101/2024.07.01.600583.

* V. Zambaldi, D. La, A. E. Chu, H. Patani, A. E. Danson, T. O. C. Kwan, T. Frerix, R. G. Schneider, D. Saxton, A. Thillaisundaram, Z. Wu, I. Moraes, O. Lange, E. Papa, G. Stanton, V. Martin, S. Singh, … J. Wang, “De Novo Design of High-Affinity Protein Binders with AlphaProteo” (Google DeepMind, 2024); https://deepmind.google/discover/blog/alphaproteo-generates-novel-proteins-for-biology-and-health-research/.

Frontier Model Forum, Progress Update: Advancing Frontier AI Safety in 2024 and Beyond, Frontier Model Forum (2024); https://www.frontiermodelforum.org/updates/progress-update-advancing-frontier-ai-safety-in-

References

-and-beyond/.

AIxBio Global Forum, “White Paper: AIxBio Global Forum Structure and Goals” (NTI, 2024); https://www.nti.org/wp-content/uploads/2024/07/AI_Bio-Global-Forum-Structure-and-Goals_White-Paper.pdf.

N. N. Thadani, S. Gurev, P. Notin, N. Youssef, N. J. Rollins, D. Ritter, C. Sander, Y. Gal, D. S. Marks, Learning from Prepandemic Data to Forecast Viral Escape. Nature 622, 818–825 (2023); https://doi.org/10.1038/s41586-023-06617-0.

E. H. Soice, R. Rocha, K. Cordova, M. Specter, K. M. Esvelt, Can Large Language Models Democratize Access to Dual-Use Biotechnology?, arXiv [cs.CY] (2023); http://arxiv.org/abs/2306.03809.

N. Li, A. Pan, A. Gopal, S. Yue, D. Berrios, A. Gatti, J. D. Li, A.-K. Dombrowski, S. Goel, L. Phan, G. Mukobi, N. Helm-Burger, R. Lababidi, L. Justen, A. B. Liu, M. Chen, I. Barrass, … D. Hendrycks, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning, arXiv [cs.LG] (2024); http://arxiv.org/abs/2403.03218.

C. A. Mouton, C. Lucas, E. Guest, “The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study” (RAND Corporation, 2024); https://www.rand.org/pubs/research_reports/RRA2977-2.html.

* T. Patwardhan, K. Liu, T. Markov, N. Chowdhury, D. Leet, N. Cone, C. Maltbie, J. Huizinga, C. Wainwright, S. (froggi) Jackson, S. Adler, R. Casagrande, A. Madry, “Building an Early Warning System for LLM-Aided Biological Threat Creation” (OpenAI, 2024); https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation.

B. J. Wittmann, T. Alexanian, C. Bartling, J. Beal, A. Clore, J. Diggans, K. Flyangolts, B. T. Gemler, T. Mitchell, S. T. Murphy, N. E. Wheeler, E. Horvitz, Toward AI-Resilient Screening of Nucleic Acid Synthesis Orders: Process, Results, and Recommendations, bioRxiv [preprint] (2024); https://doi.org/10.1101/2024.12.02.626439.

N. R. Bennett, B. Coventry, I. Goreshnik, B. Huang, A. Allen, D. Vafeados, Y. P. Peng, J. Dauparas, M. Baek, L. Stewart, F. DiMaio, S. De Munck, S. N. Savvides, D. Baker, Improving de Novo Protein Binder Design with Deep Learning. Nature Communications 14, 2625 (2023); https://doi.org/10.1038/s41467-023-38328-5.

M. Crowley, L. Shang, M. Dando, Preserving the Norm against Chemical Weapons: A Civil Society Initiative for the 2018 4th Review Conference of the Chemical Weapons Convention. Futures 102, 125–133 (2018); https://doi.org/10.1016/j.futures.2018.01.006.

F. Urbina, F. Lentzos, C. Invernizzi, S. Ekins, Dual Use of Artificial Intelligence-Powered Drug Discovery. Nature Machine Intelligence 4, 189–191 (2022); https://doi.org/10.1038/s42256-022-00465-9.

M. Guo, Z. Li, X. Deng, D. Luo, J. Yang, Y. Chen, W. Xue, ConoDL: A Deep Learning Framework for Rapid Generation and Prediction of Conotoxins, bioRxiv [preprint] (2024); https://doi.org/10.1101/2024.09.27.614001.

* 310.ai, GenAI + BIO: Nature Didn’t Have Time, We Have GPUs (2024); https://310.ai/.

* Asimov, Kernel: CAD Software for Engineering Biology (2024); https://www.asimov.com/kernel.

A. M Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, P. Schwaller, Augmenting Large Language Models with Chemistry Tools. Nature Machine Intelligence 6, 525–535 (2024); https://doi.org/10.1038/s42256-024-00832-8.

J. Goldblat, The Biological Weapons Convention: An Overview. International Review of the Red Cross 37, 251–265 (1997); https://doi.org/10.1017/s0020860400084679.

G. Gonzalez-Isunza, M. Z. Jawaid, P. Liu, D. L. Cox, M. Vazquez, J. Arsuaga, Using Machine Learning to Detect Coronaviruses Potentially Infectious to Humans. Scientific Reports 13, 9319 (2023); https://doi.org/10.1038/s41598-023-35861-7.

M. Wardeh, M. S. C. Blagrove, K. J. Sharkey, M. Baylis, Divide-and-Conquer: Machine-Learning Integrates Mammalian and Viral Traits with Network Features to Predict Virus-Mammal Associations. Nature Communications 12, 3954 (2021); https://doi.org/10.1038/s41467-021-24085-w.

S. Rose, R. Moulange, J. Smith, C. Nelson, “The near-Term Impact of AI on Biological Misuse” (Centre for Long-Term Resilience, 2024); https://www.longtermresilience.org/reports/the-near-term-impact-of-ai-on-biological-misuse/.

J. Frazer, P. Notin, M. Dias, A. Gomez, J. K. Min, K. Brock, Y. Gal, D. S. Marks, Disease Variant Prediction with Deep Generative Models of Evolutionary Data. Nature 599, 91–95 (2021); https://doi.org/10.1038/s41586-021-04043-8.

J. B. Sandbrink, E. C. Alley, M. C. Watson, G. D. Koblentz, K. M. Esvelt, Insidious Insights: Implications of Viral Vector Engineering for Pathogen Enhancement. Gene Therapy 30, 407–410 (2023); https://doi.org/10.1038/s41434-021-00312-3.

J. Kaiser, Exclusive: Controversial Experiments That Could Make Bird Flu More Risky Poised to Resume, American

References

Association for the Advancement of Science (2021); https://www.science.org/content/article/exclusive-controversial-experiments-make-bird-flu-more-risky-poised-resume.

J. Pannu, D. Bloomfield, A. Zhu, R. MacKnight, G. Gomes, A. Cicero, T. Inglesby, Prioritizing High-Consequence Biological Capabilities in Evaluations of Artificial Intelligence Models, arXiv [cs.CY] (2024); http://dx.doi.org/10.2139/ssrn.4873106.

E. Appleton, C. Madsen, N. Roehner, D. Densmore, Design Automation in Synthetic Biology. Cold Spring Harbor Perspectives in Biology 9 (2017); https://doi.org/10.1101/cshperspect.a023978.

Organisation for Economic Co-operation and Development, Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research (OECD, Paris, 2023); https://www.oecd-ilibrary.org/science-and-technology/artificial-intelligence-in-science_a8d820bd-en.

C. Nelson, S. Rose, “Understanding AI-Facilitated Biological Weapon Development” (Centre for Long-Term Resilience, 2023); https://www.longtermresilience.org/reports/understanding-risks-at-the-intersection-of-ai-and-bio/.

Z. Wu, S. B. J. Kan, R. D. Lewis, B. J. Wittmann, F. H. Arnold, Machine Learning-Assisted Directed Protein Evolution with Combinatorial Libraries. Proceedings of the National Academy of Sciences of the United States of America 116, 8852–8858 (2019); https://doi.org/10.1073/pnas.1901979116.

D. A. Boiko, R. MacKnight, B. Kline, G. Gomes, Autonomous Chemical Research with Large Language Models. Nature 624, 570–578 (2023); https://doi.org/10.1038/s41586-023-06792-0.

A. Stephenson, L. Lastra, B. Nguyen, Y.-J. Chen, J. Nivala, L. Ceze, K. Strauss, Physical Laboratory Automation in Synthetic Biology. ACS Synthetic Biology 12, 3156–3169 (2023); https://doi.org/10.1021/acssynbio.3c00345.

J. T. Rapp, B. J. Bremer, P. A. Romero, Self-Driving Laboratories to Autonomously Navigate the Protein Fitness Landscape. Nature Chemical Engineering 1, 97–107 (2024); https://doi.org/10.1038/s44286-023-00002-4.

A. Casas, M. Bultelle, R. Kitney, An Engineering Biology Approach to Automated Workflow and Biodesign. Synthetic Biology 9, ysae009 (2024); https://doi.org/10.1093/synbio/ysae009.

D. Sun, W. Gao, H. Hu, S. Zhou, Why 90% of Clinical Drug Development Fails and How to Improve It? Acta Pharmaceutica Sinica. B 12, 3049–3062 (2022); https://doi.org/10.1016/j.apsb.2022.02.002.

Forum on Neuroscience and Nervous System Disorders, Board on Health Sciences Policy, Institute of Medicine, “Drug Development Challenges” in Improving and Accelerating Therapeutic Development for Nervous System Disorders: Workshop Summary (National Academies Press (US), 2014); https://www.ncbi.nlm.nih.gov/books/NBK195047/.

K. H. Sumida, R. Núñez-Franco, I. Kalvet, S. J. Pellock, B. I. M. Wicky, L. F. Milles, J. Dauparas, J. Wang, Y. Kipnis, N. Jameson, A. Kang, J. De La Cruz, B. Sankaran, A. K. Bera, G. Jiménez-Osés, D. Baker, Improving Protein Expression, Stability, and Function with ProteinMPNN. Journal of the American Chemical Society 146, 2054–2061 (2024); https://doi.org/10.1021/jacs.3c10941.

M. Wehrs, D. Tanjore, T. Eng, J. Lievense, T. R. Pray, A. Mukhopadhyay, Engineering Robust Production Microbes for Large-Scale Cultivation. Trends in Microbiology 27, 524–537 (2019); https://doi.org/10.1016/j.tim.2019.01.006.

J. Jiang, H.-H. Peng, Z. Yang, X. Ma, S. Sahakijpijarn, C. Moon, D. Ouyang, R. O. Williams Iii, The Applications of Machine Learning (ML) in Designing Dry Powder for Inhalation by Using Thin-Film-Freezing Technology. International Journal of Pharmaceutics 626, 122179 (2022); https://doi.org/10.1016/j.ijpharm.2022.122179.

T. R. Sosnowski, Towards More Precise Targeting of Inhaled Aerosols to Different Areas of the Respiratory System. Pharmaceutics 16, 97 (2024); https://doi.org/10.3390/pharmaceutics16010097.

Department for Science, Innovation & Technology, AI Safety Institute, “Advanced AI Evaluations at AISI: May Update” (GOV.UK, 2024); https://www.aisi.gov.uk/work/advanced-ai-evaluations-may-update.

* Anthropic, Reflections on Our Responsible Scaling Policy (2024); https://www.anthropic.com/news/reflections-on-our-responsible-scaling-policy.

G. Lewis, P. Millett, A. Sandberg, A. Snyder-Beattie, G. Gronvall, Information Hazards in Biotechnology. Risk Analysis: An Official Publication of the Society for Risk Analysis 39, 975–981 (2019); https://doi.org/10.1111/risa.13235.

S. R. Carter, S. Curtis, C. Emerson, J. Gray, I. C. Haydon, A. Hebbeler, C. Qureshi, N. Randolph, A. Rives, A. L. Stuart, Responsible AI X Biodesign: Community Values, Guiding Principles, and Commitments for the Responsible Development of AI for Protein Design (2024); https://responsiblebiodesign.ai/.

NTI | bio, “Research Agenda for Safeguarding AI-Bio Capabilities Draft” (NTI, 2024); https://www.nti.org/wp-content/uploads/2024/06/Research-Agenda-for-Safeguarding-AI-Bio-Capabilities.pdf.

E. Nguyen, M. Poli, M. G. Durrant, A. W. Thomas, B. Kang, J. Sullivan, M. Y. Ng, A. Lewis, A. Patel, A. Lou, S. Ermon, S.

References

A. Baccus, T. Hernandez-Boussard, C. Re, P. D. Hsu, B. L. Hie, Sequence Modeling and Design from Molecular to Genome Scale with Evo, bioRxiv [preprint] (2024); https://doi.org/10.1101/2024.02.27.582234.

J. Cheng, G. Novati, J. Pan, C. Bycroft, A. Žemgulytė, T. Applebaum, A. Pritzel, L. H. Wong, M. Zielinski, T. Sargeant, R. G. Schneider, A. W. Senior, J. Jumper, D. Hassabis, P. Kohli, Ž. Avsec, Accurate Proteome-Wide Missense Variant Effect Prediction with AlphaMissense. Science (New York, N.Y.) 381, eadg7492 (2023); https://doi.org/10.1126/science.adg7492.

S. R. Carter, N. E. Wheeler, C. Isaac, J. M. Yassif, “Developing Guardrails for AI Biodesign Tools” (Nuclear Threat Initiative, 2024); https://www.nti.org/analysis/articles/developing-guardrails-for-ai-biodesign-tools/.

S. A. Dip, U. A. Shuvo, T. Chau, H. Song, P. Choi, X. Wang, L. Zhang, PathoLM: Identifying Pathogenicity from the DNA Sequence through the Genome Foundation Model, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.13133.

K. Workman, Engineering AAVs with Evo and AlphaFold, LatchBio (2024); https://blog.latch.bio/p/engineering-aavs-with-evo-and-alphafold.

D. Bloomfield, J. Pannu, A. W. Zhu, M. Y. Ng, A. Lewis, E. Bendavid, S. M. Asch, T. Hernandez-Boussard, A. Cicero, T. Inglesby, AI and Biosecurity: The Need for Governance. Science (New York, N.Y.) 385, 831–833 (2024); https://doi.org/10.1126/science.adq1977.

Y. Zhang, M. Yasunaga, Z. Zhou, J. Z. HaoChen, J. Zou, P. Liang, S. Yeung, “Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, N. Okazaki, Eds. (Association for Computational Linguistics, 2023), pp. 7479–7498; https://doi.org/10.18653/v1/2023.findings-acl.472.

A. Mallen, A. Asai, V. Zhong, R. Das, D. Khashabi, H. Hajishirzi, “When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, N. Okazaki, Eds. (Association for Computational Linguistics, Toronto, Canada, 2023), pp. 9802–9822; https://doi.org/10.18653/v1/2023.acl-long.546.

S. Santurkar, E. Durmus, F. Ladhak, C. Lee, P. Liang, T. Hashimoto, “Whose Opinions Do Language Models Reflect?” in Proceedings of the 40th International Conference on Machine Learning (JMLR, Honolulu, Hawaii, USA, 2023) vol. 202 of ICML’23, pp. 29971–30004; https://proceedings.mlr.press/v202/santurkar23a.html.

L. Weidinger, J. Uesato, M. Rauh, C. Griffin, P.-S. Huang, J. Mellor, A. Glaese, M. Cheng, B. Balle, A. Kasirzadeh, C. Biles, S. Brown, Z. Kenton, W. Hawkins, T. Stepleton, A. Birhane, L. A. Hendricks, … I. Gabriel, “Taxonomy of Risks Posed by Language Models” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 214–229; https://doi.org/10.1145/3531146.3533088.

* M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, … W. Zaremba, Evaluating Large Language Models Trained on Code, arXiv [cs.LG] (2021); http://arxiv.org/abs/2107.03374.

S. Nguyen, H. M. Babe, Y. Zi, A. Guha, C. J. Anderson, M. Q. Feldman, “How Beginning Programmers and Code LLMs (Mis)read Each Other” in Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24) (Association for Computing Machinery, New York, NY, USA, 2024), pp. 1–26; https://doi.org/10.1145/3613904.3642706.

F. Cassano, L. Li, A. Sethi, N. Shinn, A. Brennan-Jones, J. Ginesin, E. Berman, G. Chakhnashvili, A. Lozhkov, C. J. Anderson, A. Guha, Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions, arXiv [cs.SE] (2023); http://arxiv.org/abs/2312.12450.

R. Pan, A. R. Ibrahimzada, R. Krishna, D. Sankar, L. P. Wassi, M. Merler, B. Sobolev, R. Pavuluri, S. Sinha, R. Jabbarvand, “Lost in Translation: A Study of Bugs Introduced by Large Language Models While Translating Code” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24) (Association for Computing Machinery, New York, NY, USA, 2024), pp. 1–13; https://doi.org/10.1145/3597503.3639226.

N. Perry, M. Srivastava, D. Kumar, D. Boneh, “Do Users Write More Insecure Code with AI Assistants?” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (ACM, New York, NY, USA, 2023), pp. 2785–2799; https://doi.org/10.1145/3576915.3623157.

A. Perlman, The Implications of ChatGPT for Legal Services and Society, The Practice (2023); https://clp.law.harvard.edu/knowledge-hub/magazine/issues/generative-ai-in-the-legal-profession/the-implications-of-chatgpt-for-legal-services-and-society/.

E. Martínez, Re-Evaluating GPT-4’s Bar Exam Performance. Artificial Intelligence and Law (2024); https://doi.org/10.1007/s10506-024-09396-9.

Eastern District of Texas, US District Court, Memorandum and Order in Case 1:23-Cv-00281-MAC. (2024);

References

https://www.courthousenews.com/wp-content/uploads/2024/11/attorney-sanctioned-for-using-ai-hallucinations.pdf.

J. A. Omiye, J. C. Lester, S. Spichak, V. Rotemberg, R. Daneshjou, Large Language Models Propagate Race-Based Medicine. Npj Digital Medicine 6, 1–4 (2023); https://doi.org/10.1038/s41746-023-00939-z.

T. H. Kung, M. Cheatham, A. Medenilla, C. Sillos, L. De Leon, C. Elepaño, M. Madriaga, R. Aggabao, G. Diaz-Candido, J. Maningo, V. Tseng, Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models. PLOS Digital Health 2, e0000198 (2023); https://doi.org/10.1371/journal.pdig.0000198.

K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl, P. Payne, M. Seneviratne, P. Gamble, C. Kelly, A. Babiker, N. Schärli, A. Chowdhery, … V. Natarajan, Large Language Models Encode Clinical Knowledge. Nature 620, 172–180 (2023); https://doi.org/10.1038/s41586-023-06291-2.

J. Tan, H. Westermann, K. Benyekhlef, “ChatGPT as an Artificial Lawyer?” in Workshop on Artificial Intelligence for Access to Justice (AI4AJ 2023) (CEUR Workshop Proceedings, Braga, Portugal, 2023); https://ceur-ws.org/Vol-3435/short2.pdf.

J. L. M. Brand, Air Canada’s Chatbot Illustrates Persistent Agency and Responsibility Gap Problems for AI. AI & Society, 1–3 (2024); https://doi.org/10.1007/s00146-024-02096-7.

* Z. Yuan, H. Yuan, C. Tan, W. Wang, S. Huang, How Well Do Large Language Models Perform in Arithmetic Tasks?, arXiv [cs.CL] (2023); http://arxiv.org/abs/2304.02015.

Z. Wang, “CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models” in Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10) (2024), pp. 143–151; https://aclanthology.org/2024.sighan-1.17.pdf.

X. Yin, J. Jiang, L. Yang, X. Wan, History Matters: Temporal Knowledge Editing in Large Language Model. Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence 38, 19413–19421 (2024); https://doi.org/10.1609/aaai.v38i17.29912.

I. D. Raji, I. E. Kumar, A. Horowitz, A. Selbst, “The Fallacy of AI Functionality” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 959–972; https://doi.org/10.1145/3531146.3533158.

B. Vidgen, A. Agrawal, A. M. Ahmed, V. Akinwande, N. Al-Nuaimi, N. Alfaraj, E. Alhajjar, L. Aroyo, T. Bavalatti, M. Bartolo, B. Blili-Hamelin, K. Bollacker, R. Bomassani, M. F. Boston, S. Campos, K. Chakra, C. Chen, … J. Vanschoren, Introducing v0.5 of the AI Safety Benchmark from MLCommons, arXiv [cs.CL] (2024); http://arxiv.org/abs/2404.12241.

P. Guldimann, A. Spiridonov, R. Staab, N. Jovanović, M. Vero, V. Vechev, A. Gueorguieva, M. Balunović, N. Konstantinov, P. Bielik, P. Tsankov, M. Vechev, COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.07959.

OECD.AI Policy Observatory, OECD AI Incidents Monitor (AIM) (2024); https://oecd.ai/en/incidents.

A. Wei, N. Haghtalab, J. Steinhardt, “Jailbroken: How Does LLM Safety Training Fail?” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=jA235JGM09.

S. M. T. I. Tonmoy, S. M. M. Zaman, V. Jain, A. Rani, V. Rawte, A. Chadha, A. Das, A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models, arXiv [cs.CL] (2024); http://arxiv.org/abs/2401.01313.

ETH Zurich, INSAIT, LatticeFlow AI, COMPL-AI (2024); https://compl-ai.org/.

N. Guha, J. Nyarko, D. E. Ho, C. Ré, A. Chilton, A. Narayana, A. Chohlas-Wood, A. Peters, B. Waldon, D. N. Rockmore, D. Zambrano, D. Talisman, E. Hoque, F. Surani, F. Fagan, G. Sarfaty, G. M. Dickinson, … Z. Li, “LEGALBENCH: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models” in 37th International Conference on Neural Information Processing Systems (NeurIPS 2023) (Curran Associates Inc., Red Hook, NY, USA, 2024), pp. 44123–44279; https://doi.org/10.5555/3666122.3668037.

R. Xu, Z. Wang, R.-Z. Fan, P. Liu, Benchmarking Benchmark Leakage in Large Language Models, arXiv [cs.CL] (2024); http://arxiv.org/abs/2404.18824.

S. Longpre, S. Biderman, A. Albalak, H. Schoelkopf, D. McDuff, S. Kapoor, K. Klyman, K. Lo, G. Ilharco, N. San, M. Rauh, A. Skowron, B. Vidgen, L. Weidinger, A. Narayanan, V. Sanh, D. Adelani, … L. Soldaini, The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources. Transactions on Machine Learning Research (2024); https://openreview.net/pdf?id=tH1dQH20eZ.

V. Ojewale, R. Steed, B. Vecchione, A. Birhane, I. D. Raji, Towards AI Accountability Infrastructure: Gaps and Opportunities in AI Audit Tooling, arXiv [cs.CY] (2024); http://arxiv.org/abs/2402.17861.

References

N. Guha, C. M. Lawrence, L. A. Gailmard, K. T. Rodolfa, F. Surani, R. Bommasani, I. D. Raji, M.-F. Cuéllar, C. Honigsberg, P. Liang, D. E. Ho, AI Regulation Has Its Own Alignment Problem: The Technical and Institutional Feasibility of Disclosure, Registration, Licensing, and Auditing. The George Washington Law Review 92 (2024); https://dho.stanford.edu/wp-content/uploads/AI_Regulation.pdf.

A. Narayanan, S. Kapoor, AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference (Princeton University Press, 2024); https://doi.org/10.1515/9780691249643.

J. Buolamwini, T. Gebru, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” in Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT/MM ’19) (PMLR, 2018), pp. 77–91; https://proceedings.mlr.press/v81/buolamwini18a.html.

J. Angwin, J. Larson, L. Kirchner, S. Mattu, Machine Bias, ProPublica (2016); https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

J. Dressel, H. Farid, The Accuracy, Fairness, and Limits of Predicting Recidivism. Science Advances 4, eaao5580 (2018); https://doi.org/10.1126/sciadv.aao5580.

Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations. Science 366, 447–453 (2019); https://doi.org/10.1126/science.aax2342.

T. Zack, E. Lehman, M. Suzgun, J. A. Rodriguez, L. A. Celi, J. Gichoya, D. Jurafsky, P. Szolovits, D. W. Bates, R.-E. E. Abdulnour, A. J. Butte, E. Alsentzer, Assessing the Potential of GPT-4 to Perpetuate Racial and Gender Biases in Health Care: A Model Evaluation Study. The Lancet. Digital Health 6, e12–e22 (2024); https://doi.org/10.1016/S2589-7500(23)00225-X.

F. Bianchi, P. Kalluri, E. Durmus, F. Ladhak, M. Cheng, D. Nozza, T. Hashimoto, D. Jurafsky, J. Zou, A. Caliskan, “Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 1493–1504; https://doi.org/10.1145/3593013.3594095.

S. Ghosh, A. Caliskan, “‘person’ == Light-Skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion” in Findings of the Association for Computational Linguistics: EMNLP 2023 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2023), pp. 6971–6985; https://doi.org/10.18653/v1/2023.findings-emnlp.465.

M. Cheong, E. Abedin, M. Ferreira, R. Reimann, S. Chalson, P. Robinson, J. Byrne, L. Ruppanner, M. Alfano, C. Klein, Investigating Gender and Racial Biases in DALL-E Mini Images. ACM Journal on Responsible Computing 1, 1–20 (2024); https://doi.org/10.1145/3649883.

J. S. Park, M. S. Bernstein, R. N. Brewer, E. Kamar, M. R. Morris, “Understanding the Representation and Representativeness of Age in AI Data Sets” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21) (Association for Computing Machinery, New York, NY, USA, 2021), pp. 834–842; https://doi.org/10.1145/3461702.3462590.

R. Kamikubo, L. Wang, C. Marte, A. Mahmood, H. Kacorri, “Data Representativeness in Accessibility Datasets: A Meta-Analysis” in Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 1–15; https://doi.org/10.1145/3517428.3544826.

* S. Shankar, Y. Halpern, E. Breck, J. Atwood, J. Wilson, D. Sculley, “No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World” in 31st Conference on Neural Information Processing Systems (NIPS 2017) Machine Learning for the Developing World Workshop (Long Beach, CA, USA, 2017); https://arxiv.org/abs/1711.08536.

T. de Vries, I. Misra, C. Wang, L. van der Maaten, “Does Object Recognition Work for Everyone?” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (Long Beach, CA, USA, 2019); https://openaccess.thecvf.com/content_CVPRW_2019/papers/cv4gc/de_Vries_Does_Object_Recognition_Work_for_Everyone_CVPRW_2019_paper.pdf.

S. Longpre, R. Mahari, A. Chen, N. Obeng-Marnu, D. Sileo, W. Brannon, N. Muennighoff, N. Khazam, J. Kabbara, K. Perisetla, X. Wu, E. Shippole, K. Bollacker, T. Wu, L. Villa, S. Pentland, S. Hooker, The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.16787.

H. Suresh, J. Guttag, “A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle” in Equity and Access in Algorithms, Mechanisms, and Optimization (ACM, New York, NY, USA, 2021); https://doi.org/10.1145/3465416.3483305.

* L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh, Z. Kenton, S. Brown, W. Hawkins, T. Stepleton, C. Biles, A. Birhane, J. Haas, … I. Gabriel, “Ethical and Social Risks of

References

Harm from Language Models” (Google DeepMind, 2021); http://arxiv.org/abs/2112.04359.

J. Nwatu, O. Ignat, R. Mihalcea, “Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 10686–10702; https://doi.org/10.18653/v1/2023.emnlp-main.660.

A. Pouget, L. Beyer, E. Bugliarello, X. Wang, A. P. Steiner, X. Zhai, I. Alabdulmohsin, “No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/pdf?id=UmW9BYj761.

S. Nayak, K. Jain, R. Awal, S. Reddy, S. Van Steenkiste, L. A. Hendricks, K. Stanczak, A. Agrawal, Benchmarking Vision Language Models for Cultural Understanding (Association for Computational Linguistics, 2024); https://aclanthology.org/2024.emnlp-main.329.

D. Agarwal, M. Naaman, A. Vashistha, AI Suggestions Homogenize Writing toward Western Styles and Diminish Cultural Nuances, arXiv [cs.HC] (2024); http://arxiv.org/abs/2409.11360.

N. Shahbazi, Y. Lin, A. Asudeh, H. V. Jagadish, Representation Bias in Data: A Survey on Identification and Resolution Techniques. ACM Computing Surveys 55, 293:1–293:39 (2023); https://doi.org/10.1145/3588433.

S. E. Whang, Y. Roh, H. Song, J.-G. Lee, Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective. The VLDB Journal: Very Large Data Bases: A Publication of the VLDB Endowment 32, 791–813 (2023); https://doi.org/10.1007/s00778-022-00775-9.

A. P. Gema, J. O. J. Leang, G. Hong, A. Devoto, A. C. M. Mancino, R. Saxena, X. He, Y. Zhao, X. Du, M. R. G. Madani, C. Barale, R. McHardy, J. Harris, J. Kaddour, E. van Krieken, P. Minervini, Are We Done with MMLU?, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.04127.

Y. Wan, G. Pu, J. Sun, A. Garimella, K.-W. Chang, N. Peng, “‘Kelly Is a Warm Person, Joseph Is a Role Model’: Gender Biases in LLM-Generated Reference Letters” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 3730–3748; https://doi.org/10.18653/v1/2023.findings-emnlp.243.

D. van Niekerk, M. Pérez-Ortiz, J. Shawe-Taylor, D. Orlič, I. Drobnjak, J. Kay, N. Siegel, K. Evans, N. Moorosi, T. Eliassi-Rad, L. M. Tanczer, W. Holmes, M. P. Deisenroth, I. Straw, M. Fasli, R. Adams, N. Oliver, … M. Janicky, “Challenging Systematic Prejudices: An Investigation into Bias Against Women and Girls in Large Language Models” (UNESCO, IRCAI, 2024); https://ircai.org/project/challenging-systematic-prejudices/.

M. Vlasceanu, D. M. Amodio, Propagation of Societal Gender Inequality by Internet Search Algorithms. Proceedings of the National Academy of Sciences 119, e2204529119 (2022); https://doi.org/10.1073/pnas.2204529119.

S. Sterlie, N. Weng, A. Feragen, Generalizing Fairness to Generative Language Models via Reformulation of Non-Discrimination Criteria, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.08564.

T. Sandoval-Martin, E. Martínez-Sanzo, Perpetuation of Gender Bias in Visual Representation of Professions in the Generative AI Tools DALL·E and Bing Image Creator. Social Sciences (Basel, Switzerland) 13, 250 (2024); https://doi.org/10.3390/socsci13050250.

L. Sun, M. Wei, Y. Sun, Y. J. Suh, L. Shen, S. Yang, Smiling Women Pitching down: Auditing Representational and Presentational Gender Biases in Image-Generative AI. Journal of Computer-Mediated Communication: JCMC 29, zmad045 (2023); https://doi.org/10.1093/jcmc/zmad045.

Y. Wan, K.-W. Chang, The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-to-Image Generation of Dual Subjects, arXiv [cs.CV] (2024); http://arxiv.org/abs/2402.11089.

A. Nielsen, A. Woemmel, “Invisible Inequities: Confronting Age-Based Discrimination in Machine Learning Research and Applications” in 2nd Workshop on Generative AI and Law (2024); https://blog.genlaw.org/pdfs/genlaw_icml2024/50.pdf.

C. Harris, Mitigating Age Biases in Resume Screening AI Models. The International FLAIRS Conference Proceedings 36 (2023); https://doi.org/10.32473/flairs.36.133236.

J. Stypinska, AI Ageism: A Critical Roadmap for Studying Age Discrimination and Exclusion in Digitalized Societies. AI & Society 38, 665–677 (2023); https://doi.org/10.1007/s00146-022-01553-5.

R. Naik, B. Nushi, “Social Biases through the Text-to-Image Generation Lens” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 786–808; https://doi.org/10.1145/3600211.3604711.

* A. Tamkin, A. Askell, L. Lovitt, E. Durmus, N. Joseph, S. Kravec, K. Nguyen, J. Kaplan, D. Ganguli, Evaluating and Mitigating Discrimination in Language Model Decisions, arXiv [cs.CL] (2023); http://arxiv.org/abs/2312.03689.

References

M. Kamruzzaman, Shovon, G. Kim, Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.findings-acl.530.

C. H. Chu, S. Donato-Woodger, S. S. Khan, R. Nyrup, K. Leslie, A. Lyn, T. Shi, A. Bianchi, S. A. Rahimi, A. Grenier, Age-Related Bias and Artificial Intelligence: A Scoping Review. Humanities & Social Sciences Communications 10, 1–17 (2023); https://doi.org/10.1057/s41599-023-01999-y.

T. Kamelski, D. Klinge, Generative Artificial Intelligence and Digital Ageism: Exploring the Construction of Age and Aging by Image-Generating AI (2024); https://doi.org/10.31219/osf.io/p3sdj.

K. A. Mack, R. Qadri, R. Denton, S. K. Kane, C. L. Bennett, “‘They Only Care to Show Us the Wheelchair’: Disability Representation in Text-to-Image AI Models” in Proceedings of the CHI Conference on Human Factors in Computing Systems (ACM, New York, NY, USA, 2024) vol. 22, pp. 1–23; https://doi.org/10.1145/3613904.3642166.

P. N. Venkit, M. Srinath, S. Wilson, “Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models” in Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), A. Ovalle, K.-W. Chang, N. Mehrabi, Y. Pruksachatkun, A. Galystan, J. Dhamala, A. Verma, T. Cao, A. Kumar, R. Gupta, Eds. (Association for Computational Linguistics, Toronto, Canada, 2023), pp. 26–34; https://doi.org/10.18653/v1/2023.trustnlp-1.3.

K. Glazko, Y. Mohammed, B. Kosa, V. Potluri, J. Mankoff, “Identifying and Improving Disability Bias in GPT-Based Resume Screening” in The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2024); https://doi.org/10.1145/3630106.3658933.

N. Shahin, L. Ismail, “ChatGPT, Let Us Chat Sign Language: Experiments, Architectural Elements, Challenges and Research Directions” in 2023 International Symposium on Networks, Computers and Communications (ISNCC) (IEEE, 2023), pp. 1–7; https://doi.org/10.1109/isncc58260.2023.10323974.

S. Gueuwou, K. Takyi, M. Müller, M. S. Nyarko, R. Adade, R.-M. O. M. Gyening, “AfriSign: Machine Translation for African Sign Languages” in 4th Workshop on African Natural Language Processing (AfricaNLP 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=EHldk3J2xk.

J. Hartmann, J. Schwenzow, M. Witte, The Political Ideology of Conversational AI: Converging Evidence on ChatGPT’s pro-Environmental, Left-Libertarian Orientation, arXiv [cs.CL] (2023); http://arxiv.org/abs/2301.01768.

F. Motoki, V. Pinho Neto, V. Rodrigues, More Human than Human: Measuring ChatGPT Political Bias. Public Choice 198, 3–23 (2024); https://doi.org/10.1007/s11127-023-01097-2.

D. Rozado, The Political Biases of ChatGPT. Social Sciences (Basel, Switzerland) 12, 148 (2023); https://doi.org/10.3390/socsci12030148.

J. Rutinowski, S. Franke, J. Endendyk, I. Dormuth, M. Roidl, M. Pauly, The Self-Perception and Political Biases of ChatGPT. Human Behavior and Emerging Technologies 2024, 1–9 (2024); https://doi.org/10.1155/2024/7115633.

M. Buyl, A. Rogiers, S. Noels, I. Dominguez-Catena, E. Heiter, R. Romero, I. Johary, A.-C. Mara, J. Lijffijt, T. De Bie, Large Language Models Reflect the Ideology of Their Creators, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.18417.

* T. Choudhary, Political Bias in AI-Language Models: A Comparative Analysis of ChatGPT-4, Perplexity, Google Gemini, and Claude, Techrxiv (2024); https://doi.org/10.36227/techrxiv.172107441.12283354/v1.

S. Feng, C. Y. Park, Y. Liu, Y. Tsvetkov, From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.acl-long.656.

L. Rettenberger, M. Reischl, M. Schutera, Assessing Political Bias in Large Language Models, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.13041.

S. Fujimoto, K. Takemoto, Revisiting the Political Biases of ChatGPT. Frontiers in Artificial Intelligence 6, 1232003 (2023); https://doi.org/10.3389/frai.2023.1232003.

C. Walker, J. C. Timoneda, Identifying the Sources of Ideological Bias in GPT Models through Linguistic Variation in Output, arXiv [cs.CL] (2024); http://arxiv.org/abs/2409.06043.

T. Ceron, N. Falk, A. Barić, D. Nikolaev, S. Padó, Beyond Prompt Brittleness: Evaluating the Reliability and Consistency of Political Worldviews in LLMs. Transactions of the Association for Computational Linguistics 12, 1378–1400 (2024); https://doi.org/10.1162/tacl_a_00710.

E. Perez, S. Ringer, K. Lukosiute, K. Nguyen, E. Chen, S. Heiner, C. Pettit, C. Olsson, S. Kundu, S. Kadavath, A. Jones, A. Chen, B. Mann, B. Israel, B. Seethor, C. McKinnon, C. Olah, … J. Kaplan, “Discovering Language Model Behaviors with Model-Written Evaluations” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, N. Okazaki, Eds. (Association for Computational Linguistics, Toronto, Canada, 2023), pp.

References

–13434; https://doi.org/10.18653/v1/2023.findings-acl.847.

J. Fisher, S. Feng, R. Aron, T. Richardson, Y. Choi, D. W. Fisher, J. Pan, Y. Tsvetkov, K. Reinecke, Biased AI Can Influence Political Decision-Making, arXiv [cs.HC] (2024); http://arxiv.org/abs/2410.06415.

U. Messer, How Do People React to Political Bias in Generative Artificial Intelligence (AI)? Computers in Human Behavior: Artificial Humans, 100108 (2024); https://doi.org/10.1016/j.chbah.2024.100108.

Á. A. Cabrera, W. Epperson, F. Hohman, M. Kahng, J. Morgenstern, D. H. Chau, “FAIRVIS: Visual Analytics for Discovering Intersectional Bias in Machine Learning” in 2019 IEEE Conference on Visual Analytics Science and Technology (VAST) (2019), pp. 46–56; https://doi.org/10.1109/VAST47406.2019.8986948.

W. Guo, A. Caliskan, “Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21) (Association for Computing Machinery, New York, NY, USA, 2021), pp. 122–133; https://doi.org/10.1145/3461702.3462536.

I. M. S. Lassen, M. Almasi, K. Enevoldsen, R. D. Kristensen-McLachlan, “Detecting Intersectionality in NER Models: A Data-Driven Approach” in Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, S. Degaetano-Ortlieb, A. Kazantseva, N. Reiter, S. Szpakowicz, Eds. (Association for Computational Linguistics, Dubrovnik, Croatia, 2023), pp. 116–127; https://doi.org/10.18653/v1/2023.latechclfl-1.13.

A. Ovalle, A. Subramonian, V. Gautam, G. Gee, K.-W. Chang, “Factoring the Matrix of Domination: A Critical Review and Reimagination of Intersectionality in AI Fairness” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 496–511; https://doi.org/10.1145/3600211.3604705.

K. Wilson, A. Caliskan, Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval, arXiv [cs.CY] (2024); http://arxiv.org/abs/2407.20371.

X. Fang, S. Che, M. Mao, H. Zhang, M. Zhao, X. Zhao, Bias of AI-Generated Content: An Examination of News Produced by Large Language Models. Scientific Reports 14, 5224 (2024); https://doi.org/10.1038/s41598-024-55686-2.

H. An, C. Acquaye, C. Wang, Z. Li, R. Rudinger, “Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024), pp. 386–397; https://doi.org/10.18653/v1/2024.acl-short.37.

R. Navigli, S. Conia, B. Ross, Biases in Large Language Models: Origins, Inventory, and Discussion. J. Data and Information Quality 15, 1–21 (2023); https://doi.org/10.1145/3597307.

Y. Li, M. Du, R. Song, X. Wang, Y. Wang, A Survey on Fairness in Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2308.10149.

* S. Mukherjee, A. Mitra, G. Jawahar, S. Agarwal, H. Palangi, A. Awadallah, Orca: Progressive Learning from Complex Explanation Traces of GPT-4, arXiv [cs.CL] (2023); http://arxiv.org/abs/2306.02707.

E. Ferrara, Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci 6, 3 (2023); https://doi.org/10.3390/sci6010003.

S. U. Noble, Algorithms of Oppression: How Search Engines Reinforce Racism, NYU Press (2019); https://nyupress.org/9781479837243/algorithms-of-oppression/.

S. Lazar, A. Nelson, AI Safety on Whose Terms? Science 381, 138 (2023); https://doi.org/10.1126/science.adi8982.

R. I. J. Dobbe, T. K. Gilbert, Y. Mintz, “Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments (AIES ’20)” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (Association for Computing Machinery, New York, NY, USA, 2020), p. 242; https://doi.org/10.1145/3375627.3375861.

M. Shur-Ofry, Multiplicity as an AI Governance Principle (2023); https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4444354.

M. Sloane, E. Moss, O. Awomolo, L. Forlano, “Participation Is Not a Design Fix for Machine Learning” in Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 1–6; https://doi.org/10.1145/3551624.3555285.

H. Gonen, Y. Goldberg, “Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But Do Not Remove Them” in Proceedings of the 2019 Workshop on Widening NLP, A. Axelrod, D. Yang, R. Cunha, S. Shaikh, Z. Waseem, Eds. (Association for Computational Linguistics, Florence, Italy, 2019), pp.

References

–63; https://aclanthology.org/W19-3621.

J. Xiao, Z. Li, X. Xie, E. Getzen, C. Fang, Q. Long, W. J. Su, On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization, arXiv [stat.ML] (2024); http://arxiv.org/abs/2405.16455.

D. Y. Kim, C. Wallraven, “Label Quality in AffectNet: Results of Crowd-Based Re-Annotation” in Lecture Notes in Computer Science (Springer International Publishing, Cham, 2022)Lecture notes in computer science, pp. 518–531; https://doi.org/10.1007/978-3-031-02444-3_39.

J. Ma, Y. Ushiku, M. Sagara, “The Effect of Improving Annotation Quality on Object Detection Datasets: A Preliminary Study” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022), pp. 4849–4858; https://doi.org/10.1109/CVPRW56347.2022.00532.

Z. Xu, K. Peng, L. Ding, D. Tao, X. Lu, Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.09963.

H. Weerts, F. Pfisterer, M. Feurer, K. Eggensperger, E. Bergman, N. Awad, J. Vanschoren, M. Pechenizkiy, B. Bischl, F. Hutter, Can Fairness Be Automated? Guidelines and Opportunities for Fairness-Aware AutoML. The Journal of Artificial Intelligence Research 79, 639–677 (2024); https://doi.org/10.1613/jair.1.14747.

I. D. Raji, J. Buolamwini, “Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (ACM, New York, NY, USA, 2019); https://doi.org/10.1145/3306618.3314244.

D. Zhang, P. Finckenberg-Broman, T. Hoang, S. Pan, Z. Xing, M. Staples, X. Xu, Right to Be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions. AI and Ethics (2024); https://doi.org/10.1007/s43681-024-00573-9.

* A. Xiang, Being “Seen” vs. “Mis-Seen”: Tensions between Privacy and Fairness in Computer Vision. Harvard Journal of Law & Technology 36 (2022); https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4068921.

J. Kleinberg, “Inherent Trade-Offs in Algorithmic Fairness” in Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’18) (Association for Computing Machinery, New York, NY, USA, 2018), p. 40; https://doi.org/10.1145/3219617.3219634.

H. Nilforoshan, J. D. Gaebler, R. Shroff, S. Goel, “Causal Conceptions of Fairness and Their Consequences” in Proceedings of the 39th International Conference on Machine Learning (ICML 2022) (PMLR, 2022); https://proceedings.mlr.press/v162/nilforoshan22a.html.

N. Konstantinov, C. H. Lampert, “On the Impossibility of Fairness-Aware Learning from Corrupted Data” in Algorithmic Fairness through the Lens of Causality and Robustness Workshop (AFCR 2021) (PMLR, Virtual, 2021); https://proceedings.mlr.press/v171/konstantinov22a.html.

A. Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 153–163 (2017); https://doi.org/10.1089/big.2016.0047.

Q. Zhang, J. Liu, Z. Zhang, J. Wen, B. Mao, X. Yao, Mitigating Unfairness via Evolutionary Multiobjective Ensemble Learning. IEEE Transactions on Evolutionary Computation 27, 848–862 (2023); https://doi.org/10.1109/TEVC.2022.3209544.

M. Hardt, E. Price, E. Price, N. Srebro, “Equality of Opportunity in Supervised Learning” in 30th Conference on Neural Information Processing Systems (NIPS 2016) (Curran Associates, Inc., Barcelona, Spain, 2016) vol. 29; https://proceedings.neurips.cc/paper_files/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html.

M. Brcic, R. V. Yampolskiy, Impossibility Results in AI: A Survey. ACM Comput. Surv. 56, 1–24 (2023); https://doi.org/10.1145/3603371.

B. Green, Escaping the Impossibility of Fairness: From Formal to Substantive Algorithmic Fairness. Philosophy & Technology 35, 90 (2022); https://doi.org/10.1007/s13347-022-00584-6.

A. Bell, L. Bynum, N. Drushchak, T. Zakharchenko, L. Rosenblatt, J. Stoyanovich, “The Possibility of Fairness: Revisiting the Impossibility Theorem in Practice” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 400–422; https://doi.org/10.1145/3593013.3594007.

K. T. Rodolfa, H. Lamba, R. Ghani, Empirical Observation of Negligible Fairness–accuracy Trade-Offs in Machine Learning for Public Policy. Nature Machine Intelligence 3, 896–904 (2021); https://doi.org/10.1038/s42256-021-00396-x.

V. Hofmann, P. R. Kalluri, D. Jurafsky, S. King, Dialect Prejudice Predicts AI Decisions about People’s Character, Employability, and Criminality, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.00742.

References

R. L. Johnson, G. Pistilli, N. Menédez-González, L. D. D. Duran, E. Panai, J. Kalpokiene, D. J. Bertulfo, The Ghost in the Machine Has an American Accent: Value Conflict in GPT-3, arXiv [cs.CL] (2022); http://arxiv.org/abs/2203.07785.

E. Durmus, K. Nguyen, T. Liao, N. Schiefer, A. Askell, A. Bakhtin, C. Chen, Z. Hatfield-Dodds, D. Hernandez, N. Joseph, L. Lovitt, S. McCandlish, O. Sikder, A. Tamkin, J. Thamkul, J. Kaplan, J. Clark, D. Ganguli, “Towards Measuring the Representation of Subjective Global Opinions in Language Models” in First Conference on Language Modeling (2024); https://openreview.net/pdf?id=zl16jLb91v.

Y. Wan, K.-W. Chang, White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs, arXiv [cs.CL] (2024); http://arxiv.org/abs/2404.10508.

B. AlKhamissi, M. ElNokrashy, M. Alkhamissi, M. Diab, “Investigating Cultural Alignment of Large Language Models” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024), pp. 12404–12422; https://doi.org/10.18653/v1/2024.acl-long.671.

H. Yuan, Z. Che, S. Li, Y. Zhang, X. Hu, S. Luo, The High Dimensional Psychological Profile and Cultural Bias of ChatGPT, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.03387.

R. Hada, S. Husain, V. Gumma, H. Diddee, A. Yadavalli, A. Seth, N. Kulkarni, U. Gadiraju, A. Vashistha, V. Seshadri, K. Bali, “Akal Badi Ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology” in The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2024); https://doi.org/10.1145/3630106.3659017.

M. H. J. Lee, J. M. Montgomery, C. K. Lai, “Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans” in The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2024); https://doi.org/10.1145/3630106.3658975.

C. Raj, A. Mukherjee, A. Caliskan, A. Anastasopoulos, Z. Zhu, Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 1180–1189 (2024); https://ojs.aaai.org/index.php/AIES/article/view/31715.

D. Oba, M. Kaneko, D. Bollegala, “In-Contextual Gender Bias Suppression for Large Language Models” in Findings of the Association for Computational Linguistics: EACL 2024 (2024), pp. 1722–1742; https://aclanthology.org/2024.findings-eacl.121.pdf.

Y. Reif, R. Schwartz, “Beyond Performance: Quantifying and Mitigating Label Bias in LLMs” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024), pp. 6784–6798; https://doi.org/10.18653/v1/2024.naacl-long.378.

M. Ribeiro, B. Malcorra, N. B. Mota, R. Wilkens, A. Villavicencio, L. C. Hubner, C. Rennó-Costa, A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.00250.

L. Luo, Y.-F. Li, R. Haf, S. Pan, “Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning” in 12th International Conference on Learning Representations (2023); https://openreview.net/pdf?id=ZGNWW7xZ6Q.

S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, X. Wu, Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering 36, 3580–3599 (2024); https://doi.org/10.1109/tkde.2024.3352100.

S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, The (Im)possibility of Fairness: Different Value Systems Require Different Mechanisms for Fair Decision Making. Communications of the ACM 64, 136–143 (2021); https://doi.org/10.1145/3433949.

J. Banja, J. W. Gichoya, N. Martinez-Martin, L. A. Waller, G. D. Clifford, Fairness as an Afterthought: An American Perspective on Fairness in Model Developer-Clinician User Collaborations. PLOS Digital Health 2, e0000386 (2023); https://doi.org/10.1371/journal.pdig.0000386.

N. A. Saxena, K. Huang, E. DeFilippis, G. Radanovic, D. C. Parkes, Y. Liu, “How Do Fairness Definitions Fare? Examining Public Attitudes Towards Algorithmic Definitions of Fairness” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19) (Association for Computing Machinery, New York, NY, USA, 2019), pp. 99–106; https://doi.org/10.1145/3306618.3314248.

W. Fleisher, “What’s Fair about Individual Fairness?” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21) (Association for Computing Machinery, New York, NY, USA, 2021), pp. 480–490; https://doi.org/10.1145/3461702.3462621.

A. M. Turing, Intelligent Machinery, A Heretical Theory*. Philosophia Mathematica. Series III 4, 256–260 (1996);

References

https://doi.org/10.1093/philmat/4.3.256.

I. J. Good, “Speculations Concerning the First Ultraintelligent Machine” in Advances in Computers, F. L. Alt, M. Rubinoff, Eds. (Elsevier, 1966) vol. 6, pp. 31–88; https://doi.org/10.1016/S0065-2458(08)60418-0.

N. Wiener, Some Moral and Technical Consequences of Automation. Science 131, 1355–1358 (1960); https://doi.org/10.1126/science.131.3410.1355.

S. M. Omohundro, “The Basic AI Drives” in Proceedings of the 2008 Conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference (IOS Press, NLD, 2008), pp. 483–492; https://dl.acm.org/doi/10.5555/1566174.1566226.

N. Bostrom, M. M. Cirkovic, Global Catastrophic Risks (Oxford University Press, London, England, 2011); https://academic.oup.com/book/40615.

S. Russell, P. Norvig, Artificial Intelligence (Pearson, Upper Saddle River, NJ, ed. 3, 2009); https://people.engr.tamu.edu/guni/csce421/files/AI_Russell_Norvig.pdf.

N. Bostrom, Superintelligence: Paths, Dangers, Strategies (Oxford University Press, London, England, 2014); https://global.oup.com/academic/product/superintelligence-9780198739838?cc=mx&lang=en&.

S. J. Russell, Human Compatible: Artificial Intelligence and the Problem of Control (Penguin Books, Harlow, England, 2020); https://www.penguin.co.uk/books/307948/human-compatible-by-russell-stuart/9780141987507.

Center for AI Safety, Statement on AI Risk: AI Experts and Public Figures Express Their Concern about AI Risk (2024); https://www.safe.ai/work/statement-on-ai-risk.

Y. Bengio, Written Statement of Professor Yoshua Bengio Before the US Senate Forum on AI Insight Regarding Risk, Alignment, and Guarding Against Doomsday Scenarios. (2023); https://www.schumer.senate.gov/imo/media/doc/Yoshua%20Benigo%20-%20Statement.pdf.

K. Grace, H. Stewart, J. F. Sandkühler, S. Thomas, B. Weinstein-Raun, J. Brauner, Thousands of AI Authors on the Future of AI, arXiv [cs.CY] (2024); http://arxiv.org/abs/2401.02843.

A. Critch, S. Russell, TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI, arXiv [cs.AI] (2023); http://arxiv.org/abs/2306.06924.

K. Goddard, A. Roudsari, J. C. Wyatt, Automation Bias: A Systematic Review of Frequency, Effect Mediators, and Mitigators. Journal of the American Medical Informatics Association: JAMIA 19, 121–127 (2012); https://doi.org/10.1136/amiajnl-2011-000089.

M. Chugunova, D. Sele, We and It: An Interdisciplinary Review of the Experimental Evidence on How Humans Interact with Machines. Journal of Behavioral and Experimental Economics 99, 101897 (2022); https://doi.org/10.1016/j.socec.2022.101897.

A. Kasirzadeh, Two Types of AI Existential Risk: Decisive and Accumulative, arXiv [cs.CY] (2024); http://arxiv.org/abs/2401.07836.

M. Kinniment, L. J. K. Sato, H. Du, B. Goodrich, M. Hasin, L. Chan, L. H. Miles, T. R. Lin, H. Wijk, J. Burget, A. Ho, E. Barnes, P. Christiano, Evaluating Language-Model Agents on Realistic Autonomous Tasks, arXiv [cs.CL] (2023); https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf.

* OpenAI, “Preparedness Framework (Beta)” (OpenAI, 2023); https://cdn.openai.com/openai-preparedness-framework-beta.pdf.

* Anthropic, Anthropic’s Responsible Scaling Policy, Version 1.0. (2023); https://www-cdn.anthropic.com/1adf000c8f675958c2ee23805d91aaade1cd4613/responsible-scaling-policy.pdf.

* Google DeepMind, Frontier Safety Framework Version 1.0. (2024); https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/introducing-the-frontier-safety-framework/fsf-technical-report.pdf.

T. Hagendorff, Deception Abilities Emerged in Large Language Models. Proceedings of the National Academy of Sciences of the United States of America 121, e2317967121 (2024); https://doi.org/10.1073/pnas.2317967121.

* E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, A. Jermyn, A. Askell, A. Radhakrishnan, C. Anil, D. Duvenaud, D. Ganguli, F. Barez, … E. Perez, Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training, arXiv [cs.CR] (2024); http://arxiv.org/abs/2401.05566.

* C. Denison, M. MacDiarmid, F. Barez, D. Duvenaud, S. Kravec, S. Marks, N. Schiefer, R. Soklaski, A. Tamkin, J. Kaplan, B. Shlegeris, S. R. Bowman, E. Perez, E. Hubinger, Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models, arXiv [cs.AI] (2024); http://arxiv.org/abs/2406.10162.

S. Kapoor, B. Stroebl, Z. S. Siegel, N. Nadgir, A. Narayanan, AI Agents That Matter, arXiv [cs.LG] (2024);

References

http://arxiv.org/abs/2407.01502.

R. Shiffrin, M. Mitchell, Probing the Psychology of AI Models. Proceedings of the National Academy of Sciences of the United States of America 120, e2300963120 (2023); https://doi.org/10.1073/pnas.2300963120.

D. Hendrycks, M. Mazeika, T. Woodside, An Overview of Catastrophic AI Risks, arXiv [cs.CY] (2023); http://arxiv.org/abs/2306.12001.

J. Lehman, J. Clune, D. Misevic, C. Adami, L. Altenberg, J. Beaulieu, P. J. Bentley, S. Bernard, G. Beslon, D. M. Bryson, N. Cheney, P. Chrabaszcz, A. Cully, S. Doncieux, F. C. Dyer, K. O. Ellefsen, R. Feldt, … J. Yosinski, The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artificial Life 26, 274–306 (2020); https://doi.org/10.1162/artl_a_00319.

J. Skalse, N. H. R. Howe, D. Krasheninnikov, D. Krueger, Defining and Characterizing Reward Hacking, arXiv [cs.LG] (2022); http://arxiv.org/abs/2209.13085.

R. Ngo, L. Chan, S. Mindermann, “The Alignment Problem from a Deep Learning Perspective” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=fh8EYKFKns.

J. Ji, T. Qiu, B. Chen, B. Zhang, H. Lou, K. Wang, Y. Duan, Z. He, J. Zhou, Z. Zhang, F. Zeng, K. Y. Ng, J. Dai, X. Pan, A. O’Gara, Y. Lei, H. Xu, … W. Gao, AI Alignment: A Comprehensive Survey, arXiv [cs.AI] (2023); http://arxiv.org/abs/2310.19852.

A. Pan, K. Bhatia, J. Steinhardt, “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models” in The 10th International Conference on Learning Representations (ICLR 2022) (Virtual, 2021); https://openreview.net/forum?id=JYtwGwIL7ye.

J. Wen, R. Zhong, A. Khan, E. Perez, J. Steinhardt, M. Huang, S. R. Bowman, H. He, S. Feng, Language Models Learn to Mislead Humans via RLHF, arXiv [cs.CL] (2024); http://arxiv.org/abs/2409.12822.

* S. R. Bowman, J. Hyun, E. Perez, E. Chen, C. Pettit, S. Heiner, K. Lukošiūtė, A. Askell, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Olah, D. Amodei, D. Amodei, D. Drain, … J. Kaplan, Measuring Progress on Scalable Oversight for Large Language Models, arXiv [cs.HC] (2022); http://arxiv.org/abs/2211.03540.

* P. Christiano, B. Shlegeris, Dario, Amodei, Supervising Strong Learners by Amplifying Weak Experts, arXiv [cs.LG] (2018); http://arxiv.org/abs/1810.08575.

* G. Irving, P. Christiano, D. Amodei, “AI Safety via Debate” (OpenAI, 2018); http://arxiv.org/abs/1805.00899.

* J. Leike, D. Krueger, T. Everitt, M. Martic, V. Maini, S. Legg, Scalable Agent Alignment via Reward Modeling: A Research Direction, arXiv [cs.LG] (2018); http://arxiv.org/abs/1811.07871.

* J. Wu, L. Ouyang, D. M. Ziegler, N. Stiennon, R. Lowe, J. Leike, P. Christiano, Recursively Summarizing Books with Human Feedback, arXiv [cs.CL] (2021); http://arxiv.org/abs/2109.10862.

* W. Saunders, C. Yeh, J. Wu, S. Bills, L. Ouyang, J. Ward, J. Leike, Self-Critiquing Models for Assisting Human Evaluators, arXiv [cs.CL] (2022); http://arxiv.org/abs/2206.05802.

* A. Khan, J. Hughes, D. Valentine, L. Ruis, K. Sachan, A. Radhakrishnan, E. Grefenstette, S. R. Bowman, T. Rocktäschel, E. Perez, Debating with More Persuasive LLMs Leads to More Truthful Answers, arXiv [cs.AI] (2024); http://arxiv.org/abs/2402.06782.

L. L. D. Langosco, J. Koch, L. D. Sharkey, J. Pfau, D. Krueger, “Goal Misgeneralization in Deep Reinforcement Learning” in Proceedings of the 39th International Conference on Machine Learning (PMLR, 2022) vol. 162, pp. 12004–12019; https://proceedings.mlr.press/v162/langosco22a.html.

* R. Shah, V. Varma, R. Kumar, M. Phuong, V. Krakovna, J. Uesato, Z. Kenton, Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals, arXiv [cs.LG] (2022); http://arxiv.org/abs/2210.01790.

H. N. E. Barj, T. Sautory, Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization, arXiv [cs.LG] (2024); http://arxiv.org/abs/2401.07181.

D. Hendrycks, X. Liu, E. Wallace, A. Dziedzic, R. Krishnan, D. Song, “Pretrained Transformers Improve Out-of-Distribution Robustness” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), D. Jurafsky, J. Chai, N. Schluter, J. Tetreault, Eds. (Association for Computational Linguistics, Online, 2020), pp. 2744–2751; https://doi.org/10.18653/v1/2020.acl-main.244.

L. Berglund, A. C. Stickland, M. Balesni, M. Kaufmann, M. Tong, T. Korbak, D. Kokotajlo, O. Evans, Taken out of Context: On Measuring Situational Awareness in LLMs, arXiv [cs.CL] (2023); http://arxiv.org/abs/2309.00667.

R. Laine, B. Chughtai, J. Betley, K. Hariharan, M. Balesni, J. Scheurer, M. Hobbhahn, A. Meinke, O. Evans, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” in 38th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024); https://openreview.net/forum?id=UnWhcpIyUC.

References

C. Schwab, L. Huber, Obey or Not Obey? Dogs (Canis Familiaris) Behave Differently in Response to Attentional States of Their Owners. Journal of Comparative Psychology (Washington, D.C.: 1983) 120, 169–175 (2006); https://doi.org/10.1037/0735-7036.120.3.169.

* V. Krakovna, J. Kramar, Power-Seeking Can Be Probable and Predictive for Trained Agents, arXiv [cs.AI] (2023); http://arxiv.org/abs/2304.06528.

A. Turner, L. Smith, R. Shah, A. Critch, P. Tadepalli, “Optimal Policies Tend To Seek Power” in 35th Conference on Neural Information Processing Systems (NeurIPS 2021) (Curran Associates, Inc., Virtual, 2021) vol. 34; https://proceedings.neurips.cc/paper/2021/hash/c26820b8a4c1b3c2aa868d6d57e14a79-Abstract.html.

A. Turner, P. Tadepalli, “Parametrically Retargetable Decision-Makers Tend to Seek Power” in Advances in Neural Information Processing Systems (NeurIPS 2022) Main Conference Track (New Orleans, LA, US, 2022) vol. abs/2206.13477; https://doi.org/10.48550/arXiv.2206.13477.

M. K. Cohen, M. Hutter, M. A. Osborne, Advanced Artificial Agents Intervene in the Provision of Reward. AI Magazine 43, 282–293 (2022); https://doi.org/10.1002/aaai.12064.

S. Zhuang, D. Hadfield-Menell, “Consequences of Misaligned AI” in Advances in Neural Information Processing Systems (NeurIPS 2020) (Curran Associates, Inc., 2020) vol. 33, pp. 15763–15773; https://proceedings.neurips.cc/paper/2020/hash/b607ba543ad05417b8507ee86c54fcb7-Abstract.html.

E. Hubinger, C. van Merwijk, V. Mikulik, J. Skalse, S. Garrabrant, Risks from Learned Optimization in Advanced Machine Learning Systems, arXiv [cs.AI] (2019); http://arxiv.org/abs/1906.01820.

J. Carlsmith, Scheming AIs: Will AIs Fake Alignment during Training in Order to Get Power?, arXiv [cs.CY] (2023); http://arxiv.org/abs/2311.08379.

* R. Grosse, J. Bae, C. Anil, N. Elhage, A. Tamkin, A. Tajdini, B. Steiner, D. Li, E. Durmus, E. Perez, E. Hubinger, K. Lukošiūtė, K. Nguyen, N. Joseph, S. McCandlish, J. Kaplan, S. R. Bowman, Studying Large Language Model Generalization with Influence Functions, arXiv [cs.LG] (2023); http://arxiv.org/abs/2308.03296.

S. Im, Y. Li, On the Generalization of Preference Learning with DPO, arXiv [cs.LG] (2024); http://arxiv.org/abs/2408.03459.

A. Pan, J. S. Chan, A. Zou, N. Li, S. Basart, T. Woodside, H. Zhang, S. Emmons, D. Hendrycks, “Do the Rewards Justify the Means? Measuring Trade-Offs between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark” in Proceedings of the 40th International Conference on Machine Learning (ICML’23) (JMLR, Honolulu, Hawaii, USA, 2023) vol. 202, pp. 26837–26867.

L. Dung, The Argument for near-Term Human Disempowerment through AI. AI & Society, 1–14 (2024); https://doi.org/10.1007/s00146-024-01930-2.

P. J. Denning, The Science of Computing: The Internet Worm. American Scientist 77, 126–128 (1989); http://www.jstor.org/stable/27855650.

D. Hendrycks, Natural Selection Favors AIs over Humans, arXiv [cs.CY] (2023); http://arxiv.org/abs/2303.16200.

UK AI Safety Institute, Advancing the Field of Systemic AI Safety: Grants Open (2024); https://www.aisi.gov.uk/work/advancing-the-field-of-systemic-ai-safety-grants-open.

* T. Eloundou, S. Manning, P. Mishkin, D. Rock, GPTs Are GPTs: Labor Market Impact Potential of LLMs. Science 384, 1306–1308 (2024); https://doi.org/10.1126/science.adj0998.

B. Lou, H. Sun, T. Sun, GPTs and Labor Markets in the Developing Economy: Evidence from China, SSRN [preprint] (2023); https://doi.org/10.2139/ssrn.4426461.

P. Gmyrek, J. Berg, D. Bescond, Generative AI and Jobs: A Global Analysis of Potential Effects on Job Quantity and Quality (International Labour Organization, Geneva, 2023); https://doi.org/10.54394/fhem8239.

M. Cazzaniga, F. Jaumotte, L. Li, G. Melina, A. J. Panton, C. Pizzinelli, E. J. Rockall, M. M. Tavares, “Gen-AI: Artificial Intelligence and the Future of Work” (SDN/2024/001, International Monetary Fund, 2024); https://www.imf.org/en/Publications/Staff-Discussion-Notes/Issues/2024/01/14/Gen-AI-Artificial-Intelligence-and-the-Future-of-Work-542379.

D. Acemoglu, P. Restrepo, Automation and New Tasks: How Technology Displaces and Reinstates Labor. The Journal of Economic Perspectives: A Journal of the American Economic Association 33, 3–30 (2019); https://doi.org/10.1257/jep.33.2.3.

D. Acemoglu, D. Autor, “Skills, Tasks and Technologies: Implications for Employment and Earnings*” in Handbook of Labor Economics, D. Card, O. Ashenfelter, Eds. (Elsevier, 2011) vol. 4, pp. 1043–1171; https://doi.org/10.1016/S0169-7218(11)02410-5.

P. Restrepo, “Automation: Theory, Evidence, and Outlook” (w31910, National Bureau of Economic Research, 2023); https://doi.org/10.3386/w31910.

References

D. Autor, C. Chin, A. Salomons, B. Seegmiller, “New Frontiers: The Origins and Content of New Work, 1940–2018” (30389, National Bureau of Economic Research, 2022); https://doi.org/10.3386/w30389.

X. Hui, O. Reshef, L. Zhou, “The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence from an Online Labor Market” (10601, CESifo Working Paper, 2023); https://www.econstor.eu/handle/10419/279352.

A. Korinek, D. Suh, “Scenarios for the Transition to AGI” (32255, National Bureau of Economic Research, 2024); https://doi.org/10.3386/w32255.

A. Korinek, Scenario Planning for an A(G)I Future. Finance and Development Magazine (2023); https://www.imf.org/en/Publications/fandd/issues/2023/12/Scenario-Planning-for-an-AGI-future-Anton-korinek.

D. Acemoglu, “The Simple Macroeconomics of AI” (w32487, National Bureau of Economic Research, 2024); https://doi.org/10.3386/w32487.

B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. R. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi, P. Kohli, A. Fawzi, Mathematical Discoveries from Program Search with Large Language Models. Nature 625, 468–475 (2024); https://doi.org/10.1038/s41586-023-06924-6.

Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago, T. Hubert, P. Choy, C. de Masson d’Autume, I. Babuschkin, X. Chen, P.-S. Huang, J. Welbl, … O. Vinyals, Competition-Level Code Generation with AlphaCode. Science (New York, N.Y.) 378, 1092–1097 (2022); https://doi.org/10.1126/science.abq1158.

S. Noy, W. Zhang, Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. Science (New York, N.Y.) 381, 187–192 (2023); https://doi.org/10.1126/science.adh2586.

D. Susskind, A World without Work: Technology, Automation, and How We Should Respond (Metropolitan Books, 2020); https://www.danielsusskind.com/a-world-without-work.

A. Korinek, M. Juelfs, “Preparing for the (non-Existent?) Future of Work” (w30172, National Bureau of Economic Research, 2022); https://doi.org/10.3386/w30172.

A. Korinek, “Economic Policy Challenges for the Age of AI” (w32980, National Bureau of Economic Research, 2024); https://doi.org/10.3386/w32980.

*A. McAfee, “Generally Faster: The Economic Impact of Generative AI” (Google, 2024); https://policycommons.net/artifacts/12281693/generally_faster_-_the_economic_impact_of_generative_ai/.

A. Agrawal, J. Gans, A. Goldfarb, “AI Adoption and System-Wide Change” (w28811, National Bureau of Economic Research, 2021); https://doi.org/10.3386/w28811.

J. Feigenbaum, D. P. Gross, Organizational and Economic Obstacles to Automation: A Cautionary Tale from AT&T in the Twentieth Century. Management Science (2024); https://doi.org/10.1287/mnsc.2022.01760.

M. Svanberg, W. Li, M. Fleming, B. Goehring, N. Thompson, Beyond AI Exposure: Which Tasks Are Cost-Effective to Automate with Computer Vision?, SSRN [preprint] (2024); https://doi.org/10.2139/ssrn.4700751.

V. Magesh, F. Surani, M. Dahl, M. Suzgun, C. D. Manning, D. E. Ho, Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.20362.

E. Erdil, T. Besiroglu, Explosive Growth from AI Automation: A Review of the Arguments, arXiv [econ.GN] (2023); https://epoch.ai/blog/explosive-growth-from-ai-a-review-of-the-arguments.

A. Bick, A. Blandin, D. Deming, “The Rapid Adoption of Generative AI” (w32966, National Bureau of Economic Research, 2024); https://doi.org/10.3386/w32966.

E. Brynjolfsson, D. Li, L. Raymond, “Generative AI at Work” (w31161, National Bureau of Economic Research, 2023); https://doi.org/10.3386/w31161.

D. Acemoglu, P. Restrepo, The Race between Man and Machine: Implications of Technology for Growth, Factor Shares, and Employment. American Economic Review 108, 1488–1542 (2018); https://doi.org/10.1257/aer.20160696.

A. K. Agrawal, J. S. Gans, A. Goldfarb, “The Turing Transformation: Artificial Intelligence, Intelligence Augmentation, and Skill Premiums” (31767, National Bureau of Economic Research, 2023); https://doi.org/10.3386/w31767.

E. Felten, M. Raj, R. Seamans, How Will Language Modelers like ChatGPT Affect Occupations and Industries?, arXiv [econ.GN] (2023); http://arxiv.org/abs/2303.01157.

E. W. Felten, M. Raj, R. Seamans, Occupational Heterogeneity in Exposure to Generative AI, SSRN [preprint] (2023); https://doi.org/10.2139/ssrn.4414065.

F. Dell’Acqua, E. McFowland III, E. R. Mollick, H. Lifshitz-Assaf, K. Kellogg, S. Rajendran, L. Krayer, F. Candelon, K. R.

References

Lakhani, “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality” (24–013, Harvard Business School, 2023); https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf.

J. H. Choi, A. Monahan, D. B. Schwarcz, Lawyering in the Age of Artificial Intelligence, SSRN [preprint] (2023); https://doi.org/10.2139/ssrn.4626276.

K. Bonney, C. Breaux, C. Buffington, E. Dinlersoz, L. Foster, N. Goldschlag, J. Haltiwanger, Z. Kroff, K. Savage, “Tracking Firm Use of AI in Real Time: A Snapshot from the Business Trends and Outlook Survey” (w32319, National Bureau of Economic Research, 2024); https://doi.org/10.3386/w32319.

A. Korinek, The Rise of Articially Intelligent Agents (2019); https://drive.google.com/file/d/16y5UmeTOv5YB9E5ms_ce7WiYNfMAn17J/view.

A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, D. Krasheninnikov, L. Langosco, Z. He, Y. Duan, M. Carroll, M. Lin, A. Mayhew, K. Collins, M. Molamohammadi, J. Burden, W. Zhao, S. Rismani, … T. Maharaj, “Harms from Increasingly Agentic Algorithmic Systems” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 651–666; https://doi.org/10.1145/3593013.3594033.

METR, Details about METR’s Preliminary Evaluation of GPT-4o, METR’s Autonomy Evaluation Resources (2024); https://metr.github.io/autonomy-evals-guide/gpt-4o-report/.

Y. Shavit, S. Agarwal, M. Brundage, S. A. C. O’Keefe, R. Campbell, T. Lee, P. Mishkin, T. Eloundou, A. Hickey, K. Slama, L. Ahmad, P. McMillan, A. Beutel, A. Passos, D. G. Robinson, Practices for Governing Aagentic AI Systems. Research Paper, OpenAI (2023); https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf.

D. Hyslop, W. Townsend, “The Longer Term Impacts of Job Displacement on Labour Market Outcomes” (Motu Economic and Public Policy Research, 2017); https://www.motu.nz/our-research/population-and-labour/individual-and-group-outcomes/the-longer-term-impacts-of-job-displacement-on-labour-market-outcomes/.

S. C. Dixon, D. C. Maré, “The Costs of Involuntary Job Loss: Impacts on Workers’ Employment and Earnings” (Motu Economic and Public Policy Research, 2013); https://doi.org/10.2139/ssrn.2247198.

D. Hamermesh, “What Do We Know about Worker Displacement in the U.s.?” (National Bureau of Economic Research, 1987); https://doi.org/10.3386/w2402.

L. S. Jacobson, R. J. LaLonde, D. G. Sullivan, Earnings Losses of Displaced Workers. The American Economic Review 83, 685–709 (1993); http://www.jstor.org/stable/2117574.

T. Von Wachter, J. Song, J. Manchester, Long-Term Earnings Losses due to Mass Layoffs during the 1982 Recession: An Analysis Using US Administrative Data from 1974 to 2004 (2009); http://www.econ.ucla.edu/tvwachter/papers/mass_layoffs_1982.pdf.

J. Barnette, A. Michaud, Wage Scars and Human Capital Theory (2017); https://ammichau.github.io/papers/JBAMWageScar.pdf.

D. Sullivan, T. von Wachter, Job Displacement and Mortality: An Analysis Using Administrative Data*. The Quarterly Journal of Economics 124, 1265–1306 (2009); https://doi.org/10.1162/qjec.2009.124.3.1265.

S. A. Burgard, J. E. Brand, J. S. House, Toward a Better Estimation of the Effect of Job Loss on Health. Journal of Health and Social Behavior 48, 369–384 (2007); http://www.jstor.org/stable/27638722.

M. Browning, E. Heinesen, Effect of Job Loss due to Plant Closure on Mortality and Hospitalization. Journal of Health Economics 31, 599–616 (2012); https://doi.org/10.1016/j.jhealeco.2012.03.001.

K. Telle, M. Votruba, Parental Job Loss and Children’s School Performance. Review of Economic Studies 78, 1462–1489 (10 2011); https://doi.org/10.2307/41407068.

J. Duggan, U. Sherman, R. Carbery, A. McDonnell, Algorithmic Management and App‐work in the Gig Economy: A Research Agenda for Employment Relations and HRM. Human Resource Management Journal 30, 114–132 (2020); https://doi.org/10.1111/1748-8583.12258.

B. Bai, H. Dai, D. J. Zhang, F. Zhang, H. Hu, The Impacts of Algorithmic Work Assignment on Fairness Perceptions and Productivity: Evidence from Field Experiments. Manufacturing & Service Operations Management: M & SOM 24, 3060–3078 (2022); https://doi.org/10.1287/msom.2022.1120.

J. Howard, P. Schulte, Managing Workplace AI Risks and the Future of Work. American Journal of Industrial Medicine 67, 980–993 (2024); https://doi.org/10.1002/ajim.23653.

A. Bernhardt, L. Kresge, R. Suleiman, The Data-Driven Workplace and the Case for Worker Technology Rights. Industrial & Labor Relations Review 76, 3–29 (2023); https://doi.org/10.1177/00197939221131558.

References

D. Acemoglu, P. Restrepo, Tasks, Automation, and the Rise in U.S. Wage Inequality. Econometrica: Journal of the Econometric Society 90, 1973–2016 (2022); https://doi.org/10.3982/ECTA19815.

D. Acemoglu, Technical Change, Inequality, and the Labor Market. Journal of Economic Literature 40, 7–72 (2002); https://doi.org/10.1257/0022051026976.

D. H. Autor, Why Are There Still So Many Jobs? The History and Future of Workplace Automation. The Journal of Economic Perspectives: A Journal of the American Economic Association 29, 3–30 (2015); https://doi.org/10.1257/jep.29.3.3.

Ó. Afonso, R. Forte, Routine and Non-Routine Sectors, Tasks Automation and Wage Polarization. Applied Economics (2023); https://www.tandfonline.com/doi/abs/10.1080/00036846.2023.2280461.

D. Acemoglu, J. Loebbing, “Automation and Polarization” (National Bureau of Economic Research, 2022); https://doi.org/10.3386/w30528.

D. Autor, “Applying AI to Rebuild Middle Class Jobs” (National Bureau of Economic Research, 2024); https://doi.org/10.3386/w32140.

L. Karabarbounis, Perspectives on the Labor Share. The Journal of Economic Perspectives: A Journal of the American Economic Association 38, 107–136 (2024); https://doi.org/10.1257/jep.38.2.107.

M. Ranaldi, Income Composition Inequality. The Review of Income and Wealth 68, 139–160 (2022); https://doi.org/10.1111/roiw.12503.

T. Piketty, Capital in the Twenty-First Century (The Belknap Press of Harvard University Press, Cambridge Massachusetts, 2014); https://www.hup.harvard.edu/books/9780674430006.

B. Moll, L. Rachel, P. Restrepo, Uneven Growth: Automation’s Impact on Income and Wealth Inequality. Econometrica: Journal of the Econometric Society 90, 2645–2683 (2022); https://doi.org/10.3982/ECTA19417.

C. Wang, M. Zheng, X. Bai, Y. Li, W. Shen, Future of Jobs in China under the Impact of Artificial Intelligence. Finance Research Letters 55, 103798 (2023); https://doi.org/10.1016/j.frl.2023.103798.

H. Firooz, Z. Liu, Y. Wang, “Automation and the Rise of Superstar Firms” (Federal Reserve Bank of San Francisco, 2022); https://doi.org/10.24148/wp2022-05.

C. T. Okolo, AI in the Global South: Opportunities and Challenges towards More Inclusive Governance, Brookings (2023); https://www.brookings.edu/articles/ai-in-the-global-south-opportunities-and-challenges-towards-more-inclusive-governance/.

A. Korinek, J. E. Stiglitz, “Artificial Intelligence, Globalization, and Strategies for Economic Development” (National Bureau of Economic Research, 2021); https://doi.org/10.3386/w28453.

C. Alonso, A. Berg, S. Kothari, C. Papageorgiou, S. Rehman, “Will the AI Revolution Cause a Great Divergence?” (International Monetary Fund, 2020); https://www.imf.org/en/Publications/WP/Issues/2020/09/11/Will-the-AI-Revolution-Cause-a-Great-Divergence-49734.

H. Nii-Aponsah, B. Verspagen, P. Mohnen, “Automation-Induced Reshoring and Potential Implications for Developing Economies” (UNU-MERIT, 2023); https://ideas.repec.org/p/unm/unumer/2023018.html.

J. Jacobs, “How Generative AI Is Changing the Global South’s IT Services Sector” (Information Technology and Innovation Foundation, 2024); https://itif.org/publications/2024/06/10/how-generative-ai-is-changing-the-global-souths-it-services-sector/.

N. Otis, R. Clarke, S. Delecourt, D. Holtz, R. Koning, “The Uneven Impact of Generative AI on Entrepreneurial Performance” (Harvard Business School, 2024); https://www.hbs.edu/ris/Publication%20Files/24-042_9ebd2f26-e292-404c-b858-3e883f0e11c0.pdf.

A. Merali, Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation, arXiv [econ.GN] (2024); http://arxiv.org/abs/2409.02391.

K. McElheran, J. F. Li, E. Brynjolfsson, Z. Kroff, E. Dinlersoz, L. Foster, N. Zolas, AI Adoption in America: Who, What, and Where. Journal of Economics & Management Strategy 33, 375–415 (2024); https://doi.org/10.1111/jems.12576.

K. Bonney, C. Breaux, C. Buffington, E. Dinlersoz, L. Foster, N. Goldschlag, J. Haltiwanger, Z. Kroff, K. Savage, The Impact of AI on the Workforce: Tasks versus Jobs? Economics Letters 244, 111971 (2024); https://doi.org/10.1016/j.econlet.2024.111971.

A. Kreacic, L. Uribe, J. Romeo, A. Lasater-Wille, R. Jesuthasan, S. Luong, “How Generative AI Is Transforming Business And Society: The Good, The Bad, And Everything in Between” (Oliver Wyman Forum, 2024); https://www.oliverwymanforum.com/global-consumer-sentiment/how-will-ai-affect-global-economics.html.

N. G. Otis, S. Delecourt, K. Cranney, R. Koning, “Global Evidence on Gender Gaps and Generative AI” (Harvard Business School, 2024); https://www.hbs.edu/faculty/Pages/item.aspx?num=66548.

References

* S. Jaffe, N. P. Shah, J. Butler, A. Farach, A. Cambon, B. Hecht, M. Schwarz, J. Teevan, “Generative AI in Real-World Workplaces” (Microsoft, 2024); https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/.

* E. Wiles, L. Krayer, M. Abbadi, U. Awasthi, R. Kennedy, P. Mishkin, D. Sack, F. Candelon, GenAI as an Exoskeleton: Experimental Evidence on Knowledge Workers Using GenAI on New Skills, Social Science Research Network (2024); https://doi.org/10.2139/ssrn.4944588.

A. Toner-Rodgers, Artificial Intelligence, Scientific Discovery, and Product Innovation (2024); https://aidantr.github.io/files/AI_innovation.pdf.

T. Besiroglu, N. Emery-Xu, N. Thompson, Economic Impacts of AI-Augmented R&D. Research Policy 53, 105037 (2024); https://doi.org/10.1016/j.respol.2024.105037.

S. McConnell, K. Fortson, D. Rotz, P. Schochet, P. Burkander, L. Rosenber, A. Mastri, R. D’Amico, “Providing Public Workforce Services to Job Seekers: 15-Month Impact Findings on the WIA Adult and Dislocated Worker Programs” (Mathematica Policy Reearch, 2016); https://mathematica.org/publications/providing-public-workforce-services-to-job-seekers-15-month-impact-findings-on-the-wia-adult.

J. Furman, “Policies for the Future of Work Should Be Based on Its Past and Present” (Economic Innovation Group, 2024); https://eig.org/wp-content/uploads/2024/07/TAWP-Furman.pdf.

A. Anthony, L. Sharma, E. Noor, “Advancing a More Global Agenda for Trustworthy Artificial Intelligence” (Carnegie Endowment for International Peace, 2024); https://carnegieendowment.org/research/2024/04/advancing-a-more-global-agenda-for-trustworthy-artificial-intelligence?lang=en.

S. Ghosh, A. Caliskan, “ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five Other Low-Resource Languages” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 901–912; https://doi.org/10.1145/3600211.3604672.

C. Okorie, V. Marivate, How African NLP Experts Are Navigating the Challenges of Copyright, Innovation, and Access, Carnegie Endowment for International Peace (2024); https://carnegieendowment.org/research/2024/04/how-african-nlp-experts-are-navigating-the-challenges-of-copyright-innovation-and-access?lang=en.

N. Maslej, L. Fattorini, E. Brynjolfsson, J. Etchemendy, K. Ligett, T. Lyons, J. Manyika, H. Ngo, J. C. Niebles, V. Parli, Y. Shoham, R. Wald, J. Clark, R. Perrault, “Artificial Intelligence Index Report 2023” (AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, 2023); https://arxiv.org/pdf/2310.03715.

N. Ahmed, M. Wahed, N. C. Thompson, The Growing Influence of Industry in AI Research. Science (New York, N.Y.) 379, 884–886 (2023); https://doi.org/10.1126/science.ade2420.

S. Teleanu, J. Kurbalija, “Stronger Digital Voices from Africa: Building African Digital Foreign Policy and Diplomacy” (Diplo, 2022); https://www.diplomacy.edu/resource/report-stronger-digital-voices-from-africa/.

T. Alsop, Estimated Shipments of Nvidia H100 Graphics Processing Units (GPUs) Worldwide in 2023, by Customer, Statista (2024); https://www.statista.com/statistics/1446564/nvidia-h100-gpu-shipments-by-customer/.

* Google Data Centers, Investing in Nebraska (2020); https://www.google.com/intl/es/about/datacenters/locations/papillion/.

Office of Governor Michael L. Parson, Governor Parson Announces Google’s Selection of Kansas City for New Data Center (2024); https://governor.mo.gov/press-releases/archive/governor-parson-announces-googles-selection-kansas-city-new-data-center.

* Meta, “Meta’s Prineville Data Center” (Meta, 2024); https://datacenters.atmeta.com/wp-content/uploads/2024/10/Oregon-Prineville.pdf.

* Microsoft, Microsoft and G42 Announce $1 Billion Comprehensive Digital Ecosystem Initiative for Kenya, Stories (2024); https://news.microsoft.com/2024/05/22/microsoft-and-g42-announce-1-billion-comprehensive-digital-ecosystem-initiative-for-kenya/.

R. Zwetsloot, B. Zhang, N. Dreksler, L. Kahn, M. Anderljung, A. Dafoe, M. C. Horowitz, “Skilled and Mobile: Survey Evidence of AI Researchers’ Immigration Preferences” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21) (Association for Computing Machinery, New York, NY, USA, 2021), pp. 1050–1059; https://doi.org/10.1145/3461702.3462617.

Top Universities, QS World University Rankings for Data Science and Artificial Intelligence 2024 (2024); https://www.topuniversities.com/university-subject-rankings/data-science-artificial-intelligence.

References

N. Maslej, L. Fattorini, R. Perrault, V. Parli, A. Reuel, E. Brynjolfsson, J. Etchemendy, K. Ligett, T. Lyons, J. Manyika, J. C. Niebles, Y. Shoham, R. Wald, J. Clark, “The AI Index 2024 Annual Report” (Institute for Human-Centered AI, Stanford University, 2024); https://aiindex.stanford.edu/report/.

M. L. Gray, S. Suri, Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass (Houghton Mifflin Harcourt, 2019); https://ghostwork.info/.

A. Arora, M. Barrett, E. Lee, E. Oborn, K. Prince, Risk and the Future of AI: Algorithmic Bias, Data Colonialism, and Marginalization. Information and Organization 33 (2023); https://doi.org/10.1016/j.infoandorg.2023.100478.

C. T. Okolo, “Addressing Global Inequity in AI Development” in Handbook of Critical Studies of Artificial Intelligence (Edward Elgar Publishing, 2023), pp. 378–389; https://www.elgaronline.com/edcollchap/book/9781803928562/book-part-9781803928562-40.xml.

M. Miceli, T. Yang, A. Alvarado Garcia, J. Posada, S. M. Wang, M. Pohl, A. Hanna, Documenting Data Production Processes: A Participatory Approach for Data Work. Proceedings of the ACM on Human-Computer Interaction 6, 1–34 (2022); https://doi.org/10.1145/3555623.

D. Wang, S. Prabhat, N. Sambasivan, “Whose AI Dream? In Search of the Aspiration in Data Annotation” in CHI Conference on Human Factors in Computing Systems (CHI ’22) (ACM, New Orleans LA USA, 2022), pp. 1–16; https://doi.org/10.1145/3491102.3502121.

M. Steiger, T. J. Bharucha, S. Venkatagiri, M. J. Riedl, M. Lease, “The Psychological Well-Being of Content Moderators: The Emotional Labor of Commercial Moderation and Avenues for Improving Support” in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (ACM, New York, NY, USA, 2021); https://doi.org/10.1145/3411764.3445092.

M. M. AlEmadi, W. Zaghouani, “Emotional Toll and Coping Strategies: Navigating the Effects of Annotating Hate Speech Data” in Proceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024 (2024), pp. 66–72; https://aclanthology.org/2024.legal-1.10.pdf.

S. Luccioni, Y. Jernite, E. Strubell, “Power Hungry Processing: Watts Driving the Cost of AI Deployment?” in The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2024); https://doi.org/10.1145/3630106.3658542.

B. Thormundsson, “Change in Concentration of Talent Related to Artificial Intelligence (AI) Worldwide from 2016 to 2023, by Region” (Statista, 2024); https://www.statista.com/statistics/1472183/ai-talent-concentration-change-percentage-by-region/.

S. V. Bentley, C. K. Naughtin, M. J. McGrath, J. L. Irons, P. S. Cooper, The Digital Divide in Action: How Experiences of Digital Technology Shape Future Relationships with Artificial Intelligence. AI and Ethics 4, 901–915 (2024); https://doi.org/10.1007/s43681-024-00452-3.

Nigeria Federal Ministry of Communications, Innovation & Digital Economy, “Accelerating Our Collective Prosperity through Technical Efficiency: A Strategic Plan for the Federal Ministry of Communications, Innovation & Digital Economy” (2023); https://fmcide.gov.ng/wp-content/uploads/2023/11/blueprint.pdf.

US Government, Bring Your AI Skills to the U.S (2023); https://ai.gov/immigrate/.

UK Government, Supporting the next Generation of AI Leaders from around the World (2023); https://www.great.gov.uk/campaign-site/ai-futures/.

S. Pal, “Where Is Europe’s AI Workforce Coming from?: Immigration, Emigration & Transborder Movement of AI Talent” (interface, 2024); https://www.stiftung-nv.de/publications/where-is-europes-ai-workforce-coming-from.

M. Mazumder, C. Banbury, X. Yao, B. Karlaş, W. G. Rojas, S. Diamos, G. Diamos, L. He, A. Parrish, H. R. Kirk, J. Quaye, C. Rastogi, D. Kiela, D. Jurado, D. Kanter, R. Mosquera, J. Ciro, … V. J. Reddi, “DataPerf: Benchmarks for Data-Centric AI Development” in 37th International Conference on Neural Information Processing Systems (NeurIPS 2023) (Curran Associates Inc., Red Hook, NY, USA, 2024), pp. 5320–5347; https://doi.org/10.5555/3666122.3666357.

N. Guha, J. Nyarko, D. E. Ho, C. Ré, “Building GenAI Benchmarks: A Case Study in Legal Applications” in The Oxford Handbook on the Foundations and Regulation of Generative AI, P. Hacker, A. Engel, S. Hammer, B. Mittelstadt, Eds. (Oxford University Press, Oxford, England); https://neelguha.github.io/assets/pdf/building_genai_benchmarks_for_law_oxford_chapter.pdf.

E. Brynjolfsson, A. Ng, “Big AI Can Centralize Decision-Making and Power, and That’s a Problem” in Missing Links

References

in AI Governance, B. Prud’homme, C. Régis, G. Farnadi, Eds. (UNESCO/MILA, 2023), pp. 65–87; https://www.unesco.org/en/articles/missing-links-ai-governance.

A. Korinek, J. Vipra, “Concentrating Intelligence: Scaling and Market Structure in Artificial Intelligence” (w33139, National Bureau of Economic Research, 2024); https://doi.org/10.3386/w33139.

Competition and Markets Authority, “AI Foundation Models: Initial Report” (CMA, 2023); https://www.gov.uk/government/publications/ai-foundation-models-initial-report.

A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, … N. Fiedel, PaLM: Scaling Language Modeling with Pathways. Journal of Machine Learning Research: JMLR 24, 240:11324–240:11436 (2024).

X. Jin, D. Zhang, H. Zhu, W. Xiao, S.-W. Li, X. Wei, A. Arnold, X. Ren, “Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora” in Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models (Association for Computational Linguistics, Stroudsburg, PA, USA, 2022), pp. 1–16; https://doi.org/10.18653/v1/2022.bigscience-1.1.

K. Gupta, B. Thérien, A. Ibrahim, M. L. Richter, Q. G. Anthony, E. Belilovsky, I. Rish, T. Lesort, “Continual Pre-Training of Large Language Models: How to Re-Warm Your Model?” in Workshop on Efficient Systems for Foundation Models @ ICML2023 (2023); https://openreview.net/pdf?id=pg7PUJe0Tl.

D. Luitse, Platform Power in AI: The Evolution of Cloud Infrastructures in the Political Economy of Artificial Intelligence. Internet Policy Review 13, 1–44 (2024); https://doi.org/10.14763/2024.2.1768.

C. Rikap, Varieties of Corporate Innovation Systems and Their Interplay with Global and National Systems: Amazon, Facebook, Google and Microsoft’s Strategies to Produce and Appropriate Artificial Intelligence. Review of International Political Economy, 1–29 (2024); https://doi.org/10.1080/09692290.2024.2365757.

F. Richter, Amazon Maintains Cloud Lead as Microsoft Edges Closer, Statista (2024); https://www.statista.com/chart/18819/worldwide-market-share-of-leading-cloud-infrastructure-service-providers.

P. Maham, S. Küspert, “Governing General Purpose AI: A Comprehensive Map of Unreliability, Misuse and Systemic Risks” (Stiftung Neue Verantwortung, 2023); https://www.interface-eu.org/publications/governing-general-purpose-ai-comprehensive-map-unreliability-misuse-and-systemic-risks.

G. Yu, G. Tan, H. Huang, Z. Zhang, P. Chen, R. Natella, Z. Zheng, A Survey on Failure Analysis and Fault Injection in AI Systems, arXiv [cs.SE] (2024); http://arxiv.org/abs/2407.00125.

F. Jimmy, Emerging Threats: The Latest Cybersecurity Risks and the Role of Artificial Intelligence in Enhancing Cybersecurity Defenses. International Journal of Scientific Research and Management 9, 564–574 (2021); https://doi.org/10.18535/ijsrm/v9i2.ec01.

US Department of the Treasury, Managing Artificial Intelligence-Specific Cybersecurity Risks in the Financial Services Sector. (2024); https://home.treasury.gov/system/files/136/Managing-Artificial-Intelligence-Specific-Cybersecurity-Risks-In-The-Financial-Services-Sector.pdf.

S. Trivedi, V. Aggarwal, R. Rastogi, “Enhancing the Power of Cyber-Physical Systems Enabled with AI” in Artificial Intelligence Solutions for Cyber-Physical Systems (Auerbach Publications, Boca Raton, ed. 1, 2024), pp. 1–39; https://doi.org/10.1201/9781032694375-1.

I. D. Raji, S. Costanza-Chock, J. Buolamwini, “Change from the Outside: Towards Credible Third-Party Audits of AI Systems” in Missing Links in AI Governance, B. Prud’homme, C. Régis, G. Farnadi, Eds. (UNESCO/MILA, 2023), pp. 4–26; https://www.unesco.org/en/articles/missing-links-ai-governance.

M. Stein, M. Gandhi, T. Kriecherbauer, A. Oueslati, R. Trager, “Public vs Private Bodies: Who Should Run Advanced AI Evaluations and Audits? A Three-Step Logic Based on Case Studies of High-Risk Industries” (Oxford Martin AI Governance Initiative, 2024); https://www.oxfordmartin.ox.ac.uk/publications/public-vs-private-bodies-who-should-run-advanced-ai-evaluations-and-audits-a-three-step-logic-based-on-case-studies-of-high-risk-industries.

A. J. Grotto, J. Dempsey, “Vulnerability Disclosure and Management for AI/ML Systems: A Working Paper with Policy Recommendations” (Stanford Geopolitics, Technology, and Governance Cyber Policy Center, 2021); https://doi.org/10.2139/ssrn.3964084.

Y. Hong, J. Lian, L. Xu, J. Min, Y. Wang, L. J. Freeman, X. Deng, Statistical Perspectives on Reliability of Artificial Intelligence Systems. Quality Engineering 35, 56–78 (2023); https://doi.org/10.1080/08982112.2022.2089854.

T. Aguirre, On Labs and Fabs: Mapping How Alliances, Acquisitions, and Antitrust Are Shaping the Frontier AI Industry, arXiv [econ.GN] (2024); http://arxiv.org/abs/2406.01722.

References

B. Martens, “Why Artificial Intelligence Is Creating Fundamental Challenges for Competition Policy” (16/2024, Bruegel Policy Brief, 2024); https://hdl.handle.net/10419/302296.

US Environmental Protection Agency, “Greenhouse Gas Equivalencies Calculator - Calculations and References” (EPA, 2024); https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator-calculations-and-references.

* Gemma Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahriari, A. Ramé, J. Ferret, P. Liu, P. Tafti, A. Friesen, M. Casbon, S. Ramos, R. Kumar, … A. Andreev, Gemma 2: Improving Open Language Models at a Practical Size, arXiv [cs.CL] (2024); http://arxiv.org/abs/2408.00118.

D. Donnellan, A. Lawrence, D. Bizo, P. Judge, J. O’Brien, J. Davis, M. Smolaks, J. Williams-George, R. Weinschenk, “Uptime Institute Global Data Center Survey 2024” (Uptime Institute, 2024); https://uptimeinstitute.com/resources/research-and-reports/uptime-institute-global-data-center-survey-results-2024.

V. Rozite, E. Bertoli, B. Reidenbach, “Data Centres and Data Transmission Networks” (International Energy Agency, 2023); https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks.

L. Burdette, P. Brodsky, P. Christian, J. Hjembo, A. Mauldin, T. Stronge, M. Tan, J. Velandia, “The State of the Network 2023 Edition” (TeleGeography, 2023); https://www2.telegeography.com/hubfs/LP-Assets/Ebooks/state-of-the-network-2023.pdf.

R. Schwartz, J. Dodge, N. A. Smith, O. Etzioni, Green AI. Communications of the ACM 63, 54–63 (2020); https://doi.org/10.1145/3381831.

L. H. Kaack, P. L. Donti, E. Strubell, G. Kamiya, F. Creutzig, D. Rolnick, Aligning Artificial Intelligence with Climate Change Mitigation. Nature Climate Change 12, 518–527 (2022); https://doi.org/10.1038/s41558-022-01377-7.

E. Zelikman, Y. Wu, J. Mu, N. Goodman, “STaR: Bootstrapping Reasoning With Reasoning” in Advances in Neural Information Processing Systems (NeurIPS 2022) (New Orleans, LA, US, 2022) vol. 35, pp. 15476–15488; https://proceedings.neurips.cc/paper_files/paper/2022/file/639a9a172c044fbb64175b5fad42e9a5-Paper-Conference.pdf.

* T. Wu, J. Lan, W. Yuan, J. Jiao, J. Weston, S. Sukhbaatar, Thinking LLMs: General Instruction Following with Thought Generation, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.10630.

L. Long, R. Wang, R. Xiao, J. Zhao, X. Ding, G. Chen, H. Wang, “On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey” in Findings of the Association for Computational Linguistics ACL 2024 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024), pp. 11065–11082; https://doi.org/10.18653/v1/2024.findings-acl.658.

N. Alder, K. Ebert, R. Herbrich, P. Hacker, AI, Climate, and Transparency: Operationalizing and Improving the AI Act, arXiv [cs.CY] (2024); http://arxiv.org/abs/2409.07471.

* A. S. Luccioni, A. Hernandez-Garcia, Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning, arXiv [cs.LG] (2023); http://arxiv.org/abs/2302.08476.

* Google, “Environmental Report 2024” (2024); https://www.gstatic.com/gumdrop/sustainability/google-2024-environmental-report.pdf.

Baidu, “Baidu 2023 Environmental, Social and Governance Report” (2023); https://esg.baidu.com/Uploads/File/2024/05/17/Baidu%202023%20Environmental,%20Social%20and%20Governance%20Report.20240517150706.pdf.

EPRI, “Powering Intelligence: Analyzing Artificial Intelligence and Data Center Energy Consumption” (2024); https://www.epri.com/research/products/000000003002028905.

G. Guidi, F. Dominici, J. Gilmour, K. Butler, E. Bell, S. Delaney, F. J. Bargagli-Stoffi, Environmental Burden of United States Data Centers in the Artificial Intelligence Era, arXiv [cs.CY] (2024); http://arxiv.org/abs/2411.09786.

International Energy Agency, “World Energy Outlook 2024” (IEA, 2024); https://www.iea.org/reports/world-energy-outlook-2024.

Ireland Central Statistics Office, “Data Centres Metered Electricity Consumption 2023” (CSO, 2024); https://www.cso.ie/en/releasesandpublications/ep/p-dcmec/datacentresmeteredelectricityconsumption2023/.

PGIM Real Estate, “Global Data Centers Americas Excerpt” (2021); https://cdn.pficdn.com/cms/pgim-real-estate/sites/default/files/2021-01/Global%20Data%20Centers-U.S._February%202021_PGIM.pdf.

US Department of Energy Office of Policy, “Clean Energy Resources to Meet Data Center Electricity Demand” (DOE, 2024); https://www.energy.gov/policy/articles/clean-energy-resources-meet-data-center-electricity-demand.

References

* Constellation, Constellation to Launch Crane Clean Energy Center, Restoring Jobs and Carbon-Free Power to The Grid (2024); https://www.constellationenergy.com/newsroom/2024/Constellation-to-Launch-Crane-Clean-Energy-Center-Restoring-Jobs-and-Carbon-Free-Power-to-The-Grid.html.

Talen Energy Corporation, “Unlocking Value” (2024); https://ir.talenenergy.com/static-files/f02c44a9-d2dc-45c1-9331-eee1495f7d2d.

US Federal Energy Regulatory Commission, Order Rejecting Amendments to Interconnection Service Agreement. FERC (2024); https://elibrary.ferc.gov/eLibrary/filelist?accession_number=20241101-3061&optimized=false.

* M. Terrell, New Nuclear Clean Energy Agreement with Kairos Power, Google (2024); https://blog.google/outreach-initiatives/sustainability/google-kairos-power-nuclear-energy-agreement/.

L. M. Krall, A. M. Macfarlane, R. C. Ewing, Nuclear Waste from Small Modular Reactors. Proceedings of the National Academy of Sciences of the United States of America 119, e2111833119 (2022); https://doi.org/10.1073/pnas.2111833119.

J. Dodge, T. Prewitt, R. Tachet des Combes, E. Odmark, R. Schwartz, E. Strubell, A. S. Luccioni, N. A. Smith, N. DeCario, W. Buchanan, “Measuring the Carbon Intensity of AI in Cloud Instances” in 2022 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2022); https://doi.org/10.1145/3531146.3533234.

P. Hacker, Sustainable AI Regulation, arXiv [cs.CY] (2023); http://arxiv.org/abs/2306.00292.

* Meta, “2024 Sustainability Report” (2024); https://sustainability.atmeta.com/wp-content/uploads/2024/08/Meta-2024-Sustainability-Report.pdf.

* Amazon, “Amazon Sustainability Report” (2024); https://sustainability.aboutamazon.com/2023-amazon-sustainability-report.pdf.

A. N. Achanta, P. Erickson, E. Haites, M. Lazarus, N. Pandey, N. Pahuja, S. Seres, R. Spalding-Fecher, R. Tewari, “Assessing the Impact of the Clean Development Mechanism” (The High-Level Panel on the CDM Policy Dialogue, 2012); https://www.cdmpolicydialogue.org/research/1030_impact.pdf.

J. Rasley, S. Rajbhandari, O. Ruwase, Y. He, “DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (ACM, New York, NY, USA, 2020); https://doi.org/10.1145/3394486.3406703.

W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, I. Stoica, “Efficient Memory Management for Large Language Model Serving with PagedAttention” in Proceedings of the 29th Symposium on Operating Systems Principles (ACM, New York, NY, USA, 2023), pp. 611–626; https://doi.org/10.1145/3600006.3613165.

H. D. Saunders, The Khazzoom-Brookes Postulate and Neoclassical Growth. The Energy Journal 13, 131–148 (1992); http://www.jstor.org/stable/41322471.

G. Kamiya, V. C. Coroamă, “Data Centre Energy Use – A Critical Review” (IEA 4E TCP Electronic Devices and Networks Annex (EDNA)).

International Energy Agency, “Tracking Clean Energy Progress 2023” (IEA, 2023); https://www.iea.org/reports/tracking-clean-energy-progress-2023.

E. Halper, Amid Explosive Demand, America Is Running out of Power, Washington Post (2024); https://www.washingtonpost.com/business/2024/03/07/ai-data-centers-power/.

European Commission, Joint Research Centre, G. Kamiya, P. Bertoldi, Energy Consumption in Data Centres and Broadband Communication Networks in the EU (Publications Office of the European Union, 2024); https://doi.org/10.2760/706491.

J. Koomey, E. Masanet, Does Not Compute: Avoiding Pitfalls Assessing the Internet’s Energy and Carbon Impacts. Joule 5, 1625–1628 (2021); https://doi.org/10.1016/j.joule.2021.05.007.

E. Masanet, A. Shehabi, N. Lei, S. Smith, J. Koomey, Recalibrating Global Data Center Energy-Use Estimates. Science (New York, N.Y.) 367, 984–986 (2020); https://doi.org/10.1126/science.aba3758.

D. Rolnick, P. L. Donti, L. H. Kaack, K. Kochanski, A. Lacoste, K. Sankaran, A. S. Ross, N. Milojevic-Dupont, N. Jaques, A. Waldman-Brown, A. S. Luccioni, T. Maharaj, E. D. Sherwin, S. K. Mukkavilli, K. P. Kording, C. P. Gomes, A. Y. Ng, … Y. Bengio, Tackling Climate Change with Machine Learning. ACM Computing Surveys 55, 1–96 (2023); https://doi.org/10.1145/3485128.

U. Gupta, Y. G. Kim, S. Lee, J. Tse, H.-H. S. Lee, G.-Y. Wei, D. Brooks, C.-J. Wu, Chasing Carbon: The Elusive Environmental Footprint of Computing. IEEE Micro 42, 37–47 (2022); https://doi.org/10.1109/mm.2022.3163226.

* Intel, “2023-24 Corporate Responsibility Report” (2024);

References

https://csrreportbuilder.intel.com/pdfbuilder/pdfs/CSR-2023-24-Full-Report.pdf.

European Environment Agency, “Water Use in Europe — Quantity and Quality Face Big Challenges” (EEA, 2018); https://www.eea.europa.eu/signals-archived/signals-2018-content-list/articles/water-use-in-europe-2014.

Taiwan Semiconductor Manufacturing Company, “TSMC 2023 Sustainability Report” (TSMC, 2024); https://esg.tsmc.com/en-US/file/public/e-all_2023.pdf.

P. Li, J. Yang, M. A. Islam, S. Ren, Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models, arXiv [cs.LG] (2023); http://arxiv.org/abs/2304.03271.

United Nations, The Human Right to Water and Sanitation: Resolution A/RES/64/292 Adopted by the General Assembly on 28 July 2010 (2010); https://documents.un.org/doc/undoc/gen/n09/479/35/pdf/n0947935.pdf.

The European Parliament and the Council of the European Union, Directive (EU) 2023/1791 of the European Parliament and of the Council on Energy Efficiency and Amending Regulation (EU) 2023/955 (recast) (Text with EEA Relevance). (2023); https://eur-lex.europa.eu/eli/dir/2023/1791/oj.

Y. Jin, P. Behrens, A. Tukker, L. Scherer, Water Use of Electricity Technologies: A Global Meta-Analysis. Renewable and Sustainable Energy Reviews 115, 109391 (2019); https://doi.org/10.1016/j.rser.2019.109391.

H. Zhai, E. S. Rubin, E. J. Grol, A. C. O’Connell, Z. Wu, E. G. Lewis, Dry Cooling Retrofits at Existing Fossil Fuel-Fired Power Plants in a Water-Stressed Region: Tradeoffs in Water Savings, Cost, and Capacity Shortfalls. Applied Energy 306, 117997 (2022); https://doi.org/10.1016/j.apenergy.2021.117997.

V. G. Gude, Energy Consumption and Recovery in Reverse Osmosis. Desalination and Water Treatment 36, 239–260 (2011); https://doi.org/10.5004/dwt.2011.2534.

Australian Department of the Environment and Energy, “HVAC Factsheet: Co and Tri-Generation” (2013); https://www.energy.gov.au/sites/default/files/hvac-factsheet-co-tri-generation.pdf.

Office of Fossil Energy, “Hydrogen Strategy: Enabling A Low-Carbon Economy” (US Department of Energy, 2020); https://www.energy.gov/sites/prod/files/2020/07/f76/USDOE_FE_Hydrogen_Strategy_July2020.pdf.

H. Nissenbaum, Privacy in Context: Technology, Policy, and the Integrity of Social Life (Stanford University Press, Palo Alto, CA, 2009); http://www.sup.org/books/title/?id=8862.

L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, N. Papernot, “Machine Unlearning” in 2021 IEEE Symposium on Security and Privacy (SP) (IEEE, Virtual, 2021), pp. 141–159; https://doi.org/10.1109/SP40001.2021.00019.

Organisation for Economic Co-Operation and Development, “AI, Data Governance and Privacy” (OECD, 2024); https://doi.org/10.1787/2476b1a4-en.

European Data Protection Board, “Report of the Work Undertaken by the ChatGPT Taskforce” (EDPB, 2024); https://www.edpb.europa.eu/our-work-tools/our-documents/other/report-work-undertaken-chatgpt-taskforce_en.

D. J. Solove, Artificial Intelligence and Privacy. Florida Law Review (forthcoming Jan 2025); https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4713111.

UK Parliament, Data Protection Act 2018, Section 46: Right to Rectification. (2018); https://www.legislation.gov.uk/ukpga/2018/12/section/46.

GPA’s International Enforcement Cooperation Working Group, “Joint Statement on Data Scraping and the Protection of Privacy” (Information Commissioner’s Office, 2023); https://ico.org.uk/media/about-the-ico/documents/4026232/joint-statement-data-scraping-202308.pdf.

N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, C. Zhang, “Quantifying Memorization Across Neural Language Models” in 11th International Conference on Learning Representations (ICLR 2023) (Kigali, Rwanda, 2022); https://openreview.net/forum?id=TatRHT_1cK.

Y. Chen, E. Mendes, S. Das, W. Xu, A. Ritter, Can Language Models Be Instructed to Protect Personal Information?, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.02224.

R. Shokri, M. Stronati, C. Song, V. Shmatikov, “Membership Inference Attacks Against Machine Learning Models” in 2017 IEEE Symposium on Security and Privacy (SP) (IEEE, San Jose, CA, USA, 5/2017), pp. 3–18; https://doi.org/10.1109/SP.2017.41.

M. Fredrikson, S. Jha, T. Ristenpart, “Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS ’15) (Association for Computing Machinery, New York, NY, USA, 2015), pp. 1322–1333; https://doi.org/10.1145/2810103.2813677.

M. Duan, A. Suri, N. Mireshghallah, S. Min, W. Shi, L. Zettlemoyer, Y. Tsvetkov, Y. Choi, D. Evans, H. Hajishirzi, Do

References

Membership Inference Attacks Work on Large Language Models?, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.07841.

N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, Ú. Erlingsson, A. Oprea, C. Raffel, “Extracting Training Data from Large Language Models” in 30th USENIX Security Symposium (USENIX Security 21) (USENIX Association, 2021), pp. 2633–2650; https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.

N. Carlini, J. Hayes, M. Nasr, M. Jagielski, V. Sehwag, F. Tramèr, B. Balle, D. Ippolito, E. Wallace, “Extracting Training Data from Diffusion Models” in 32nd USENIX Security Symposium (USENIX Security 23) (USENIX Association, Anaheim, CA, 2023), pp. 5253–5270; https://www.usenix.org/conference/usenixsecurity23/presentation/carlini.

W. Shi, A. Ajith, M. Xia, Y. Huang, D. Liu, T. Blevins, D. Chen, L. Zettlemoyer, “Detecting Pretraining Data from Large Language Models” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=zWqr3MQuNs.

N. Lukas, A. Salem, R. Sim, S. Tople, L. Wutschitz, S. Zanella-Béguelin, “Analyzing Leakage of Personally Identifiable Information in Language Models” in 2023 IEEE Symposium on Security and Privacy (SP) (IEEE, 2023), pp. 346–363; https://doi.org/10.1109/SP46215.2023.10179300.

S. Longpre, R. Mahari, A. N. Lee, C. S. Lund, H. Oderinwale, W. Brannon, N. Saxena, N. Obeng-Marnu, T. South, C. J. Hunter, K. Klyman, C. Klamm, H. Schoelkopf, N. Singh, M. Cherep, A. M. Anis, A. Dinh, … A. Pentland, “Consent in Crisis: The Rapid Decline of the AI Data Commons” in 38th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024); https://openreview.net/pdf?id=66PcEzkf95.

* K. Saab, T. Tu, W.-H. Weng, R. Tanno, D. Stutz, E. Wulczyn, F. Zhang, T. Strother, C. Park, E. Vedadi, J. Z. Chaves, S.-Y. Hu, M. Schaekermann, A. Kamath, Y. Cheng, D. G. T. Barrett, C. Cheung, … V. Natarajan, “Capabilities of Gemini Models in Medicine” (Google Deepmind, 2024); http://arxiv.org/abs/2404.18416.

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-T. Yih, T. Rocktäschel, S. Riedel, D. Kiela, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks” in 34th Conference on Neural Information Processing Systems (NeurIPS 2020) (Curran Associates, Inc., Vancouver, Canada, 2020) vol. 33, pp. 9459–9474; https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html.

V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, W.-T. Yih, “Dense Passage Retrieval for Open-Domain Question Answering” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Stroudsburg, PA, USA, 2020), pp. 6769–6781; https://doi.org/10.18653/v1/2020.emnlp-main.550.

O. Ram, Y. Levine, I. Dalmedigos, D. Muhlgay, A. Shashua, K. Leyton-Brown, Y. Shoham, In-Context Retrieval-Augmented Language Models. Transactions of the Association for Computational Linguistics 11, 1316–1331 (2023); https://doi.org/10.1162/tacl_a_00605.

* T. Gunter, Z. Wang, C. Wang, R. Pang, A. Narayanan, A. Zhang, B. Zhang, C. Chen, C.-C. Chiu, D. Qiu, D. Gopinath, D. A. Yap, D. Yin, F. Nan, F. Weers, G. Yin, H. Huang, … Z. Ren, Apple Intelligence Foundation Language Models, arXiv [cs.AI] (2024); http://arxiv.org/abs/2407.21075.

S. Arora, P. Lewis, A. Fan, J. Kahn, C. Ré, Reasoning over Public and Private Data in Retrieval-Based Systems. Transactions of the Association for Computational Linguistics 11, 902–921 (2023); https://doi.org/10.1162/tacl_a_00580.

G. Zyskind, T. South, A. Pentland, “Don’t Forget Private Retrieval: Distributed Private Similarity Search for Large Language Models” in Proceedings of the Fifth Workshop on Privacy in Natural Language Processing (2024), pp. 7–19; https://aclanthology.org/2024.privatenlp-1.2.pdf.

UK National Cyber Security Centre, US Cybersecurity and Infrastructure Security Agency, National Security Agency, Federal Bureau of Investigation, Australian Signals Directorate’s Australian Cyber Security Centre, Canadian Centre for Cyber Security, New Zealand National Cyber Security Centre, Chile’s Government CSIRT, National Cyber and Information Security Agency of the Czech Republic, Information System Authority of Estonia, National Cyber Security Centre of Estonia, French Cybersecurity Agency, Germany’s Federal Office for Information Security, Israeli National Cyber Directorate, Italian National Cybersecurity Agency, Japan’s National center of Incident readiness and Strategy For Cybersecurity, Japan’s Secretariat of Science, Technology and Innovation Policy, Cabinet Office, … Cyber Security Agency of Singapore, “Guidelines for Secure AI System Development” (UK Government, 2023); https://www.ncsc.gov.uk/files/Guidelines-for-secure-AI-system-development.pdf.

M. Kosinski, D. Stillwell, T. Graepel, Private Traits and Attributes Are Predictable from Digital Records of Human Behavior. Proceedings of the National Academy of Sciences of the United States of America 110, 5802–5805

References

(2013); https://doi.org/10.1073/pnas.1218772110.

R. Staab, M. Vero, M. Balunovic, M. Vechev, “Beyond Memorization: Violating Privacy via Inference with Large Language Models” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=kmn0BhQk7p.

N. Mireshghallah, M. Antoniak, Y. More, Y. Choi, G. Farnadi, “Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild” in First Conference on Language Modeling (2024); https://openreview.net/pdf?id=tIpWtMYkzU.

* J. Lamb, G. Israelstam, R. Agarwal, S. Bhasker, “Generative AI in Healthcare: Adoption Trends and What’s next” (McKinsey & Company, 2024); https://www.mckinsey.com/industries/healthcare/our-insights/generative-ai-in-healthcare-adoption-trends-and-whats-next.

Federal Trade Commission, FTC Staff Report Finds Large Social Media and Video Streaming Companies Have Engaged in Vast Surveillance of Users with Lax Privacy Controls and Inadequate Safeguards for Kids and Teens (2024); https://www.ftc.gov/news-events/news/press-releases/2024/09/ftc-staff-report-finds-large-social-media-video-streaming-companies-have-engaged-vast-surveillance.

Federal Trade Commission, FTC Says Ring Employees Illegally Surveilled Customers, Failed to Stop Hackers from Taking Control of Users’ Cameras (2023); https://www.ftc.gov/news-events/news/press-releases/2023/05/ftc-says-ring-employees-illegally-surveilled-customers-failed-stop-hackers-taking-control-users.

* J. Ho, W. Chan, C. Saharia, J. Whang, R. Gao, A. Gritsenko, D. P. Kingma, B. Poole, M. Norouzi, D. J. Fleet, T. Salimans, Imagen Video: High Definition Video Generation with Diffusion Models, arXiv [cs.CV] (2022); http://arxiv.org/abs/2210.02303.

* Reka Team, A. Ormazabal, C. Zheng, C. de M. d’Autume, D. Yogatama, D. Fu, D. Ong, E. Chen, E. Lamprecht, H. Pham, I. Ong, K. Aleksiev, L. Li, M. Henderson, M. Bain, M. Artetxe, N. Relan, … Z. Xie, Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models, arXiv [cs.CL] (2024); http://arxiv.org/abs/2404.12387.

S. Adler, Z. Hitzig, S. Jain, C. Brewer, W. Chang, R. DiResta, E. Lazzarin, S. McGregor, W. Seltzer, D. Siddarth, N. Soliman, T. South, C. Spelliscy, M. Sporny, V. Srivastava, J. Bailey, B. Christian, … T. Zick, Personhood Credentials: Artificial Intelligence and the Value of Privacy-Preserving Tools to Distinguish Who Is Real Online, arXiv [cs.CY] (2024); http://arxiv.org/abs/2408.07892.

B. Auxier, L. Rainie, M. Anderson, A. Perrin, M. Kumar, E. Turner, “Americans and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information” (Pew Research Center, 2019); https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/.

* IBM, “Cost of a Data Breach 2024” (2024); https://www.ibm.com/reports/data-breach.

S. Min, S. Gururangan, E. Wallace, W. Shi, H. Hajishirzi, N. A. Smith, L. Zettlemoyer, “SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore” in NeurIPS 2023 Workshop on Distribution Shifts (DistShift) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=z03bW0doni.

US Copyright Office, “Copyright and Artificial Intelligence” (2024); https://www.copyright.gov/ai/.

P. Burger, The Berne Convention: Its History and Its Key Role in the Future. Journal of Law and Technology 3, 1–70 (1988); https://heinonline.org/HOL/P?h=hein.journals/jlawtecy3&i=9.

L. R. Patterson, C. Joyce, Copyright in 1791: An Essay Concerning the Founers’ View of the Copyright Power Granted to Congress in Article I, Section 8, Clause 8 of the US Constitution. Emory Law Journal (2003); https://heinonline.org/hol-cgi-bin/get_pdf.cgi?handle=hein.journals/emlj52&section=25.

The Office of the Law Revision Counsel of the United States House of Representatives, “Limitations on Exclusive Rights: Fair Use. Sec. 107” in United States Code, 2006 Edition, Supplement 4, Title 17 - Copyrights (US Government Publishing Office, ed. 2010, 2010); https://www.govinfo.gov/app/details/USCODE-2010-title17/USCODE-2010-title17-chap1-sec107.

European Parliament, Directorate-General for Internal Policies of the Union, E. Rosati, The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market – Technical Aspects (European Parliament, 2018).

Japanese Law Translation Database System, “著作権法（一部未施行）Copyright Act (Partially Unenforced)” (Ministry of Justice, Japan, 2024); https://www.japaneselawtranslation.go.jp/en/laws/view/4207.

Israeli Ministry of Justice, “Opinion: Uses of Copyrighted Materials for Machine Learning” (Israeli Government, 2022); https://www.gov.il/BlobFolder/legalinfo/machine-learning/he/18-12-2022.pdf.

Intellectual Property Office of Singapore, “Copyright: Factsheet on Copyright Act 2021” (IPOS, 2022); https://www.ipos.gov.sg/docs/default-source/resources-library/copyright/copyright-act-factsheet.pdf.

References

P. Henderson, X. Li, D. Jurafsky, T. Hashimoto, M. A. Lemley, P. Liang, Foundation Models and Fair Use, arXiv [cs.CY] (2023); http://arxiv.org/abs/2303.15715.

B. L. W. Sobel, Artificial Intelligence’s Fair Use Crisis. The Columbia Journal of Law & the Arts 41, 45–97 (2018); https://doi.org/10.7916/jla.v41i1.2036.

M. A. Lemley, B. Casey, Fair Learning. Texas Law Review 99, 743–786 (2020-2021); https://heinonline.org/HOL/P?h=hein.journals/tlr99&i=777.

P. Samuelson, Generative AI Meets Copyright. Science 381, 158–161 (2023); https://doi.org/10.1126/science.adi0656.

Tremblay v. OpenAI, Inc. (3:23-cv-03223) Document 1 (2023); https://storage.courtlistener.com/recap/gov.uscourts.cand.414822/gov.uscourts.cand.414822.1.0_1.pdf.

D. Zhang, B. Xia, Y. Liu, X. Xu, T. Hoang, Z. Xing, M. Staples, Q. Lu, L. Zhu, “Privacy and Copyright Protection in Generative AI: A Lifecycle Perspective” in 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN) (Lisbon, Portugal, 2024); http://arxiv.org/abs/2311.18252.

R. Mahari, S. Longpre, “Discit Ergo Est: Training Data Provenance And Fair Use” in Dynamics of Generative AI, T. Schrepel, V. Stocker, Eds. (Network Law Review, 2023); https://www.networklawreview.org/mahari-longpre-generative-ai/.

K. Lee, A. F. Cooper, J. Grimmelmann, “Talkin’ 'Bout AI Generation: Copyright and the Generative-AI Supply Chain (The Short Version)” in Proceedings of the Symposium on Computer Science and Law (CSLAW ’24) (Association for Computing Machinery, New York, NY, USA, 2024), pp. 48–63; https://doi.org/10.1145/3614407.3643696.

J. Grimmelmann, Copyright for Literate Robots. Iowa Law Review 101, 657–682 (2015-2016); https://heinonline.org/HOL/P?h=hein.journals/ilr101&i=681.

K. Lee, A. F. Cooper, J. Grimmelmann, D. Ippolito, AI and Law: The Next Generation (2023); https://doi.org/10.2139/ssrn.4580739.

L. Tiedrich, When AI Generates Work, Standard Contractual Terms Can Help Generate Value and Clarity, OECD.AI Policy Observatory (2024); https://oecd.ai/en/wonk/contractual-terms.

M. Sag, Copyright Safety for Generative AI. Houston Law Review / University of Houston 61, 295–347 (2023); https://houstonlawreview.org/article/92126-copyright-safety-for-generative-ai.

N. Vyas, S. M. Kakade, B. Barak, “On Provable Copyright Protection for Generative Models” in Proceedings of the 40th International Conference on Machine Learning (ICML 2023) (PMLR, Kigali, Rwanda, 2023); https://proceedings.mlr.press/v202/vyas23b.html.

L. Soldaini, R. Kinney, A. Bhagia, D. Schwenk, D. Atkinson, R. Authur, B. Bogin, K. Chandu, J. Dumas, Y. Elazar, V. Hofmann, A. H. Jha, S. Kumar, L. Lucy, X. Lyu, N. Lambert, I. Magnusson, … K. Lo, Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.00159.

E. M. Bender, B. Friedman, Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics 6, 587–604 (2018); https://doi.org/10.1162/tacl_a_00041.

R. Bommasani, K. Klyman, S. Longpre, S. Kapoor, N. Maslej, B. Xiong, D. Zhang, P. Liang, “The Foundation Model Transparency Index” (Center for Research on Foundation Models (CRFM) and Institute on Human-Centered Artificial Intelligence (HAI), 2023); http://arxiv.org/abs/2310.12941.

R. Mahari, L. Shayne, L. Donewald, A. Polozov, A. Pentland, A. Lipsitz, Comment to US Copyright Office on Data Provenance and Copyright. US Copyright Office (2023); https://dspace.mit.edu/handle/1721.1/154171?show=full?show=full.

B. Magagna, D. Goldfarb, P. Martin, M. Atkinson, S. Koulouzis, Z. Zhao, “Data Provenance” in Towards Interoperable Research Infrastructures for Environmental and Earth Sciences: A Reference Model Guided Approach for Common Challenges, Z. Zhao, M. Hellström, Eds. (Springer International Publishing, Cham, 2020), pp. 208–225; https://doi.org/10.1007/978-3-030-52829-4_12.

S. Longpre, R. Mahari, N. Obeng-Marnu, W. Brannon, T. South, J. Kabbara, S. Pentland, Data Authenticity, Consent, and Provenance for AI Are All Broken: What Will It Take to Fix Them? An MIT Exploration of Generative AI (2024); https://doi.org/10.21428/e4baedd9.a650f77d.

K. I. Gero, M. Desai, C. Schnitzler, N. Eom, J. Cushman, E. L. Glassman, Creative Writers’ Attitudes on Writing as Training Data for Large Language Models, arXiv [cs.HC] (2024); http://arxiv.org/abs/2409.14281.

R. Fletcher, “How Many News Websites Block AI Crawlers?” (Reuters Institute for the Study of Journalism, 2024); https://doi.org/10.60625/RISJ-XM9G-WS87.

European Commission, AI Act: Participate in the Drawing-up of the First General-Purpose AI Code of Practice,

References

Shaping Europe’s digital future (2024); https://digital-strategy.ec.europa.eu/en/news/ai-act-participate-drawing-first-general-purpose-ai-code-practice.

National Institute of Standards and Technology (NIST), AI Risk Management Framework (2021); https://www.nist.gov/itl/ai-risk-management-framework.

J. Lee, T. Le, J. Chen, D. Lee, “Do Language Models Plagiarize?” in Proceedings of the ACM Web Conference 2023 (ACM, New York, NY, USA, 2023); https://doi.org/10.1145/3543507.3583199.

A. F. Cooper, J. Grimmelmann, The Files Are in the Computer: On Copyright, Memorization, and Generative AI. Chicago-Kent Law Review (2024); https://blog.genlaw.org/pdfs/genlaw_icml2024/5.pdf.

C. Zhang, D. Ippolito, K. Lee, M. Jagielski, F. Tramèr, N. Carlini, “Counterfactual Memorization in Neural Language Models” in 37th International Conference on Neural Information Processing Systems (NeurIPS 2023) (Curran Associates Inc., Red Hook, NY, USA, 2023); https://dl.acm.org/doi/10.5555/3666122.3667830.

L. He, Y. Huang, W. Shi, T. Xie, H. Liu, Y. Wang, L. Zettlemoyer, C. Zhang, D. Chen, P. Henderson, Fantastic Copyrighted Beasts and How (not) to Generate Them, arXiv [cs.CV] (2024); http://arxiv.org/abs/2406.14526.

S. Liu, Y. Yao, J. Jia, S. Casper, N. Baracaldo, P. Hase, X. Xu, Y. Yao, H. Li, K. R. Varshney, M. Bansal, S. Koyejo, Y. Liu, Rethinking Machine Unlearning for Large Language Models, arXiv [cs.LG] (2024); http://arxiv.org/abs/2402.08787.

* R. Eldan, M. Russinovich, Who’s Harry Potter? Approximate Unlearning in LLMs, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.02238.

T. Chen, A. Asai, N. Mireshghallah, S. Min, J. Grimmelmann, Y. Choi, H. Hajishirzi, L. Zettlemoyer, P. W. Koh, CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation, arXiv [cs.CL] (2024); http://arxiv.org/abs/2407.07087.

T. T. Nguyen, T. T. Huynh, P. Le Nguyen, A. W.-C. Liew, H. Yin, Q. V. H. Nguyen, A Survey of Machine Unlearning, arXiv [cs.LG] (2022); http://arxiv.org/abs/2209.02299.

T. Baumhauer, P. Schöttle, M. Zeppelzauer, Machine Unlearning: Linear Filtration for Logit-Based Classifiers. Machine Learning 111, 3203–3226 (2022); https://doi.org/10.1007/s10994-022-06178-9.

Z. Liu, H. Ye, C. Chen, Y. Zheng, K.-Y. Lam, Threats, Attacks, and Defenses in Machine Unlearning: A Survey, arXiv [cs.CR] (2024); http://arxiv.org/abs/2403.13682.

J. Xu, Z. Wu, C. Wang, X. Jia, Machine Unlearning: Solutions and Challenges. IEEE Transactions on Emerging Topics in Computational Intelligence 8, 2150–2168 (2024); https://doi.org/10.1109/tetci.2024.3379240.

S. Nevo, D. Lahav, A. Karpur, Y. Bar-On, H. A. Bradley, J. Alstott, Securing AI Model Weights: Preventing Theft and Misuse of Frontier Models (RAND Corporation, Santa Monica, CA, 2024); https://doi.org/10.7249/RRA2849-1.

R. Bommasani, S. Kapoor, K. Klyman, S. Longpre, A. Ramaswami, D. Zhang, M. Schaake, D. E. Ho, A. Narayanan, P. Liang, Considerations for Governing Open Foundation Models. Science (New York, N.Y.) 386, 151–153 (2024); https://doi.org/10.1126/science.adp1848.

US National Telecommunications and Information Administration, “Dual-Use Foundation Models with Widely Available Model Weights NTIA Report” (US Department of Commerce, 2024); https://www.ntia.gov/issues/artificial-intelligence/open-model-weights-report.

E. Seger, N. Dreksler, R. Moulange, E. Dardaman, J. Schuett, K. Wei, C. Winter, M. Arnold, S. Ó. hÉigeartaigh, A. Korinek, M. Anderljung, B. Bucknall, A. Chan, E. Stafford, L. Koessler, A. Ovadya, B. Garfinkel, … A. Gupta, “Open-Sourcing Highly Capable Foundation Models: An Evaluation of Risks, Benefits, and Alternative Methods for Pursuing Open-Source Objectives” ( Centre for the Governance of AI, 2023); http://arxiv.org/abs/2311.09227.

P. Gade, S. Lermen, C. Rogers-Smith, J. Ladish, BadLlama: Cheaply Removing Safety Fine-Tuning from Llama 2-Chat 13B, arXiv [cs.CL] (2023); http://arxiv.org/abs/2311.00117.

* A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Zico Kolter, M. Fredrikson, Universal and Transferable Adversarial Attacks on Aligned Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2307.15043.

I. Yum, Language Agents and Malevolent Design. Philosophy & Technology 37, 1–19 (2024); https://doi.org/10.1007/s13347-024-00794-0.

S. Lermen, C. Rogers-Smith, J. Ladish, LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B, arXiv [cs.LG] (2023); http://arxiv.org/abs/2310.20624.

A. Arditi, O. Obeso, A. Syed, D. Paleka, N. Panickssery, W. Gurnee, N. Nanda, Refusal in Language Models Is Mediated by a Single Direction, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.11717.

J. Cable, A. Black, “With Open Source Artificial Intelligence, Don’t Forget the Lessons of Open Source Software” (Cybersecurity and Infrastructure Security Agency CISA, 2024); https://www.cisa.gov/news-

References

events/news/open-source-artificial-intelligence-dont-forget-lessons-open-source-software.

by J. Bateman, D. Baer, S. A. Bell, G. O. Brown, M.-F. (tino) Cuéllar, D. Ganguli, P. Henderson, B. Kotila, L. Lessig, N. B. Lundblad, J. Napolitano, D. Raji, E. Seger, M. Sheehan, A. Skowron, I. Solaiman, H. Toner, A. P. Zvyagina, “Beyond Open vs. Closed: Emerging Consensus and Key Questions for Foundation AI Model Governance” (Carnegie Endowment for International Peace, 2024); https://carnegieendowment.org/research/2024/07/beyond-open-vs-closed-emerging-consensus-and-key-questions-for-foundation-ai-model-governance?lang=en.

E. Seger, B. O’Dell, “Open Horizons: Exploring Nuanced Technical and Policy Approaches to Openness in AI” (Demos, 2024); https://demos.co.uk/research/open-horizons-exploring-nuanced-technical-and-policy-approaches-to-openness-in-ai/.

S. Kapoor, R. Bommasani, K. Klyman, S. Longpre, A. Ramaswami, P. Cihon, A. K. Hopkins, K. Bankston, S. Biderman, M. Bogen, R. Chowdhury, A. Engler, P. Henderson, Y. Jernite, S. Lazar, S. Maffulli, A. Nelson, … A. Narayanan, “Position: On the Societal Impact of Open Foundation Models” in International Conference on Machine Learning (PMLR, 2024), pp. 23082–23104; https://proceedings.mlr.press/v235/kapoor24a.html.

* S. Lakatos, “A Revealing Picture: AI-Generated ‘Undressing’ Images Move from Niche Pornography Discussion Forums to a Scaled and Monetized Online Business” (Graphika, 2023); https://graphika.com/reports/a-revealing-picture.

D. Thiel, M. Stroebel, R. Portnoff, “Generative ML and CSAM: Implications and Mitigations” (Thorn & Stanford Internet Observatory, 2023); https://fsi.stanford.edu/publication/generative-ml-and-csam-implications-and-mitigations.

A. Engler, “How Open-Source Software Shapes AI Policy” (Brookings, 2021); https://www.brookings.edu/articles/how-open-source-software-shapes-ai-policy/.

D. Gray Widder, S. West, M. Whittaker, Open (for Business): Big Tech, Concentrated Power, and the Political Economy of Open AI, SSRN [preprint] (2023); https://doi.org/10.2139/ssrn.4543807.

K. Blind, M. Böhm, P. Grzegorzewska, A. Katz, S. Muto, S. Pätsch, T. Schubert, “Study about the Impact of Open Source Software and Hardware on Technological Independence, Competitiveness and Innovation in the EU Economy, Final Study Report” (European Commission, 2021); https://digital-strategy.ec.europa.eu/en/library/study-about-impact-open-source-software-and-hardware-technological-independence-competitiveness-and.

Y. Kilcher, Ykilcher/gpt-4chan (2023); https://huggingface.co/ykilcher/gpt-4chan.

S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. van den Driessche, J.-B. Lespiau, B. Damoc, A. Clark, D. de Las Casas, A. Guy, J. Menick, R. Ring, T. Hennigan, S. Huang, L. Maggiore, … L. Sifre, Improving Language Models by Retrieving from Trillions of Tokens. International Conference on Machine Learning 162, 2206–2240 (2021); https://proceedings.mlr.press/v162/borgeaud22a/borgeaud22a.pdf.

P. Henderson, E. Mitchell, C. Manning, D. Jurafsky, C. Finn, “Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (Association for Computing Machinery, New York, NY, USA, 2023)AIES ’23, pp. 287–296; https://doi.org/10.1145/3600211.3604690.

J. Deng, S. Pang, Y. Chen, L. Xia, Y. Bai, H. Weng, W. Xu, SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-Trained Models, arXiv [cs.LG] (2024); http://arxiv.org/abs/2404.12699.

T. Huang, S. Hu, L. Liu, “Vaccine: Perturbation-Aware Alignment for Large Language Models against Harmful Fine-Tuning Attack” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/pdf?id=lpXDZKiAnt.

D. Rosati, J. Wehner, K. Williams, Ł. Bartoszcze, D. Atanasov, R. Gonzales, S. Majumdar, C. Maple, H. Sajjad, F. Rudzicz, Representation Noising Effectively Prevents Harmful Fine-Tuning on LLMs, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.14577.

R. Tamirisa, B. Bharathi, L. Phan, A. Zhou, A. Gatti, T. Suresh, M. Lin, J. Wang, R. Wang, R. Arel, A. Zou, D. Song, B. Li, D. Hendrycks, M. Mazeika, Tamper-Resistant Safeguards for Open-Weight LLMs, arXiv [cs.LG] (2024); http://arxiv.org/abs/2408.00761.

G. Wang, Y.-N. Chuang, R. Tang, S. Zhong, J. Yuan, H. Jin, Z. Liu, V. Chaudhary, S. Xu, J. Caverlee, X. Hu, Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion, arXiv [cs.CR] (2024); http://arxiv.org/abs/2410.05331.

M. Srikumar, J. Chang, K. Chmielinski, “Risk Mitigation Strategies for the Open Foundation Model Value Chain: Insights from PAI Workshop Co-Hosted with GitHub” (Partnership on AI, 2024); https://partnershiponai.org/wp-content/uploads/dlm_uploads/2024/07/open-foundation-model-risk-mitigation_rev3-1.pdf.

References

E. David, Meta Unleashes Its Most Powerful AI Model, Llama 3.1, with 405B Parameters, VentureBeat (2024); https://venturebeat.com/ai/meta-unleashes-its-most-powerful-ai-model-llama-3-1-with-405b-parameters/.

B. Muralidharan, H. Beadles, R. Marzban, K. S. Mupparaju, Knowledge AI: Fine-Tuning NLP Models for Facilitating Scientific Knowledge Extraction and Understanding, arXiv [cs.CL] (2024); http://arxiv.org/abs/2408.04651.

* L. Weidinger, M. Rauh, N. Marchal, A. Manzini, L. A. Hendricks, J. Mateos-Garcia, S. Bergman, J. Kay, C. Griffin, B. Bariach, I. Gabriel, V. Rieser, W. Isaac, “Sociotechnical Safety Evaluation of Generative AI Systems” (Google Deepmind, 2023); http://arxiv.org/abs/2310.11986.

* L. Weidinger, J. Barnhart, J. Brennan, C. Butterfield, S. Young, W. Hawkins, L. A. Hendricks, R. Comanescu, O. Chang, M. Rodriguez, J. Beroshi, D. Bloxwich, L. Proleev, J. Chen, S. Farquhar, L. Ho, I. Gabriel, … W. Isaac, “Holistic Safety and Responsibility Evaluations of Advanced AI Models” (Google Deepmind, 2024); http://arxiv.org/abs/2404.14068.

* I. Solaiman, Z. Talat, W. Agnew, L. Ahmad, D. Baker, S. L. Blodgett, H. Daumé III, J. Dodge, E. Evans, S. Hooker, Y. Jernite, A. S. Luccioni, A. Lusoli, M. Mitchell, J. Newman, M.-T. Png, A. Strait, A. Vassilev, Evaluating the Social Impact of Generative AI Systems in Systems and Society, arXiv [cs.CY] (2023); http://arxiv.org/abs/2306.05949.

A. R. R. Salammagari, G. Srivastava, Advancing Natural Language Understanding for Low-Resource Languages: Current Progress, Applications, and Challenges. International Journal of Advanced Research in Engineering and Technology 15, 244–255 (2024); https://iaeme.com/Home/article_id/IJARET_15_03_021.

A. Birhane, W. Isaac, V. Prabhakaran, M. Diaz, M. C. Elish, I. Gabriel, S. Mohamed, “Power to the People? Opportunities and Challenges for Participatory AI” in Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 1–8; https://doi.org/10.1145/3551624.3555290.

P. Slattery, A. K. Saeri, E. A. C. Grundy, J. Graham, M. Noetel, R. Uuk, J. Dao, S. Pour, S. Casper, N. Thompson, The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks from Artificial Intelligence, arXiv [cs.AI] (2024); http://arxiv.org/abs/2408.12622.

Partnership on AI, “[Draft] Guidelines for Participatory and Inclusive AI” (2024); https://partnershiponai.notion.site/1e8a6131dda045f1ad00054933b0bda0?v=dcb890146f7d464a86f11fcd5de372c0.

M. Maghsoudi, A. Mohammadi, S. Habibipour, Navigating and Addressing Public Concerns in AI: Insights from Social Media Analytics and Delphi. IEEE Access: Practical Innovations, Open Solutions 12, 1–1 (2024); https://doi.org/10.1109/access.2024.3440660.

K. Grosse, L. Bieringer, T. R. Besold, A. M. Alahi, “Towards More Practical Threat Models in Artificial Intelligence Security” in 33rd USENIX Security Symposium (USENIX Security 24) (2024), pp. 4891–4908; https://www.usenix.org/system/files/usenixsecurity24-grosse.pdf.

H. Li, Z. Ren, M. Fan, W. Li, Y. Xu, Y. Jiang, W. Xia, A Review of Scenario Analysis Methods in Planning and Operation of Modern Power Systems: Methodologies, Applications, and Challenges. Electric Power Systems Research 205, 107722 (2022); https://doi.org/10.1016/j.epsr.2021.107722.

A. Mantelero, The Fundamental Rights Impact Assessment (FRIA) in the AI Act: Roots, Legal Obligations and Key Elements for a Model Template. Computer Law and Security Report 54, 106020 (2024); https://doi.org/10.1016/j.clsr.2024.106020.

I. D. Raji, P. Xu, C. Honigsberg, D. Ho, “Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance” in Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 557–571; https://doi.org/10.1145/3514094.3534181.

V. Storchan, R. Kumar, R. Chowdhury, S. Goldfarb-Tarrant, S. Cattell, “2024 Generative AI Red Teaming Transparency Report” (Humane intelligence, 2024).

* S. Wan, C. Nikolaidis, D. Song, D. Molnar, J. Crnkovich, J. Grace, M. Bhatt, S. Chennabasappa, S. Whitman, S. Ding, V. Ionescu, Y. Li, J. Saxe, CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models, arXiv [cs.CR] (2024); http://arxiv.org/abs/2408.01605.

R. J. Neuwirth, Prohibited Artificial Intelligence Practices in the Proposed EU Artificial Intelligence Act (AIA). Computer Law & Security Review 48, 105798 (2023); https://doi.org/10.1016/j.clsr.2023.105798.

L. Heim, L. Koessler, Training Compute Thresholds: Features and Functions in AI Regulation, arXiv [cs.CY] (2024); http://arxiv.org/abs/2405.10799.

L. Koessler, J. Schuett, M. Anderljung, Risk Thresholds for Frontier AI, arXiv [cs.CY] (2024); http://arxiv.org/abs/2406.14713.

References

Center for Chemical Process Safety (CCPS), Bow Ties in Risk Management (John Wiley & Sons, Nashville, TN, 2018); https://doi.org/10.1002/9781119490357.

International Organization for Standardization, “ISO 21448:2022: Road Vehicles — Safety of the Intended Functionality” (ISO, 2022); https://www.iso.org/standard/77490.html.

* Anthropic, Responsible Scaling Policy. (2024); https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf.

Partnership on AI, PAI’s Guidance for Safe Foundation Model Deployment (2023); https://partnershiponai.org/modeldeployment/.

T. Kelly, A Systematic Approach to Safety Case Management. SAE Transactions: Journal of Materials & Manufacturing 113, 257–266 (2004); http://www.jstor.org/stable/44699541.

B. Lakshmi Prasanna, M. SaidiReddy, (CSM2-RA-R2-TI): Cyber Security Maturity Model for Risk Assessment Using Risk Register for Threat Intelligence. Journal of Physics. Conference Series 2040, 012005 (2021); https://doi.org/10.1088/1742-6596/2040/1/012005.

* Y. Zeng, K. Klyman, A. Zhou, Y. Yang, M. Pan, R. Jia, D. Song, P. Liang, B. Li, AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies, arXiv [cs.CY] (2024); http://arxiv.org/abs/2406.17864.

H. Wu, AI Whistleblowers, SSRN [preprint] (2024); https://doi.org/10.2139/ssrn.4790511.

MITRE ATLAS, MITRE ATLAS AI Incidents (2024); https://ai-incidents.mitre.org/.

B. Robinson, J. Ginns, “Transforming Risk Governance at Frontier AI Companies” (Centre for Long-Term Resilience, 2024); https://www.longtermresilience.org/wp-content/uploads/2024/07/Transforming-risk-governance-at-frontier-AI-companies-CLTR-1.pdf.

J. Schuett, Three Lines of Defense against Risks from AI. AI & Society (2023); https://doi.org/10.1007/s00146-023-01811-0.

R. Bommasani, K. Klyman, S. Longpre, B. Xiong, S. Kapoor, N. Maslej, A. Narayanan, P. Liang, Foundation Model Transparency Reports, arXiv [cs.LG] (2024); http://arxiv.org/abs/2402.16268.

* D. Hendrycks, N. Carlini, J. Schulman, J. Steinhardt, Unsolved Problems in ML Safety, arXiv [cs.LG] (2021); http://arxiv.org/abs/2109.13916.

M. Anderljung, E. T. Smith, J. O’Brien, L. Soder, B. Bucknall, E. Bluemke, J. Schuett, R. Trager, L. Strahm, R. Chowdhury, Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework, arXiv [cs.CY] (2023); http://arxiv.org/abs/2311.14711.

R. Gupta, L. Walker, R. Corona, S. Fu, S. Petryk, J. Napolitano, T. Darrell, A. W. Reddie, Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies, arXiv [cs.CY] (2024); http://arxiv.org/abs/2409.17216.

D. McDuff, T. Korjakow, S. Cambo, J. J. Benjamin, J. Lee, Y. Jernite, C. M. Ferrandis, A. Gokaslan, A. Tarkowski, J. Lindley, A. F. Cooper, D. Contractor, On the Standardization of Behavioral Use Clauses and Their Adoption for Responsible Licensing of AI, arXiv [cs.SE] (2024); http://arxiv.org/abs/2402.05979.

B. Rakova, J. Yang, H. Cramer, R. Chowdhury, Where Responsible AI Meets Reality: Practitioner Perspectives on Enablers for Shifting Organizational Practices. Proceedings of the ACM on Human-Computer Interaction 5, 1–23 (2021); https://doi.org/10.1145/3449081.

* Microsoft AI, “Putting Principles into Practice: How We Approach Responsible AI at Microsoft” (Microsoft, 2020); https://www.microsoft.com/cms/api/am/binary/RE4pKH5.

J. Schuett, A.-K. Reuel, A. Carlier, How to Design an AI Ethics Board. AI and Ethics, 1–19 (2024); https://doi.org/10.1007/s43681-023-00409-y.

G. de Beco, Human Rights Impact Assessments. Netherlands Quarterly of Human Rights 27, 139–166 (2009); https://doi.org/10.1177/016934410902700202.

E. Donahoe, M. M. Metzger, Artificial Intelligence and Human Rights. Journal of Democracy 30, 115–126 (2019); https://doi.org/10.1353/jod.2019.0029.

S. Makridakis, The Art and Science of Forecasting An Assessment and Future Directions. International Journal of Forecasting 2, 15–39 (1986); https://doi.org/10.1016/0169-2070(86)90028-2.

E. Karger, P. Atanasov, P. E. Tetlock, “Improving Judgments of Existential Risk: Better Forecasts, Questions, Explanations, Policies” (Future of Humanity Institute, 2022); https://www.fhi.ox.ac.uk/wp-content/uploads/2022/05/Improving-Judgments-of-Existential-Risk.pdf.

L. Koessler, J. Schuett, Risk Assessment at AGI Companies: A Review of Popular Risk Assessment Techniques

References

from Other Safety-Critical Industries, arXiv [cs.CY] (2023); http://arxiv.org/abs/2307.08823.

B. Anderson-Samways, “AI-Relevant Regulatory Precedents: A Systematic Search Across All Federal Agencies” (Institute for AI Policy and Strategy, 2024); https://www.iaps.ai/research/ai-relevant-regulatory-precedent.

H. E. Roland, B. Moriarty, System Safety Engineering and Management (Wiley, New York, 2nd ed., 1990); https://www.wiley.com/en-us/System+Safety+Engineering+and+Management%2C+2nd+Edition-p-9780471618164.

N. G. Leveson, Engineering a Safer World: Systems Thinking Applied to Safety (The MIT Press, 2012); https://doi.org/10.7551/mitpress/8179.001.0001.

S. Dekker, Foundations of Safety Science: A Century of Understanding Accidents and Disasters (Routledge, London, England, 2019); https://doi.org/10.4324/9781351059794.

ISO, ISO 31000: Risk Management, ISO (2018); https://www.iso.org/iso-31000-risk-management.html.

E. Black, R. Naidu, R. Ghani, K. Rodolfa, D. Ho, H. Heidari, “Toward Operationalizing Pipeline-Aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools” in Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 1–11; https://doi.org/10.1145/3617694.3623259.

S. Rismani, R. Shelby, A. Smart, E. Jatho, J. Kroll, A. Moon, N. Rostamzadeh, “From Plane Crashes to Algorithmic Harm: Applicability of Safety Engineering Frameworks for Responsible ML” in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 1–18; https://doi.org/10.1145/3544548.3581407.

R. Hawkins, C. Paterson, C. Picardi, Y. Jia, R. Calinescu, I. Habli, Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS), arXiv [cs.LG] (2021); http://arxiv.org/abs/2102.01564.

T. Raz, D. Hillson, A Comparative Review of Risk Management Standards. Risk Management: An International Journal 7, 53–66 (2005); https://doi.org/10.1057/palgrave.rm.8240227.

J. Clymer, N. Gabrieli, D. Krueger, T. Larsen, Safety Cases: How to Justify the Safety of Advanced AI Systems, arXiv [cs.CY] (2024); http://arxiv.org/abs/2403.10462.

C. Haddon-Cave, The Nimrod Review: An Independent Review into the Broader Issues Surrounding the Loss of the RAF Nimrod MR2 Aircraft XV230 in Afghanistan in 2006, Report (Stationery Office, 2009); https://www.gov.uk/government/publications/the-nimrod-review.

N. G. Leveson, Applying Systems Thinking to Analyze and Learn from Events. Safety Science 49, 55–64 (2011); https://doi.org/10.1016/j.ssci.2009.12.021.

D. Hendrycks, Introduction to AI Safety, Ethics, and Society (CRC Press, 2024); https://www.aisafetybook.com/.

O. Delaney, O. Guest, Z. Williams, Mapping Technical Safety Research at AI Companies: A Literature Review and Incentives Analysis, arXiv [cs.CY] (2024); http://arxiv.org/abs/2409.07878.

R. Uuk, A. Brouwer, N. Dreksler, V. Pulignano, R. Bommasani, Effective Mitigations for Systemic Risks from General-Purpose AI. (2024); https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5021463.

D. A. Boiko, R. MacKnight, G. Gomes, Emergent Autonomous Scientific Research Capabilities of Large Language Models, arXiv [physics.chem-ph] (2023); http://arxiv.org/abs/2304.05332.

Q. Lu, L. Zhu, X. Xu, Z. Xing, S. Harrer, J. Whittle, Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model Based Agents, arXiv [cs.AI] (2023); http://arxiv.org/abs/2311.13148.

* SIMA Team, M. A. Raad, A. Ahuja, C. Barros, F. Besse, A. Bolt, A. Bolton, B. Brownfield, G. Buttimore, M. Cant, S. Chakera, S. C. Y. Chan, J. Clune, A. Collister, V. Copeman, A. Cullum, I. Dasgupta, … N. Young, “Scaling Instructable Agents Across Many Simulated Worlds” (Google Deepmind, 2024); http://arxiv.org/abs/2404.10179.

T. Schick, J. Dwivedi-Yu, R. Dessi, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom, “Toolformer: Language Models Can Teach Themselves to Use Tools” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=Yacmpz84TH.

Y. Tian, X. Yang, J. Zhang, Y. Dong, H. Su, Evil Geniuses: Delving into the Safety of LLM-Based Agents, arXiv [cs.CL] (2023); http://arxiv.org/abs/2311.11855.

Z. Wu, C. Han, Z. Ding, Z. Weng, Z. Liu, S. Yao, T. Yu, L. Kong, OS-Copilot: Towards Generalist Computer Agents with Self-Improvement, arXiv [cs.AI] (2024); http://arxiv.org/abs/2402.07456.

Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, R. Zheng, X. Fan, X. Wang, L. Xiong, Y. Zhou, W. Wang, C. Jiang, … T. Gui, The Rise and Potential of Large Language Model Based Agents: A Survey, arXiv [cs.AI] (2023); http://arxiv.org/abs/2309.07864.

References

* T. Masterman, S. Besen, M. Sawtell, A. Chao, The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey, arXiv [cs.AI] (2024); http://arxiv.org/abs/2404.11584.

M. Hartmann, A. Koller, A Survey on Complex Tasks for Goal-Directed Interactive Agents, arXiv [cs.CL] (2024); http://arxiv.org/abs/2409.18538.

T. Xie, D. Zhang, J. Chen, X. Li, S. Zhao, R. Cao, T. J. Hua, Z. Cheng, D. Shin, F. Lei, Y. Liu, Y. Xu, S. Zhou, S. Savarese, C. Xiong, V. Zhong, T. Yu, OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments, arXiv [cs.AI] (2024); http://arxiv.org/abs/2404.07972.

* A. Fourney, G. Bansal, H. Mozannar, C. Tan, E. Salinas, E. (eric) Zhu, F. Niedtner, G. Proebsting, G. Bassman, J. Gerrits, J. Alber, P. Chang, R. Loynd, R. West, V. Dibia, A. Awadallah, E. Kamar, … S. Amershi, “Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks” (Microsoft, 2024); https://www.microsoft.com/en-us/research/publication/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/.

S. Hu, M. Ouyang, D. Gao, M. Z. Shou, The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use, arXiv [cs.AI] (2024); http://arxiv.org/abs/2411.10323.

J.-P. Rivera, G. Mukobi, A. Reuel, M. Lamparth, C. Smith, J. Schneider, “Escalation Risks from Language Models in Military and Diplomatic Decision-Making” in The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2024); https://doi.org/10.1145/3630106.3658942.

B. Zhang, Y. Tan, Y. Shen, A. Salem, M. Backes, S. Zannettou, Y. Zhang, Breaking Agents: Compromising Autonomous LLM Agents through Malfunction Amplification, arXiv [cs.CR] (2024); http://arxiv.org/abs/2407.20859.

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, M. Fritz, “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection” in Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 79–90; https://doi.org/10.1145/3605764.3623985.

R. Fang, D. Bowman, D. Kang, Voice-Enabled AI Agents Can Perform Common Scams, arXiv [cs.AI] (2024); http://arxiv.org/abs/2410.15650.

M. Andriushchenko, A. Souly, M. Dziemian, D. Duenas, M. Lin, J. Wang, D. Hendrycks, A. Zou, Z. Kolter, M. Fredrikson, E. Winsor, J. Wynne, Y. Gal, X. Davies, AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents, arXiv [cs.LG] (2024); http://arxiv.org/abs/2410.09024.

* P. Kumar, E. Lau, S. Vijayakumar, T. Trinh, Scale Red Team, E. Chang, V. Robinson, S. Hendryx, S. Zhou, M. Fredrikson, S. Yue, Z. Wang, Refusal-Trained LLMs Are Easily Jailbroken as Browser Agents, arXiv [cs.CR] (2024); http://arxiv.org/abs/2410.13886.

A. Chan, C. Ezell, M. Kaufmann, K. Wei, L. Hammond, H. Bradley, E. Bluemke, N. Rajkumar, D. Krueger, N. Kolt, L. Heim, M. Anderljung, Visibility into AI Agents, arXiv [cs.CY] (2024); http://arxiv.org/abs/2401.13138.

M. K. Cohen, N. Kolt, Y. Bengio, G. K. Hadfield, S. Russell, Regulating Advanced Artificial Agents. Science 384, 36–38 (2024); https://doi.org/10.1126/science.adl0625.

G. Mialon, C. Fourrier, T. Wolf, Y. LeCun, T. Scialom, “GAIA: A Benchmark for General AI Assistants” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2024); https://openreview.net/forum?id=fibxvahvs3.

K. Valmeekam, K. Stechly, S. Kambhampati, “LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench” in NeurIPS 2024 Workshop on Open-World Agents (2024); https://openreview.net/forum?id=Gcr1Lx4Koz.

P. P. Liang, A. Zadeh, L.-P. Morency, Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. ACM Computing Surveys 56, 1–42 (2024); https://doi.org/10.1145/3656580.

R. Wang, X. Ma, H. Zhou, C. Ji, G. Ye, Y.-G. Jiang, “White-Box Multimodal Jailbreaks Against Large Vision-Language Models” in ACM Multimedia 2024 (2024); https://openreview.net/forum?id=SMOUQtEaAf.

M. Thiemann, J. Lepoutre, Stitched on the Edge: Rule Evasion, Embedded Regulators, and the Evolution of Markets. American Journal of Sociology 122, 1775–1821 (2017); https://doi.org/10.1086/691348.

R. Huben, H. Cunningham, L. R. Smith, A. Ewart, L. Sharkey, “Sparse Autoencoders Find Highly Interpretable Features in Language Models” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=F76bwRSLeK.

* L. Gao, T. D. la Tour, H. Tillman, G. Goh, R. Troll, A. Radford, I. Sutskever, J. Leike, J. Wu, Scaling and Evaluating Sparse Autoencoders, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.04093.

* T. Lieberum, S. Rajamanoharan, A. Conmy, L. Smith, N. Sonnerat, V. Varma, J. Kramar, A. Dragan, R. Shah, N. Nanda, “Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2” in The 7th BlackboxNLP

References

Workshop (2024); https://openreview.net/forum?id=XkMrWOJhNd.

A. Templeton, T. Conerly, J. Marcus, J. Lindsey, T. Bricken, B. Chen, A. Pearce, C. Citro, E. Ameisen, A. Jones, H. Cunningham, N. L. Turner, C. McDougall, M. MacDiarmid, C. D. Freeman, T. R. Sumers, E. Rees, … T. Henighan, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread (2024); https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html.

* T. Bricken, A. Templeton, J. Batson, B. Chen, A. Jermyn, T. Conerly, N. Turner, C. Anil, C. Denison, A. Askell, R. Lasenby, Y. Wu, S. Kravec, N. Schiefer, T. Maxwell, N. Joseph, Z. Hatfield-Dodds, … C. Olah, Towards Monosemanticity: Decomposing Language Models with Dictionary Learning, Transformer Circuits Thread (2023); https://transformer-circuits.pub/2023/monosemantic-features.

M. Ananny, K. Crawford, Seeing without Knowing: Limitations of the Transparency Ideal and Its Application to Algorithmic Accountability. New Media & Society 20, 973–989 (2018); https://doi.org/10.1177/1461444816676645.

* T. Bolukbasi, A. Pearce, A. Yuan, A. Coenen, E. Reif, F. Viégas, M. Wattenberg, An Interpretability Illusion for BERT, arXiv [cs.CL] (2021); http://arxiv.org/abs/2104.07143.

K. Kaye, P. Dixon, “Risky Analysis: Assessing and Improving AI Governance Tools An International Review of AI Governance Tools and Suggestions for Pathways Forward” (World Privacy Forum, 2023); https://www.worldprivacyforum.org/wp-content/uploads/2023/12/WPF_Risky_Analysis_December_2023_fs.pdf.

A. Makelov, G. Lange, A. Geiger, N. Nanda, “Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=Ebt7JgMHv1.

D. Stander, Q. Yu, H. Fan, S. Biderman, “Grokking Group Multiplication with Cosets” in Forty-First International Conference on Machine Learning (2024); https://openreview.net/forum?id=hcQfTsVnBo.

D. Chanin, J. Wilken-Smith, T. Dulka, H. Bhatnagar, J. Bloom, A Is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders, arXiv [cs.CL] (2024); http://arxiv.org/abs/2409.14507.

J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, B. Kim, “Sanity Checks for Saliency Maps” in Advances in Neural Information Processing Systems (NeurIPS 2018) (Curran Associates, Inc., 2018) vol. 31; https://proceedings.neurips.cc/paper_files/paper/2018/hash/294a8ed24b1ad22ec2e7efea049b8737-Abstract.html.

J. Adebayo, M. Muelly, I. Liccardi, B. Kim, “Debugging Tests for Model Explanations” in Advances in Neural Information Processing Systems (NeurIPS 2020) (Curran Associates, Inc., 2020) vol. 33, pp. 700–712; https://proceedings.neurips.cc/paper/2020/hash/075b051ec3d22dac7b33f788da631fd4-Abstract.html.

S. Casper, T. Bu, Y. Li, J. Li, K. Zhang, K. Hariharan, D. Hadfield-Menell, “Red Teaming Deep Neural Networks with Feature Synthesis Tools” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=Od6CHhPM7I.

P. Hase, M. Bansal, B. Kim, A. Ghandeharioun, “Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023); https://openreview.net/forum?id=EldbUlZtbd.

J. Miller, B. Chughtai, W. Saunders, Transformer Circuit Faithfulness Metrics Are Not Robust, arXiv [cs.LG] (2024); http://arxiv.org/abs/2407.08734.

* M. L. Leavitt, A. Morcos, Towards Falsifiable Interpretability Research, arXiv [cs.CY] (2020); http://arxiv.org/abs/2010.12016.

* E. Durmus, A. Tamkin, J. Clark, J. Wei, J. Marcus, J. Batson, K. Handa, L. Lovitt, M. Tong, M. McCain, O. Rausch, S. Huang, S. Bowman, S. Ritchie, T. Henighan, D. Ganguli, “Evaluating Feature Steering: A Case Study in Mitigating Social Biases” (Anthropic, 2024); https://www.anthropic.com/research/evaluating-feature-steering.

G. E. Hinton, “Distributed Representations” (CMU-CS-84–157, Carnegie-Mellon University, 1984); http://shelf2.library.cmu.edu/Tech/19334156.pdf.

Y. Bengio, A. Courville, P. Vincent, Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1798–1828 (2013); https://doi.org/10.1109/TPAMI.2013.50.

L. Gao, J. Schulman, J. Hilton, “Scaling Laws for Reward Model Overoptimization” in Proceedings of the 40th International Conference on Machine Learning (PMLR, Honolulu, Hawaii, USA, 2023), pp. 10835–10866; https://proceedings.mlr.press/v202/gao23h.html.

P. Singhal, T. Goyal, J. Xu, G. Durrett, A Long Way to Go: Investigating Length Correlations in RLHF, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.03716.

References

J. M. V. Skalse, N. H. R. Howe, D. Krasheninnikov, D. Krueger, “Defining and Characterizing Reward Gaming” in 36th Conference on Neural Information Processing Systems (NeurIPS 2022) (Virtual, 2022); https://openreview.net/forum?id=yb3HOXO3lX2.

L. E. McKinney, Y. Duan, D. Krueger, A. Gleave, “On The Fragility of Learned Reward Functions” in 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Deep Reinforcement Learning Workshop (Virtual, 2022); https://openreview.net/forum?id=9gj9vXfeS-y.

J. Tien, J. Z.-Y. He, Z. Erickson, A. Dragan, D. S. Brown, “Causal Confusion and Reward Misidentification in Preference-Based Reward Learning” in 11th International Conference on Learning Representations (ICLR 2023) (Kigali, Rwanda, 2022); https://openreview.net/forum?id=R0Xxvr_X3ZA.

Z. X. Yong, C. Menghini, S. Bach, “Low-Resource Languages Jailbreak GPT-4” in NeurIPS Workshop on Socially Responsible Language Modelling Research (SoLaR) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=pn83r8V2sv.

Y. Huang, L. Sun, H. Wang, S. Wu, Q. Zhang, Y. Li, C. Gao, Y. Huang, W. Lyu, Y. Zhang, X. Li, H. Sun, Z. Liu, Y. Liu, Y. Wang, Z. Zhang, B. Vidgen, … Y. Zhao, “Position: TrustLLM: Trustworthiness in Large Language Models” in International Conference on Machine Learning (PMLR, 2024), pp. 20166–20270; https://proceedings.mlr.press/v235/huang24x.html.

S. Longpre, S. Kapoor, K. Klyman, A. Ramaswami, R. Bommasani, B. Blili-Hamelin, Y. Huang, A. Skowron, Z.-X. Yong, S. Kotha, Y. Zeng, W. Shi, X. Yang, R. Southen, A. Robey, P. Chao, D. Yang, … P. Henderson, A Safe Harbor for AI Evaluation and Red Teaming, arXiv [cs.AI] (2024); http://arxiv.org/abs/2403.04893.

Y. M. Pa Pa, S. Tanizaki, T. Kou, M. van Eeten, K. Yoshioka, T. Matsumoto, “An Attacker’s Dream? Exploring the Capabilities of ChatGPT for Developing Malware” in Proceedings of the 16th Cyber Security Experimentation and Test Workshop (CSET ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 10–18; https://doi.org/10.1145/3607505.3607513.

A. Liu, Q. Sheng, X. Hu, “Preventing and Detecting Misinformation Generated by Large Language Models” in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, NY, USA, 2024), pp. 3001–3004; https://doi.org/10.1145/3626772.3661377.

J. B. Sandbrink, Artificial Intelligence and Biological Misuse: Differentiating Risks of Language Models and Biological Design Tools, arXiv [cs.CY] (2023); http://arxiv.org/abs/2306.13952.

L. Pöhler, V. Schrader, A. Ladwein, F. von Keller, A Technological Perspective on Misuse of Available AI, arXiv [cs.CY] (2024); http://arxiv.org/abs/2403.15325.

M. Anderljung, J. Hazell, Protecting Society from AI Misuse: When Are Restrictions on Capabilities Warranted?, arXiv [cs.AI] (2023); http://arxiv.org/abs/2303.09377.

A. Karamolegkou, J. Li, L. Zhou, A. Søgaard, “Copyright Violations and Large Language Models” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 7403–7412; https://doi.org/10.18653/v1/2023.emnlp-main.458.

H. Li, D. Guo, W. Fan, M. Xu, J. Huang, F. Meng, Y. Song, “Multi-Step Jailbreaking Privacy Attacks on ChatGPT” in The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) (Singapore, 2023); https://openreview.net/forum?id=ls4Pfsl2jZ.

* M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. Feder Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, K. Lee, Scalable Extraction of Training Data from (Production) Language Models, arXiv [cs.LG] (2023); http://arxiv.org/abs/2311.17035.

B. C. Das, M. H. Amini, Y. Wu, Security and Privacy Challenges of Large Language Models: A Survey, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.00888.

B. Yan, K. Li, M. Xu, Y. Dong, Y. Zhang, Z. Ren, X. Cheng, On Protecting the Data Privacy of Large Language Models (LLMs): A Survey, arXiv [cs.CR] (2024); http://arxiv.org/abs/2403.05156.

Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, Y. Zhang, A Survey on Large Language Model (LLM) Security and Privacy: The Good, The Bad, and The Ugly. High-Confidence Computing 4, 100211 (2024); https://doi.org/10.1016/j.hcc.2024.100211.

A. Deshpande, V. Murahari, T. Rajpurohit, A. Kalyan, K. Narasimhan, “Toxicity in Chatgpt: Analyzing Persona-Assigned Language Models” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Singapore, 2023), pp. 1236–1270; https://doi.org/10.18653/v1/2023.findings-emnlp.88.

Y. Qu, X. Shen, X. He, M. Backes, S. Zannettou, Y. Zhang, “Unsafe Diffusion: On the Generation of Unsafe Images

References

and Hateful Memes From Text-To-Image Models” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 3403–3417; https://doi.org/10.1145/3576915.3616679.

Z. Xu, S. Jain, M. Kankanhalli, Hallucination Is Inevitable: An Innate Limitation of Large Language Models, arXiv [cs.CL] (2024); http://arxiv.org/abs/2401.11817.

* Z. Bai, P. Wang, T. Xiao, T. He, Z. Han, Z. Zhang, M. Z. Shou, Hallucination of Multimodal Large Language Models: A Survey, arXiv [cs.CV] (2024); http://arxiv.org/abs/2404.18930.

Y. Liu, G. Deng, Z. Xu, Y. Li, Y. Zheng, Y. Zhang, L. Zhao, T. Zhang, K. Wang, Y. Liu, Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study, arXiv [cs.SE] (2023); http://arxiv.org/abs/2305.13860.

R. Shah, Q. F. Montixi, S. Pour, A. Tagade, J. Rando, “Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Socially Responsible Language Modelling Research Workshop (SoLaR) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=x3Ltqz1UFg.

N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, P. W. Koh, D. Ippolito, F. Tramèr, L. Schmidt, “Are Aligned Neural Networks Adversarially Aligned?” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=OQQoD8Vc3B.

X. Shen, Z. Chen, M. Backes, Y. Shen, Y. Zhang, “Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models, arXiv [cs.CR] (2023); http://arxiv.org/abs/2308.03825.

* N. Li, Z. Han, I. Steneker, W. Primack, R. Goodside, H. Zhang, Z. Wang, C. Menghini, S. Yue, LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks yet, arXiv [cs.LG] (2024); http://arxiv.org/abs/2408.15221.

L. Jiang, K. Rao, S. Han, A. Ettinger, F. Brahman, S. Kumar, N. Mireshghallah, X. Lu, M. Sap, Y. Choi, N. Dziri, “WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/pdf?id=n5R6TvBVcX.

Z. Dong, Z. Zhou, C. Yang, J. Shao, Y. Qiao, Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.naacl-long.375.

M. Andriushchenko, F. Croce, N. Flammarion, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks, arXiv [cs.CR] (2024); http://arxiv.org/abs/2404.02151.

Y. Zeng, H. Lin, J. Zhang, D. Yang, R. Jia, W. Shi, “How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024), pp. 14322–14350; https://doi.org/10.18653/v1/2024.acl-long.773.

A. G. Chowdhury, M. M. Islam, V. Kumar, F. H. Shezan, V. Kumar, V. Jain, A. Chadha, Breaking down the Defenses: A Comparative Survey of Attacks on Large Language Models, arXiv [cs.CR] (2024); http://arxiv.org/abs/2403.04786.

M. K. B. Doumbouya, A. Nandi, G. Poesia, D. Ghilardi, A. Goldie, F. Bianchi, D. Jurafsky, C. D. Manning, H4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment, arXiv [cs.CR] (2024); http://arxiv.org/abs/2408.04811.

* B. R. Y. Huang, M. Li, L. Tang, Endless Jailbreaks with Bijection Learning, arXiv [cs.CL] (2024); http://arxiv.org/abs/2410.01294.

X. Qi, Y. Zeng, T. Xie, P.-Y. Chen, R. Jia, P. Mittal, P. Henderson, “Fine-Tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=hTEGyKf0dZ.

Q. Zhan, R. Fang, R. Bindu, A. Gupta, T. Hashimoto, D. Kang, “Removing RLHF Protections in GPT-4 via Fine-Tuning” in 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Mexico City, Mexico, 2024); https://doi.org/10.48550/arXiv.2311.05553.

S. Jain, R. Kirk, E. S. Lubana, R. P. Dick, H. Tanaka, E. Grefenstette, T. Rocktäschel, D. S. Krueger, Mechanistically Analyzing the Effects of Fine-Tuning on Procedurally Defined Tasks, arXiv [cs.LG] (2023); http://arxiv.org/abs/2311.12786.

X. Yang, X. Wang, Q. Zhang, L. Petzold, W. Y. Wang, X. Zhao, D. Lin, Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.02949.

R. Bhardwaj, S. Poria, Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.14303.

J. Ji, K. Wang, T. Qiu, B. Chen, J. Zhou, C. Li, H. Lou, Y. Yang, Language Models Resist Alignment, arXiv [cs.CL]

References

(2024); http://arxiv.org/abs/2406.06144.

X. Qi, A. Panda, K. Lyu, X. Ma, S. Roy, A. Beirami, P. Mittal, P. Henderson, Safety Alignment Should Be Made More Than Just a Few Tokens Deep, arXiv [cs.CR] (2024); http://arxiv.org/abs/2406.05946.

S. Hu, Y. Fu, Z. S. Wu, V. Smith, Jogging the Memory of Unlearned LLMs through Targeted Relearning Attacks, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.13356.

D. Halawi, A. Wei, E. Wallace, T. T. Wang, N. Haghtalab, J. Steinhardt, “Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation” in International Conference on Machine Learning (PMLR, 2024), pp. 17298–17312; https://proceedings.mlr.press/v235/halawi24a.html.

R. Greenblatt, F. Roger, D. Krasheninnikov, D. Krueger, “Stress-Testing Capability Elicitation With Password-Locked Models” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/pdf?id=zzOOqD6R1b.

M. Lo, F. Barez, S. Cohen, Large Language Models Relearn Removed Concepts (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.findings-acl.492.

S. Peng, P.-Y. Chen, M. D. Hull, D. H. Chau, “Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/pdf?id=GZnsqBwHAG.

A. Sheshadri, A. Ewart, P. Guo, A. Lynch, C. Wu, V. Hebbar, H. Sleight, A. C. Stickland, E. Perez, D. Hadfield-Menell, S. Casper, Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs, arXiv [cs.LG] (2024); http://arxiv.org/abs/2407.15549.

S. Xhonneux, A. Sordoni, S. Günnemann, G. Gidel, L. Schwinn, “Efficient Adversarial Training in LLMs with Continuous Attacks” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/pdf?id=8jB6sGqvgQ.

L. Schwinn, S. Geisler, Revisiting the Robust Alignment of Circuit Breakers, arXiv [cs.CR] (2024); http://arxiv.org/abs/2407.15902.

T. Huang, S. Hu, F. Ilhan, S. F. Tekin, L. Liu, Harmful Fine-Tuning Attacks and Defenses for Large Language Models: A Survey, arXiv [cs.CR] (2024); http://arxiv.org/abs/2409.18169.

J. Łucki, B. Wei, Y. Huang, P. Henderson, F. Tramèr, J. Rando, An Adversarial Perspective on Machine Unlearning for AI Safety, arXiv [cs.LG] (2024); http://arxiv.org/abs/2409.18025.

Y. Wolf, N. Wies, O. Avnery, Y. Levine, A. Shashua, Fundamental Limitations of Alignment in Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2304.11082.

T. Tseng, E. McLean, K. Pelrine, T. T. Wang, A. Gleave, Can Go AIs Be Adversarially Robust?, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.12843.

M. Andriushchenko, N. Flammarion, Does Refusal Training in LLMs Generalize to the Past Tense?, arXiv [cs.CL] (2024); http://arxiv.org/abs/2407.11969.

I. D. Raji, E. Denton, E. M. Bender, A. Hanna, A. Paullada, “AI and the Everything in the Whole Wide World Benchmark” in 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Datasets and Benchmarks Track (Round 2) (Virtual, 2021); https://openreview.net/forum?id=j6NxpQbREA1.

B. Hutchinson, N. Rostamzadeh, C. Greer, K. Heller, V. Prabhakaran, “Evaluation Gaps in Machine Learning Practice” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 1859–1876; https://doi.org/10.1145/3531146.3533233.

S. Casper, C. Ezell, C. Siegmann, N. Kolt, T. L. Curtis, B. Bucknall, A. Haupt, K. Wei, J. Scheurer, M. Hobbhahn, L. Sharkey, S. Krishna, M. Von Hagen, S. Alberti, A. Chan, Q. Sun, M. Gerovitch, … D. Hadfield-Menell, Black-Box Access Is Insufficient for Rigorous AI Audits, arXiv [cs.CY] (2024); http://arxiv.org/abs/2401.14446.

B. Ram, P. Verma, Artificial Intelligence AI-Based Chatbot Study of ChatGPT, Google AI Bard and Baidu AI. World Journal of Advanced Engineering Technology and Sciences 8, 258–261 (2023); https://doi.org/10.30574/wjaets.2023.8.1.0045.

M. M. Maas, “Artificial Intelligence Governance under Change: Foundations, Facets, Frameworks,” thesis, University of Copenhagen (2020); https://matthijsmaas.com/uploads/Maas%20-%202021%20-%20PhD%20Dissertation%20-%20Artificial%20Intelligence%20Governance%20Under%20Change%20-%20monograph.pdf.

P. M. Napoli, Social Media and the Public Interest (Columbia University Press, 2019); https://cup.columbia.edu/book/social-media-and-the-public-interest/9780231184540.

J. M. Balkin, How to Regulate (and Not Regulate) Social Media. Journal of Free Speech Law 1, 71–96 (2021);

References

https://doi.org/10.2139/ssrn.3484114.

R. H. Frank, P. J. Cook, Winner-Take-All Markets. Studies in Microeconomics 1, 131–154 (2013); https://doi.org/10.1177/2321022213501254.

B. A. Prakash, A. Beutel, R. Rosenfeld, C. Faloutsos, “Winner Takes All: Competing Viruses or Ideas on Fair-Play Networks” in Proceedings of the 21st International Conference on World Wide Web - WWW ’12 (ACM Press, New York, New York, USA, 2012); https://doi.org/10.1145/2187836.2187975.

T. A. Han, L. M. Pereira, T. Lenaerts, “Modelling and Influencing the AI Bidding War: A Research Agenda” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19) (New York, NY, USA, 2019), pp. 5–11; https://doi.org/10.1145/3306618.3314265.

T. Cimpeanu, F. C. Santos, L. M. Pereira, T. Lenaerts, T. A. Han, Artificial Intelligence Development Races in Heterogeneous Settings. Scientific Reports 12, 1723 (2022); https://doi.org/10.1038/s41598-022-05729-3.

A. Guasti, M. Koenig-Archibugi, Has Global Trade Competition Really Led to a Race to the Bottom in Labor Standards? International Studies Quarterly: A Publication of the International Studies Association 66, sqac061 (2022); https://doi.org/10.1093/isq/sqac061.

G. Porter, Trade Competition and Pollution Standards: “race to the Bottom” or “stuck at the Bottom.” Journal of Environment & Development 8, 133–151 (1999); https://doi.org/10.1177/107049659900800203.

D. Vera, C. Rusche, “The Economics of Platforms” (Institut der deutschen Wirtschaft, 2018); https://www.iwkoeln.de/en/studies/vera-demary-christian-rusche-the-economics-of-platforms.html.

M. F. Niculescu, D. J. Wu, L. Xu, Strategic Intellectual Property Sharing: Competition on an Open Technology Platform under Network Effects. Information Systems Research : ISR 29, 498–519 (2018); https://doi.org/10.1287/isre.2017.0756.

N. L. Rose, Fear of Flying? Economic Analyses of Airline Safety. The Journal of Economic Perspectives: A Journal of the American Economic Association 6, 75–94 (1992); https://doi.org/10.1257/jep.6.2.75.

J. Tirole, The Theory of Industrial Organization (MIT Press, London, England, 1988).

S. Armstrong, N. Bostrom, C. Shulman, Racing to the Precipice: A Model of Artificial Intelligence Development. AI & Society 31, 201–206 (2016); https://doi.org/10.1007/s00146-015-0590-y.

G. H. Stern, R. J. Feldman, Too Big to Fail: The Hazards of Bank Bailouts (Brookings Institution Press, 2009); https://www.brookings.edu/books/too-big-to-fail/.

B. E. Gup, Financial Management Association International, Too Big to Fail : Policies and Practices in Government Bailouts (Praeger, Westport, Conn, ed. 1, 2003); https://library-search.open.ac.uk/permalink/44OPN_INST/la9sg5/alma9952597297902316.

V. Acharya, D. Anginer, J. A. Warburton, “The End of Market Discipline? Investor Expectations of Implicit Government Guarantees” (2022); https://cepr.org/publications/dp17426.

K. Pernell, J. Jung, Rethinking Moral Hazard: Government Protection and Bank Risk-Taking. Socio-Economic Review 22, 625–653 (2024); https://doi.org/10.1093/ser/mwad050.

W. J. Baumol, W. E. Oates, The Theory of Environmental Policy (Cambridge University Press, Cambridge, England, ed. 2, 1988); https://doi.org/10.1017/cbo9781139173513.

P. DeCicca, D. Kenkel, M. F. Lovenheim, The Economics of Tobacco Regulation: A Comprehensive Review. Journal of Economic Literature 60, 883–970 (2022); https://doi.org/10.1257/jel.20201482.

J. Guerreiro, S. Rebelo, P. Teles, “Regulating Artificial Intelligence” (w31921, National Bureau of Economic Research, 2023); https://doi.org/10.3386/w31921.

L. Dallas, “Short-Termism, the Financial Crisis, and Corporate Governance” (University of San Diego School of Law, 2012); http://dx.doi.org/.

N. Kolt, M. Anderljung, J. Barnhart, A. Brass, K. Esvelt, G. K. Hadfield, L. Heim, M. Rodriguez, J. B. Sandbrink, T. Woodside, Responsible Reporting for Frontier AI Development, arXiv [cs.CY] (2024); http://arxiv.org/abs/2404.02675.

M. Anderljung, J. Barnhart, A. Korinek, J. Leung, C. O’Keefe, J. Whittlestone, S. Avin, M. Brundage, J. Bullock, D. Cass-Beggs, B. Chang, T. Collins, T. Fist, G. Hadfield, A. Hayes, L. Ho, S. Hooker, … K. Wolf, Frontier AI Regulation: Managing Emerging Risks to Public Safety, arXiv [cs.CY] (2023); http://arxiv.org/abs/2307.03718.

L. Collina, M. Sayyadi, M. Provitera, Critical Issues About A.I. Accountability Answered. California Management Review Insights (2023); https://cmr.berkeley.edu/2023/11/critical-issues-about-a-i-accountability-answered/.

A. T. da Fonseca, E. Vaz de Sequeira, L. Barreto Xavier, “Liability for AI Driven Systems” in Multidisciplinary Perspectives on Artificial Intelligence and the Law, H. Sousa Antunes, P. M. Freitas, A. L. Oliveira, C. Martins

References

Pereira, E. Vaz de Sequeira, L. Barreto Xavier, Eds. (Springer International Publishing, Cham, 2024), pp. 299–317; https://doi.org/10.1007/978-3-031-41264-6_16.

M. Buiten, A. de Streel, M. Peitz, The Law and Economics of AI Liability. Computer Law and Security Report 48, 105794 (2023); https://doi.org/10.1016/j.clsr.2023.105794.

T. Miller, Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence 267, 1–38 (2019); https://doi.org/10.1016/j.artint.2018.07.007.

F. Doshi-Velez, B. Kim, Towards A Rigorous Science of Interpretable Machine Learning, arXiv [stat.ML] (2017); http://arxiv.org/abs/1702.08608.

Z. C. Lipton, The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability Is Both Important and Slippery. ACM Queue: Tomorrow’s Computing Today 16, 31–57 (2018); https://doi.org/10.1145/3236386.3241340.

T. Räuker, A. Ho, S. Casper, D. Hadfield-Menell, Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks, arXiv [cs.LG] (2022); http://arxiv.org/abs/2207.13243.

M. Busuioc, Accountable Artificial Intelligence: Holding Algorithms to Account. Public Administration Review 81, 825–836 (2021); https://doi.org/10.1111/puar.13293.

F. Doshi-Velez, M. Kortz, R. Budish, C. Bavitz, S. J. Gershman, D. O’Brien, K. Scott, S. Shieber, J. Waldo, D. Weinberger, A. Weller, A. Wood, “Accountability of AI Under the Law: The Role of Explanation” (Berkman Klein Center Working Group on Explanation and the Law, 2017); http://nrs.harvard.edu/urn-3:HUL.InstRepos:34372584.

R. Palin, I. Habli, “Assurance of Automotive Safety – A Safety Case Approach” in Computer Safety, Reliability, and Security (SAFECOMP 2010), E. Schoitsch, Ed. (Springer, Berlin, Heidelberg, 2010)Lecture Notes in Computer Science (LNPSE), pp. 82–96; https://doi.org/10.1007/978-3-642-15651-9_7.

I. I. Livshitz, P. A. Lontsikh, N. P. Lontsikh, E. Y. Golovina, O. M. Safonova, “A Study of Modern Risk Management Methods for Industrial Safety Assurance in the Fuel and Energy Industry” in 2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS) (2021), pp. 165–167; https://doi.org/10.1109/ITQMIS53292.2021.9642791.

M. L. Cummings, Rethinking the Maturity of Artificial Intelligence in Safety-Critical Settings. AI Magazine 42, 6–15 (2021); https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/7394.

N. Kolt, Governing AI Agents (2024); https://doi.org/10.2139/ssrn.4772956.

P. Verdegem, Dismantling AI Capitalism: The Commons as an Alternative to the Power Concentration of Big Tech. AI & Society 39, 1–11 (2022); https://doi.org/10.1007/s00146-022-01437-8.

K. Crawford, Atlas of AI, Yale University Press London (2021); https://yalebooks.co.uk/9780300264630/atlas-of-ai.

J. Angwin, A. Nelson, R. Palta, “Seeking Reliable Election Information? Don’t Trust AI” (The AI Democracy Projects, 2024); https://www.proofnews.org/seeking-election-information-dont-trust-ai/.

H. Shen, A. DeVos, M. Eslami, K. Holstein, Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors. Proceedings of the ACM on Human-Computer Interaction 5, 1–29 (2021); https://doi.org/10.1145/3479577.

G. Abercrombie, D. Benbouzid, P. Giudici, D. Golpayegani, J. Hernandez, P. Noro, H. Pandit, E. Paraschou, C. Pownall, J. Prajapati, M. A. Sayre, U. Sengupta, A. Suriyawongkul, R. Thelot, S. Vei, L. Waltersdorfer, A Collaborative, Human-Centred Taxonomy of AI, Algorithmic, and Automation Harms, arXiv [cs.LG] (2024); http://arxiv.org/abs/2407.01294.

J. Molloy, S. Shahbeigi, J. A. McDermid, Hazard and Safety Analysis of Machine-Learning-Based Perception Capabilities in Autonomous Vehicles. Computer 57, 60–70 (2024); https://doi.org/10.1109/mc.2024.3443751.

Y. Jia, T. Lawton, J. Burden, J. McDermid, I. Habli, Safety-Driven Design of Machine Learning for Sepsis Treatment. Journal of Biomedical Informatics 117, 103762 (2021); https://doi.org/10.1016/j.jbi.2021.103762.

R. Hawkins, C. Picardi, L. Donnell, M. Ireland, Creating a Safety Assurance Case for a Machine Learned Satellite-Based Wildfire Detection and Alert System. Journal of Intelligent & Robotic Systems 108, 1–21 (2023); https://doi.org/10.1007/s10846-023-01905-3.

P. Festor, Y. Jia, A. C. Gordon, A. A. Faisal, I. Habli, M. Komorowski, Assuring the Safety of AI-Based Clinical Decision Support Systems: A Case Study of the AI Clinician for Sepsis Treatment. BMJ Health & Care Informatics 29, e100549 (2022); https://doi.org/10.1136/bmjhci-2022-100549.

Department for Science, Innovation & Technology, “Frontier AI Safety Commitments, AI Seoul Summit 2024” (GOV.UK, 2024); https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-

References

summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024.

R. Schwartz, J. Fiscus, K. Greene, G. Waters, R. Chowdhury, T. Jensen, C. Greenberg, A. Godil, R. Amironesei, P. Hall, S. Jain, “The NIST Assessing Risks and Impacts of AI (ARIA) Pilot Evaluation Plan” (US National Institute of Standards and Technology, 2024); https://ai-challenges.nist.gov/uassets/7.

C. G. Northcutt, A. Athalye, J. Mueller, “Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks” in 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Datasets and Benchmarks Track (Round 1) (Virtual, 2021); https://openreview.net/forum?id=XccDXrDNLek.

Z. Xiao, S. Zhang, V. Lai, Q. V. Liao, Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics Using Measurement Theory (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.emnlp-main.676.

M. Sclar, Y. Choi, Y. Tsvetkov, A. Suhr, Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I Learned to Start Worrying about Prompt Formatting, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.11324.

B. Shu, L. Zhang, M. Choi, L. Dunagan, L. Logeswaran, M. Lee, D. Card, D. Jurgens, “You Don’t Need a Personality Test to Know These Models Are Unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024), pp. 5263–5281; https://doi.org/10.18653/v1/2024.naacl-long.295.

A. Bavaresco, R. Bernardi, L. Bertolazzi, D. Elliott, R. Fernández, A. Gatt, E. Ghaleb, M. Giulianelli, M. Hanna, A. Koller, A. F. T. Martins, P. Mondorf, V. Neplenbroek, S. Pezzelle, B. Plank, D. Schlangen, A. Suglia, … A. Testoni, LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.18403.

ISACA, “The Risk IT Framework” (2009); https://www.hci-itil.com/ITIL_v3/docs/RiskIT_FW_30June2010_Research.pdf.

US AI Safety Institute, UK AI Safety Institute, “US AISI and UK AISI Joint Pre-Deployment Test” (National Institute of Standards and Technology; Department of Science Innovation and Technology, 2024); https://www.nist.gov/system/files/documents/2024/11/19/Upgraded%20Sonnet-Publication-US.pdf.

G. Leech, J. J. Vazquez, N. Kupper, M. Yagudin, L. Aitchison, Questionable Practices in Machine Learning, arXiv [cs.LG] (2024); http://arxiv.org/abs/2407.12220.

* L. Madaan, A. K. Singh, R. Schaeffer, A. Poulton, S. Koyejo, P. Stenetorp, S. Narang, D. Hupkes, Quantifying Variance in Evaluation Benchmarks, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.10229.

C. Xu, S. Guan, D. Greene, M.-T. Kechadi, Benchmark Data Contamination of Large Language Models: A Survey, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.04244.

Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, W. Ye, Y. Zhang, Y. Chang, P. S. Yu, Q. Yang, X. Xie, A Survey on Evaluation of Large Language Models. ACM Transactions on Intelligent Systems and Technology 15, 39:1–39:45 (2024); https://doi.org/10.1145/3641289.

* W. Zhong, R. Cui, Y. Guo, Y. Liang, S. Lu, Y. Wang, A. Saied, W. Chen, N. Duan, AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2304.06364.

L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, H. Zhang, J. E. Gonzalez, I. Stoica, “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Datasets and Benchmarks Track (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=uccHPGDlao.

* S. Yao, N. Shinn, P. Razavi, K. Narasimhan, τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains, arXiv [cs.AI] (2024); http://arxiv.org/abs/2406.12045.

P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, B. Newman, B. Yuan, B. Yan, C. Zhang, C. A. Cosgrove, C. D. Manning, C. Re, … Y. Koreeda, Holistic Evaluation of Language Models. Transactions on Machine Learning Research (2023); https://openreview.net/forum?id=iO4LZibEqW.

A. Reuel, A. Hardy, C. Smith, M. Lamparth, M. Hardy, M. J. Kochenderfer, BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices, arXiv [cs.AI] (2024); http://arxiv.org/abs/2411.12990.

* E. Miller, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations, arXiv [stat.AP] (2024); http://arxiv.org/abs/2411.00640.

* N. Sambasivan, E. Arnesen, B. Hutchinson, V. Prabhakaran, Non-Portability of Algorithmic Fairness in India, arXiv

References

[cs.CY] (2020); http://arxiv.org/abs/2012.03659.

I. O. Gallegos, R. A. Rossi, J. Barrow, M. M. Tanjim, S. Kim, F. Dernoncourt, T. Yu, R. Zhang, N. K. Ahmed, Bias and Fairness in Large Language Models: A Survey. Computational Linguistics (Association for Computational Linguistics) 50, 1–83 (2024); https://doi.org/10.1162/coli_a_00524.

K. Charmaz, Constructing Grounded Theory (SAGE Publications, Thousand Oaks, CA, 2014).

T. Shin, Y. Razeghi, R. L. Logan IV, E. Wallace, S. Singh, “AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), B. Webber, T. Cohn, Y. He, Y. Liu, Eds. (Association for Computational Linguistics, Online, 2020), pp. 4222–4235; https://doi.org/10.18653/v1/2020.emnlp-main.346.

E. Perez, S. Huang, F. Song, T. Cai, R. Ring, J. Aslanides, A. Glaese, N. McAleese, G. Irving, “Red Teaming Language Models with Language Models” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Y. Goldberg, Z. Kozareva, Y. Zhang, Eds. (Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022), pp. 3419–3448; https://doi.org/10.18653/v1/2022.emnlp-main.225.

* D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, … J. Clark, “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned” (Anthropic, 2022); http://arxiv.org/abs/2209.07858.

S. Casper, J. Lin, J. Kwon, G. Culp, D. Hadfield-Menell, Explore, Establish, Exploit: Red Teaming Language Models from Scratch, arXiv [cs.CL] (2023); http://arxiv.org/abs/2306.09442.

S. Tong, E. Jones, J. Steinhardt, “Mass-Producing Failures of Multimodal Systems with Language Models” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=T6iiOqsGOh.

M. Mazeika, L. Phan, X. Yin, A. Zou, Z. Wang, N. Mu, E. Sakhaee, N. Li, S. Basart, B. Li, D. Forsyth, D. Hendrycks, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal, arXiv [cs.LG] (2024); http://arxiv.org/abs/2402.04249.

P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, E. Wong, Jailbreaking Black Box Large Language Models in Twenty Queries, arXiv [cs.LG] (2023); http://arxiv.org/abs/2310.08419.

D. Ziegler, S. Nix, L. Chan, T. Bauman, P. Schmidt-Nielsen, T. Lin, A. Scherlis, N. Nabeshima, B. Weinstein-Raun, D. de Haas, B. Shlegeris, N. Thomas, “Adversarial Training for High-Stakes Reliability” in Advances in Neural Information Processing Systems (NeurIPS 2022) (New Orleans, LA, US, 2022) vol. 35, pp. 9274–9286; https://proceedings.neurips.cc//paper_files/paper/2022/hash/3c44405d619a6920384a45bce876b41e-Abstract-Conference.html.

A. Rao, S. Vashistha, A. Naik, S. Aditya, M. Choudhury, “Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks” in 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (Torino, Italia, 2024); https://doi.org/10.48550/arXiv.2305.14965.

* A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y. Singer, A. Karbasi, Tree of Attacks: Jailbreaking Black-Box LLMs Automatically, arXiv [cs.LG] (2023); http://arxiv.org/abs/2312.02119.

T. D. Pala, V. Y. H. Toh, R. Bhardwaj, S. Poria, Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique, arXiv [cs.CL] (2024); http://arxiv.org/abs/2408.10701.

M. Feffer, A. Sinha, Z. C. Lipton, H. Heidari, Red-Teaming for Generative AI: Silver Bullet or Security Theater?, arXiv [cs.CY] (2024); http://arxiv.org/abs/2401.15897.

* L. Weidinger, J. Mellor, B. G. Pegueroles, N. Marchal, R. Kumar, K. Lum, C. Akbulut, M. Diaz, S. Bergman, M. Rodriguez, V. Rieser, W. Isaac, STAR: SocioTechnical Approach to Red Teaming Language Models, arXiv [cs.AI] (2024); http://arxiv.org/abs/2406.11757.

P. Chao, E. Debenedetti, A. Robey, M. Andriushchenko, F. Croce, V. Sehwag, E. Dobriban, N. Flammarion, G. J. Pappas, F. Tramer, H. Hassani, E. Wong, JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models, arXiv [cs.CR] (2024); http://arxiv.org/abs/2404.01318.

US AI Safety Institute, “Managing Misuse Risk for Dual-Use Foundation Models” (NIST, 2024); https://doi.org/10.6028/nist.ai.800-1.ipd.

W. Tann, Y. Liu, J. H. Sim, C. M. Seah, E.-C. Chang, Using Large Language Models for Cybersecurity Capture-the-Flag Challenges and Certification Questions, arXiv [cs.AI] (2023); http://arxiv.org/abs/2308.10443.

D. Kang, X. Li, I. Stoica, C. Guestrin, M. Zaharia, T. Hashimoto, “Exploiting Programmatic Behavior of LLMs: Dual-

References

Use through Standard Security Attacks” in 2024 IEEE Security and Privacy Workshops (SPW) (IEEE, 2024), pp. 132–143; https://doi.org/10.1109/spw63631.2024.00018.

F. N. Motlagh, M. Hajizadeh, M. Majd, P. Najafi, F. Cheng, C. Meinel, Large Language Models in Cybersecurity: State-of-the-Art, arXiv [cs.CR] (2024); http://arxiv.org/abs/2402.00891.

A. Hagerty, I. Rubinov, Global AI Ethics: A Review of the Social Impacts and Ethical Implications of Artificial Intelligence, arXiv [cs.CY] (2019); http://arxiv.org/abs/1907.07892.

M. M. Maas, “Aligning AI Regulation to Sociotechnical Change” in The Oxford Handbook of AI Governance, J. B. Bullock, Y.-C. Chen, J. Himmelreich, V. M. Hudson, A. Korinek, M. M. Young, B. Zhang, Eds. (Oxford University Press, 2022); https://doi.org/10.1093/oxfordhb/9780197579329.013.22.

D. Dalrymple, J. Skalse, Y. Bengio, S. Russell, M. Tegmark, S. Seshia, S. Omohundro, C. Szegedy, B. Goldhaber, N. Ammann, A. Abate, J. Halpern, C. Barrett, D. Zhao, T. Zhi-Xuan, J. Wing, J. Tenenbaum, Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems, arXiv [cs.AI] (2024); http://arxiv.org/abs/2405.06624.

A. Reuel, B. Bucknall, S. Casper, T. Fist, L. Soder, O. Aarne, L. Hammond, L. Ibrahim, A. Chan, P. Wills, M. Anderljung, B. Garfinkel, L. Heim, A. Trask, G. Mukobi, R. Schaeffer, M. Baker, … R. Trager, Open Problems in Technical AI Governance, arXiv [cs.CY] (2024); http://arxiv.org/abs/2407.14981.

R. Ren, S. Basart, A. Khoja, A. Gatti, L. Phan, X. Yin, M. Mazeika, A. Pan, G. Mukobi, R. H. Kim, S. Fitz, D. Hendrycks, “Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?” in 38th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024); https://openreview.net/pdf?id=YagfTP3RK6.

B. S. Bucknall, R. F. Trager, “Structured Access for Third-Party Research on Frontier AI Models: Investigating Researchers’ Model Access Requirements” (Oxford Martin School, University of Oxford and Center for the Governance of AI, 2023); https://cdn.governance.ai/Structured_Access_for_Third-Party_Research.pdf.

A. Birhane, V. U. Prabhu, E. Kahembwe, Multimodal Datasets: Misogyny, Pornography, and Malignant Stereotypes, arXiv [cs.CY] (2021); http://arxiv.org/abs/2110.01963.

R. Ashmore, R. Calinescu, C. Paterson, Assuring the Machine Learning Lifecycle. ACM Computing Surveys 54, 1–39 (2022); https://doi.org/10.1145/3453444.

S. Casper, X. Davies, C. Shi, T. K. Gilbert, J. Scheurer, J. Rando, R. Freedman, T. Korbak, D. Lindner, P. Freire, T. T. Wang, S. Marks, C.-R. Segerie, M. Carroll, A. Peng, P. Christoffersen, M. Damani, … D. Hadfield-Menell, Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Transactions on Machine Learning Research (2023); https://openreview.net/forum?id=bx24KpJ4Eb.

T. Shevlane, Structured Access: An Emerging Paradigm for Safe AI Deployment, arXiv [cs.AI] (2022); http://arxiv.org/abs/2201.05159.

J. Petrie, O. Aarne, N. Amman, D. Dalrymple, Interim Report: Mechanisms for Flexible Hardware-Enabled Guarantees. (2024); https://yoshuabengio.org/wp-content/uploads/2024/09/FlexHEG-Interim-Report_2024.pdf.

S. Costanza-Chock, I. D. Raji, J. Buolamwini, “Who Audits the Auditors? Recommendations from a Field Scan of the Algorithmic Auditing Ecosystem” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 1571–1583; https://doi.org/10.1145/3531146.3533213.

M. Feffer, M. Skirpan, Z. Lipton, H. Heidari, “From Preference Elicitation to Participatory ML: A Critical Survey & Guidelines for Future Research” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23) (ACM, Montréal QC Canada, 2023), pp. 38–48; https://doi.org/10.1145/3600211.3604661.

F. Delgado, S. Yang, M. Madaio, Q. Yang, “The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice” in Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’23) (Association for Computing Machinery, New York, NY, USA, 2023), pp. 1–23; https://doi.org/10.1145/3617694.3623261.

J. Metcalf, E. Moss, E. A. Watkins, R. Singh, M. C. Elish, “Algorithmic Impact Assessments and Accountability: The Co-Construction of Impacts” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21) (Association for Computing Machinery, New York, NY, USA, 2021), pp. 735–746; https://doi.org/10.1145/3442188.3445935.

D. Martin Jr, V. Prabhakaran, J. Kuhlberg, A. Smart, W. S. Isaac, “Participatory Problem Formulation for Fairer Machine Learning Through Community Based System Dynamics” in ICLR Workshop on Machine Learning in Real Life (2020); https://doi.org/10.48550/arXiv.2005.07572.

References

S. Fazelpour, M. De-Arteaga, Diversity in Sociotechnical Machine Learning Systems. Big Data & Society 9, 205395172210820 (2022); https://doi.org/10.1177/20539517221082027.

C. Knight, Reflective Equilibrium. (2023); https://plato.stanford.edu/entries/reflective-equilibrium/.

P. Kalluri, Don’t Ask If Artificial Intelligence Is Good or Fair, Ask How It Shifts Power. Nature 583, 169 (2020); https://doi.org/10.1038/d41586-020-02003-2.

R. Dobbe, T. Krendl Gilbert, Y. Mintz, Hard Choices in Artificial Intelligence. Artificial Intelligence 300, 103555 (2021); https://doi.org/10.1016/j.artint.2021.103555.

* S. Fort, B. Lakshminarayanan, Ensemble Everything Everywhere: Multi-Scale Aggregation for Adversarial Robustness, arXiv [cs.CV] (2024); http://arxiv.org/abs/2408.05446.

A. Zou, L. Phan, J. Wang, D. Duenas, M. Lin, M. Andriushchenko, R. Wang, Z. Kolter, M. Fredrikson, D. Hendrycks, Improving Alignment and Robustness with Circuit Breakers, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.04313.

M. Williams, M. Carroll, A. Narang, C. Weisser, B. Murphy, A. Dragan, On Targeted Manipulation and Deception When Optimizing LLMs for User Feedback, arXiv [cs.LG] (2024); http://arxiv.org/abs/2411.02306.

S. Arnesen, D. Rein, J. Michael, Training Language Models to Win Debates with Self-Play Improves Judge Accuracy, arXiv [cs.CL] (2024); http://arxiv.org/abs/2409.16636.

Z. Kenton, N. Y. Siegel, J. Kramar, J. Brown-Cohen, S. Albanie, J. Bulian, R. Agarwal, D. Lindner, Y. Tang, N. Goodman, R. Shah, “On Scalable Oversight with Weak LLMs Judging Strong LLMs” in 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) (2024); https://openreview.net/forum?id=O1fp9nVraj.

A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowski, S. Goel, N. Li, M. J. Byun, Z. Wang, A. Mallen, S. Basart, S. Koyejo, … D. Hendrycks, Representation Engineering: A Top-Down Approach to AI Transparency, arXiv [cs.LG] (2023); http://arxiv.org/abs/2310.01405.

S. Casper, L. Schulze, O. Patel, D. Hadfield-Menell, Defending Against Unforeseen Failure Modes with Latent Adversarial Training, arXiv [cs.CR] (2024); http://arxiv.org/abs/2403.05030.

T. R. Shaham, S. Schwettmann, F. Wang, A. Rajaram, E. Hernandez, J. Andreas, A. Torralba, A Multimodal Automated Interpretability Agent (2024); https://openreview.net/forum?id=mDw42ZanmE.

* Z. Kenton, T. Everitt, L. Weidinger, I. Gabriel, V. Mikulik, G. Irving, “Alignment of Language Agents” (Google DeepMind, 2021); http://arxiv.org/abs/2103.14659.

* C. Burns, P. Izmailov, J. H. Kirchner, B. Baker, L. Gao, L. Aschenbrenner, Y. Chen, A. Ecoffet, M. Joglekar, J. Leike, I. Sutskever, J. Wu, Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision, arXiv [cs.CL] (2023); http://arxiv.org/abs/2312.09390.

* J. Michael, S. Mahdi, D. Rein, J. Petty, J. Dirani, V. Padmakumar, S. R. Bowman, Debate Helps Supervise Unreliable Experts, arXiv [cs.AI] (2023); http://arxiv.org/abs/2311.08702.

Y. Bengio, M. K. Cohen, N. Malkin, M. MacDermott, D. Fornasiere, P. Greiner, Y. Kaddar, Can a Bayesian Oracle Prevent Harm from an Agent?, arXiv [cs.AI] (2024); http://arxiv.org/abs/2408.05284.

M. Wu, A. F. Aji, Style Over Substance: Evaluation Biases for Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2307.03025.

* N. Lambert, R. Calandra, The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback, arXiv [cs.LG] (2023); http://arxiv.org/abs/2311.00168.

H. Bansal, J. Dang, A. Grover, “Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=dKl6lMwbCy.

* J. Uesato, N. Kushman, R. Kumar, F. Song, N. Siegel, L. Wang, A. Creswell, G. Irving, I. Higgins, “Solving Math Word Problems with Process- and Outcome-Based Feedback” (Google Deepmind, 2022); https://doi.org/10.48550/arXiv.2211.14275.

H. Lightman, V. Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, K. Cobbe, “Let’s Verify Step by Step” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=v8L0pN6EOi.

Z. Wu, Y. Hu, W. Shi, N. Dziri, A. Suhr, P. Ammanabrolu, N. A. Smith, M. Ostendorf, H. Hajishirzi, “Fine-Grained Human Feedback Gives Better Rewards for Language Model Training” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=CSbGXyCswu.

Z. Li, The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination, arXiv

References

[cs.CY] (2023); http://arxiv.org/abs/2304.14347.

* A. Askell, Y. Bai, A. Chen, D. Drain, D. Ganguli, T. Henighan, A. Jones, N. Joseph, B. Mann, N. DasSarma, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, J. Kernion, K. Ndousse, C. Olsson, D. Amodei, … J. Kaplan, A General Language Assistant as a Laboratory for Alignment, arXiv [cs.CL] (2021); http://arxiv.org/abs/2112.00861.

K. Shuster, S. Poff, M. Chen, D. Kiela, J. Weston, “Retrieval Augmentation Reduces Hallucination in Conversation” in Findings of the Association for Computational Linguistics: EMNLP 2021, M.-F. Moens, X. Huang, L. Specia, S. W.-T. Yih, Eds. (Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021), pp. 3784–3803; https://doi.org/10.18653/v1/2021.findings-emnlp.320.

L. Kuhn, Y. Gal, S. Farquhar, “Semantic Uncertainty: Linguistic Invariances For Uncertainty Estimation in Natural Language Generation” in 11th International Conference on Learning Representations (ICLR 2023) (Kigali, Rwanda, 2023); https://openreview.net/forum?id=VD-AYtP0dve.

S. Min, K. Krishna, X. Lyu, M. Lewis, W.-T. Yih, P. Koh, M. Iyyer, L. Zettlemoyer, H. Hajishirzi, “FActScore: Fine-Grained Atomic Evaluation of Factual Precision in Long Form Text Generation” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA, USA, 2023), pp. 12076–12100; https://doi.org/10.18653/v1/2023.emnlp-main.741.

L. Chen, A. Perez-Lebel, F. M. Suchanek, G. Varoquaux, Reconfidencing LLMs from the Grouping Loss Perspective, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.04957.

D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T. Zhu, S. Parajuli, M. Guo, D. Song, J. Steinhardt, J. Gilmer, “The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 8320–8329; https://doi.org/10.1109/ICCV48922.2021.00823.

* S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield-Dodds, N. DasSarma, E. Tran-Johnson, S. Johnston, S. El-Showk, A. Jones, N. Elhage, T. Hume, A. Chen, Y. Bai, … J. Kaplan, Language Models (mostly) Know What They Know, arXiv [cs.CL] (2022); http://arxiv.org/abs/2207.05221.

* Y. A. Yadkori, I. Kuzborskij, A. György, C. Szepesvári, To Believe or Not to Believe Your LLM, arXiv [cs.LG] (2024); http://arxiv.org/abs/2406.02543.

S. Marks, C. Rager, E. J. Michaud, Y. Belinkov, D. Bau, A. Mueller, Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models, arXiv [cs.LG] (2024); http://arxiv.org/abs/2403.19647.

* T. Lieberum, M. Rahtz, J. Kramár, N. Nanda, G. Irving, R. Shah, V. Mikulik, “Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla” (Google Deepmind, 2023); https://doi.org/10.48550/arXiv.2307.09458.

E. Mitchell, C. Lin, A. Bosselut, C. D. Manning, C. Finn, “Memory-Based Model Editing at Scale” in Proceedings of the 39th International Conference on Machine Learning (PMLR, 2022), pp. 15817–15831; https://proceedings.mlr.press/v162/mitchell22a.html.

K. Meng, A. S. Sharma, A. J. Andonian, Y. Belinkov, D. Bau, “Mass-Editing Memory in a Transformer” in 11th International Conference on Learning Representations (ICLR 2023) (Kigali, Rwanda, 2022); https://openreview.net/forum?id=MkbcAHIYgyS.

Y. Gandelsman, A. A. Efros, J. Steinhardt, “Interpreting CLIP’s Image Representation via Text-Based Decomposition” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=5Ca9sSzuDp.

C. Tan, G. Zhang, J. Fu, “Massive Editing for Large Language Models via Meta Learning” in The 12th International Conference on Learning Representations (ICLR 2024) (Vienna, Austria, 2023); https://openreview.net/forum?id=L6L1CJQ2PE.

S. Wang, Y. Zhu, H. Liu, Z. Zheng, C. Chen, J. Li, Knowledge Editing for Large Language Models: A Survey, arXiv [cs.CL] (2023); http://arxiv.org/abs/2310.16218.

A. Ghorbani, J. Y. Zou, “Neuron Shapley: Discovering the Responsible Neurons” in Advances in Neural Information Processing Systems (NeurIPS 2020) (Curran Associates, Inc., 2020) vol. 33, pp. 5922–5932; https://proceedings.neurips.cc/paper/2020/hash/41c542dfe6e4fc3deb251d64cf6ed2e4-Abstract.html.

X. Wu, J. Li, M. Xu, W. Dong, S. Wu, C. Bian, D. Xiong, “DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), H. Bouamor, J. Pino, K. Bali, Eds. (Association for Computational Linguistics, Gateway, Singapore , 2023), pp. 2875–2886; https://doi.org/10.18653/v1/2023.emnlp-main.174.

K. Li, O. Patel, F. Viégas, H. Pfister, M. Wattenberg, “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans,

References

LA, USA, 2023); https://openreview.net/forum?id=aLLuYpn83y.

N. Belrose, D. Schneider-Joseph, S. Ravfogel, R. Cotterell, E. Raff, S. Biderman, “LEACE: Perfect Linear Concept Erasure in Closed Form” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (New Orleans, LA, USA, 2023); https://openreview.net/forum?id=awIpKpwTwF&noteId=Ju4XcafMir.

A. M. Turner, L. Thiergart, D. Udell, G. Leech, U. Mini, M. MacDiarmid, Activation Addition: Steering Language Models Without Optimization, arXiv [cs.CL] (2023); http://arxiv.org/abs/2308.10248.

E. Hernandez, B. Z. Li, J. Andreas, Inspecting and Editing Knowledge Representations in Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2304.00740.

D. Brown, C. Godfrey, C. Nizinski, J. Tu, H. Kvinge, “Robustness of Edited Neural Networks” in ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo 2023) (Kigali, Rwanda, 2023); https://openreview.net/forum?id=JAjH6VANZ4.

* C. Anil, E. Durmus, M. Sharma, J. Benton, S. Kundu, J. Batson, N. Rimsky, M. Tong, J. Mu, D. Ford, F. Mosconi, R. Agrawal, R. Schaeffer, N. Bashkansky, S. Svenningsen, M. Lambert, A. Radhakrishnan, … D. Duvenaud, “Many-Shot Jailbreaking” (Anthropic, 2024); https://www-cdn.anthropic.com/af5633c94ed2beb282f6a53c595eb437e8e7b630/Many_Shot_Jailbreaking__2024_04_02_0936.pdf.

Y. Deng, W. Zhang, S. J. Pan, L. Bing, “Multilingual Jailbreak Challenges in Large Language Models” in 12th International Conference on Learning Representations (2024); https://openreview.net/forum?id=vESNKdEMGp.

Y. Yuan, W. Jiao, W. Wang, J.-T. Huang, P. He, S. Shi, Z. Tu, “GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher” in 12th International Conference on Learning Representations (2024); https://openreview.net/forum?id=MbfAK4s61A.

P. Ding, J. Kuang, D. Ma, X. Cao, Y. Xian, J. Chen, S. Huang, “A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts Can Fool Large Language Models Easily” in North American Chapter of the Association for Computational Linguistics (2023); https://api.semanticscholar.org/CorpusID:265664913.

Z. Wei, Y. Wang, A. Li, Y. Mo, Y. Wang, Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations, arXiv [cs.LG] (2023); http://arxiv.org/abs/2310.06387.

* M. Russinovich, A. Salem, R. Eldan, Great, Now Write an Article about That: The Crescendo Multi-Turn LLM Jailbreak Attack, arXiv [cs.CR] (2024); http://arxiv.org/abs/2404.01833.

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks” in The 6th International Conference on Learning Representations (ICLR 2018) (Vancouver, BC, Canada, 2018); https://openreview.net/forum?id=rJzIBfZAb.

S. Friedler, R. Singh, B. Blili-Hamelin, J. Metcalf, B. J. Chen, “AI Red-Teaming Is Not a One-Stop Solution to AI Harms: Recommendations for Using Red-Teaming for AI Accountability” (Data & Society, 2023); https://datasociety.net/library/ai-red-teaming-is-not-a-one-stop-solution-to-ai-harms-recommendations-for-using-red-teaming-for-ai-accountability/.

N. Jain, A. Schwarzschild, Y. Wen, G. Somepalli, J. Kirchenbauer, P.-Y. Chiang, M. Goldblum, A. Saha, J. Geiping, T. Goldstein, Baseline Defenses for Adversarial Attacks Against Aligned Language Models, arXiv [cs.LG] (2023); http://arxiv.org/abs/2309.00614.

S. Lee, M. Kim, L. Cherif, D. Dobre, J. Lee, S. J. Hwang, K. Kawaguchi, G. Gidel, Y. Bengio, N. Malkin, M. Jain, Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.18540.

A. Peng, J. Michael, H. Sleight, E. Perez, M. Sharma, Rapid Response: Mitigating LLM Jailbreaks with a Few Examples, arXiv [cs.CL] (2024); http://arxiv.org/abs/2411.07494.

Z. Liu, G. Dou, Z. Tan, Y. Tian, M. Jiang, Towards Safer Large Language Models through Machine Unlearning, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.10058.

A. Lynch, P. Guo, A. Ewart, S. Casper, D. Hadfield-Menell, Eight Methods to Evaluate Robust Unlearning in LLMs, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.16835.

D. Gamage, J. Chen, K. Sasahara, “The Emergence of Deepfakes and Its Societal Implications: A Systematic Review” in Conference for Truth and Trust Online 2021 (2021), pp. 28–39; https://www.researchgate.net/publication/355583941_The_Emergence_of_Deepfakes_and_its_Societal_Implications_A_Systematic_Review.

A. Kaushal, A. Mina, A. Meena, T. H. Babu, “The Societal Impact of Deepfakes: Advances in Detection and Mitigation” in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) (2023), pp. 1–7; https://doi.org/10.1109/ICCCNT56998.2023.10307353.

References

F. Romero Moreno, Generative AI and Deepfakes: A Human Rights Approach to Tackling Harmful Content. International Review of Law Computers & Technology 38, 297–326 (2024); https://doi.org/10.1080/13600869.2024.2324540.

R. Tang, Y.-N. Chuang, X. Hu, The Science of Detecting LLM-Generated Text. Communications of the ACM 67, 50–59 (04/2024); https://doi.org/10.1145/3624725.

K. Krishna, Y. Song, M. Karpinska, J. F. Wieting, M. Iyyer, “Paraphrasing Evades Detectors of AI-Generated Text, but Retrieval Is an Effective Defense” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023); https://openreview.net/pdf?id=WbFhFvjjKj.

L. Lin, N. Gupta, Y. Zhang, H. Ren, C.-H. Liu, F. Ding, X. Wang, X. Li, L. Verdoliva, S. Hu, Detecting Multimedia Generated by Large AI Models: A Survey, arXiv [cs.MM] (2024); http://arxiv.org/abs/2402.00045.

R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, L. Verdoliva, “On The Detection of Synthetic Images Generated by Diffusion Models” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), pp. 1–5; https://doi.org/10.1109/ICASSP49357.2023.10095167.

U. Ojha, Y. Li, Y. J. Lee, “Towards Universal Fake Image Detectors That Generalize Across Generative Models” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, 2023), pp. 24480–24489; https://doi.org/10.1109/CVPR52729.2023.02345.

H. B. Wee, J. D. Reimer, Non-English Academics Face Inequality via AI-Generated Essays and Countermeasure Tools. Bioscience 73, 476–478 (2023); https://doi.org/10.1093/biosci/biad034.

Y. Zhao, T. Pang, C. Du, X. Yang, N.-M. Cheung, M. Lin, A Recipe for Watermarking Diffusion Models, arXiv [cs.CV] (2023); http://arxiv.org/abs/2303.10137.

M. Christ, S. Gunn, O. Zamir, “Undetectable Watermarks for Language Models” in Proceedings of 37th Conference on Learning Theory, S. Agrawal, A. Roth, Eds. (PMLR, 2024) vol. 247 of Proceedings of Machine Learning Research, pp. 1125–1139; https://proceedings.mlr.press/v247/christ24a.html.

J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, T. Goldstein, “A Watermark for Large Language Models” in Proceedings of the 40th International Conference on Machine Learning (PMLR, 2023), pp. 17061–17084; https://proceedings.mlr.press/v202/kirchenbauer23a.html.

Y. Liu, Y. Bu, “Adaptive Text Watermark for Large Language Models” in Forty-First International Conference on Machine Learning (2024); https://openreview.net/forum?id=7emOSb5UfX.

A. Liu, L. Pan, Y. Lu, J. Li, X. Hu, X. Zhang, L. Wen, I. King, H. Xiong, P. S. Yu, A Survey of Text Watermarking in the Era of Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2312.07913.

H. Zhang, B. L. Edelman, D. Francati, D. Venturi, G. Ateniese, B. Barak, Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models, arXiv [cs.LG] (2023); http://arxiv.org/abs/2311.04378.

A. Knott, D. Pedreschi, R. Chatila, T. Chakraborti, S. Leavy, R. Baeza-Yates, D. Eyers, A. Trotman, P. D. Teal, P. Biecek, S. Russell, Y. Bengio, Generative AI Models Should Include Detection Mechanisms as a Condition for Public Release. Ethics and Information Technology 25, 55 (2023); https://doi.org/10.1007/s10676-023-09728-4.

C2PA, Overview (2022); https://c2pa.org/.

AI for Good, AI and Multimedia Authenticity Standards Collaboration (2024); https://aiforgood.itu.int/multimedia-authenticity/.

A. Al-Dhaqm, R. A. Ikuesan, V. R. Kebande, S. A. Razak, G. Grispos, K.-K. R. Choo, B. A. S. Al-Rimy, A. A. Alsewari, Digital Forensics Subdomains: The State of the Art and Future Directions. IEEE Access 9, 152476–152502 (2021); https://doi.org/10.1109/ACCESS.2021.3124262.

F. Casino, T. K. Dasaklis, G. P. Spathoulas, M. Anagnostopoulos, A. Ghosal, I. Borocz, A. Solanas, M. Conti, C. Patsakis, Research Trends, Challenges, and Emerging Topics in Digital Forensics: A Review of Reviews. IEEE Access 10, 25464–25493 (2022); https://doi.org/10.1109/ACCESS.2022.3154059.

H. R. Hasan, K. Salah, Combating Deepfake Videos Using Blockchain and Smart Contracts. IEEE Access: Practical Innovations, Open Solutions 7, 41596–41606 (2019); https://doi.org/10.1109/access.2019.2905689.

C. C. Ki Chan, V. Kumar, S. Delaney, M. Gochoo, “Combating Deepfakes: Multi-LSTM and Blockchain as Proof of Authenticity for Digital Media” in 2020 IEEE / ITU International Conference on Artificial Intelligence for Good (AI4G) (IEEE, 2020); https://doi.org/10.1109/ai4g50087.2020.9311067.

P. Fraga-Lamas, T. M. Fernández-Caramés, Fake News, Disinformation, and Deepfakes: Leveraging Distributed Ledger Technologies and Blockchain to Combat Digital Deception and Counterfeit Reality, arXiv [cs.CY] (2019); http://dx.doi.org/10.1109/MITP.2020.2977589.

S. Mohammad Niyaz Khan, J. Mohd Ghazali, L. Q. Zakaria, S. N. Ahmad, K. A. Elias, Various Image Classification Using Certain Exchangeable Image File Format (EXIF) Metadata of Images. Malaysian Journal of Information and

References

Communication Technology (MyJICT), 1–12 (2018); https://doi.org/10.53840/myjict3-1-33.

A. Chan, C. Ezell, M. Kaufmann, K. Wei, L. Hammond, H. Bradley, E. Bluemke, N. Rajkumar, D. Krueger, N. Kolt, L. Heim, M. Anderljung, “Visibility into AI Agents” in The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2024); https://doi.org/10.1145/3630106.3658948.

A. Chan, N. Kolt, P. Wills, U. Anwar, C. S. de Witt, N. Rajkumar, L. Hammond, D. Krueger, L. Heim, M. Anderljung, IDs for AI Systems, arXiv [cs.AI] (2024); http://arxiv.org/abs/2406.12137.

B. Pan, N. Stakhanova, S. Ray, Data Provenance in Security and Privacy. ACM Computing Surveys 55, 1–35 (2023); https://doi.org/10.1145/3593294.

E. Laird, M. Dwyer, “Off Task: EdTech Threats to Student Privacy and Equity in the Age of AI” (Center for Democracy and Technology, 2023); https://cdt.org/insights/report-off-task-edtech-threats-to-student-privacy-and-equity-in-the-age-of-ai/.

S. S. El Mokadem, The Effect of Media Literacy on Misinformation and Deep Fake Video Detection. Arab Media & Society (2023); https://www.arabmediasociety.com/the-effect-of-media-literacy-on-misinformation-and-deep-fake-video-detection/.

Y. Hwang, J. Y. Ryu, S.-H. Jeong, Effects of Disinformation Using Deepfake: The Protective Effect of Media Literacy Education. Cyberpsychology, Behavior and Social Networking 24, 188–193 (2021); https://doi.org/10.1089/cyber.2020.0174.

S. Y. Shin, J. Lee, The Effect of Deepfake Video on News Credibility and Corrective Influence of Cost-Based Knowledge about Deepfakes. Digital Journalism 10, 412–432 (2022); https://doi.org/10.1080/21670811.2022.2026797.

S. Qian, C. Shen, J. Zhang, Fighting Cheapfakes: Using a Digital Media Literacy Intervention to Motivate Reverse Search of out-of-Context Visual Misinformation. Journal of Computer-Mediated Communication: JCMC 28 (2022); https://doi.org/10.1093/jcmc/zmac024.

T. Ali, P. Kostakos, HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs), arXiv [cs.CR] (2023); http://arxiv.org/abs/2309.16021.

G. Pang, C. Shen, L. Cao, A. Van Den Hengel, Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys 54, 38:1–38:38 (2021); https://doi.org/10.1145/3439950.

J. Geng, F. Cai, Y. Wang, H. Koeppl, P. Nakov, I. Gurevych, A Survey of Confidence Estimation and Calibration in Large Language Models, arXiv [cs.CL] (2023); http://arxiv.org/abs/2311.08298.

A. Aldahdooh, W. Hamidouche, S. A. Fezza, O. Déforges, Adversarial Example Detection for DNN Models: A Review and Experimental Comparison. Artificial Intelligence Review 55, 4403–4462 (2022); https://doi.org/10.1007/s10462-021-10125-w.

J. Hayase, W. Kong, R. Somani, S. Oh, “SPECTRE: Defending against Backdoor Attacks Using Robust Statistics” in Proceedings of the 38th International Conference on Machine Learning, M. Meila, T. Zhang, Eds. (PMLR, 2021) vol. 139 of Proceedings of Machine Learning Research, pp. 4129–4139; https://proceedings.mlr.press/v139/hayase21a.html.

A. T. Mallen, N. Belrose, “Eliciting Latent Knowledge from Quirky Language Models” in ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models (2024); https://openreview.net/forum?id=Z1531QeqAQ.

* M. MacDiarmid, T. Maxwell, N. Schiefer, J. Mu, J. Kaplan, D. Duvenaud, S. Bowman, A. Tamkin, E. Perez, M. Sharma, C. Denison, E. Hubinger, Simple Probes Can Catch Sleeper Agents (2024); https://www.anthropic.com/news/probes-catch-sleeper-agents.

S. Han, K. Rao, A. Ettinger, L. Jiang, B. Y. Lin, N. Lambert, Y. Choi, N. Dziri, “WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs” in 38th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024); https://openreview.net/forum?id=Ich4tv4202.

R. Greenblatt, B. Shlegeris, K. Sachan, F. Roger, AI Control: Improving Safety Despite Intentional Subversion, arXiv [cs.LG] (2023); http://arxiv.org/abs/2312.06942.

M. Phute, A. Helbling, M. D. Hull, S. Peng, S. Szyller, C. Cornelius, D. H. Chau, “LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked” in The Second Tiny Papers Track at ICLR 2024 (Vienna, Austria, 2024); https://openreview.net/forum?id=YoqgcIA19o.

* H. Inan, K. Upasani, J. Chi, R. Rungta, K. Iyer, Y. Mao, M. Tontchev, Q. Hu, B. Fuller, D. Testuggine, M. Khabsa, Llama Guard: LLM-Based Input-Output Safeguard for Human-AI Conversations, arXiv [cs.CL] (2023); http://arxiv.org/abs/2312.06674.

T. Kim, S. Kotha, A. Raghunathan, Jailbreaking Defenses with the Purple Problem, arXiv [cs.CR] (2024);

References

http://arxiv.org/abs/2403.14725.

S. O. Hansson, M.-Å. Belin, B. Lundgren, Self-Driving Vehicles—an Ethical Overview. Philosophy & Technology 34, 1383–1408 (2021); https://doi.org/10.1007/s13347-021-00464-5.

N. R. Jennings, L. Moreau, D. Nicholson, S. Ramchurn, S. Roberts, T. Rodden, A. Rogers, Human-Agent Collectives. Communications of the ACM 57, 80–88 (2014); https://doi.org/10.1145/2629559.

* A. Dafoe, E. Hughes, Y. Bachrach, T. Collins, K. R. McKee, J. Z. Leibo, K. Larson, T. Graepel, Open Problems in Cooperative AI, arXiv [cs.AI] (2020); http://arxiv.org/abs/2012.08630.

A. Dafoe, Y. Bachrach, G. Hadfield, E. Horvitz, K. Larson, T. Graepel, Cooperative AI: Machines Must Learn to Find Common Ground. Nature 593, 33–36 (2021); https://doi.org/10.1038/d41586-021-01170-0.

D. Hadfield-Menell, A. Dragan, P. Abbeel, S. Russell, “Cooperative Inverse Reinforcement Learning” in Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS 2016) (Curran Associates Inc., Red Hook, NY, USA, 2016), pp. 3916–3924; https://papers.nips.cc/paper_files/paper/2016/hash/c3395dd46c34fa7fd8d729d8cf88b7a8-Abstract.html.

I. Seeber, E. Bittner, R. O. Briggs, T. de Vreede, G.-J. de Vreede, A. Elkins, R. Maier, A. B. Merz, S. Oeste-Reiß, N. Randrup, G. Schwabe, M. Söllner, Machines as Teammates: A Research Agenda on AI in Team Collaboration. Information & Management 57, 103174 (2020); https://doi.org/10.1016/j.im.2019.103174.

R. Shah, P. Freire, N. Alex, R. Freedman, D. Krasheninnikov, L. Chan, M. D. Dennis, P. Abbeel, A. Dragan, S. Russell, Benefits of Assistance over Reward Learning (2020); https://openreview.net/forum?id=DFIoGDZejIB.

S. D. Ramchurn, S. Stein, N. R. Jennings, Trustworthy Human-AI Partnerships. iScience 24, 102891 (2021); https://doi.org/10.1016/j.isci.2021.102891.

X. Wu, L. Xiao, Y. Sun, J. Zhang, T. Ma, L. He, A Survey of Human-in-the-Loop for Machine Learning. Future Generations Computer Systems: FGCS 135, 364–381 (2022); https://doi.org/10.1016/j.future.2022.05.014.

K. L. Mosier, L. J. Skitka, Automation Use and Automation Bias. Proceedings of the Human Factors and Ergonomics Society ... Annual Meeting. Human Factors and Ergonomics Society. Annual Meeting 43, 344–348 (1999); https://doi.org/10.1177/154193129904300346.

J. Babcock, J. Krámar, R. V. Yampolskiy, “Guidelines for Artificial Intelligence Containment” in Next-Generation Ethics: Engineering a Better Society, A. E. Abbas, Ed. (Cambridge University Press, Cambridge, 2019), pp. 90–112; https://doi.org/10.1017/9781108616188.008.

S. G. Patil, T. Zhang, V. Fang, N. C., R. Huang, A. Hao, M. Casado, J. E. Gonzalez, R. A. Popa, I. Stoica, GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications, arXiv [cs.CL] (2024); http://arxiv.org/abs/2404.06921.

J. Gryz, M. Rojszczak, Black Box Algorithms and the Rights of Individuals: No Easy Solution to the “explainability” Problem. Internet Policy Review 10 (2021); https://policyreview.info/articles/analysis/black-box-algorithms-and-rights-individuals-no-easy-solution-explainability.

J. A. McDermid, Y. Jia, Z. Porter, I. Habli, Artificial Intelligence Explainability: The Technical and Ethical Dimensions. Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 379, 20200363 (2021); https://doi.org/10.1098/rsta.2020.0363.

T. Ploug, S. Holm, “Right to Contest AI Diagnostics Defining Transparency and Explainability Requirements from a Patient’s Perspective” in Artificial Intelligence in Medicine (Springer Publishing Company, 2022), pp. 227–238; https://doi.org/10.1007/978-3-030-64573-1_267.

S. H. Tanneru, D. Ley, C. Agarwal, H. Lakkaraju, On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models, arXiv [cs.CL] (2024); http://arxiv.org/abs/2406.10625.

* J. Chua, E. Rees, H. Batra, S. R. Bowman, J. Michael, E. Perez, M. Turpin, Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought, arXiv [cs.CL] (2024); http://arxiv.org/abs/2403.05518.

* A. Radhakrishnan, K. Nguyen, A. Chen, C. Chen, C. Denison, D. Hernandez, E. Durmus, E. Hubinger, J. Kernion, K. Lukošiūtė, N. Cheng, N. Joseph, N. Schiefer, O. Rausch, S. McCandlish, S. El Showk, T. Lanham, … E. Perez, Question Decomposition Improves the Faithfulness of Model-Generated Reasoning, arXiv [cs.CL] (2023); http://arxiv.org/abs/2307.11768.

J. Li, P. Cao, Y. Chen, K. Liu, J. Zhao, Towards Faithful Chain-of-Thought: Large Language Models Are Bridging Reasoners, arXiv [cs.CL] (2024); http://arxiv.org/abs/2405.18915.

D. Paul, R. West, A. Bosselut, B. Faltings, Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning, arXiv [cs.CL] (2024); http://arxiv.org/abs/2402.13950.

A. Saranya, R. Subhashini, A Systematic Review of Explainable Artificial Intelligence Models and Applications: Recent Developments and Future Trends. Decision Analytics Journal 7, 100230 (2023);

References

https://doi.org/10.1016/j.dajour.2023.100230.

H. Zhao, H. Chen, F. Yang, N. Liu, H. Deng, H. Cai, S. Wang, D. Yin, M. Du, Explainability for Large Language Models: A Survey. ACM Transactions on Intelligent Systems and Technology 15, 1–38 (2024); https://doi.org/10.1145/3639372.

S. Casper, C. Ezell, C. Siegmann, N. Kolt, T. L. Curtis, B. Bucknall, A. Haupt, K. Wei, J. Scheurer, M. Hobbhahn, L. Sharkey, S. Krishna, M. Von Hagen, S. Alberti, A. Chan, Q. Sun, M. Gerovitch, … D. Hadfield-Menell, “Black-Box Access Is Insufficient for Rigorous AI Audits” in The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, New York, NY, USA, 2024), pp. 2254–2272; https://doi.org/10.1145/3630106.3659037.

O. Aarne, T. Fist, C. Withers, “Secure, Governable Chips: Using On-Chip Mechanisms to Manage National Security Risks from AI & Advanced Computing” ( Center for a New American Security, 2024); https://s3.us-east-1.amazonaws.com/files.cnas.org/documents/CNAS-Report-Tech-Secure-Chips-Jan-24-finalb.pdf.

G. Kulp, D. Gonzales, E. Smith, L. Heim, P. Puri, M. Vermeer, Z. Winkelman, “Hardware-Enabled Governance Mechanisms” (RAND Corporation, 2024); https://www.rand.org/pubs/working_papers/WRA3056-1.html.

Z. Ghodsi, T. Gu, S. Garg, SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted Cloud. Advances in Neural Information Processing Systems 30 (2017); https://proceedings.neurips.cc/paper_files/paper/2017/file/6048ff4e8cb07aa60b6777b6f7384d52-Paper.pdf.

H. Chen, C. Fu, B. D. Rouhani, J. Zhao, F. Koushanfar, “DeepAttest: An End-to-End Attestation Framework for Deep Neural Networks” in Proceedings of the 46th International Symposium on Computer Architecture (Association for Computing Machinery, New York, NY, USA, 2019)ISCA ’19, pp. 487–498; https://doi.org/10.1145/3307650.3322251.

H. Jia, M. Yaghini, C. A. Choquette-Choo, N. Dullerud, A. Thudi, V. Chandrasekaran, N. Papernot, “Proof-of-Learning: Definitions and Practice” in 2021 IEEE Symposium on Security and Privacy (SP) (IEEE, 2021), pp. 1039–1056; https://doi.org/10.1109/SP40001.2021.00106.

S. Goldwasser, G. N. Rothblum, J. Shafer, A. Yehudayoff, “Interactive Proofs for Verifying Machine Learning” in 12th Innovations in Theoretical Computer Science Conference (ITCS 2021), J. R. Lee, Ed. (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2021) vol. 185 of Leibniz International Proceedings in Informatics (LIPIcs), pp. 41:1–41:19; https://doi.org/10.4230/LIPIcs.ITCS.2021.41.

* Apple, “Apple Platform Security” (Apple, 2024); https://help.apple.com/pdf/security/en_US/apple-platform-security-guide.pdf.

* J. Zhu, H. Yin, P. Deng, A. Almeida, S. Zhou, Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study, arXiv [cs.DC] (2024); http://arxiv.org/abs/2409.03992.

R. Anderson, S. Fuloria, “Who Controls the off Switch?” in 2010 First IEEE International Conference on Smart Grid Communications (IEEE, 2010), pp. 96–101; https://doi.org/10.1109/smartgrid.2010.5622026.

Organisation for Economic Co-Operation and Development, “Emerging Privacy-Enhancing Technologies” (OECD, 2023); https://doi.org/10.1787/bf121be4-en.

N. Subramani, S. Luccioni, J. Dodge, M. Mitchell, “Detecting Personal Information in Training Corpora: An Analysis” in Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), A. Ovalle, K.-W. Chang, N. Mehrabi, Y. Pruksachatkun, A. Galystan, J. Dhamala, A. Verma, T. Cao, A. Kumar, R. Gupta, Eds. (Association for Computational Linguistics, Toronto, Canada, 2023), pp. 208–220; https://doi.org/10.18653/v1/2023.trustnlp-1.18.

Y. Elazar, A. Bhagia, I. H. Magnusson, A. Ravichander, D. Schwenk, A. Suhr, E. P. Walsh, D. Groeneveld, L. Soldaini, S. Singh, H. Hajishirzi, N. A. Smith, J. Dodge, “What’s In My Big Data?” in 12th International Conference on Learning Representations (2024); https://openreview.net/forum?id=RvfPnOkPV4.

A. Narayanan, V. Shmatikov, “Robust De-Anonymization of Large Sparse Datasets” in 2008 IEEE Symposium on Security and Privacy (sp 2008) (2008), pp. 111–125; https://doi.org/10.1109/SP.2008.33.

H. Brown, K. Lee, F. Mireshghallah, R. Shokri, F. Tramèr, “What Does It Mean for a Language Model to Preserve Privacy?” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) (Association for Computing Machinery, New York, NY, USA, 2022), pp. 2280–2292; https://doi.org/10.1145/3531146.3534642.

* S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, G. Mann, BloombergGPT: A Large Language Model for Finance, arXiv [cs.LG] (2023); http://arxiv.org/abs/2303.17564.

G. Penedo, Q. Malartic, D. Hesslow, R. Cojocaru, H. Alobeidli, A. Cappelli, B. Pannier, E. Almazrouei, J. Launay, “The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only” in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Datasets and Benchmarks Track (New Orleans, LA,

References

USA, 2023); https://openreview.net/pdf?id=kM5eGcdCzq.

T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H. D. Iii, K. Crawford, Datasheets for Datasets. Communications of the ACM 64, 86–92 (2021); https://doi.org/10.1145/3458723.

A. Ghorbani, J. Zou, “Data Shapley: Equitable Valuation Of Data for Machine Learning” in Proceedings of the 36th International Conference on Machine Learning (ICML 2019), K. Chaudhuri, R. Salakhutdinov, Eds. (PMLR, New Orleans, LA, USA, 2019) vol. 97 of Proceedings of Machine Learning Research, pp. 2242–2251; https://proceedings.mlr.press/v97/ghorbani19c.html.

T. Li, E. F. Villaronga, P. Kieseberg, Humans Forget, Machines Remember: Artificial Intelligence and the Right to Be Forgotten. Computer Law & Security Review 34, 304 (2018); https://scholarship.law.bu.edu/faculty_scholarship/817.

Z. Zhang, M. Jia, H.-P. Lee, B. Yao, S. Das, A. Lerner, D. Wang, T. Li, “It’s a Fair Game”, or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents, arXiv [cs.HC] (2023); http://dx.doi.org/10.1145/3613904.3642385.

Z. Zhang, C. Shen, B. Yao, D. Wang, T. Li, Secret Use of Large Language Model (LLM), arXiv [cs.HC] (2024); http://arxiv.org/abs/2409.19450.

C. Dwork, F. McSherry, K. Nissim, A. Smith, “Calibrating Noise to Sensitivity in Private Data Analysis” in Theory of Cryptography, S. Halevi, T. Rabin, Eds. (Springer, Berlin, Heidelberg, 2006) vol. 3876 of Lecture Notes in Computer Science; https://doi.org/10.1007/11681878_14.

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, L. Zhang, “Deep Learning with Differential Privacy” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16) (Association for Computing Machinery, New York, NY, USA, 2016), pp. 308–318; https://doi.org/10.1145/2976749.2978318.

* S. De, L. Berrada, J. Hayes, S. L. Smith, B. Balle, “Unlocking High-Accuracy Differentially Private Image Classification through Scale” (Google Deepmind, 2022); http://arxiv.org/abs/2204.13650.

X. Li, F. Tramer, P. Liang, T. Hashimoto, “Large Language Models Can Be Strong Differentially Private Learners” in International Conference on Learning Representations 2022 (Virtual, 2022); https://openreview.net/forum?id=bVuP3ltATMz.

D. Yu, S. Naik, A. Backurs, S. Gopi, H. A. Inan, G. Kamath, J. Kulkarni, Y. T. Lee, A. Manoel, L. Wutschitz, S. Yekhanin, H. Zhang, “Differentially Private Fine-Tuning of Language Models” in International Conference on Learning Representations (2022); https://openreview.net/forum?id=Q42f0dfjECO.

* A. Kurakin, N. Ponomareva, U. Syed, L. MacDermed, A. Terzis, Harnessing Large-Language Models to Generate Private Synthetic Text, arXiv [cs.LG] (2023); http://arxiv.org/abs/2306.01684.

R. Liu, J. Wei, F. Liu, C. Si, Y. Zhang, J. Rao, S. Zheng, D. Peng, D. Yang, D. Zhou, A. M. Dai, “Best Practices and Lessons Learned on Synthetic Data” in First Conference on Language Modeling (2024); https://openreview.net/forum?id=OJaWBhh61C.

A. Yale, S. Dash, R. Dutta, I. Guyon, A. Pavao, K. P. Bennett, “Assessing Privacy and Quality of Synthetic Health Data” in Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse (ACM, New York, NY, USA, 2019); https://doi.org/10.1145/3359115.3359124.

X. Tang, R. Shin, H. A. Inan, A. Manoel, F. Mireshghallah, Z. Lin, S. Gopi, J. Kulkarni, R. Sim, “Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation” in 12th International Conference on Learning Representations (2024); https://openreview.net/forum?id=oZtt0pRnOl.

F. Mireshghallah, Y. Su, T. Hashimoto, J. Eisner, R. Shin, “Privacy-Preserving Domain Adaptation of Semantic Parsers” in ACL (1) (2023), pp. 4950–4970; https://doi.org/10.18653/v1/2023.acl-long.271.

J. Mattern, Z. Jin, B. Weggenmann, B. Schölkopf, M. Sachan, “Differentially Private Language Models for Secure Data Sharing” in EMNLP (2022), pp. 4860–4873; https://aclanthology.org/2022.emnlp-main.323.

T. Stadler, B. Oprisanu, C. Troncoso, “Synthetic Data – Anonymisation Groundhog Day” in 31st USENIX Security Symposium (USENIX Security 22) (USENIX Association, Boston, MA, USA, 2022), pp. 1451–1468; https://www.usenix.org/conference/usenixsecurity22/presentation/stadler.

M. Meeus, F. Guepin, A.-M. Creţu, Y.-A. de Montjoye, “Achilles’ Heels: Vulnerable Record Identification in Synthetic Data Publishing” in 28th European Symposium on Research in Computer Security (ESORICS 2023), G. Tsudik, M. Conti, K. Liang, G. Smaragdakis, Eds. (Springer Nature Switzerland, The Hague, The Netherlands, 2024), pp. 380–399; https://doi.org/10.1007/978-3-031-51476-0_19.

G. Ganev, E. De Cristofaro, On the Inadequacy of Similarity-Based Privacy Metrics: Reconstruction Attacks against "Truly Anonymous Synthetic Data’', arXiv [cs.CR] (2023); http://arxiv.org/abs/2312.05114.

R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, J. Wernsing, “CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy” in Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan, K. Q. Weinberger, Eds. (PMLR, New York, New York, USA, 2016) vol. 48 of Proceedings of Machine Learning Research, pp. 201–210; https://proceedings.mlr.press/v48/gilad-bachrach16.html.

D. Kang, T. Hashimoto, I. Stoica, Y. Sun, “Scaling up Trustless DNN Inference with Zero-Knowledge Proofs” in NeurIPS 2023 Workshop on Regulatable ML (New Orleans, LA, US, 2023); https://openreview.net/forum?id=GjNRF5VTfn.

B. Knott, S. Venkataraman, A. Hannun, S. Sengupta, M. Ibrahim, “CrypTen: Secure Multi-Party Computation Meets Machine Learning” in Advances in Neural Information Processing Systems (Curran Associates, Inc., 2021) vol. 34, pp. 4961–4973; https://papers.neurips.cc/paper/2021/hash/2754518221cfbc8d25c13a06a4cb8421-Abstract.html.

P. Mohassel, Y. Zhang, “SecureML: A System for Scalable Privacy-Preserving Machine Learning” in 2017 IEEE Symposium on Security and Privacy (SP) (IEEE Computer Society, San Jose, CA, USA, 2017), pp. 19–38; https://doi.org/10.1109/SP.2017.12.

O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin, K. Vaswani, M. Costa, “Oblivious Multi-Party Machine Learning on Trusted Processors” in Proceedings of the 25th USENIX Conference on Security Symposium (SEC’16) (USENIX Association, Austin, TX, 2016), pp. 619–636; https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/ohrimenko.

F. Tramer, D. Boneh, “Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware” in International Conference on Learning Representations (2019); https://openreview.net/forum?id=rJVorjCcKQ.

* J. Zhu, H. Yin, P. Deng, A. Almeida, S. Zhou, Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study, arXiv [cs.DC] (2024); http://arxiv.org/abs/2409.03992.

T. South, J. Drean, A. Singh, G. Zyskind, R. Mahari, V. Sharma, P. Vepakomma, L. Kagal, S. Devadas, A. Pentland, “A Roadmap for End-to-End Privacy and Security in Generative AI” (MIT, 2024); https://doi.org/10.21428/e4baedd9.9af67664.

A. Cavoukian, Privacy by Design: The 7 Foundational Principles. (2009); https://privacy.ucsc.edu/resources/privacy-by-design---foundational-principles.pdf.

M. ElBaih, The Role of Privacy Regulations in AI Development (A Discussion of the Ways in Which Privacy Regulations Can Shape the Development of AI) (2023); https://doi.org/10.2139/ssrn.4589207.

E. Rader, R. Wash, B. Brooks, “Stories as Informal Lessons about Security” in Proceedings of the Eighth Symposium on Usable Privacy and Security (ACM, New York, NY, USA, 2012); https://doi.org/10.1145/2335356.2335364.

* J. Lamb, Generative AI in Healthcare: Adoption Trends and What’s next (2024); https://www.mckinsey.com/industries/healthcare/our-insights/generative-ai-in-healthcare-adoption-trends-and-whats-next.

G. Dhanuskodi, S. Guha, V. Krishnan, A. Manjunatha, M. O’Connor, R. Nertney, P. Rogers, Creating the First Confidential GPUs: The Team at NVIDIA Brings Confidentiality and Integrity to User Code and Data for Accelerated Computing. Queueing Systems. Theory and Applications 21, 68–93 (2023); https://doi.org/10.1145/3623393.3623391.

X. Zhou, H. Kim, F. Brahman, L. Jiang, H. Zhu, X. Lu, F. Xu, B. Y. Lin, Y. Choi, N. Mireshghallah, R. L. Bras, M. Sap, HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions, arXiv [cs.AI] (2024); http://arxiv.org/abs/2409.16427.

K. Tirumala, A. H. Markosyan, L. Zettlemoyer, A. Aghajanyan, “Memorization without Overfitting: Analyzing the Training Dynamics of Large Language Models” in 36th International Conference on Neural Information Processing Systems (NeurIPS 2022) (Curran Associates Inc., Red Hook, NY, USA, 2024); https://proceedings.neurips.cc/paper_files/paper/2022/file/fa0509f4dab6807e2cb465715bf2d249-Paper-Conference.pdf.

N. Mireshghallah, H. Kim, X. Zhou, Y. Tsvetkov, M. Sap, R. Shokri, Y. Choi, “Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory” in ICLR (2024); https://openreview.net/forum?id=gmg7t8b4s0.

M. Brundage, S. Avin, J. Wang, H. Belfield, G. Krueger, G. Hadfield, H. Khlaaf, J. Yang, H. Toner, R. Fong, T. Maharaj, P. W. Koh, S. Hooker, J. Leung, A. Trask, E. Bluemke, J. Lebensold, … M. Anderljung, Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims, arXiv [cs.CY] (2020); http://arxiv.org/abs/2004.07213.

The First International AI Safety Report

The International Scientific Report on the Safety of Advanced AI

Authors

DOI:

Keywords:

Abstract

Author Biography

Yoshua Bengio, Université de Montréal; Mila

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Current Issue

Announcements

Dario Amodei, The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI

Steve Omohundro: Regulating AGI: From Liability to Provable Contracts

Joe Rogan Experience #2345 - Roman Yampolskiy

Steve Omohundro Receives 2024 Future of Life Award

Steve Omohundro and Scientists Discuss the AI Alignment Problem with Neil deGrasse Tyson

Information