Measuring AI Agent Autonomy: Towards a Scalable Approach with Code Inspection

Authors

  • Peter Cihon Senior Advisor, U.S. Center for AI Standards and Innovation (CAISI); GitHub
  • Merlin Stein University of Oxford
  • Gagan Bansal Microsoft
  • Sam Manning Centre for the Governance of AI, Oxford, UK
  • Kevin Xu GitHub, San Francisco, CA USA

DOI:

https://doi.org/10.70777/si.v2i3.15295

Keywords:

AutoGen Framework, ai agents, ai autonomy, artificial general intelligence

Abstract

AI agents are AI systems that can achieve complex goals autonomously. Assess-ing the level of agent autonomy is crucial for understanding both their potential benefits and risks. Current assessments of autonomy often focus on specific risks and rely on run-time evaluations – observations of agent actions during operation. We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks, thereby reducing the costs and risks associated with run-time evaluations. Using this code-based framework, the or-chestration code used to run an AI agent can be scored according to a taxonomy that assesses attributes of autonomy: impact and oversight. We demonstrate this approach with the AutoGen framework and select applications.

Author Biography

Peter Cihon, Senior Advisor, U.S. Center for AI Standards and Innovation (CAISI); GitHub

National Institute of Standards and Technology (NIST) logo Senior Advisor, U.S. Center for AI Standards and Innovation (CAISI)Senior Advisor, U.S. Center for AI Standards and Innovation (CAISI) National Institute of Standards and Technology (NIST) · Full-timeNational Institute of Standards and Technology (NIST) · Full-timeNov 2024 - Present · 9 mosNov 2024 to Present · 9 mosSan Francisco Bay AreaSan Francisco Bay Area
    • CAISI is industry's point of contact within USG for evaluations of the national security risks posed by frontier AI models. https://www.nist.gov/caisi

References

Eric Anderson, Timothy Fannin, and Brent Nelson. Levels of aviation autonomy. In 2018

IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), pp. 1–8. IEEE, 2018.

Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov,

Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasingly

agentic algorithmic systems. In Proceedings of the 2023 ACM Conference on Fairness,

Accountability, and Transparency, pp. 651–666, 2023.

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma

Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, et al. Visibility into AI agents. In The

ACM Conference on Fairness, Accountability, and Transparency, pp. 958–973, 2024.

Peter Cihon. Chilling autonomy: Policy enforcement for human oversight of ai agents. In 41st

International Conference on Machine Learning, Workshop on Generative AI and Law, 2024.

URL https://blog.genlaw.org/pdfs/genlaw_icml2024/79.pdf.

Tom Davidson, Jean-Stanislas Denain, Pablo Villalobos, and Guillem Bas. Ai capabilities can be

significantly improved without expensive retraining. arXiv preprint arXiv:2312.07413, 2023.

Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock. Gpts are gpts: Labor market

impact potential of llms. Science, 384(6702):1306–1308, 2024. doi: 10.1126/science.adj0998.

URL https://www.science.org/doi/abs/10.1126/science.adj0998.

EU. Artificial Intelligence Act. Official Journal of the European Union, 2024. URL

https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:

L_202401689&qid=1726146160101. Accessed: 2024-09-12.

Chinedu Pascal Ezenkwu and Andrew Starkey. Machine autonomy: Definition, approaches, challenges

and research gaps. In Intelligent Computing: Proceedings of the 2019 Computing Conference,

Volume 1, pp. 335–358. Springer, 2019.

Tom Froese, Nathaniel Virgo, and Eduardo Izquierdo. Autonomy: a review and a reappraisal. In

Advances in Artificial Life: 9th European Conference, ECAL 2007, Lisbon, Portugal, September

-14, 2007. Proceedings 9, pp. 455–464. Springer, 2007.

Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal,

Nenad Tomaˇsev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, et al. The ethics of advanced AI

assistants. arXiv preprint arXiv:2404.16244, 2024.

Eric Horvitz. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference

on Human Factors in Computing Systems, pp. 159–166, 1999.

Hui-Min Huang. Autonomy levels for unmanned systems (alfus) framework: safety and application

issues. In Proceedings of the 2007 Workshop on Performance Metrics for Intelligent Systems,

PerMIS ’07, pp. 48–53, New York, NY, USA, 2007. Association for Computing Machinery.

ISBN 9781595938541. doi: 10.1145/1660877.1660883. URL https://doi.org/10.

/1660877.1660883.

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik

Narasimhan. Swe-bench: Can language models resolve real-world github issues?, 2024. URL

https://arxiv.org/abs/2310.06770.

Sayash Kapoor, Benedikt Stroebl, Zachary S Siegel, Nitya Nadgir, and Arvind Narayanan. AI agents

that matter. arXiv preprint arXiv:2407.01502, 2024.

Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du, Brian Goodrich, Max Hasin, Lawrence Chan,

Luke Harold Miles, Tao R Lin, Hjalmar Wijk, Joel Burget, et al. Evaluating language-model

agents on realistic autonomous tasks. arXiv preprint arXiv:2312.11671, 2023.

LangChain. What is an agent? LangChain Blog, 2023. URL https://blog.langchain.

dev/what-is-an-agent/. Accessed: 2024-09-09.

Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu,

Wenxing Xu, Xiang Wang, Yi Sun, et al. Personal llm agents: Insights and survey about the

capability, efficiency and security. arXiv preprint arXiv:2401.05459, 2024.

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding,

Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui

Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie

Tang. Agentbench: Evaluating llms as agents, 2023. URL https://arxiv.org/abs/

03688.

METR. Autonomy evaluations protocol guide, 2024. URL https://metr.github.io/

autonomy-evals-guide/example-protocol/. Accessed: 2024-09-12.

Gr´egoire Mialon, Cl´ementine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas

Scialom. Gaia: a benchmark for general ai assistants, 2023. URL https://arxiv.org/

abs/2311.12983.

Meredith Ringel Morris, Jascha Sohl-Dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra

Faust, Clement Farabet, and Shane Legg. Levels of agi: Operationalizing progress on the

path to agi. arXiv preprint arXiv:2311.02462, 2023.

NIST. Autonomy levels for unmanned systems (alfus) framework. Technical report, National Institute

of Standards and Technology (NIST), 2005. URL https://www.nist.gov/system/

files/documents/el/isd/ks/NISTSP_1011_ver_1-1.pdf. Accessed: 2024-09-

SAE. Iso/sae pas 22736: Definitions for terms related to driving automation systems for onroad

motor vehicles. Technical report, International Organization for Standardization (ISO) and

SAE International, 2021. URL https://cdn.standards.iteh.ai/samples/73766/

c7c9dd67c147a1a7067be549d9653d/ISO-SAE-PRF-PAS-22736.pdf. Accessed:

-09-09.

Yonadav Shavit, Sandhini Agarwal, Miles Brundage, Steven Adler, Cullen O’Keefe, Rosie

Campbell, Teddy Lee, Pamela Mishkin, Tyna Eloundou, Alan Hickey, et al. Practices

for governing agentic AI systems. 2023. URL https://cdn.openai.com/papers/

practices-for-governing-agentic-ai-systems.pdf.

Monika Simmler and Ruth Frischknecht. A taxonomy of human–machine collaboration: capturing

automation and technical autonomy. Ai & Society, 36(1):239–250, 2021.

UK AI Safety Institute. Agents, 2024. URL https://ukgovernmentbeis.github.io/

inspect_ai/agents.html.

Kenneth R Walsh, Sathiadev Mahesh, and Cherie C Trumbach. Autonomy in ai systems. The

Journal of Technology Studies, 47(1):38–47, 2021.

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai

Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.

Frontiers of Computer Science, 18(6):1–26, 2024.

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li,

Li Jiang, Xiaoyun Zhang, and ChiWang. Autogen: Enabling next-gen llm applications via multiagent

conversation framework. arXiv preprint arXiv:2308.08155, 2023.

Guang-Zhong Yang, James Cambias, Kevin Cleary, Eric Daimler, James Drake, Pierre E

Dupont, Nobuhiko Hata, Peter Kazanzides, Sylvain Martel, Rajni V Patel, et al. Medical

robotics—regulatory, ethical, and legal considerations for increasing levels of autonomy, 2017.

Eight-component definition and metric of agent autonomy

Downloads

Published

2025-07-25

How to Cite

Cihon, P., Stein, M., Bansal, G., Manning, S., & Xu, K. (2025). Measuring AI Agent Autonomy: Towards a Scalable Approach with Code Inspection. SuperIntelligence - Robotics - Safety & Alignment, 2(3). https://doi.org/10.70777/si.v2i3.15295

Most read articles by the same author(s)