Measuring AI Agent Autonomy: Towards a Scalable Approach with Code Inspection
DOI:
https://doi.org/10.70777/si.v2i3.15295Keywords:
AutoGen Framework, ai agents, ai autonomy, artificial general intelligenceAbstract
AI agents are AI systems that can achieve complex goals autonomously. Assess-ing the level of agent autonomy is crucial for understanding both their potential benefits and risks. Current assessments of autonomy often focus on specific risks and rely on run-time evaluations – observations of agent actions during operation. We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks, thereby reducing the costs and risks associated with run-time evaluations. Using this code-based framework, the or-chestration code used to run an AI agent can be scored according to a taxonomy that assesses attributes of autonomy: impact and oversight. We demonstrate this approach with the AutoGen framework and select applications.
References
Eric Anderson, Timothy Fannin, and Brent Nelson. Levels of aviation autonomy. In 2018
IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), pp. 1–8. IEEE, 2018.
Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov,
Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasingly
agentic algorithmic systems. In Proceedings of the 2023 ACM Conference on Fairness,
Accountability, and Transparency, pp. 651–666, 2023.
Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma
Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, et al. Visibility into AI agents. In The
ACM Conference on Fairness, Accountability, and Transparency, pp. 958–973, 2024.
Peter Cihon. Chilling autonomy: Policy enforcement for human oversight of ai agents. In 41st
International Conference on Machine Learning, Workshop on Generative AI and Law, 2024.
URL https://blog.genlaw.org/pdfs/genlaw_icml2024/79.pdf.
Tom Davidson, Jean-Stanislas Denain, Pablo Villalobos, and Guillem Bas. Ai capabilities can be
significantly improved without expensive retraining. arXiv preprint arXiv:2312.07413, 2023.
Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock. Gpts are gpts: Labor market
impact potential of llms. Science, 384(6702):1306–1308, 2024. doi: 10.1126/science.adj0998.
URL https://www.science.org/doi/abs/10.1126/science.adj0998.
EU. Artificial Intelligence Act. Official Journal of the European Union, 2024. URL
https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:
L_202401689&qid=1726146160101. Accessed: 2024-09-12.
Chinedu Pascal Ezenkwu and Andrew Starkey. Machine autonomy: Definition, approaches, challenges
and research gaps. In Intelligent Computing: Proceedings of the 2019 Computing Conference,
Volume 1, pp. 335–358. Springer, 2019.
Tom Froese, Nathaniel Virgo, and Eduardo Izquierdo. Autonomy: a review and a reappraisal. In
Advances in Artificial Life: 9th European Conference, ECAL 2007, Lisbon, Portugal, September
-14, 2007. Proceedings 9, pp. 455–464. Springer, 2007.
Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal,
Nenad Tomaˇsev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, et al. The ethics of advanced AI
assistants. arXiv preprint arXiv:2404.16244, 2024.
Eric Horvitz. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference
on Human Factors in Computing Systems, pp. 159–166, 1999.
Hui-Min Huang. Autonomy levels for unmanned systems (alfus) framework: safety and application
issues. In Proceedings of the 2007 Workshop on Performance Metrics for Intelligent Systems,
PerMIS ’07, pp. 48–53, New York, NY, USA, 2007. Association for Computing Machinery.
ISBN 9781595938541. doi: 10.1145/1660877.1660883. URL https://doi.org/10.
/1660877.1660883.
Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik
Narasimhan. Swe-bench: Can language models resolve real-world github issues?, 2024. URL
https://arxiv.org/abs/2310.06770.
Sayash Kapoor, Benedikt Stroebl, Zachary S Siegel, Nitya Nadgir, and Arvind Narayanan. AI agents
that matter. arXiv preprint arXiv:2407.01502, 2024.
Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du, Brian Goodrich, Max Hasin, Lawrence Chan,
Luke Harold Miles, Tao R Lin, Hjalmar Wijk, Joel Burget, et al. Evaluating language-model
agents on realistic autonomous tasks. arXiv preprint arXiv:2312.11671, 2023.
LangChain. What is an agent? LangChain Blog, 2023. URL https://blog.langchain.
dev/what-is-an-agent/. Accessed: 2024-09-09.
Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu,
Wenxing Xu, Xiang Wang, Yi Sun, et al. Personal llm agents: Insights and survey about the
capability, efficiency and security. arXiv preprint arXiv:2401.05459, 2024.
Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding,
Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui
Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie
Tang. Agentbench: Evaluating llms as agents, 2023. URL https://arxiv.org/abs/
03688.
METR. Autonomy evaluations protocol guide, 2024. URL https://metr.github.io/
autonomy-evals-guide/example-protocol/. Accessed: 2024-09-12.
Gr´egoire Mialon, Cl´ementine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas
Scialom. Gaia: a benchmark for general ai assistants, 2023. URL https://arxiv.org/
abs/2311.12983.
Meredith Ringel Morris, Jascha Sohl-Dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra
Faust, Clement Farabet, and Shane Legg. Levels of agi: Operationalizing progress on the
path to agi. arXiv preprint arXiv:2311.02462, 2023.
NIST. Autonomy levels for unmanned systems (alfus) framework. Technical report, National Institute
of Standards and Technology (NIST), 2005. URL https://www.nist.gov/system/
files/documents/el/isd/ks/NISTSP_1011_ver_1-1.pdf. Accessed: 2024-09-
SAE. Iso/sae pas 22736: Definitions for terms related to driving automation systems for onroad
motor vehicles. Technical report, International Organization for Standardization (ISO) and
SAE International, 2021. URL https://cdn.standards.iteh.ai/samples/73766/
c7c9dd67c147a1a7067be549d9653d/ISO-SAE-PRF-PAS-22736.pdf. Accessed:
-09-09.
Yonadav Shavit, Sandhini Agarwal, Miles Brundage, Steven Adler, Cullen O’Keefe, Rosie
Campbell, Teddy Lee, Pamela Mishkin, Tyna Eloundou, Alan Hickey, et al. Practices
for governing agentic AI systems. 2023. URL https://cdn.openai.com/papers/
practices-for-governing-agentic-ai-systems.pdf.
Monika Simmler and Ruth Frischknecht. A taxonomy of human–machine collaboration: capturing
automation and technical autonomy. Ai & Society, 36(1):239–250, 2021.
UK AI Safety Institute. Agents, 2024. URL https://ukgovernmentbeis.github.io/
inspect_ai/agents.html.
Kenneth R Walsh, Sathiadev Mahesh, and Cherie C Trumbach. Autonomy in ai systems. The
Journal of Technology Studies, 47(1):38–47, 2021.
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai
Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.
Frontiers of Computer Science, 18(6):1–26, 2024.
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li,
Li Jiang, Xiaoyun Zhang, and ChiWang. Autogen: Enabling next-gen llm applications via multiagent
conversation framework. arXiv preprint arXiv:2308.08155, 2023.
Guang-Zhong Yang, James Cambias, Kevin Cleary, Eric Daimler, James Drake, Pierre E
Dupont, Nobuhiko Hata, Peter Kazanzides, Sylvain Martel, Rajni V Patel, et al. Medical
robotics—regulatory, ethical, and legal considerations for increasing levels of autonomy, 2017.
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2025 Peter Cihon

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.