HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions
DOI:
https://doi.org/10.70777/si.v3i2.18033Keywords:
Zero-Day, Code Analysis, Patched Function, Deep Representation Learning, GraphCodeBERT, Vulnerability Prediction, Software SecurityAbstract
Software security testing, particularly when enhanced with deep learning models, has become a powerful approach for improving software quality, enabling faster detection of known flaws in source code. However, many approaches miss post-fix latent vulnerabilities that remain even after patches typically due to incomplete fixes or overlooked issues may later lead to zero-day exploits. In this paper, we propose HYDRA, a Hybrid heuristic-guided Deep Representation Architecture for predicting latent zero-day vulnerabilities in patched functions that combines rule-based heuristics with deep representation learning to detect latent risky code patterns that may persist after patches. It integrates static vulnerability rules, GraphCodeBERT embeddings, and a Variational Autoencoder (VAE) to uncover anomalies often missed by symbolic or neural models alone. We evaluate HYDRA in an unsupervised setting on patched functions from three diverse real-world software projects: Chrome, Android, and ImageMagick. Our results show HYDRA predicts 13.7%, 20.6%, and 24% of functions from Chrome, Android, and ImageMagick respectively as containing latent risks, including both heuristic matches and cases without heuristic matches (None) that may lead to zero-day vulnerabilities. It outperforms baseline models that rely solely on regex-derived features or their combination with embeddings, uncovering truly risky code variants that largely align with known heuristic patterns. These results demonstrate HYDRA’s capability to surface hidden, previously undetected risks, advancing software security validation and supporting proactive zero-day vulnerabilities discovery.
References
National Vulnerability Database (NVD). https://nvd.nist.gov Accessed March 30, 2025.
PyTorch. https://pytorch.org/ Accessed March 15, 2025.
Solarwinds. https://www.solarwinds.com/orion-platform Accessed March 30, 2025.
Stuxnet. https://www.malwarebytes.com/stuxnet Accessed March 29, 2025.
Transformers. https://pypi.org/project/transformers/4.37.0/ Accessed March 15, 2025.
Namrata Govind Ambekar and Surmila Thokchom. 2024. UL-VAE: An Unsupervised Learning Approach for Zero-day Malware Detection Using Variational Autoencoder. In 2024 International Conference on Computational Intelligence and Network Systems (CINS). 1–7. doi:10.1109/CINS63881.2024.10864450
Leyla Bilge and Tudor Dumitraş. 2012. Before we knew it: an empirical study of zero-day attacks in the real world. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (Raleigh, North Carolina, USA) (CCS ’12). Association for Computing Machinery, New York, NY, USA, 833–844. doi:10. 1145/2382196.2382284
Jesús F. Cevallos M., Alessandra Rizzardi, Sabrina Sicari, and Alberto Coen Porisini. 2024. NERO: NEural algorithmic reasoning for zeRO-day attack detection in the IoT: A hybrid approach. Computers & Security 142 (2024), 103898. doi:10.1016/j.cose.2024.103898
Haogang Chen, Yandong Mao, Xi Wang, Dong Zhou, Nickolai Zeldovich, and M. Frans Kaashoek. 2011. Linux kernel vulnerabilities: state-of-the-art defenses and open problems (APSys ’11). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. doi:10.1145/2103799.2103805
The MITRE Corporation. CWE Details. https://cwe.mitre.org/. Accessed March 12, 2025.
The MITRE Corporation. CVE Details. https://www.cve.org/. Accessed March 12, 2025.
Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). 508–512. doi:10.1145/3379597.3387501
Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: a transformerbased line-level vulnerability prediction. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 608–620. doi:10.1145/3524842.3528452
Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. VulRepair: a T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 935–947. doi:10.1145/3540250.3549098
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
Mei Han, Lulu Wang, Jianming Chang, Bixin Li, and Chunguang Zhang. 2024. Learning Graph-based Patch Representations for Identifying and Assessing Silent Vulnerability Fixes. In 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE). 120–131. doi:10.1109/ISSRE62328.2024.00022
Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, and Alireza Fathi. 2023. REVEAL: Retrieval- Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory. CVPR (2023).
Alya Hannah Ahmad Kamal, Caryn Chuah Yi Yen, Gan Jia Hui, Pang Sze Ling, et al. 2020. Risk assessment, threat modeling and security testing in SDLC. arXiv preprint arXiv:2012.07226 (2020).
Triet Huynh Minh Le, David Hin, Roland Croft, and M. Ali Babar. 2022. DeepCVA: automated commit-level vulnerability assessment with deep multi-task learning. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (Melbourne, Australia) (ASE ’21). IEEE Press, 717–729. doi:10.1109/ASE51524.2021.9678622
Scikit-learn Developers. sklearn.metrics.calinski_harabasz_score. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_ harabasz_score.html. Accessed July 14, 2025.
Scikit-learn Developers. sklearn.metrics.davies_bouldin_score. https://scikitlearn. org/stable/modules/generated/sklearn.metrics.davies_bouldin_score. html. Accessed July 14, 2025.
Scikit-learn Developers. sklearn.metrics.silhouette_score. https://scikit-learn. org/stable/modules/generated/sklearn.metrics.silhouette_score.html. Accessed July 14, 2025.
Yi Li, Shaohua Wang, and Tien N. Nguyen. 2021. Vulnerability detection with fine-grained interpretations. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 292–303. doi:10.1145/3468264.3468597
Yi Li, Aashish Yadavally, Jiaxing Zhang, Shaohua Wang, and Tien N. Nguyen. 2023. Commit-Level, Neural Vulnerability Detection and Assessment. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco, CA, USA) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1024–1036. doi:10.1145/3611643.3616346
Georgios Michail Makrakis, Constantinos Kolias, Georgios Kambourakis, Craig Rieger, and Jacob Benjamin. 2021. Vulnerabilities and attacks against industrial control systems and critical infrastructures. arXiv preprint arXiv:2109.03945 (2021).
Daniel Marjamaki. https://cppcheck.sourceforge.io/. Accessed February 14, 2025.
D Nandakumar, R Schiller, C Redino, K Choi, A Rahman, E Bowen, M Vucovich, J Nehila, M Weeks, and A Shaha. Zero day threat detection using metric learning autoencoders (2022).
Chao Ni, Xin Yin, Kaiwen Yang, Dehai Zhao, Zhenchang Xing, and Xin Xia. 2023. Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco, CA, USA) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1611–1622. doi:10.1145/3611643. 3616358
Yu Nong, Richard Fang, Guangbei Yi, Kunsong Zhao, Xiapu Luo, Feng Chen, and Haipeng Cai. 2024. VGX: Large-Scale Sample Generation for Boosting Learning- Based Software Vulnerability Analyses. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 149, 13 pages. doi:10.1145/3597503.3639116
Yu Nong, Yuzhe Ou, Michael Pradel, Feng Chen, and Haipeng Cai. 2023. VULGEN: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2527–2539. doi:10.1109/ICSE48619.2023.00211
Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023. 10179420
Zeqing Qin, Yiwei Wu, and Lansheng Han. 2025. CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 25047– 25055.
Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, and Wei Le. 2024. Towards causal deep learning for vulnerability detection. In Proceedings of the IEEE/ACM 46th international conference on software engineering. 1–11.
Yaman Roumani. 2021. Patching zero-day vulnerabilities: an empirical analysis. Journal of Cybersecurity 7, 1 (11 2021), tyab023. doi:10. 1093/cybsec/tyab023 arXiv:https://academic.oup.com/cybersecurity/articlepdf/ 7/1/tyab023/41180532/tyab023.pdf
Karuturi Sneha and Gowda M Malle. 2017. Research on software testing techniques and software automation testing tools. In 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). 77–81. doi:10.1109/ICECDS.2017.8389562
Benjamin Steenhoek, Hongyang Gao, and Wei Le. 2024. Dataflow Analysis- Inspired Deep Learning for Efficient Vulnerability Detection. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 16, 13 pages. doi:10.1145/3597503.3623345
Jiamou Sun, Zhenchang Xing, Qinghua Lu, Xiwei Xu, Liming Zhu, Thong Hoang, and Dehai Zhao. 2023. Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 970–982. doi:10.1109/ICSE48619.2023.00089
Maneela Tuteja, Gaurav Dubey, et al. 2012. A research study on importance of testing and quality assurance in software development life cycle (SDLC) models. International Journal of Soft Computing and Engineering (IJSCE) 2, 3 (2012), 251– 257.
Shu Wang, Xinda Wang, Kun Sun, Sushil Jajodia, Haining Wang, and Qi Li. 2023. GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics . In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2409–2426. doi:10.1109/SP46215.2023.10179479
Xinda Wang, Kun Sun, Archer Batcheller, and Sushil Jajodia. 2019. Detecting" 0-day" vulnerability: An empirical study of secret security patch in OSS. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 485–492.
Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, Sushil Jajodia, Sanae Benchaaboun, and Frank Geck. 2021. PatchRNN: A Deep Learning-Based System for Security Patch Identification. In MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM) (San Diego, CA, USA). IEEE Press, 595–600. doi:10.1109/MILCOM52596.2021.9652940
David A. Wheeler. https://dwheeler.com/flawfinder/. (
[n. d.]). Accessed February 14, 2025.
Alexander A. Zakharov and Kirill I. Gladkikh. 2024. Characteristics and Trends of Zero-Day Vulnerabilities in Open-Source Code. In 2024 International Russian Automation Conference (RusAutoCon). 498–502. doi:10.1109/RusAutoCon61949. 2024.10694228
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Curran Associates Inc., Red Hook, NY, USA.
Enrico Zio. 2016. Challenges in the vulnerability and risk analysis of critical infrastructures. Reliability Engineering & System Safety 152 (2016), 137–150. doi:10.1016/j.ress.2016.02.009
Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2021. muVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection . IEEE Transactions on Dependable and Secure Computing 18, 05 (Sept. 2021), 2224–2236. doi:10.1109/TDSC.2019.2942930
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2026 Mohammad Farhad, Sabbir Rahman, Shuvalaxmi Dass

This work is licensed under a Creative Commons Attribution 4.0 International License.