HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions

Mohammad Farhad; Sabbir Rahman; Shuvalaxmi Dass

doi:10.70777/si.v3i2.18033

Authors

Mohammad Farhad University of Louisiana at Lafayette
Sabbir Rahman University of Louisiana at Lafayette
Shuvalaxmi Dass University of Louisiana at Lafayette

DOI:

https://doi.org/10.70777/si.v3i2.18033

Keywords:

Zero-Day, Code Analysis, Patched Function, Deep Representation Learning, GraphCodeBERT, Vulnerability Prediction, Software Security

Abstract

Software security testing, particularly when enhanced with deep learning models, has become a powerful approach for improving software quality, enabling faster detection of known flaws in source code. However, many approaches miss post-fix latent vulnerabilities that remain even after patches typically due to incomplete fixes or overlooked issues may later lead to zero-day exploits. In this paper, we propose HYDRA, a Hybrid heuristic-guided Deep Representation Architecture for predicting latent zero-day vulnerabilities in patched functions that combines rule-based heuristics with deep representation learning to detect latent risky code patterns that may persist after patches. It integrates static vulnerability rules, GraphCodeBERT embeddings, and a Variational Autoencoder (VAE) to uncover anomalies often missed by symbolic or neural models alone. We evaluate HYDRA in an unsupervised setting on patched functions from three diverse real-world software projects: Chrome, Android, and ImageMagick. Our results show HYDRA predicts 13.7%, 20.6%, and 24% of functions from Chrome, Android, and ImageMagick respectively as containing latent risks, including both heuristic matches and cases without heuristic matches (None) that may lead to zero-day vulnerabilities. It outperforms baseline models that rely solely on regex-derived features or their combination with embeddings, uncovering truly risky code variants that largely align with known heuristic patterns. These results demonstrate HYDRA’s capability to surface hidden, previously undetected risks, advancing software security validation and supporting proactive zero-day vulnerabilities discovery.

Author Biography

Mohammad Farhad, University of Louisiana at Lafayette

Passionate about leveraging technology to drive innovation, I worked as a software engineer intern and tried to bring a strong foundation in creating efficient and scalable solutions from my experience. I have sharpened my skills in problem-solving and thrive in dynamic environments. Also I was a Jr. Lecturer in Computer Science & Engineering department at the University of Science and Technology Chittagong (USTC), a renowned private university in Chattogram, Bangladesh. I am currently a PhD student in Computer Science at University of Louisiana at Lafayette (ULL) and working as a Graduate Teaching Assistant (GTA). I am particularly enthusiastic about cutting-edge technology and its potential to revolutionize industries. Let's connect and explore the possibilities of building the future together.

References

National Vulnerability Database (NVD). https://nvd.nist.gov Accessed March 30, 2025.

PyTorch. https://pytorch.org/ Accessed March 15, 2025.

Solarwinds. https://www.solarwinds.com/orion-platform Accessed March 30, 2025.

Stuxnet. https://www.malwarebytes.com/stuxnet Accessed March 29, 2025.

Transformers. https://pypi.org/project/transformers/4.37.0/ Accessed March 15, 2025.

Namrata Govind Ambekar and Surmila Thokchom. 2024. UL-VAE: An Unsupervised Learning Approach for Zero-day Malware Detection Using Variational Autoencoder. In 2024 International Conference on Computational Intelligence and Network Systems (CINS). 1–7. doi:10.1109/CINS63881.2024.10864450

Leyla Bilge and Tudor Dumitraş. 2012. Before we knew it: an empirical study of zero-day attacks in the real world. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (Raleigh, North Carolina, USA) (CCS ’12). Association for Computing Machinery, New York, NY, USA, 833–844. doi:10. 1145/2382196.2382284

Jesús F. Cevallos M., Alessandra Rizzardi, Sabrina Sicari, and Alberto Coen Porisini. 2024. NERO: NEural algorithmic reasoning for zeRO-day attack detection in the IoT: A hybrid approach. Computers & Security 142 (2024), 103898. doi:10.1016/j.cose.2024.103898

Haogang Chen, Yandong Mao, Xi Wang, Dong Zhou, Nickolai Zeldovich, and M. Frans Kaashoek. 2011. Linux kernel vulnerabilities: state-of-the-art defenses and open problems (APSys ’11). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. doi:10.1145/2103799.2103805

The MITRE Corporation. CWE Details. https://cwe.mitre.org/. Accessed March 12, 2025.

The MITRE Corporation. CVE Details. https://www.cve.org/. Accessed March 12, 2025.

Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). 508–512. doi:10.1145/3379597.3387501

Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: a transformerbased line-level vulnerability prediction. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 608–620. doi:10.1145/3524842.3528452

Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. VulRepair: a T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 935–947. doi:10.1145/3540250.3549098

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).

Mei Han, Lulu Wang, Jianming Chang, Bixin Li, and Chunguang Zhang. 2024. Learning Graph-based Patch Representations for Identifying and Assessing Silent Vulnerability Fixes. In 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE). 120–131. doi:10.1109/ISSRE62328.2024.00022

Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, and Alireza Fathi. 2023. REVEAL: Retrieval- Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory. CVPR (2023).

Alya Hannah Ahmad Kamal, Caryn Chuah Yi Yen, Gan Jia Hui, Pang Sze Ling, et al. 2020. Risk assessment, threat modeling and security testing in SDLC. arXiv preprint arXiv:2012.07226 (2020).

Triet Huynh Minh Le, David Hin, Roland Croft, and M. Ali Babar. 2022. DeepCVA: automated commit-level vulnerability assessment with deep multi-task learning. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (Melbourne, Australia) (ASE ’21). IEEE Press, 717–729. doi:10.1109/ASE51524.2021.9678622

Scikit-learn Developers. sklearn.metrics.calinski_harabasz_score. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_ harabasz_score.html. Accessed July 14, 2025.

Scikit-learn Developers. sklearn.metrics.davies_bouldin_score. https://scikitlearn. org/stable/modules/generated/sklearn.metrics.davies_bouldin_score. html. Accessed July 14, 2025.

Scikit-learn Developers. sklearn.metrics.silhouette_score. https://scikit-learn. org/stable/modules/generated/sklearn.metrics.silhouette_score.html. Accessed July 14, 2025.

Yi Li, Shaohua Wang, and Tien N. Nguyen. 2021. Vulnerability detection with fine-grained interpretations. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 292–303. doi:10.1145/3468264.3468597

Yi Li, Aashish Yadavally, Jiaxing Zhang, Shaohua Wang, and Tien N. Nguyen. 2023. Commit-Level, Neural Vulnerability Detection and Assessment. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco, CA, USA) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1024–1036. doi:10.1145/3611643.3616346

Georgios Michail Makrakis, Constantinos Kolias, Georgios Kambourakis, Craig Rieger, and Jacob Benjamin. 2021. Vulnerabilities and attacks against industrial control systems and critical infrastructures. arXiv preprint arXiv:2109.03945 (2021).

Daniel Marjamaki. https://cppcheck.sourceforge.io/. Accessed February 14, 2025.

D Nandakumar, R Schiller, C Redino, K Choi, A Rahman, E Bowen, M Vucovich, J Nehila, M Weeks, and A Shaha. Zero day threat detection using metric learning autoencoders (2022).

Chao Ni, Xin Yin, Kaiwen Yang, Dehai Zhao, Zhenchang Xing, and Xin Xia. 2023. Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco, CA, USA) (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1611–1622. doi:10.1145/3611643. 3616358

Yu Nong, Richard Fang, Guangbei Yi, Kunsong Zhao, Xiapu Luo, Feng Chen, and Haipeng Cai. 2024. VGX: Large-Scale Sample Generation for Boosting Learning- Based Software Vulnerability Analyses. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 149, 13 pages. doi:10.1145/3597503.3639116

Yu Nong, Yuzhe Ou, Michael Pradel, Feng Chen, and Haipeng Cai. 2023. VULGEN: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2527–2539. doi:10.1109/ICSE48619.2023.00211

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023. 10179420

Zeqing Qin, Yiwei Wu, and Lansheng Han. 2025. CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 25047– 25055.

Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, and Wei Le. 2024. Towards causal deep learning for vulnerability detection. In Proceedings of the IEEE/ACM 46th international conference on software engineering. 1–11.

Yaman Roumani. 2021. Patching zero-day vulnerabilities: an empirical analysis. Journal of Cybersecurity 7, 1 (11 2021), tyab023. doi:10. 1093/cybsec/tyab023 arXiv:https://academic.oup.com/cybersecurity/articlepdf/ 7/1/tyab023/41180532/tyab023.pdf

Karuturi Sneha and Gowda M Malle. 2017. Research on software testing techniques and software automation testing tools. In 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). 77–81. doi:10.1109/ICECDS.2017.8389562

Benjamin Steenhoek, Hongyang Gao, and Wei Le. 2024. Dataflow Analysis- Inspired Deep Learning for Efficient Vulnerability Detection. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 16, 13 pages. doi:10.1145/3597503.3623345

Jiamou Sun, Zhenchang Xing, Qinghua Lu, Xiwei Xu, Liming Zhu, Thong Hoang, and Dehai Zhao. 2023. Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 970–982. doi:10.1109/ICSE48619.2023.00089

Maneela Tuteja, Gaurav Dubey, et al. 2012. A research study on importance of testing and quality assurance in software development life cycle (SDLC) models. International Journal of Soft Computing and Engineering (IJSCE) 2, 3 (2012), 251– 257.

Shu Wang, Xinda Wang, Kun Sun, Sushil Jajodia, Haining Wang, and Qi Li. 2023. GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics . In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2409–2426. doi:10.1109/SP46215.2023.10179479

Xinda Wang, Kun Sun, Archer Batcheller, and Sushil Jajodia. 2019. Detecting" 0-day" vulnerability: An empirical study of secret security patch in OSS. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 485–492.

Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, Sushil Jajodia, Sanae Benchaaboun, and Frank Geck. 2021. PatchRNN: A Deep Learning-Based System for Security Patch Identification. In MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM) (San Diego, CA, USA). IEEE Press, 595–600. doi:10.1109/MILCOM52596.2021.9652940

David A. Wheeler. https://dwheeler.com/flawfinder/. (

[n. d.]). Accessed February 14, 2025.

Alexander A. Zakharov and Kirill I. Gladkikh. 2024. Characteristics and Trends of Zero-Day Vulnerabilities in Open-Source Code. In 2024 International Russian Automation Conference (RusAutoCon). 498–502. doi:10.1109/RusAutoCon61949. 2024.10694228

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Curran Associates Inc., Red Hook, NY, USA.

Enrico Zio. 2016. Challenges in the vulnerability and risk analysis of critical infrastructures. Reliability Engineering & System Safety 152 (2016), 137–150. doi:10.1016/j.ress.2016.02.009

Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2021. muVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection . IEEE Transactions on Dependable and Secure Computing 18, 05 (Sept. 2021), 2224–2236. doi:10.1109/TDSC.2019.2942930

HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions

Authors

DOI:

Keywords:

Abstract

Author Biography

Mohammad Farhad, University of Louisiana at Lafayette

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Current Issue

Announcements

Dario Amodei, The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful AI

Steve Omohundro: Regulating AGI: From Liability to Provable Contracts

Joe Rogan Experience #2345 - Roman Yampolskiy

Steve Omohundro Receives 2024 Future of Life Award

Steve Omohundro and Scientists Discuss the AI Alignment Problem with Neil deGrasse Tyson

Information