Abstract
While deep learning-based models have achieved remarkable progress in vulnerability detection, our understanding of these models remains limited, which hinders further advancement in model capability, mechanistic understanding of detection processes, and efficient and safe practical deployment. This paper presents a comprehensive investigation of state-of-the-art learning-based models, including sequence-based models, graph-based models, and Large Language Models (LLMs), through extensive experiments conducted on MegaVul, a recently constructed large-scale vulnerability dataset. We systematically explore seven research questions across five critical dimensions: model capability, model interpretation, model robustness, ease of model deployment, and model economy. Our experimental findings reveal the superiority of sequence-based models over graph-based models and demonstrate the limited effectiveness of current LLMs (e.g., ChatGPT and CodeLlama) for vulnerability detection. We identify the specific vulnerability types that different learning-based models excel at detecting and reveal the instability of the models through subtle semantic equivalent changes in the input. Through interpretability analysis, we provide empirical insights into what these models actually learn and focus on during the detection process. Additionally, we systematically summarize the pre-processing requirements and deployment considerations necessary for practical model usage. Finally, our study provides essential guidelines for the economical and safe practical application of learning-based models, offering valuable insights for both researchers and practitioners.





Similar content being viewed by others
Data Availability
For code and data availability, we release our reproduction package: https://github.com/vinci-grape/Learning-based-Models-for-VD, including the datasets and the source code, to facilitate other researchers and practitioners to repeat our work and verify their studies.
References
AI D (2023) Deepseek coder: let the code write itself. https://github.com/deepseek-ai/DeepSeek-Coder
AWS (2024) Aws g5 instance. https://aws.amazon.com/cn/ec2/instance-types/g5/
Ban X, Liu S, Chen C, Chua C (2019) A performance evaluation of deep-learnt features for software vulnerability detection. Concurrency Comput Pract Exp 31(19):e5103
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Cao S, Sun X, Bo L, Wu R, Li B, Tao C (2022) Mvd: memory-related vulnerability detection based on flow-sensitive graph neural networks. arXiv
Chakraborty S, Krishna R, Ding Y, Ray B (2021) Deep learning based vulnerability detection: are we there yet. IEEE Trans Softw Eng
ChatGPT (2022) Chatgpt: optimizing language models for dialogue. https://chat.openai.com
Cheng X, Wang H, Hua J, Xu G, Sui Y (2021) Deepwukong: statically detecting software vulnerabilities using deep graph neural network. ACM Trans Softw Eng Methodol 30(3):1–33
Croft R, Babar MA, Kholoosi MM (2023) Data quality for software vulnerability datasets. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 121–133
Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2017) Automatic feature learning for vulnerability prediction. arXiv
Ding Y, Fu Y, Ibrahim O, Sitawarin C, Chen X, Alomair B, Wagner D, Ray B, Chen Y (2024) Vulnerability detection with code language models: How far are we? arXiv:2403.18624
Duan X, Wu J, Ji S, Rui Z, Luo T, Yang M, Wu Y (2019) Vulsniper: Focus your attention to shoot fine-grained vulnerabilities. In: IJCAI, pp 4665–4671
Fan J, Li Y, Wang S, Nguyen TN (2020) A c/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th international conference on mining software repositories, pp 508–512
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al. (2020b) Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020a) CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the association for computational linguistics: EMNLP 2020, Association for Computational Linguistics, Online, pp 1536–1547, https://doi.org/10.18653/v1/2020.findings-emnlp.139
Fu M, Tantithamthavorn C (2022) Linevul: a transformer-based line-level vulnerability prediction. In: Proceedings of the 19th international conference on mining software repositories, pp 608–620
Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) Unixcoder: unified cross-modal pre-training for code representation. arXiv
Hanif H, Maffeis S (2022) Vulberta: simplified source code pre-training for vulnerability detection. In: 2022 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
Hin D, Kan A, Chen H, Babar MA (2022) Linevd: statement-level vulnerability detection using graph neural networks. arXiv
Hugging Face (2024) Hugging face. https://huggingface.co
Hu Y, Wang S, Li W, Peng J, Wu Y, Zou D, Jin H (2023) Interpreters for gnn-based vulnerability detection: are we there yet? In: Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis, pp 1407–1419
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. Adv Neural Inf Process Syst 35:22199–22213
Lin G, Xiao W, Zhang LY, Gao S, Tai Y, Zhang J (2021) Deep neural-based vulnerability discovery demystified: data, model and performance. Neural Comput Appl 33(20):13287–13300
Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017) Poster: Vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2539–2541
Li B, Roundy K, Gates C, Vorobeychik Y (2017) Large-scale identification of malicious singleton files. In: Proceedings of the seventh ACM on conference on data and application security and privacy, pp 227–238
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv 55(9):1–35
Liu J, Shen D, Zhang Y, Dolan B, Carin L, Chen W (2021) What makes good in-context examples for gpt-\(3\)? arXiv:2101.06804
Li Y, Wang S, Nguyen TN (2021a) Vulnerability detection with fine-grained interpretations. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 292–303
Li Z, Zou D, Xu S, Chen Z, Zhu Y, Jin H (2021b) Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans Dependable Secure Comput
Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021c) Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the 25th annual network and distributed system security symposium
Lu Y, Bartolo M, Moore A, Riedel S, Stenetorp P (2021) Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. arXiv
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. Proc fifth Berkeley Symposium Math Stat Probab Oakland CA USA 1:281–297
Maiorca D, Biggio B (2019) Digital investigation of PDF files: unveiling traces of embedded malware. IEEE Secur Priv 17(1):63–71
Mazuera-Rozo A, Mojica-Hanke A, Linares-Vásquez M, Bavota G (2021) Shallow or deep? An empirical study on detecting vulnerabilities using deep learning. In: 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), IEEE, pp 276–287
Min S, Lyu X, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, Zettlemoyer L (2022) Rethinking the role of demonstrations: what makes in-context learning work? In: Proceedings of the 2022 conference on empirical methods in natural language processing, association for computational linguistics, Abu Dhabi, United Arab Emirates, pp 11048–1106. https://doi.org/10.18653/v1/2022.emnlp-main.759
Ni C, Shen L, Yang X, Zhu Y, Wang S (2024) Megavul: a c/c++ vulnerability dataset with comprehensive code representation. In: Proceedings of 21th international conference on Mining Software Repositories (MSR)
Ni C, Wang W, Yang K, Xia X, Liu K, Lo D (2022a) The best of both worlds: integrating semantic features with expert features for defect prediction and localization. In: Proceedings of the 2022 30th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ACM, pp 672–683
Ni C, Yang K, Xia X, Lo D, Chen X, Yang X (2022b) Defect identification, categorization, and repair: better together. arXiv:2204.04856
Ni C, Yin X, Li X, Xu X, Yu Z (2025) Abundant modalities offer more nutrients: Multi-modal-based function-level vulnerability detection. ACM Trans Softw Eng Method
Ni C, Yin X, Yang K, Zhao D, Xing Z, Xia X (2023) Distinguishing look-alike innocent and vulnerable code by subtle semantic representation learning and explanation. In: Proceedings of the 31st ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1611–1622
OpenAI (2022) Chatgpt: optimizing language models for dialogue (2022). https://openai.com/blog/chatgpt/
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Advan Neural Inform Process Syst 32
Roziere B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan XE, Adi Y, Liu J, Sauvestre R, Remez T, et al. (2023) Code llama: open foundation models for code. arXiv:2308.12950
Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp 757–762
SARD (2018) Software assurance reference dataset (sard). https://samate.nist.gov/SARD/
Serrano S, Smith NA (2019) Is attention interpretable? arXiv:1906.03731
Shrikumar A, Greenside P, Kundaje A (2019) Learning important features through propagating activation differences. 1704.02685
Song Z, Wang J, Liu S, Fang Z, Yang K, et al. (2022) Hgvul: a code vulnerability detection method based on heterogeneous source-level intermediate representation. Security and Communication Networks 2022
Steenhoek B, Rahman MM, Jiles R, Le W (2023) An empirical study of deep learning models for vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2237–2248
Suarez-Tangil G, Dash SK, Ahmadi M, Kinder J, Giacinto G, Cavallaro L (2017) Droidsieve: fast and accurate classification of obfuscated android malware. In: Proceedings of the seventh ACM on conference on data and application security and privacy, pp 309–320
Tang G, Meng L, Wang H, Ren S, Wang Q, Yang L, Cao W (2020) A comparative study of neural network techniques for automatic software vulnerability detection. In: 2020 International symposium on Theoretical Aspects of Software Engineering (TASE), IEEE, pp 1–8
Tree-sitter (2024) Tree-sitter. https://github.com/tree-sitter/tree-sitter
Tsipenyuk K, Chess B, McGraw G (2005) Seven pernicious kingdoms: a taxonomy of software security errors. IEEE Secur Privacy Mag 3(6):81–84
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advan Neural Inform Process Syst 30
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2023) Attention is all you need. 1706.03762
Wang H, Ye G, Tang Z, Tan SH, Huang S, Fang D, Feng Y, Bian L, Wang Z (2020) Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans Inf Forensics Secur 16:1943–1958
Wang W, Nguyen TN, Wang S, Li Y, Zhang J, Yadavally A (2023) Deepvd: Toward class-separation features for neural network vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2249–2261
Wei Y, Wang Z, Liu J, Ding Y, Zhang L (2023) Magicoder: source code is all you need. arXiv
Wei J, Wang X, Schuurmans D, Bosma M, Chi E, Le Q, Zhou D (2022) Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903
Wen XC, Chen Y, Gao C, Zhang H, Zhang JM, Liao Q (2023) Vulnerability detection with graph simplification and enhanced graph representation learning. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2275–2286
Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE symposium on security and privacy, IEEE, pp 590–604
Yin X (2024) Pros and cons! evaluating chatgpt on software vulnerability. arXiv
Ying Z, Bourgeois D, You J, Zitnik M, Leskovec J (2019) Gnnexplainer: generating explanations for graph neural networks. Advan Neural Inform Process Syst 32
Yin X, Ni C, Wang S (2024a) Multitask-based evaluation of open-source llm on software vulnerability. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2024.3470333
Yin X, Ni C, Wang S, Li Z, Zeng L, Yang X (2024b) Thinkrepair: Self-directed automated program repair. In: Proceedings of the 33rd ACM SIGSOFT international symposium on software testing and analysis, pp 1274–1286
Zhang Z, Zhang H, Shen B, Gu X (2022) Diet code is healthy: simplifying programs for pre-trained models of code. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1073–1084
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI open 1:57–81
Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Proceedings of the 33rd international conference on neural information processing systems, p 10197–10207
Zou D, Zhu Y, Xu S, Li Z, Jin H, Ye H (2021) Interpreting deep learning-based vulnerability detector predictions based on heuristic searching. ACM Trans Softw Eng Methodol 30(2):1–31
Funding
This work was supported by Zhejiang Pioneer (Jianbing) Project (2025C01198(SD2)), the National Natural Science Foundation of China (Grant No.62202419), the Fundamental Research Funds for the Central Universities (No. 226-2022-00064), Zhejiang Provincial Natural Science Foundation of China (No. LY24F020008), the Ningbo Natural Science Foundation (No. 2022J184), the Key Research and Development Program of Zhejiang Province (No.2021C01105), and the State Street Zhejiang University Technology Center.
Author information
Authors and Affiliations
Contributions
Chao Ni is the corresponding author. Xin Yin and Liyu Shen co-designed the experiment and wrote the paper. Shaohua Wang participated in the idea proposal stage of the paper.
Corresponding author
Ethics declarations
Ethical Approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee.
Informed Consent
Written informed consent was obtained from all authors for the publication of this paper.
Conflicts of Interest
Beyond this, the authors have no conflicts of interest to declare that are relevant to the content of this article.
Clinical Trial Number
Not applicable.
Additional information
Communicated by: Fabio Palomba.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ni, C., Yin, X., Shen, L. et al. Learning-based models for vulnerability detection: an extensive study. Empir Software Eng 31, 18 (2026). https://doi.org/10.1007/s10664-025-10734-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s10664-025-10734-x

