Learning-based models for vulnerability detection: an extensive study

Ni, Chao; Yin, Xin; Shen, Liyu; Wang, Shaohua

doi:10.1007/s10664-025-10734-x

Learning-based models for vulnerability detection: an extensive study

Published: 15 November 2025

Volume 31, article number 18, (2026)
Cite this article

Save article

View saved research

Empirical Software Engineering Aims and scope Submit manuscript

Chao Ni¹^na1,
Xin Yin¹^na1,
Liyu Shen¹ &
…
Shaohua Wang²

490 Accesses
3 Citations
Explore all metrics

Abstract

While deep learning-based models have achieved remarkable progress in vulnerability detection, our understanding of these models remains limited, which hinders further advancement in model capability, mechanistic understanding of detection processes, and efficient and safe practical deployment. This paper presents a comprehensive investigation of state-of-the-art learning-based models, including sequence-based models, graph-based models, and Large Language Models (LLMs), through extensive experiments conducted on MegaVul, a recently constructed large-scale vulnerability dataset. We systematically explore seven research questions across five critical dimensions: model capability, model interpretation, model robustness, ease of model deployment, and model economy. Our experimental findings reveal the superiority of sequence-based models over graph-based models and demonstrate the limited effectiveness of current LLMs (e.g., ChatGPT and CodeLlama) for vulnerability detection. We identify the specific vulnerability types that different learning-based models excel at detecting and reveal the instability of the models through subtle semantic equivalent changes in the input. Through interpretability analysis, we provide empirical insights into what these models actually learn and focus on during the detection process. Additionally, we systematically summarize the pre-processing requirements and deployment considerations necessary for practical model usage. Finally, our study provides essential guidelines for the economical and safe practical application of learning-based models, offering valuable insights for both researchers and practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Large language models based vulnerability detection: How does it enhance performance?

Article 24 January 2025

DeepVulHunter: enhancing the code vulnerability detection capability of LLMs through multi-round analysis

Article 16 September 2025

Exploring Security Vulnerabilities in Leading Large Language Models: A Comparative Study

Data Availability

For code and data availability, we release our reproduction package: https://github.com/vinci-grape/Learning-based-Models-for-VD, including the datasets and the source code, to facilitate other researchers and practitioners to repeat our work and verify their studies.

Notes

References

AI D (2023) Deepseek coder: let the code write itself. https://github.com/deepseek-ai/DeepSeek-Coder
AWS (2024) Aws g5 instance. https://aws.amazon.com/cn/ec2/instance-types/g5/
Ban X, Liu S, Chen C, Chua C (2019) A performance evaluation of deep-learnt features for software vulnerability detection. Concurrency Comput Pract Exp 31(19):e5103
Article Google Scholar
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Google Scholar
Cao S, Sun X, Bo L, Wu R, Li B, Tao C (2022) Mvd: memory-related vulnerability detection based on flow-sensitive graph neural networks. arXiv
Chakraborty S, Krishna R, Ding Y, Ray B (2021) Deep learning based vulnerability detection: are we there yet. IEEE Trans Softw Eng
ChatGPT (2022) Chatgpt: optimizing language models for dialogue. https://chat.openai.com
Cheng X, Wang H, Hua J, Xu G, Sui Y (2021) Deepwukong: statically detecting software vulnerabilities using deep graph neural network. ACM Trans Softw Eng Methodol 30(3):1–33
Article Google Scholar
Croft R, Babar MA, Kholoosi MM (2023) Data quality for software vulnerability datasets. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 121–133
Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2017) Automatic feature learning for vulnerability prediction. arXiv
Ding Y, Fu Y, Ibrahim O, Sitawarin C, Chen X, Alomair B, Wagner D, Ray B, Chen Y (2024) Vulnerability detection with code language models: How far are we? arXiv:2403.18624
Duan X, Wu J, Ji S, Rui Z, Luo T, Yang M, Wu Y (2019) Vulsniper: Focus your attention to shoot fine-grained vulnerabilities. In: IJCAI, pp 4665–4671
Fan J, Li Y, Wang S, Nguyen TN (2020) A c/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th international conference on mining software repositories, pp 508–512
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al. (2020b) Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020a) CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the association for computational linguistics: EMNLP 2020, Association for Computational Linguistics, Online, pp 1536–1547, https://doi.org/10.18653/v1/2020.findings-emnlp.139
Fu M, Tantithamthavorn C (2022) Linevul: a transformer-based line-level vulnerability prediction. In: Proceedings of the 19th international conference on mining software repositories, pp 608–620
Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) Unixcoder: unified cross-modal pre-training for code representation. arXiv
Hanif H, Maffeis S (2022) Vulberta: simplified source code pre-training for vulnerability detection. In: 2022 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
Hin D, Kan A, Chen H, Babar MA (2022) Linevd: statement-level vulnerability detection using graph neural networks. arXiv
Hugging Face (2024) Hugging face. https://huggingface.co
Hu Y, Wang S, Li W, Peng J, Wu Y, Zou D, Jin H (2023) Interpreters for gnn-based vulnerability detection: are we there yet? In: Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis, pp 1407–1419
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. Adv Neural Inf Process Syst 35:22199–22213
Google Scholar
Lin G, Xiao W, Zhang LY, Gao S, Tai Y, Zhang J (2021) Deep neural-based vulnerability discovery demystified: data, model and performance. Neural Comput Appl 33(20):13287–13300
Article Google Scholar
Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017) Poster: Vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2539–2541
Li B, Roundy K, Gates C, Vorobeychik Y (2017) Large-scale identification of malicious singleton files. In: Proceedings of the seventh ACM on conference on data and application security and privacy, pp 227–238
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv 55(9):1–35
Google Scholar
Liu J, Shen D, Zhang Y, Dolan B, Carin L, Chen W (2021) What makes good in-context examples for gpt-$3$? arXiv:2101.06804
Li Y, Wang S, Nguyen TN (2021a) Vulnerability detection with fine-grained interpretations. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 292–303
Li Z, Zou D, Xu S, Chen Z, Zhu Y, Jin H (2021b) Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans Dependable Secure Comput
Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021c) Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the 25th annual network and distributed system security symposium
Lu Y, Bartolo M, Moore A, Riedel S, Stenetorp P (2021) Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. arXiv
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
Google Scholar
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. Proc fifth Berkeley Symposium Math Stat Probab Oakland CA USA 1:281–297
MathSciNet Google Scholar
Maiorca D, Biggio B (2019) Digital investigation of PDF files: unveiling traces of embedded malware. IEEE Secur Priv 17(1):63–71
Article Google Scholar
Mazuera-Rozo A, Mojica-Hanke A, Linares-Vásquez M, Bavota G (2021) Shallow or deep? An empirical study on detecting vulnerabilities using deep learning. In: 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), IEEE, pp 276–287
Min S, Lyu X, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, Zettlemoyer L (2022) Rethinking the role of demonstrations: what makes in-context learning work? In: Proceedings of the 2022 conference on empirical methods in natural language processing, association for computational linguistics, Abu Dhabi, United Arab Emirates, pp 11048–1106. https://doi.org/10.18653/v1/2022.emnlp-main.759
Ni C, Shen L, Yang X, Zhu Y, Wang S (2024) Megavul: a c/c++ vulnerability dataset with comprehensive code representation. In: Proceedings of 21th international conference on Mining Software Repositories (MSR)
Ni C, Wang W, Yang K, Xia X, Liu K, Lo D (2022a) The best of both worlds: integrating semantic features with expert features for defect prediction and localization. In: Proceedings of the 2022 30th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ACM, pp 672–683
Ni C, Yang K, Xia X, Lo D, Chen X, Yang X (2022b) Defect identification, categorization, and repair: better together. arXiv:2204.04856
Ni C, Yin X, Li X, Xu X, Yu Z (2025) Abundant modalities offer more nutrients: Multi-modal-based function-level vulnerability detection. ACM Trans Softw Eng Method
Ni C, Yin X, Yang K, Zhao D, Xing Z, Xia X (2023) Distinguishing look-alike innocent and vulnerable code by subtle semantic representation learning and explanation. In: Proceedings of the 31st ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1611–1622
OpenAI (2022) Chatgpt: optimizing language models for dialogue (2022). https://openai.com/blog/chatgpt/
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Advan Neural Inform Process Syst 32
Roziere B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan XE, Adi Y, Liu J, Sauvestre R, Remez T, et al. (2023) Code llama: open foundation models for code. arXiv:2308.12950
Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp 757–762
SARD (2018) Software assurance reference dataset (sard). https://samate.nist.gov/SARD/
Serrano S, Smith NA (2019) Is attention interpretable? arXiv:1906.03731
Shrikumar A, Greenside P, Kundaje A (2019) Learning important features through propagating activation differences. 1704.02685
Song Z, Wang J, Liu S, Fang Z, Yang K, et al. (2022) Hgvul: a code vulnerability detection method based on heterogeneous source-level intermediate representation. Security and Communication Networks 2022
Steenhoek B, Rahman MM, Jiles R, Le W (2023) An empirical study of deep learning models for vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2237–2248
Suarez-Tangil G, Dash SK, Ahmadi M, Kinder J, Giacinto G, Cavallaro L (2017) Droidsieve: fast and accurate classification of obfuscated android malware. In: Proceedings of the seventh ACM on conference on data and application security and privacy, pp 309–320
Tang G, Meng L, Wang H, Ren S, Wang Q, Yang L, Cao W (2020) A comparative study of neural network techniques for automatic software vulnerability detection. In: 2020 International symposium on Theoretical Aspects of Software Engineering (TASE), IEEE, pp 1–8
Tree-sitter (2024) Tree-sitter. https://github.com/tree-sitter/tree-sitter
Tsipenyuk K, Chess B, McGraw G (2005) Seven pernicious kingdoms: a taxonomy of software security errors. IEEE Secur Privacy Mag 3(6):81–84
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advan Neural Inform Process Syst 30
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2023) Attention is all you need. 1706.03762
Wang H, Ye G, Tang Z, Tan SH, Huang S, Fang D, Feng Y, Bian L, Wang Z (2020) Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans Inf Forensics Secur 16:1943–1958
Article Google Scholar
Wang W, Nguyen TN, Wang S, Li Y, Zhang J, Yadavally A (2023) Deepvd: Toward class-separation features for neural network vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2249–2261
Wei Y, Wang Z, Liu J, Ding Y, Zhang L (2023) Magicoder: source code is all you need. arXiv
Wei J, Wang X, Schuurmans D, Bosma M, Chi E, Le Q, Zhou D (2022) Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903
Wen XC, Chen Y, Gao C, Zhang H, Zhang JM, Liao Q (2023) Vulnerability detection with graph simplification and enhanced graph representation learning. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2275–2286
Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE symposium on security and privacy, IEEE, pp 590–604
Yin X (2024) Pros and cons! evaluating chatgpt on software vulnerability. arXiv
Ying Z, Bourgeois D, You J, Zitnik M, Leskovec J (2019) Gnnexplainer: generating explanations for graph neural networks. Advan Neural Inform Process Syst 32
Yin X, Ni C, Wang S (2024a) Multitask-based evaluation of open-source llm on software vulnerability. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2024.3470333
Article Google Scholar
Yin X, Ni C, Wang S, Li Z, Zeng L, Yang X (2024b) Thinkrepair: Self-directed automated program repair. In: Proceedings of the 33rd ACM SIGSOFT international symposium on software testing and analysis, pp 1274–1286
Zhang Z, Zhang H, Shen B, Gu X (2022) Diet code is healthy: simplifying programs for pre-trained models of code. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1073–1084
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI open 1:57–81
Article Google Scholar
Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Proceedings of the 33rd international conference on neural information processing systems, p 10197–10207
Zou D, Zhu Y, Xu S, Li Z, Jin H, Ye H (2021) Interpreting deep learning-based vulnerability detector predictions based on heuristic searching. ACM Trans Softw Eng Methodol 30(2):1–31
Article Google Scholar

Download references

Funding

This work was supported by Zhejiang Pioneer (Jianbing) Project (2025C01198(SD2)), the National Natural Science Foundation of China (Grant No.62202419), the Fundamental Research Funds for the Central Universities (No. 226-2022-00064), Zhejiang Provincial Natural Science Foundation of China (No. LY24F020008), the Ningbo Natural Science Foundation (No. 2022J184), the Key Research and Development Program of Zhejiang Province (No.2021C01105), and the State Street Zhejiang University Technology Center.

Author information

Chao Ni and Xin Yin contributed equally to this research.

Authors and Affiliations

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China
Chao Ni, Xin Yin & Liyu Shen
Central University of Finance and Economics, Beijing, China
Shaohua Wang

Authors

Chao Ni
View author publications
Search author on:PubMed Google Scholar
Xin Yin
View author publications
Search author on:PubMed Google Scholar
Liyu Shen
View author publications
Search author on:PubMed Google Scholar
Shaohua Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Chao Ni is the corresponding author. Xin Yin and Liyu Shen co-designed the experiment and wrote the paper. Shaohua Wang participated in the idea proposal stage of the paper.

Corresponding author

Correspondence to Chao Ni.

Ethics declarations

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee.

Informed Consent

Written informed consent was obtained from all authors for the publication of this paper.

Conflicts of Interest

Beyond this, the authors have no conflicts of interest to declare that are relevant to the content of this article.

Clinical Trial Number

Not applicable.

Additional information

Communicated by: Fabio Palomba.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ni, C., Yin, X., Shen, L. et al. Learning-based models for vulnerability detection: an extensive study. Empir Software Eng 31, 18 (2026). https://doi.org/10.1007/s10664-025-10734-x

Download citation

Received: 10 February 2025
Accepted: 16 September 2025
Published: 15 November 2025
Version of record: 15 November 2025
DOI: https://doi.org/10.1007/s10664-025-10734-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning-based models for vulnerability detection: an extensive study

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large language models based vulnerability detection: How does it enhance performance?

DeepVulHunter: enhancing the code vulnerability detection capability of LLMs through multi-round analysis

Exploring Security Vulnerabilities in Leading Large Language Models: A Comparative Study

Explore related subjects

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflicts of Interest

Clinical Trial Number

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now