Skip to main content
Log in

Learning-based models for vulnerability detection: an extensive study

  • Published:
Save article
View saved research
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

While deep learning-based models have achieved remarkable progress in vulnerability detection, our understanding of these models remains limited, which hinders further advancement in model capability, mechanistic understanding of detection processes, and efficient and safe practical deployment. This paper presents a comprehensive investigation of state-of-the-art learning-based models, including sequence-based models, graph-based models, and Large Language Models (LLMs), through extensive experiments conducted on MegaVul, a recently constructed large-scale vulnerability dataset. We systematically explore seven research questions across five critical dimensions: model capability, model interpretation, model robustness, ease of model deployment, and model economy. Our experimental findings reveal the superiority of sequence-based models over graph-based models and demonstrate the limited effectiveness of current LLMs (e.g., ChatGPT and CodeLlama) for vulnerability detection. We identify the specific vulnerability types that different learning-based models excel at detecting and reveal the instability of the models through subtle semantic equivalent changes in the input. Through interpretability analysis, we provide empirical insights into what these models actually learn and focus on during the detection process. Additionally, we systematically summarize the pre-processing requirements and deployment considerations necessary for practical model usage. Finally, our study provides essential guidelines for the economical and safe practical application of learning-based models, offering valuable insights for both researchers and practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

For code and data availability, we release our reproduction package: https://github.com/vinci-grape/Learning-based-Models-for-VD, including the datasets and the source code, to facilitate other researchers and practitioners to repeat our work and verify their studies.

Notes

  1. https://surveysystem.com/sscalc.htm

  2. https://cwe.mitre.org/top25/archive/2023/2023_top25_list.html

References

  • AI D (2023) Deepseek coder: let the code write itself. https://github.com/deepseek-ai/DeepSeek-Coder

  • AWS (2024) Aws g5 instance. https://aws.amazon.com/cn/ec2/instance-types/g5/

  • Ban X, Liu S, Chen C, Chua C (2019) A performance evaluation of deep-learnt features for software vulnerability detection. Concurrency Comput Pract Exp 31(19):e5103

    Article  Google Scholar 

  • Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901

    Google Scholar 

  • Cao S, Sun X, Bo L, Wu R, Li B, Tao C (2022) Mvd: memory-related vulnerability detection based on flow-sensitive graph neural networks. arXiv

  • Chakraborty S, Krishna R, Ding Y, Ray B (2021) Deep learning based vulnerability detection: are we there yet. IEEE Trans Softw Eng

  • ChatGPT (2022) Chatgpt: optimizing language models for dialogue. https://chat.openai.com

  • Cheng X, Wang H, Hua J, Xu G, Sui Y (2021) Deepwukong: statically detecting software vulnerabilities using deep graph neural network. ACM Trans Softw Eng Methodol 30(3):1–33

    Article  Google Scholar 

  • Croft R, Babar MA, Kholoosi MM (2023) Data quality for software vulnerability datasets. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 121–133

  • Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2017) Automatic feature learning for vulnerability prediction. arXiv

  • Ding Y, Fu Y, Ibrahim O, Sitawarin C, Chen X, Alomair B, Wagner D, Ray B, Chen Y (2024) Vulnerability detection with code language models: How far are we? arXiv:2403.18624

  • Duan X, Wu J, Ji S, Rui Z, Luo T, Yang M, Wu Y (2019) Vulsniper: Focus your attention to shoot fine-grained vulnerabilities. In: IJCAI, pp 4665–4671

  • Fan J, Li Y, Wang S, Nguyen TN (2020) A c/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th international conference on mining software repositories, pp 508–512

  • Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al. (2020b) Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155

  • Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020a) CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the association for computational linguistics: EMNLP 2020, Association for Computational Linguistics, Online, pp 1536–1547, https://doi.org/10.18653/v1/2020.findings-emnlp.139

  • Fu M, Tantithamthavorn C (2022) Linevul: a transformer-based line-level vulnerability prediction. In: Proceedings of the 19th international conference on mining software repositories, pp 608–620

  • Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) Unixcoder: unified cross-modal pre-training for code representation. arXiv

  • Hanif H, Maffeis S (2022) Vulberta: simplified source code pre-training for vulnerability detection. In: 2022 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8

  • Hin D, Kan A, Chen H, Babar MA (2022) Linevd: statement-level vulnerability detection using graph neural networks. arXiv

  • Hugging Face (2024) Hugging face. https://huggingface.co

  • Hu Y, Wang S, Li W, Peng J, Wu Y, Zou D, Jin H (2023) Interpreters for gnn-based vulnerability detection: are we there yet? In: Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis, pp 1407–1419

  • Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. Adv Neural Inf Process Syst 35:22199–22213

    Google Scholar 

  • Lin G, Xiao W, Zhang LY, Gao S, Tai Y, Zhang J (2021) Deep neural-based vulnerability discovery demystified: data, model and performance. Neural Comput Appl 33(20):13287–13300

    Article  Google Scholar 

  • Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017) Poster: Vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 2539–2541

  • Li B, Roundy K, Gates C, Vorobeychik Y (2017) Large-scale identification of malicious singleton files. In: Proceedings of the seventh ACM on conference on data and application security and privacy, pp 227–238

  • Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv 55(9):1–35

    Google Scholar 

  • Liu J, Shen D, Zhang Y, Dolan B, Carin L, Chen W (2021) What makes good in-context examples for gpt-\(3\)? arXiv:2101.06804

  • Li Y, Wang S, Nguyen TN (2021a) Vulnerability detection with fine-grained interpretations. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 292–303

  • Li Z, Zou D, Xu S, Chen Z, Zhu Y, Jin H (2021b) Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans Dependable Secure Comput

  • Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2021c) Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Trans Dependable Secure Comput

  • Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the 25th annual network and distributed system security symposium

  • Lu Y, Bartolo M, Moore A, Riedel S, Stenetorp P (2021) Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. arXiv

  • Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605

    Google Scholar 

  • MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. Proc fifth Berkeley Symposium Math Stat Probab Oakland CA USA 1:281–297

    MathSciNet  Google Scholar 

  • Maiorca D, Biggio B (2019) Digital investigation of PDF files: unveiling traces of embedded malware. IEEE Secur Priv 17(1):63–71

    Article  Google Scholar 

  • Mazuera-Rozo A, Mojica-Hanke A, Linares-Vásquez M, Bavota G (2021) Shallow or deep? An empirical study on detecting vulnerabilities using deep learning. In: 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), IEEE, pp 276–287

  • Min S, Lyu X, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, Zettlemoyer L (2022) Rethinking the role of demonstrations: what makes in-context learning work? In: Proceedings of the 2022 conference on empirical methods in natural language processing, association for computational linguistics, Abu Dhabi, United Arab Emirates, pp 11048–1106. https://doi.org/10.18653/v1/2022.emnlp-main.759

  • Ni C, Shen L, Yang X, Zhu Y, Wang S (2024) Megavul: a c/c++ vulnerability dataset with comprehensive code representation. In: Proceedings of 21th international conference on Mining Software Repositories (MSR)

  • Ni C, Wang W, Yang K, Xia X, Liu K, Lo D (2022a) The best of both worlds: integrating semantic features with expert features for defect prediction and localization. In: Proceedings of the 2022 30th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ACM, pp 672–683

  • Ni C, Yang K, Xia X, Lo D, Chen X, Yang X (2022b) Defect identification, categorization, and repair: better together. arXiv:2204.04856

  • Ni C, Yin X, Li X, Xu X, Yu Z (2025) Abundant modalities offer more nutrients: Multi-modal-based function-level vulnerability detection. ACM Trans Softw Eng Method

  • Ni C, Yin X, Yang K, Zhao D, Xing Z, Xia X (2023) Distinguishing look-alike innocent and vulnerable code by subtle semantic representation learning and explanation. In: Proceedings of the 31st ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1611–1622

  • OpenAI (2022) Chatgpt: optimizing language models for dialogue (2022). https://openai.com/blog/chatgpt/

  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Advan Neural Inform Process Syst 32

  • Roziere B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan XE, Adi Y, Liu J, Sauvestre R, Remez T, et al. (2023) Code llama: open foundation models for code. arXiv:2308.12950

  • Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp 757–762

  • SARD (2018) Software assurance reference dataset (sard). https://samate.nist.gov/SARD/

  • Serrano S, Smith NA (2019) Is attention interpretable? arXiv:1906.03731

  • Shrikumar A, Greenside P, Kundaje A (2019) Learning important features through propagating activation differences. 1704.02685

  • Song Z, Wang J, Liu S, Fang Z, Yang K, et al. (2022) Hgvul: a code vulnerability detection method based on heterogeneous source-level intermediate representation. Security and Communication Networks 2022

  • Steenhoek B, Rahman MM, Jiles R, Le W (2023) An empirical study of deep learning models for vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2237–2248

  • Suarez-Tangil G, Dash SK, Ahmadi M, Kinder J, Giacinto G, Cavallaro L (2017) Droidsieve: fast and accurate classification of obfuscated android malware. In: Proceedings of the seventh ACM on conference on data and application security and privacy, pp 309–320

  • Tang G, Meng L, Wang H, Ren S, Wang Q, Yang L, Cao W (2020) A comparative study of neural network techniques for automatic software vulnerability detection. In: 2020 International symposium on Theoretical Aspects of Software Engineering (TASE), IEEE, pp 1–8

  • Tree-sitter (2024) Tree-sitter. https://github.com/tree-sitter/tree-sitter

  • Tsipenyuk K, Chess B, McGraw G (2005) Seven pernicious kingdoms: a taxonomy of software security errors. IEEE Secur Privacy Mag 3(6):81–84

    Article  Google Scholar 

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advan Neural Inform Process Syst 30

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2023) Attention is all you need. 1706.03762

  • Wang H, Ye G, Tang Z, Tan SH, Huang S, Fang D, Feng Y, Bian L, Wang Z (2020) Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans Inf Forensics Secur 16:1943–1958

    Article  Google Scholar 

  • Wang W, Nguyen TN, Wang S, Li Y, Zhang J, Yadavally A (2023) Deepvd: Toward class-separation features for neural network vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2249–2261

  • Wei Y, Wang Z, Liu J, Ding Y, Zhang L (2023) Magicoder: source code is all you need. arXiv

  • Wei J, Wang X, Schuurmans D, Bosma M, Chi E, Le Q, Zhou D (2022) Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903

  • Wen XC, Chen Y, Gao C, Zhang H, Zhang JM, Liao Q (2023) Vulnerability detection with graph simplification and enhanced graph representation learning. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), IEEE, pp 2275–2286

  • Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE symposium on security and privacy, IEEE, pp 590–604

  • Yin X (2024) Pros and cons! evaluating chatgpt on software vulnerability. arXiv

  • Ying Z, Bourgeois D, You J, Zitnik M, Leskovec J (2019) Gnnexplainer: generating explanations for graph neural networks. Advan Neural Inform Process Syst 32

  • Yin X, Ni C, Wang S (2024a) Multitask-based evaluation of open-source llm on software vulnerability. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2024.3470333

    Article  Google Scholar 

  • Yin X, Ni C, Wang S, Li Z, Zeng L, Yang X (2024b) Thinkrepair: Self-directed automated program repair. In: Proceedings of the 33rd ACM SIGSOFT international symposium on software testing and analysis, pp 1274–1286

  • Zhang Z, Zhang H, Shen B, Gu X (2022) Diet code is healthy: simplifying programs for pre-trained models of code. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1073–1084

  • Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI open 1:57–81

    Article  Google Scholar 

  • Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Proceedings of the 33rd international conference on neural information processing systems, p 10197–10207

  • Zou D, Zhu Y, Xu S, Li Z, Jin H, Ye H (2021) Interpreting deep learning-based vulnerability detector predictions based on heuristic searching. ACM Trans Softw Eng Methodol 30(2):1–31

    Article  Google Scholar 

Download references

Funding

This work was supported by Zhejiang Pioneer (Jianbing) Project (2025C01198(SD2)), the National Natural Science Foundation of China (Grant No.62202419), the Fundamental Research Funds for the Central Universities (No. 226-2022-00064), Zhejiang Provincial Natural Science Foundation of China (No. LY24F020008), the Ningbo Natural Science Foundation (No. 2022J184), the Key Research and Development Program of Zhejiang Province (No.2021C01105), and the State Street Zhejiang University Technology Center.

Author information

Authors and Affiliations

Authors

Contributions

Chao Ni is the corresponding author. Xin Yin and Liyu Shen co-designed the experiment and wrote the paper. Shaohua Wang participated in the idea proposal stage of the paper.

Corresponding author

Correspondence to Chao Ni.

Ethics declarations

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee.

Informed Consent

Written informed consent was obtained from all authors for the publication of this paper.

Conflicts of Interest

Beyond this, the authors have no conflicts of interest to declare that are relevant to the content of this article.

Clinical Trial Number

Not applicable.

Additional information

Communicated by: Fabio Palomba.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ni, C., Yin, X., Shen, L. et al. Learning-based models for vulnerability detection: an extensive study. Empir Software Eng 31, 18 (2026). https://doi.org/10.1007/s10664-025-10734-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s10664-025-10734-x

Keywords