Superviz25-SQL: High-Quality Dataset to Empower Unsupervised SQL Injection Detection Systems

The paper

Title: Superviz25-SQL: High-Quality Dataset to Empower Unsupervised SQL Injection Detection Systems
Authors: Grégor Quetel, Eric Alata, Pierre-François Gimenez, Thomas Robert, Laurent Pautet
Venue: Assessment with New methodologies, Unified Benchmarks, and environments, of Intrusion detection and response Systems (ANUBIS) at ESORICS 2025. Abstract: The digitalization of public and private services has led to more sophisticated and serious cybersecurity threats. Among them, SQL injection attacks leverage user inputs to remotely execute malicious actions on a database, such as data exfiltration and deletion, or privilege escalation. They are regularly classified as one of the most prominent threats to web services. Intrusion detection systems are widely used to detect such injection attacks and react to them, but it is difficult to assess their actual effectiveness and compare them because of a lack of high-quality datasets. Current SQL injection detection datasets lack diversity, are poorly documented, and the generated samples are not representative of real-world infrastructures. This article presents a new dataset Superviz25-SQL, whose design is structured around four quality dimensions: realism, diversity, benchmarking capabilities and the presence of good documentation. We examine the dataset diversity using lexical, syntactic and semantic metrics, and demonstrate that its size is sufficient to evaluate data-intensive detectors. Finally, we provide nine classical and state-of-the art SQL injection detection pipelines as baselines for future works.
Pre-print: https://hal.science/hal-05314211
The dataset: https://zenodo.org/records/17086037

Built with Hugo
Theme Stack designed by Jimmy