0% found this document useful (0 votes)
4 views11 pages

Detection of SQL Injection Attack Using Machine Le

This study explores the use of machine learning techniques to detect SQL injection attacks (SQLIAs) in web applications, highlighting the effectiveness of various models such as decision trees, support vector machines, and neural networks. By analyzing a dataset of legitimate and malicious SQL queries, the research demonstrates high accuracy in distinguishing between benign and harmful queries, emphasizing the importance of feature selection and real-time detection capabilities. The findings suggest that machine learning can significantly enhance the security of database-driven applications against evolving SQLIAs.

Uploaded by

Blender Junior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Detection of SQL Injection Attack Using Machine Le

This study explores the use of machine learning techniques to detect SQL injection attacks (SQLIAs) in web applications, highlighting the effectiveness of various models such as decision trees, support vector machines, and neural networks. By analyzing a dataset of legitimate and malicious SQL queries, the research demonstrates high accuracy in distinguishing between benign and harmful queries, emphasizing the importance of feature selection and real-time detection capabilities. The findings suggest that machine learning can significantly enhance the security of database-driven applications against evolving SQLIAs.

Uploaded by

Blender Junior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

International Journal of Scientific Research in Science and Technology

Available online at : www.ijsrst.com

Print ISSN: 2395-6011 | Online ISSN: 2395-602X doi : https://doi.org/10.32628/IJSRST24114323

Detection of SQL Injection Attack Using Machine Learning


Techniques
Bhanu Pratap Singh1, Prof. Manish Kumar Singhal2
1M.tech Scholar, NRI Institute of Information Science and Technology, Bhopal, Madhya Pradesh, India
Associate Professor & H.O.D, Department of Information Technology (IT), NRI Institute of Information
2

Science and Technology, Bhopal, Madhya Pradesh India

ARTICLEINFO ABSTRACT
SQL injection attacks (SQLIAs) remain a prevalent threat to web
Article History:
applications, exploiting vulnerabilities in database interactions to
Accepted : 27 Nov 2024 compromise data security. Detecting such attacks effectively is crucial for
Published : 27 Dec 2024 ensuring robust application security. This study investigates the use of
machine learning techniques to identify SQLIAs by analyzing patterns and
features in SQL queries. A dataset comprising both legitimate and
Publication Issue : malicious SQL queries is utilized to train and evaluate various machine
Volume 11, Issue 6 learning models, including decision trees, support vector machines, and
November-December-2024 neural networks. The proposed approach achieves high accuracy in
distinguishing between benign and malicious queries, showcasing the
Page Number : potential of machine learning for proactive SQLIA detection. The findings
780-790 highlight the importance of feature selection, algorithm choice, and real-
time detection capabilities in mitigating the risk of SQL injection attacks.
This research provides a foundation for developing intelligent, automated
systems to enhance the security of database-driven applications.
Keywords: SQL Injection, Cross Side Scripting, Denial of Service Attack,
Naïve Bias, Gradient Boosting, etc.

I. INTRODUCTION Traditional defensive mechanisms, such as input


validation and parameterized queries, often fall short
The proliferation of web applications and databases in detecting and mitigating sophisticated SQLi
has made data security a critical concern. One of the attempts.
most prevalent and dangerous threats to database With the advent of advanced technologies, machine
systems is SQL Injection (SQLi) attacks. These attacks learning (ML) has emerged as a promising solution for
exploit vulnerabilities in applications to manipulate enhancing cyber security. Machine learning
database queries, potentially exposing sensitive algorithms can analyze vast amounts of data, identify
information or compromising system integrity. patterns, and detect anomalous behaviors indicative of

Copyright © 2024 The Author(s): This is an open-access article distributed under the terms of the Creative 780
Commons Attribution 4.0 International License (CC BY-NC 4.0)
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

SQLi attacks. Unlike traditional methods, ML-based


approaches adapt to evolving attack strategies,
offering a proactive and dynamic defense mechanism.
This study explores the application of machine
learning techniques in detecting SQL Injection
attacks, emphasizing their accuracy, efficiency, and
adaptability. By leveraging supervised and
unsupervised learning models, the proposed approach
aims to strengthen database security and minimize the
risks associated with SQLi threats.
The growing sophistication of SQL injection
techniques, such as blind, error-based, and time-based
injections, has made it increasingly difficult for
conventional security measures to keep pace. Fig-1 Securing web applications against SQLi attacks
Attackers continuously refine their methods to evade using a novel deep learning approach
detection, exploiting even minor vulnerabilities in
web applications. This has prompted a shift towards II. LITERATURE SURVEY
more intelligent and adaptive defense strategies, such
as machine learning, which can learn from existing Laila Aburashed et.al.(2024) - This research work
data and detect previously unseen attack vectors. presented, SQL Injection is one of the most common
Machine learning models, especially classification vulnerabilities exploited for both privacy breaches
algorithms like decision trees, support vector and financial damage. It remains the top vulnerability
machines (SVM), and deep learning networks, can be on the most recent OWASP Top 10 list, with the
trained on large datasets containing both normal and number of such attacks on the rise. The SQL Injection
malicious queries. These models can then classify Detection Challenge is addressed using machine
new, unseen SQL queries based on their features, such learning algorithms. By employing a classification
as syntax patterns, database operations, and user input method, communications are identified as either SQL
characteristics. By training on labeled attack data, ML Injection or plain text. This research proposes a
models can learn to distinguish between legitimate machine learning framework to assess the feasibility
user inputs and potentially malicious ones, providing of using a machine learning classifier to detect SQL
a dynamic defense against evolving attack techniques. Injection attacks. Classification algorithms such as
Furthermore, machine learning techniques offer the Random Forest, Gradient Boosting, SVM, and ANN
advantage of automation, enabling real-time detection are utilized. As a result, ANN demonstrated superior
and response to SQLi attempts. This reduces the performance and required less time to detect SQL
burden on human analysts, allowing for faster Injection attacks [01].
mitigation of threats. The ability to continuously Hakan Can Altunay et.al. (2023) - This research work
improve the accuracy of detection by feeding new presented, SQL injection attack is one of the cyber
attack data into the models ensures that the defense attack types that puts individuals and institutions in a
system remains robust against emerging SQL injection difficult situation in terms of data disclosure and
variants. material damage. This attack type, which is frequently
preferred due to its case of use, has emerged with
different usage features in recent years. In this study,

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 781
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

various machine learning algorithms were tested to applications developed in a weak and stable typed
detect SQL Injection attacks. In the data pre- language. So we looked for a big point where an
processing section, feature extraction was performed attacker may use a native notion or existing ways to
using Natural Language Processing techniques. While access or hack data from the database. What kind of
the relevance of expressions to each other was issue will arise if an attacker tries to make a database
calculated with the Word Level TF-IDF method, term vulnerable by injecting SQL that affects the results
search was also performed [02]. [04].
Maha Alghawazi et. al. (2022) - This research work Binh An Pham et.al. (2020) - This research work
presented , An SQL injection attack, usually occur presented, —SQL injection attacks (SQLi attacks)
when the attacker(s) modify, delete, read, and copy have proven their danger on several website types
data from database servers and are among the most such as social media, e-shopping, etc. In order to
damaging of web application attacks. A successful prevent such attacks from occurring, this research
SQL injection attack can affect all aspects of security, effort investigates on efficient ways of detection and
including confidentiality, integrity, and data prevention, so that we can preserve each cyberuser’s
availability. SQL (structured query language) is used right of privacy. This research effort is aimed at
to represent queries to database management systems. investigating and looking at different ways to protect
Detection and deterrence of SQL injection attacks, for websites from SQL injection attacks. In this research
which techniques from different areas can be applied effort, machine learning algorithms were used to
to improve the detect ability of the attack, is not a detect such SQLi attacks. Machine Learning (ML)
new area of research but it is still relevant. Artificial algorithms are algorithms that can learn from the data
intelligence and machine learning techniques have provided and infer interesting results from the
been tested and used to control SQL injection attacks, dataset. We used SQL code and user input as our data
showing promising results. The main contribution of and ML algorithms to detect malicious code [05].
this paper is to cover relevant work related to Tareek Pattewar et.al, (2019) - In this research work
different machine learning and deep learning models presented, SQL injection attack is a very serious
used to detect SQL injection attacks. With this problem of web applications. Finding the efficient
systematic review, we aims to keep researchers up-to- solution of this problem is essential. Researchers have
date and contribute to the understanding of the developed many techniques to detect and prevent this
intersection between SQL injection attacks and the vulnerability. There is no appropriate solution that
artificial intelligence field [03]. can prevent all types of SQL injection attacks. SQL
Ravi Raj Choudhary et.al. (2021) - This research Injection attacks remain to be one of top concerns for
work presented, a web component, and that web- cyber security researchers. Signature based SQL
based component, or web application, was accessible Injection detection methods are no longer reliable as
to the general public over the Internet. It is attackers are using new types of SQL Injections each
vulnerable to attack by the adversary. It is not time. There is a need for SQL Injection detection
uncommon for web and mobile applications to have a mechanisms that are capable of identifying new,
lackadaisical flaw that adversely affects their security never before seen attacks. Applying machine learning
and privacy. Database vulnerability attacks are to the field of cyber-security is being considered by
becoming more common and harmful. It is critical to many researchers. Two machine learning
understand software defects and, more importantly, classification algorithms are implemented on the
prevent these security issues. SQL injection and XSS problem, which are, Na¨ıve Bayes Classifier and
scan the same security code, often employed in online Gradient Boosting Classifier. Na¨ıve Bayes classifier

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 782
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

machine learning model provides results with an  Vectorization: The text data will be converted
accuracy of 92.8%. Ensemble learning methods are into numerical form. Techniques such as TF-IDF
said to provide results with better accuracy as they (Term Frequency-Inverse Document Frequency)
implement multiple simple classifiers to improve error will be used to represent the importance of each
and accuracy [06]. word or token in the context of the entire
. dataset. This allows the machine learning models
III.PROPOSED METHOD to process text effectively.
Once the data is cleaned and vectorized, it will be
The goal of this project is to build a model that can divided into training, validation, and testing sets to
detect SQL Injection (SQLi) attacks in SQL queries. evaluate the models' performance.
SQLi attacks exploit vulnerabilities in the database C. Feature Engineering
query processing, leading to unauthorized access and  TF-IDF Representation: The primary feature
data breaches. This model will classify SQL queries as engineering technique involves transforming the
either malicious (SQLi) or benign (safe), helping to SQL queries into numerical vectors using the TF-
prevent security vulnerabilities in web applications. IDF method. This method assigns a weight to
A. Dataset Description each word based on its frequency in the
The dataset used for this project consists of SQL document and its rarity across the entire dataset.
queries labeled as:  N-grams: In addition to individual words,
 Malicious (SQLi): Labeled as 1, representing a sequences of words (n-grams) will also be
harmful SQL Injection attempt. considered as features. This helps capture
 Benign (Safe): Labeled as 0, representing a contextual information in the queries, which
normal, safe SQL query. may be crucial for identifying attack patterns in
Each row in the dataset represents a single SQL query, SQLi.
and the associated label indicates whether the query is  Embeddings (Optional): If necessary, pre-trained
benign or malicious. The data is likely to be text- word embeddings like Word2Vec or GloVe could
based, and preprocessing is necessary to convert the be used to capture semantic meaning between
raw SQL queries into a usable form for machine words, improving model performance by
learning. considering word relationships.
B. Data Preprocessing D. Model Development
 Text Cleaning: The raw SQL queries may contain Random Forest:
noise such as special characters, extra whitespace, Random Forest is an ensemble learning technique
or null values. This will be cleaned to ensure that builds multiple decision trees and aggregates
uniformity and to eliminate unwanted elements their results. It works by learning patterns from the
that might interfere with model performance. data, such as identifying which features (e.g., words or
 Text Normalization: All text will be converted to n-grams) contribute to the classification of a query as
lowercase to maintain consistency across all benign or malicious. This model is robust, less prone
queries, as SQL queries may have different case to overfitting, and can handle complex relationships
conventions. in data.
 Tokenization: The cleaned text will be split into How it Works:
smaller units, such as words or tokens, to better Random Forest is an ensemble learning technique
understand the structure of the query. that combines multiple decision trees to make
predictions. Each decision tree is trained on a subset

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 783
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

of the data and features, which introduces tokens in a SQL query). The model assumes that each
randomness to improve the model’s generalization feature is conditionally independent given the class
ability. label.
 Training: Random Forest creates multiple  Bayes' Theorem: The probability of a class CCC
decision trees by using bootstrap sampling, where given the features XXX is calculated using Bayes'
each tree is trained on a random subset of the theorem:
training data. At each node of the tree, the
algorithm selects a random subset of features to
split the data, ensuring that the trees are diverse. Where
This randomness helps reduce over fitting. P (Ć/X) = is the probability of the class given the
 Prediction: Once all the trees are trained, each features
tree makes a prediction. The final prediction is P (X / Ć) = is the likelihood of the feature given the
determined by taking a majority vote from all the class
trees. This aggregation improves the model’s P (Ć) = is the prior probability of the class
robustness and makes it less sensitive to P (X) = is the probability of the feature
fluctuations or noise in the data. Training: During training, Naive Bayes calculates the
Strengths of Random Forest: likelihood of each word or token appearing in benign
 Robustness: Random Forest reduces overfitting and malicious queries. The model then uses these
by aggregating predictions from multiple decision probabilities to classify new queries.
trees, leading to more stable and accurate results. Classification: For a new SQL query, Naive Bayes
 Handles High-dimensional Data: It works well computes the likelihood of the query belonging to
with data containing many features (e.g., a large each class (benign or malicious) based on the
vocabulary in SQL queries) and can effectively probabilities of the individual words and selects the
learn complex patterns and relationships. class with the highest probability.
 Feature Importance: Random Forest can rank Strengths of Naive Bayes:
features based on their importance in predicting Simplicity and Speed: Naive Bayes is computationally
the target variable, helping us identify which efficient, making it well-suited for large datasets
words or tokens are most indicative of SQLi where quick predictions are required.
attacks. Effective for Text Classification: The model performs
Naive Bayes: Naive Bayes is a probabilistic classifier well in text classification tasks, where the goal is to
based on Bayes' theorem. It works well for text classify a document (in this case, an SQL query) based
classification tasks, especially when features (in this on word frequencies or patterns.
case, words or tokens in a query) are conditionally Convolutional Neural Networks (CNN):
independent. Despite its simplicity, Naive Bayes is CNNs, a type of deep learning model, are powerful at
often effective for detecting patterns in textual data. detecting local patterns in data. For text classification,
In this project, it will help in distinguishing between CNNs can identify important sequences of words or
benign and malicious queries based on word n-grams in SQL queries. This is especially useful for
frequencies and probabilities. detecting the structure of SQLi attacks, which often
How it Works: involve specific patterns or keywords in SQL queries.
Naive Bayes is a probabilistic classifier based on Bayes' CNNs can automatically learn these features and
Theorem, which calculates the probability of a class make predictions based on learned patterns in the
(benign or malicious) given the features (words or data.

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 784
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

How it Works:
Convolutional Neural Networks (CNNs) are deep
learning models originally designed for image
classification but have proven effective in text
classification tasks, such as detecting SQL Injection
attacks.
Convolutional Layers: The CNN applies filters (also
called kernels) to input sequences of words or
characters. These filters slide over the text data to
detect local patterns, such as specific phrases or n-
grams indicative of SQLi attacks (e.g., ―OR 1=1‖ or
Fig - 2 Simple CNN model
―DROP TABLE‖). These patterns may represent SQL
commands commonly used in injections.
Each model has unique strengths:
Pooling Layer: After convolution, a pooling layer
Random Forest: An ensemble method that builds
reduces the size of the data by taking the maximum
multiple decision trees to capture complex
(or average) value from a set of features, which helps
relationships in the data. It is robust and handles
in capturing the most important patterns while
high-dimensional data well. It also provides insights
reducing computational complexity.
into feature importance, which is valuable for
Fully Connected Layers: Once the features have been
understanding what makes a query malicious.
extracted by the convolutional and pooling layers, the
Naive Bayes: A probabilistic model that is fast and
data is passed through fully connected layers that
effective for text classification tasks. It works by
perform the final classification, determining whether
calculating the probability of a query being benign or
the SQL query is benign or malicious.
malicious based on the frequencies of words in the
Training: CNNs are trained using back propagation,
query. It is particularly efficient for large datasets.
where the weights of the filters are adjusted based on
Convolutional Neural Networks (CNN): A deep
the errors made in predictions. This allows the model
learning model that automatically learns patterns
to learn which sequences of words are most indicative
from raw data. CNNs are effective at detecting local,
of SQLi attacks.
sequential patterns in SQL queries, making them
Strengths of CNN:
powerful for detecting sophisticated SQLi attacks.
Automatic Feature Extraction: CNNs automatically
Hyper parameter Tuning and Optimization
learn relevant features from raw text data, eliminating
Once the models are trained, hyper parameter tuning
the need for manual feature engineering. They are
will be performed using methods such as Grid Search
particularly good at detecting specific patterns in
or Random Search to optimize the performance of the
sequences of words.
models. The goal is to find the best combination of
Pattern Detection: CNNs excel at recognizing
hyper parameters (such as the number of trees in
complex, local patterns in data. This is important for
Random Forest or the kernel in Naive Bayes) that
SQLi detection, where malicious queries often contain
maximizes model accuracy.
specific sequences of keywords.
Cross-validation will be used to evaluate the models
Scalability: CNNs can handle large amounts of data
and ensure that they generalize well to unseen data,
and can improve their performance as more labeled
minimizing over fitting.
data is provided, making them suitable for large-scale
applications.

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 785
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

IV. SIMULATION RESULT positives entirely. This is crucial in minimizing the


misclassification of benign queries as malicious.
A. Results and Discussion Recall:
The performance of three models—Random Forest, Recall measures the proportion of true positives
Naive Bayes, and Convolutional Neural Networks identified among all actual positives.
(CNN)—was evaluated on the task of SQL Injection Random Forest had the lowest recall at 0.731788,
(SQLi) detection using several key metrics: Precision, indicating it missed a significant number of malicious
Recall, F1-Score, and Accuracy. The results of the queries. In contrast, both Naive Bayes and CNN
evaluation are summarized in Table 1 below: achieved a recall of 0.917012, demonstrating their
ability to capture a larger proportion of malicious
Table -1 Model Performance Comparison Table queries.
Metric Random Forest Naive Bayes CNN Accuracy:
F1-Score 0.845124 0.956710 0.956710 Accuracy reflects the overall correctness of the model
Precision 1.000000 1.000000 1.000000 in classifying both benign and malicious queries.
Recall 0.731788 0.917012 0.917012 Naive Bayes and CNN achieved the highest accuracy
Accuracy 0.903571 0.976190 0.976190 of 97.6%, significantly outperforming Random Forest,
which achieved 90.4%. This highlights the
effectiveness of Naive Bayes and CNN in providing
reliable predictions.
C. Result
From the results, it is evident that Naive Bayes and
CNN outperform Random Forest in all metrics except
precision, where all models performed equally well.
While Random Forest demonstrated acceptable
performance with an accuracy of 90.4%, it was less
Fig-3 Model Performance Comparison: Random effective in identifying all malicious queries, as shown
Forest, Naive Bayes, and CNN by its lower recall and F1-Score.
B. Metric Descriptions and Analysis Naive Bayes and CNN emerged as the best-performing
F1-Score: models with:
The F1-Score balances the trade-off between  A high F1-Score of 0.956710, indicating a strong
precision and recall, offering a single measure of balance between precision and recall.
performance.  A high recall of 0.917012, showing their ability
Random Forest achieved an F1-Score of 0.845124, to detect malicious queries effectively.
which is lower compared to Naive Bayes and CNN  The highest accuracy of 97.6%, highlighting their
(both scoring 0.956710). This indicates that Random overall reliability in classifying queries.
Forest struggles to maintain a balance between Between Naive Bayes and CNN, the choice depends
precision and recall. on the application's requirements. CNN is better
Precision: suited for tasks requiring deep learning's ability to
Precision measures the proportion of true positives capture complex patterns, while Naive Bayes offers
among all predicted positives. simplicity, speed, and comparable performance,
All three models achieved perfect precision making it ideal for resource-constrained
(1.000000), showing that they were able to avoid false environments.

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 786
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

Fig-4 Training and Validation Accuracy

Fig – 6 Confusion Matrix Explanation

D. Confusion Matrix Explanation


The confusion matrix is a powerful tool to evaluate
the performance of a classification model. It provides
detailed insights into how well the model classifies
both the positive and negative classes. Below is a
breakdown of the confusion matrix for the given
results:
Confusion Matrix Overview:

Fig -5 Training and Validation Loss

Training and Validation Accuracy Detailed Analysis of the Confusion Matrix


This plot depicts the accuracy of the CNN model on Components:
the training and validation datasets over 10 epochs of True Negatives (Top Left - 599):
training.  These are cases where the model correctly
Training Accuracy (Blue Line): identified negative instances.
The training accuracy improves consistently over the  Out of all negative instances in the dataset, 599
10 epochs and reaches a high value of approximately were correctly predicted as negative.
0.98 by the final epoch.  This high count of true negatives indicates the
Validation Accuracy (Orange Line): model's ability to effectively distinguish benign
The validation accuracy follows a similar trend, SQL queries.
reaching a comparable high value (~0.97) but starts to False Positives (Top Right - 20):
slightly plateau or fluctuate after 5-6 epochs.  These are cases where the model incorrectly
Training and Validation Loss classified a negative instance as positive.
This plot tracks the loss (error) for the training and  20 benign queries were misclassified as malicious.
validation datasets over 10 epochs of training. Loss  While this number is low relative to the total
measures how well (or poorly) the model's predictions dataset, minimizing false positives is critical in
align with the actual target values, with lower values real-world applications to avoid unnecessary flags
indicating better performance. or interruptions in benign operations.

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 787
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

False Negatives (Bottom Left - 0): 1. HTML (Hypertext Markup Language):


 These are cases where the model failed to o Used to define the structure of the form.
identify a positive instance. o Includes fields for user input and buttons to
 0 malicious queries were misclassified as benign. submit the query.
 The absence of false negatives is a significant 2. CSS (Cascading Style Sheets):
strength of the model, as it ensures that no o Used to style the form, making it visually
malicious queries are missed, which is crucial in appealing and enhancing user experience.
security systems. 3. Bootstrap:
True Positives (Bottom Right - 221): o A responsive design framework that ensures
 These are cases where the model correctly the form looks good across devices.
identified positive instances. o Provides prebuilt components (e.g., buttons,
 221 malicious queries were accurately classified alerts) and responsive grid layouts.
as malicious. G. Implementation Overview
 A high count of true positives demonstrates the Frontend Implementation:
model's capability to accurately detect attacks.  Create an HTML form with a text field for users
E. Performance Interpretation: to input their SQL query and a submit button.
Exceptional Detection Rate: The model's ability to  Use Bootstrap classes to style the form (e.g.,
achieve no false negatives (FN = 0) ensures that every form-control for input fields and btn-primary for
malicious SQL query is detected. This is critical for buttons).
security systems where missing even a single  Include alerts to show validation results (e.g., a
malicious query could lead to potential breaches or Bootstrap alert to warn about SQL injection).
compromises. Flask Backend:
 Low False Positives: The model has a false  Define routes to handle the form submission
positive rate of only 20. While this is a small (/validate endpoint).
fraction, efforts to further reduce false positives Process the input query by:
can enhance system efficiency and prevent  Checking for common SQL injection patterns (or,
benign queries from being flagged unnecessarily. --, ;, etc.).
 High True Positives and True Negatives: The  Displaying appropriate warnings if malicious
model correctly classified 599 negative queries input is detected.
and 221 positive queries, showcasing its Render the results on the same page or redirect to a
reliability in both identifying attacks and results page
recognizing safe queries.
F. Explanation of Frontend and Backend
Implementation for SQL Query Validation Form
Frontend Design
The frontend of the SQL query validation system is
built using HTML, CSS, and Bootstrap. These
technologies are used to create a user-friendly and Fig 7:- SQL Injection Detected
responsive interface where users can input SQL
queries for validation. Steps in the Application Workflow
User Input:
The user inputs a query, such as a' or 1 = 1; --,1.

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 788
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

Validation Logic: decision trees, support vector machines, or neural


 The backend Flask app receives the query. networks, the system can efficiently identify and
 It parses and checks for suspicious patterns: mitigate malicious inputs in real-time. The research
 or operator: Often used to bypass conditions. emphasizes the importance of feature selection,
 ; character: Allows execution of multiple SQL dataset quality, and algorithm optimization to achieve
commands. high accuracy and low false-positive rates.
 operator: Comments out part of the SQL Furthermore, the study underscores the adaptability
statement. of machine learning models to evolving attack
Output: patterns, making them a robust choice for dynamic
 If a potential SQL injection is detected, a warning security environments. The confusion matrix
message is displayed on the page. highlights the model's overall excellent performance,
 The warning highlights the risky components in with zero false negatives, ensuring maximum security
the query (e.g., or, --, ;). by detecting all malicious queries, and a low false
 Safe input results in a confirmation message positive count, indicating efficient classification of
indicating no SQL injection was detected. benign queries. These results make the model highly
Recommendations: suitable for deployment in SQL injection detection
 The page displays best practices for query safety, tasks, where both accuracy and reliability are
including sanitizing inputs and using paramount.
parameterized queries.
VI. REFERENCES

[1]. Laila Aburashed1,Marah AL Amoush1, Wardeh


Alrefai1 ―SQL Injection Attack Detection Using
Machine Learning Algorithms ‖ ISSN: 3030-
Fig - 8 SQL Injections Detected Safe Query 5241, 15 June 2024.
[2]. Hakan Can Altunay. "Detection of SQL
The image indicates a potential security vulnerability Injection Attacks Using Machine Learning
in a web application that validates SQL queries. It Algorithms Based on NLP-Based Feature
shows a form where a user can input an SQL query, Extraction‖ 11 December 2023.
and despite the entered query "x' or full_name like [3]. Maha Alghawazi , Daniyal Alghazzawi and
'%bob%,1" being a known SQL injection attempt, the Suaad Alarifi ―Detection of SQL Injection
system incorrectly marks it as "safe." This highlights Attack Using Machine Learning Techniques‖
the importance of robust security measures and Volume 2, Issue 4 ,20 September 2022 .
accurate validation processes to prevent SQL injection [4]. Ravi Raj Choudhary; Susheela Verma; Gaurav
attacks. Meena. "Detection of SQL Injection attack
Using Machine Learning‖ 17-19 December
V. CONCLUSION 2021.
[5]. Binh An Pham, Vinitha Hannah Subburaj ―An
The use of machine learning techniques for detecting Experimental setup for Detecting SQLi Attacks
SQL injection attacks demonstrates a promising using Machine Learning Algorithms‖ Volume 8,
solution to one of the most prevalent web application No. 1, 2020.
security threats. By leveraging algorithms such as

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 789
Bhanu Pratap Singh et al Int J Sci Res Sci & Technol. November-December-2024, 11 (6) : 780-790

[6]. Tareek Pattewar,Hitesh Patil, Harshada Patil, Vulnerabilities. Proc. - Int. Comput. Softw.
Neha Patil, Muskan Taneja, Tushar Wadile Appl. Conf., 1(August), 87-94. Doi:
"Detection of SQL Injection using Machine 10.1109/COMPSAC.2007.43.
Learning‖ Volume: 06, Issue: 11, ISSN: 2395- [15]. D. Appelt, C. D. Nguyen, L. C. Briand, and N.
0072, Nov 2019. Alshahwan. (2014). Automated Testing for SQL
[7]. S. Steiner, D. Conte de Leon, and J. Alves-Foss. Injection Vulnerabilities: An Input Mutation
(2017). AStructured Analysis of SQL Injection Approach. 2014 Int. Symp. Softw. Test. Anal.
[8]. Runtime MitigationTechniques. Proc. 50th ISSTA 2014 - Proc., May, 259-269. Doi:
Hawaii Int. Conf. Syst. Sci., 2887-2895.Doi: 10.1145/2610384.2610403.
10.24251/hicss.2017.349. [16]. A. Ciampa, C. A. Visaggio, and M. Di Penta.
[9]. W. G. J. Halfond, J. Viegas, and A. Orso. (2008). (2010). A Heuristic-based Approach for
AClassification of SQL Injection Attacks and Detecting SQL-injection Vulnerabilities in Web
Countermeasures.Prev. Sql Code Inject. By Applications. Proc. - Int. Conf. Softw. Eng.,
Comb. Static Runtime Anal., 53. January, 43-49. Doi: 10.1145/1809100.1809107.
[10]. P. Kumar and R. K. Pateriya. (2012). [17]. Y. Shin. (2004). Improving the Identification of
ASurveyonSQLInjection Attacks, Detection and Actual InputManipulation Vulnerabilities, 1-4.
Prevention Techniques. 20123rd Int. Conf. [12] W. G. J. Halfond and A. Orso. (2005).
Comput. Commun. Netw. Technol. AMNESIA: Analysisand Monitoring for
ICCCNT2012.Doi:10.1109/ICCCNT.2012.63960 Neutralizing SQL-injection Attacks.
96. 20thIEEE/ACM Int. Conf. Autom. Softw. Eng.
[11]. G. Wassermann and Z. Su. (2004). An Analysis ASE2005, 174-183.Doi:
FrameworkforSecurity in Web Applications. 10.1145/1101908.1101935.
SAVCBS 2004 Specif. Verif.Component-Based [18]. R. Mui and P. Frankl. (2010). Preventing
Syst., 70. [Online]. SQLInjectionthrough Automatic Query
Available:http://web.cs.ucdavis.edu/~su/publica Sanitization with ASSIST. Electron.Proc. Theor.
tions/savcbs.pdf%0Ahttp://citeseerx.ist.psu.edu/ Comput. Sci., 35, 27-38. Doi: 10.4204/eptcs.35.3.
viewdoc/download?doi=10.1.1.72.2255&rep=rep [19]. R. Dharam and S. G. Shiva. (2012). Runtime
1&type=pdf#page=82. MonitoringTechnique to handle Tautology
[12]. C. Gould, Z. Su, and P. Devanbu. (2004). based SQL InjectionAttacks.Int. J. Cyber-
JDBCChecker:A Static Analysis Tool for Security Digit. Forensics (IJCSDF), 1(3), 189-
SQL/JDBC Applications. Proc. - Int. Conf. 203,
Softw. Eng., 26, 697-698. Doi: [20]. W. Qing and C. He. (2016). The Research of
10.1109/icse.2004.1317494. anAOP-basedApproach to the Detection and
[13]. Y. Kosuga, K. Kono, M. Hanaoka, M. Defense of SQLInjectionAttack, 731-737. Doi:
Hishiyama, and Y. Takahama. (2007). Sania: 10.2991/aest-16.2016.98.
Syntactic and Semantic Analysis for Automated [21]. A. Ghafarian. (2018). A Hybrid Method for
Testing Against SQL Injection. Proc. - Annu. DetectionandPrevention of SQL Injection
Comput. Secur. Appl. Conf. ACSAC, 107-116. Attacks. Proc. Comput. Conf.2017, 833-838.
Doi: 10.1109/ACSAC.2007.20. Doi: 10.1109/SAI.2017.8252192.
[14]. X. Fu, X. Lu, B. Peltsverger, S. Chen, K. Qian,
and L. Tao. (2007). A Static Analysis
Framework for Detecting SQL Injection

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 1 1 | Issue 6 790

You might also like