top of page

Predict if the SQL injection query can get the access to the database


Introduction

SQL injection is the extension or modification of a web application's SQL statement by an attacker in order to extract or update information in the database that they are not authorized to access. In this project we are going to study SQL injection queries to get access to the database. For that first we preprocess the data and then apply the machine learning model. The objective of this project is to predict if the SQL injection query can get access to the database.


Description of Data

We have a SQL injection dataset which consists of two columns i.e query and label. The dataset contains 30919 records. The main goal of this model is to build the accurate binary classification model and predict if the SQL injection query can get access to the database.

Data preprocessing


Data Cleaning

Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. There was no null value in this dataset.


Data Transformation

In this project apply the TfidfVectorizer on a query column, So that we can use that data to train the model. machine learning algorithm to improve the prediction. After that, split the dataset into a 70 % training set and 30% testing set.


Data visualization

Following Word cloud visualize the popular word of the query of label 1

Following Word cloud visualize the popular word of the query of label 0

Following bar plot represents the number of label counts in the data set


Algorithm

In this project applied Decision tree classifier machine learning algorithm.. Machine learning is the research that explores the development of algorithms that can learn from data and provide predictions based on it. Preprocessed data fed to the machine learning algorithm to learn the classifier used for predicting the result.


Decision tree classifier

Decision tree classifier is a supervised learning type of machine learning classification algorithm. The data is continuously separated according to various parameters in the decision tree algorithm. A decision tree algorithm work as flowchart-like structure in which each internal node represents a “test” on an attribute (for example : whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.


Result

Results of our experiment's decision tree classification machine learning algorithms are summarized in the following table. The model gives the best accuracy rate of 80 percent.





For Complete solution of this sample assignment mail us on contact@codersarts.com


Also, If you are looking for any kind of coding help in programming languages, Contact us for instant solution




Comments


bottom of page