Create the model that can classify the different species of the Iris flower.
Load the dataset.
Build the model
Train the model
Iris is the family in the flower which contains the several species such as the iris.setosa, iris.versicolor, iris.virginica,etc.
1. Load the datasets:
sklearn with the inbuilt datasets for the iris classification problem.
Scikit learn only works if data is stored as numeric data, irrespective of it being a regression or a classification problem. It also requires the arrays to be stored at numpy arrays for optimization. Since, this dataset is loaded from scikit learn, everything is appropriately formatted.
Let us first understand the datasets
The data set consists of:
3 labels: species of Iris (Iris setosa, Iris virginica and Iris versicolor)
4 features: Sepal length, Sepal width, Petal length, Petal Width in cm
python code to load the Iris dataset.
from sklearn import datasets iris = datasets.load_iris()
create a pandas dataframe from the iris dataset
import pandas as pd data=pd.DataFrame(iris['data'],columns=["Petal length","Petal Width","Sepal Length","Sepal Width"]) data["Species"] = iris["target"]
2. Analysis the iris dataset :
There are different types of plots like bar plot, box plot, scatter plot etc. Scatter plot is very useful when we are analyzing the relation ship between 2 features on x and y axis.
In seaborn library we have pairplot function which is very useful to scatter plot all the features at once instead of plotting them individually.
import seaborn as sns sns.pairplot(data)
We can also use histogram to analysis
# histograms data.hist() plt.show()
3. Splitting the dataset
Since our process involve training and testing ,We should split our dataset. It can be executed by the following code
x = data.drop("Species" ,axis=1) y = data["Species"] from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test = train_test_split(x,y,test_size =0.3)
x_train contains the training features
x_test contains the testing features
y_train contains the training label
y_test contains the testing labels
4. Build the Model
We can use any classification algorithm to solve the problem. but i will go with KNN
K-Nearest Neighbors (KNN)
The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems
from sklearn import neighbors classifier=neighbors.KNeighborsClassifier(n_neighbors=3)
5. Train the Model
We can train the model with fit function.
Now the model is ready to make predictions
6. Make Predictions
predictions by our model can be matched with the expected output to measure the accuracy value.
from sklearn.metrics import accuracy_score print(accuracy_score(y_test,predictions))
So the Accuracy of our model is : 93.3 %
If you have project or assignment files, You can send at firstname.lastname@example.org directly.