top of page

Breast Cancer Wisconsin Dataset - Classification




Description :


This dataset provides information about the features which are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. These features describe characteristics of the cell nuclei present in the image. In the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].



Recommended Model :


Algorithms to be used: random forest, svm, logistic regression etc


Recommended Project :

Prediction of developing Breast Cancer



Dataset link:


https://www.kaggle.com/uciml/breast-cancer-wisconsin-data



Overview of data


Detailed overview of dataset:


- Rows = 569

- Columns= 31



1) ID number 2) Diagnosis (M = malignant, B = benign)

Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)



EDA [CODE]


import pandas as pd  

# load data 
data = pd.read_csv('data.csv') 
data.head()

# check details of the dataframe 
data.info()














# check the no.of missing values in each column 
data.isna().sum()

















# statistical information about the dataset 
data.describe()


# data distribution
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(x='diagnosis', data=data) 
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'radius_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'radius_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'radius_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'perimeter_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'perimeter_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'perimeter_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'texture_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'texture_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'texture_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'area_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'area_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'area_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'smoothness_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'smoothness_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'smoothness_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'compactness_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'compactness_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'compactness_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'concavity_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'concavity_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'concavity_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'symmetry_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'symmetry_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'symmetry_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'concave points_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'concave points_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'concave points_worst', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'fractal_dimension_mean', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'fractal_dimension_se', bins=30, kde=False)
plt.show()

fig = sns.FacetGrid(data, col='diagnosis', hue='diagnosis', height=4)
fig.map(sns.histplot, 'fractal_dimension_worst', bins=30, kde=False)
plt.show()


Other datasets for classification:


Spam Text Dataset,

Pima Indians Diabetes Dataset



If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.


bottom of page