top of page

Blood Transfusion service center data set - Classification


ree


Description :


This dataset was taken from a blood transfusion service center in taiwan. This dataset contains information about the blood donor, E.g. duration of last month blood donation, number of times blood donated, how much blood donated, how many times blood donated etc.This dataset consists of 748 instances and 5 attributes. We can use this dataset to predict the whether he/she donated blood in March 2007.


Recommended Model :


Algorithms to be used, TPOT Classifier, logistic regression etc.


Recommended Projects :


To predict the whether he/she donated blood in March 2007

Dataset link



Overview of data


Detailed overview of dataset

  • Records in the dataset = 748 ROWS

  • Columns in the dataset = 5 COLUMNS

  1. Recency (months) - The number of months since the most recent donation

  2. Frequency (times) - Total number of blood donation of particular donor

  3. Monetary (c.c. blood) - Total amount of blood that the donor has donated in C.C

  4. Time (months) - Number of months since the donor's first donation

Target Variable

  1. whether he/she donated blood in March 2007 - This is a binary variable which represents whether the donor donated blood in March 2007 (0 - not donate blood and 1 - blood donate)


EDA[Code]


Blood donation Dataset



import pandas as pd
# Load Data
file_loc = "data\\transfusion.DATA"
blood_transfusion_data = pd.read_csv(file_loc)
blood_transfusion_data.head()


ree

Total number of rows and column in the dataset.


# Number of Rows and columns 
rows_col = blood_transfusion_data.shape
print("Total number of Rows in the dataset : {}".format(rows_col[0]))
print("Total number of columns in the dataset : {}".format(rows_col[1]))


ree

Dataset information

# Data information
blood_transfusion_data.info()

ree

Check the number of missing values in the dataset.


# Check the number of Missing Values in each columns
blood_transfusion_data.isna().sum()


ree

Statistical information.


# Statistical information
blood_transfusion_data.describe()

ree

Data Visualization


Correlation

import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = blood_transfusion_data.corr()
corr.style.background_gradient(cmap='coolwarm')


ree

Plot the count plot of Target Variable

# 0 means no, 1 - yes
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.countplot(x= "whether he/she donated blood in March 2007",data=blood_transfusion_data)

ree

Countplot of Recency (month)

plt.figure(figsize=(8,5))
sns.countplot(x= "Recency (months)",data=blood_transfusion_data)

ree

Count plot of Frequency (times)



plt.figure(figsize=(8,5))
sns.countplot(x= "Frequency (times)",data=blood_transfusion_data)

ree

Count plot of Monetary(c.c. blood)


plt.figure(figsize=(18,5))
sns.countplot(x= "Monetary (c.c. blood)",data=blood_transfusion_data)

ree

Count plot of Time (months)


plt.figure(figsize=(20,5))
sns.countplot(x= "Time (months)",data=blood_transfusion_data)

ree

num_cols = blood_transfusion_data.columns
num_cols=num_cols[:-1]
for col in num_cols:
    sns.set_theme(style="whitegrid")
    plt.figure(figsize=(10,5))
    ax = sns.boxplot(x=blood_transfusion_data[col])

ree

Other related data


If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us
ree


bottom of page