Oct 28, 2021

Room Occupancy detection data set - classification

Updated: Nov 3, 2021

Description :

This dataset provides information about the room's environmental factors such as temperature, humidity, light, Co2 Humidity ratio and occupancy. We can use this dataset for predicting occupancy in an office room. There are three dataset available, one for training and two for testing the models considering the office door opened and closed during occupancy. The target variable occupancy 0 and 1,

Recommended Model :

Algorithms to be used , Random forest, svm’s, GaussianNB classifier, Decision tree Classifier, Logistic regression etc

Recommended Projects :

Dataset for Predicting room occupancy using environmental factors.

Dataset link

Data set Link : UCI MLR - https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+

Kaggle : - https://www.kaggle.com/robmarkcole/occupancy-detection-data-set-uci

Overview of data

Detailed overview of dataset

  • Records in the dataset = 8143ROWS

  • Columns in the dataset = 7 COLUMNS

Data is provided with date-time information and six environmental measures taken each minute over multiple days, specifically

  1. date : - Data is provided with date time information and six environmental factors taken per minute over multiple days (year-month-day hour:minute:second)

  2. Temperature : Room Temperature, in Celsius.

  3. Humidity : Relative Humidity in percentage

  4. light : Light measures in Lux

  5. CO2 : Carbon dioxide measured in parts per million (ppm)

  6. HumidityRatio : Humidity Ratio, Derived quantity from temperature and relative humidity, in kilogram of water vapours /kg of air

  • Target variable :

Occupancy, 0 or 1, 0 for not occupied, 1 for occupied status

EDA[Code]

Dataset

import pandas as pd
 
#Load data
 

 
file_loc = "data\\datatraining.txt"
 
occupancy_data = pd.read_csv(file_loc)
 
occupancy_data.head()

Total number of rows and column in the dataset

r_c=occupancy_data.shape
 
print("Total number of record in the dataset : ",r_c[0])
 
print("Total number of columns in the dataset : ",r_c[1])

Check Details

# Data information
 
occupancy_data.info()

Check the number of missing values in the dataset

# check missing values in each column
 
occupancy_data.isna().sum()
 

Statistical information

# statistical information
 
occupancy_data.describe()

Data Visulization

Correlation

import seaborn as sns
 
import matplotlib.pyplot as plt
 
# correlation
 
corr = occupancy_data.corr()
 
corr.style.background_gradient(cmap='coolwarm')

Box plot

num_cols = occupancy_data.select_dtypes(exclude ='object').columns
 
for col in num_cols:
 
plt.boxplot(occupancy_data[col])
 
plt.xlabel(col)
 
plt.show()
 

occupancy_data.set_index('date', inplace=True)
 

 
for i in range(len(num_cols)):
 
occupancy_data.iloc[:,[i]].plot(figsize=(15,5))
 
plt.xticks(rotation=45,size=10)
 
plt.yticks(size=10)

Tempreture

Humidity

Light

Co2

Humidity Ratio

Other related data

Student performance dataset - Classification and Regression

Census income Data Set - Classification

Wholesale customer - Classification and Clustering

Online retail dataset - classification, clustering and regression

Cervical Cancer Risk Factor Dataset - classification and clustering

Blood Transfusion service center dataset - Classification

Divorce Predictor Dataset -classification

Fire Forest Dataset - Regression

Heart Disease dataset -Classification

If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.