top of page

Room Occupancy detection data set - classification

Updated: Nov 3, 2021


Description :


This dataset provides information about the room's environmental factors such as temperature, humidity, light, Co2 Humidity ratio and occupancy. We can use this dataset for predicting occupancy in an office room. There are three dataset available, one for training and two for testing the models considering the office door opened and closed during occupancy. The target variable occupancy 0 and 1,


Recommended Model :


Algorithms to be used , Random forest, svm’s, GaussianNB classifier, Decision tree Classifier, Logistic regression etc


Recommended Projects :


Dataset for Predicting room occupancy using environmental factors.


Dataset link



Overview of data


Detailed overview of dataset

  • Records in the dataset = 8143ROWS

  • Columns in the dataset = 7 COLUMNS


Data is provided with date-time information and six environmental measures taken each minute over multiple days, specifically


  1. date : - Data is provided with date time information and six environmental factors taken per minute over multiple days (year-month-day hour:minute:second)

  2. Temperature : Room Temperature, in Celsius.

  3. Humidity : Relative Humidity in percentage

  4. light : Light measures in Lux

  5. CO2 : Carbon dioxide measured in parts per million (ppm)

  6. HumidityRatio : Humidity Ratio, Derived quantity from temperature and relative humidity, in kilogram of water vapours /kg of air

  • Target variable :

Occupancy, 0 or 1, 0 for not occupied, 1 for occupied status


EDA[Code]


Dataset


import pandas as pd
#Load data 

file_loc = "data\\datatraining.txt"
occupancy_data = pd.read_csv(file_loc)
occupancy_data.head()


Total number of rows and column in the dataset


r_c=occupancy_data.shape
print("Total number of record in the dataset : ",r_c[0])
print("Total number of columns in the dataset : ",r_c[1])


Check Details


# Data information
occupancy_data.info()


Check the number of missing values in the dataset


# check missing values in each column
occupancy_data.isna().sum()

Statistical information


# statistical information 
occupancy_data.describe()


Data Visulization


Correlation


import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = occupancy_data.corr()
corr.style.background_gradient(cmap='coolwarm')


Box plot

num_cols = occupancy_data.select_dtypes(exclude ='object').columns
for col in num_cols:
  plt.boxplot(occupancy_data[col])
  plt.xlabel(col)
  plt.show()






occupancy_data.set_index('date', inplace=True)

for i in range(len(num_cols)):
    occupancy_data.iloc[:,[i]].plot(figsize=(15,5))
    plt.xticks(rotation=45,size=10)
    plt.yticks(size=10)

Tempreture



Humidity


Light


Co2


Humidity Ratio



Other related data



If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.

bottom of page