Room Occupancy detection data set - classification

Description :

This dataset provides information about the room's environmental factors such as temperature, humidity, light, Co2 Humidity ratio and occupancy. We can use this dataset for predicting occupancy in an office room. There are three dataset available, one for training and two for testing the models considering the office door opened and closed during occupancy. The target variable occupancy 0 and 1,

Recommended Model :

Algorithms to be used , Random forest, svm’s, GaussianNB classifier, Decision tree Classifier, Logistic regression etc

Dataset link

Data set Link : UCI MLR - https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+

Kaggle : - https://www.kaggle.com/robmarkcole/occupancy-detection-data-set-uci

Overview of data

Detailed overview of dataset

  • Records in the dataset = 8143ROWS

  • Columns in the dataset = 7 COLUMNS

Data is provided with date-time information and six environmental measures taken each minute over multiple days, specifically

  1. date : - Data is provided with date time information and six environmental factors taken per minute over multiple days (year-month-day hour:minute:second)

  2. Temperature : Room Temperature, in Celsius.

  3. Humidity : Relative Humidity in percentage

  4. light : Light measures in Lux

  5. CO2 : Carbon dioxide measured in parts per million (ppm)

  6. HumidityRatio : Humidity Ratio, Derived quantity from temperature and relative humidity, in kilogram of water vapours /kg of air

  • Target variable :

Occupancy, 0 or 1, 0 for not occupied, 1 for occupied status



import pandas as pd
#Load data

file_loc = "data\\datatraining.txt"
occupancy_data = pd.read_csv(file_loc)

Total number of rows and column in the dataset

print("Total number of record in the dataset : ",r_c[0])
print("Total number of columns in the dataset : ",r_c[1])

Check Details

# Data information

Check the number of missing values in the dataset

# check missing values in each column

Statistical information

# statistical information

Data Visulization


import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = occupancy_data.corr()

Box plot

num_cols = occupancy_data.select_dtypes(exclude ='object').columns
for col in num_cols:

occupancy_data.set_index('date', inplace=True)

for i in range(len(num_cols)):





Humidity Ratio

