Oct 28, 2021
Updated: Nov 3, 2021
This dataset provides information about the room's environmental factors such as temperature, humidity, light, Co2 Humidity ratio and occupancy. We can use this dataset for predicting occupancy in an office room. There are three dataset available, one for training and two for testing the models considering the office door opened and closed during occupancy. The target variable occupancy 0 and 1,
Algorithms to be used , Random forest, svm’s, GaussianNB classifier, Decision tree Classifier, Logistic regression etc
Dataset for Predicting room occupancy using environmental factors.
Data set Link : UCI MLR - https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
Kaggle : - https://www.kaggle.com/robmarkcole/occupancy-detection-data-set-uci
Detailed overview of dataset
Records in the dataset = 8143ROWS
Columns in the dataset = 7 COLUMNS
Data is provided with date-time information and six environmental measures taken each minute over multiple days, specifically
date : - Data is provided with date time information and six environmental factors taken per minute over multiple days (year-month-day hour:minute:second)
Temperature : Room Temperature, in Celsius.
Humidity : Relative Humidity in percentage
light : Light measures in Lux
CO2 : Carbon dioxide measured in parts per million (ppm)
HumidityRatio : Humidity Ratio, Derived quantity from temperature and relative humidity, in kilogram of water vapours /kg of air
Target variable :
Occupancy, 0 or 1, 0 for not occupied, 1 for occupied status
Dataset
import pandas as pd
#Load data
file_loc = "data\\datatraining.txt"
occupancy_data = pd.read_csv(file_loc)
occupancy_data.head()
Total number of rows and column in the dataset
r_c=occupancy_data.shape
print("Total number of record in the dataset : ",r_c[0])
print("Total number of columns in the dataset : ",r_c[1])
Check Details
# Data information
occupancy_data.info()
Check the number of missing values in the dataset
# check missing values in each column
occupancy_data.isna().sum()
Statistical information
# statistical information
occupancy_data.describe()
Data Visulization
Correlation
import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = occupancy_data.corr()
corr.style.background_gradient(cmap='coolwarm')
Box plot
num_cols = occupancy_data.select_dtypes(exclude ='object').columns
for col in num_cols:
plt.boxplot(occupancy_data[col])
plt.xlabel(col)
plt.show()
occupancy_data.set_index('date', inplace=True)
for i in range(len(num_cols)):
occupancy_data.iloc[:,[i]].plot(figsize=(15,5))
plt.xticks(rotation=45,size=10)
plt.yticks(size=10)
Tempreture
Humidity
Light
Co2
Humidity Ratio
Student performance dataset - Classification and Regression
Census income Data Set - Classification
Wholesale customer - Classification and Clustering
Online retail dataset - classification, clustering and regression
Cervical Cancer Risk Factor Dataset - classification and clustering
Blood Transfusion service center dataset - Classification
Divorce Predictor Dataset -classification
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.