Nov 1, 2021
Updated: Nov 3, 2021
This dataset contains information about forest fires. This dataset is used to Predict Forest Fires using Meteorological Data. In [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The best RMSE was attained by the naive mean predictor. An analysis to the regression error curve (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect, the SVM model predicts better small fires, which are the majority.
Algorithms to be used, regression, random forest, Support Vector Machines etc.
To Predict the burned area of forest fires by using this dataset.
Data set Link : UCI MLR - https://archive.ics.uci.edu/ml/datasets/forest+fires
Kaggle : - https://www.kaggle.com/elikplim/forest-fires-data-set
Detailed overview of dataset
Records in the dataset = 517 ROWS
Columns in the dataset = 13 COLUMNS
X - x-axis spatial coordinate within the Montesinho park map: 1 to 9
Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9
month - month of the year: 'jan' to 'dec'
day - day of the week: 'mon' to 'sun'
FFMC - FFMC index from the FWI system: 18.7 to 96.20
DMC - DMC index from the FWI system: 1.1 to 291.3
DC - DC index from the FWI system: 7.9 to 860.6
ISI - ISI index from the FWI system: 0.0 to 56.10
temp - temperature in Celsius degrees: 2.2 to 33.30
RH - relative humidity in %: 15.0 to 100
wind - wind speed in km/h: 0.40 to 9.40
rain - outside rain in mm/m2 : 0.0 to 6.4
area - the burned area of the forest (in ha): 0.00 to 1090.84
(this output variable is very skewed towards 0.0, thus it may make
sense to model with the logarithm transform).
Dataset
import pandas as pd
# Load Data
file_loc = "data\\forestfires.csv"
forest_fire_data = pd.read_csv(file_loc)
forest_fire_data.head()
Total Number of Rows and columns in the dataset
# Number of Rows and columns
rows_col = forest_fire_data.shape
print("Total number of Rows in the dataset : {}".format(rows_col[0]))
print("Total number of columns in the dataset : {}".format(rows_col[1]))
Check Details
# Data information
forest_fire_data.info()
Check missing values in the dataset
# Missing Values
forest_fire_data.isna().sum()
Statistical information
# Statistical information
forest_fire_data.describe()
Data Visualization
import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = forest_fire_data.corr()
corr.style.background_gradient(cmap='coolwarm')
Count plot of the month
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.countplot(x= "month",data=forest_fire_data)
Count plot of Day
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.countplot(x= "day",data=forest_fire_data)
Histogram plot of rain.
# Histogram
plt.figure(figsize=(8,5))
sns.histplot(x="rain",data=forest_fire_data)
Histogram plot of FFMC
# Histogram
plt.figure(figsize=(8,5))
sns.histplot(x="FFMC",data=forest_fire_data)
Histogram plot of DMC
# Histogram
plt.figure(figsize=(8,5))
sns.histplot(x="DMC",data=forest_fire_data)
Histogram plot of Tempreture
# Histogram
plt.figure(figsize=(8,5))
sns.histplot(x="temp",data=forest_fire_data)
Other related data
Occupancy Detection Data Set - Classification
Census income Data Set - Classification
Wholesale customer - Classification and Clustering
Online retail dataset - classification, clustering and regression
Cervical Cancer Risk Factor Dataset - classification and clustering
Blood Transfusion service center dataset - Classification
Divorce Predictor Dataset -classification
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.