The data extracted by Barry Becker using the1994 census dataset. Dataset contains 14 attributes consisting of 8 categorical and 6 continuous attributes containing information about age, education, nationality, marital status, relationship status, occupation, work classification, gender, race, working hours per week, capital loss and capital gain. The target variable in the dataset income level which predicts whether a person earns more than 50 thousand dollars per year or not based on the given set of attributes.
Recommended Model :
Algorithms to be used Decision tree Classifier, Random forest, svm’s, Logistic regression etc
Recommended Projects :
Determine the weather a person makes over 50,000 a year, predict income
Data set Link : https://archive.ics.uci.edu/ml/datasets/census+income
Overview of data
Detailed overview of dataset
Records in the dataset = 32561 ROWS
Columns in the dataset = 15 COLUMNS
age: Age of person (continuous).
workclass: The working sector of a person (Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.)
fnlwgt: final weight The weights on the Current Population Survey (CPS) files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for the US by the Population Division here at the Census Bureau. ( continuous).
Education: Qualification (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.)
education-num : Education number continuous.
marital-status: Marital status (Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.)
Occupation: Occupation of person (Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspect, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.)
relationship: (Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.)
race: race of person ( White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.)
sex : Gender of person Female, Male.
capital-gain: Capital gain of a person per year ( continuous.)
Capital-loss: Capital Loss of person per year ( continuous)
hours-per-week: Work hours per Week (continuous)
native-country: (United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.)
Target variable :
Income: -Earn money >50K,<=50K per year.
# load data import pandas as pd file_loc="data\\adult.csv" census_data = pd.read_csv(file_loc) census_data.head()
Total Number of Rows and Columns in the dataset
shape=census_data.shape print("Total records in the dataset :", shape) print("Total columns in the dataset :", shape)
Check the details of dataset
# Data information census_data.info()
Check the missing values in the dataset.
# Check the missing values in each column census_data.isna().sum()
# Statistical information about the dataset census_data.describe()
Data Visualization :
import seaborn as sns import matplotlib.pyplot as plt # correlation corr = census_data.corr() corr.style.background_gradient(cmap='coolwarm')
Count plot of income
sns.set_style("whitegrid") plt.figure(figsize = (8,5)) sns.countplot(x='income', data=census_data) plt.show()
Count plot of gender
sns.countplot(x='sex', data=census_data) plt.show()
Count plot of Workclass
plt.figure(figsize = (18,10)) sns.countplot(x='workclass', data=census_data) plt.show()
Other related data
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.