top of page

Data Analysis and Privacy Considerations in Health Survey


Introduction

Welcome to this new blog. In this post, we’re going to discuss a new project requirement which is "Data Analysis and Privacy Considerations in Health Survey". This project focuses on analyzing health survey data to address research questions while considering data privacy, employing modern database models and techniques to manipulate, scale, and mitigate privacy risks.


We'll walk you through the project requirements, highlighting the tasks at hand. Then, in the solution approach section, we'll delve into what we've accomplished, discussing the techniques applied and the steps taken and At last In the output section, we'll showcase key screenshots of the results obtained from the project.


Let's get started!


Project Requirement 

1. Module Learning Outcomes

  1. Create a data set using modern database models and technology 

  2. Manipulate a data set to extract statistics and features,

  3. Critically evaluate and apply data mining techniques/tools to build a classifier or regression model, and predict values for new examples

  4. Analyse and communicate issues with scaling up to large data sets, and use appropriate techniques to scale up the computation,

  5. Critically discuss the need for privacy, identify privacy risks in releasing information, and design techniques to mediate these risks.

  6. This assessment will contribute to all the learning outcomes for this module.


2. Assessment Background/Scenario

Data

You will find a dataset called brfss_for_bda_2021.csv which describes a survey around health and associated behavioural factors carried out on a population in the US. It’s in CSV format.


3. Assessment Task(s)

Your task


Is to use that dataset, any information you can find about it elsewhere, and the techniques taught in the module, to pose and answer three research questions of your choosing. You will then need to consider how you might store the (research-question-relevant) data in a database, how you might spread a very large version of that data over multiple computers, and what the privacy concerns are here and how you might address them.


Produce a structured analysis report using the given template. The structured report consists of seven sections, each containing specific questions, which you must answer.


In sections 3 and 4 of the report, which require you to use data analysis tools, you may use WEKA or Python tools. In section 3 you may also use Excel for visualisation.

NOTE: Failure to submit all the data files will result in a zero grade.


4. Deliverables

General submission criteria

  • The submission is a report and final data files. The report must be provided in a format which Canvas can display (i.e. PDF or MS-Word native format), and data files containing the final version of the data that you used for analysis (ARFF or CSV format).

  • You are expected to research your answers and to cite appropriate academic and/or other sources in an appropriate format (IEEE) for the type of report you have been asked to write. It is probably not sufficient to use only the module notes.

  • Each part has an indicated maximum word count for your answer. Any cover page and reference lists or bibliographies do not count towards these limits

  • Exceeding word counts will not be marked.

  • Your assessment submission should not include your examination number or any other personal identification information.



Solution Approach 

In this project, we utilized various methods and techniques to analyze a dataset regarding health and behavioral factors of a US population. The dataset, named brfss_for_bda_2021.csv, was explored using modern database models and technology. Let's delve into the specific methods and techniques employed:


Data Exploration and Visualization:


  • We began by importing the dataset using Pandas, a powerful data manipulation library in Python. This allowed us to efficiently handle and analyze the data.

  • Initial exploratory data analysis (EDA) involved examining the structure of the dataset, identifying columns, and understanding data types.

  • Visualization techniques, including bar plots and count plots, were utilized to gain insights into various attributes such as physical health, mental health, education levels, income, and dietary habits.


Research Question Formulation:

  • Three research questions were formulated to guide our analysis and investigation into the dataset:

  • Question 1: Are educated individuals more susceptible to mental health issues, and does regular exercise mitigate this risk?

  • Question 2: Does higher education correlate with higher income? How does education distribution vary between genders, and does it influence marital status and family size?

  • Question 3: What is the impact of dietary habits on physical health?

Data Analysis and Interpretation:


  • We applied statistical analysis and data mining techniques to answer the research questions effectively.

  • For instance, in addressing Question 1, we examined the relationship between education levels, physical activity, and mental health using visualization tools such as seaborn and matplotlib.

  • Similarly, we analyzed income distribution based on education levels, gender disparities in education, marital status, and the number of children in Question 2.

  • In Question 3, we investigated the association between food consumption patterns (e.g., fruits, vegetables) and physical health outcomes.


Output 







Through meticulous data exploration and visualization, we unearthed valuable insights into the interplay between various socio-economic factors and health outcomes. Our research questions led us on a path of discovery, uncovering correlations and trends that shed light on critical aspects of public health and well-being.


Moreover, our commitment to privacy considerations underscores our dedication to ethical data practices. As custodians of sensitive information, we recognize the importance of safeguarding privacy while extracting meaningful insights—a responsibility we take seriously and integrate seamlessly into our project methodologies.


If you require any assistance with the project discussed in this blog, or if you find yourself in need of similar support for other projects, please don't hesitate to reach out to us. Our team can be contacted at any time via email at contact@codersarts.com.

bottom of page