top of page

Big Data Analytics Project Help

Updated: Feb 4, 2022

Module Learning Outcomes

  1. Create a data set using modern database models and technology

  2. Manipulate a data set to extract statistics and features,

  3. Critically evaluate and apply data mining techniques/tools to build a classifier or regression model, and predict values for new examples

  4. Analyse and communicate issues with scaling up to large data sets, and use appropriate techniques to scale up the computation

  5. Critically discuss the need for privacy, identify privacy risks in releasing information, and design techniques to mediate these risks.

This assessment will contribute to all the learning outcomes for this module.


Assessment Background/Scenario

Data

You will find a dataset called brfss_for_bda_2021.csv which describes a survey around health and associated behavioural factors carried out on a population in the US. It’s in CSV format.


Assessment Task(s)

Your task

Is to use that dataset, any information you can find about it elsewhere, and the techniques taught in the module, to pose and answer three research questions of your choosing. You will then need to consider how you might store the (research-question-relevant) data in a database, how you might spread a very large version of that data over multiple computers, and what the privacy concerns are here and how you might address them.

Produce a structured analysis report using the given template. The structured report consists of seven sections, each containing specific questions, which you must answer.

In sections 3 and 4 of the report, which require you to use data analysis tools, you may use WEKA or Python tools. In section 3 you may also use Excel for visualisation.


NOTE: Failure to submit all the data files will result in a zero grade.


Deliverables

General submission criteria

  • The submission is a report and final data files. The report must be provided in a format which Canvas can display (i.e. PDF or MS-Word native format), and data files containing the final version of the data that you used for analysis (ARFF or CSV format).

  • You are expected to research your answers and to cite appropriate academic and/or other sources in an appropriate format (IEEE) for the type of report you have been asked to write. It is probably not sufficient to use only the module notes

  • Each part has an indicated maximum word countfor your answer. Any cover page and reference lists or bibliographies do not count towards these limits

  • Exceeding word counts will not be marked.

  • Your assessment submission should not include your examination number or any other personal identification information.


bottom of page