top of page

Visualizing covid-19 data, data analytics sample assignment

The Product Development

The data analytics department of the NHS has asked you to help them analyse COVID-19 data. More specifically, to analyse the geographic distribution of cases and discover clusters of high and low incidence. This information will help them to optimally distribute available resources.


The product to be developed is an interactive data clustering and visualisation system. The system should convert the raw data into a report that can be read by the NHS management and easily interpreted. The report should include visualisations of the data and/or summary statistics, which achieve the following objectives:

  • The report should include some statistics on the geographical distribution of the data.

  • There should be some visualisation of the data. The visualisation should include information on which areas (clusters) have a high and which areas have a low number of cases.

  • The system should use some clustering algorithm (more information below) to partition the data into k clusters. The number of clusters k is a parameter that the user can change. The report should interactively adapt when the value of k is changed.

The final reports should be presented in a suitable format for the NHS to be able to read easily and displayed on the screen. These are the rough requirements for the system, given by the NHS. The way you design your system to achieve these goals is up to you.


The clients would welcome any useful extras that you can think of in developing the system, provided they are feasible to be implemented to a high quality within the project's timescale. They will especially appreciate any features that aid them in their central objective, i.e. making informed analysis about the geographical distribution of COVID cases, but other features aiding a smoothly operating system, including good GUI interface design, will be appreciated too.


Data file


The data file is included in the ZIP file. Each line is an individual observation (patient) and contains x and y coordinates on a 2-D plane. You can think of these coordinates as GPS coordinates of homes of people that have been infected with COVID.


Clustering


Different clustering algorithms exist. The NHS suggests you use k-mean clustering. Please check the university library resources or the below links for more information on the method and implementation:


Hastie, Tibshirani & Friedman, The Elements of Statistical Learning, Springer. Freely available at https://web.stanford.edu/~hastie/ElemStatLearn/. Check page 460.

Note that you can use third party toolboxes for visualisation purposes. The clustering algorithm must be implemented by yourself.




bottom of page