  1. Overview

In this project, you are required to develop a complex scalable cloud computing solution, which should be informed by best practice in the domain and documented in the form of a conference- style report. You will also be required to provide a complete archive of the code you developed and to prepare a video presentation demonstrating your working solution.

  1. The data set

BGL is an open data set of logs collected from a BlueGene/L supercomputer at Lawrence Livermore National Labs. It is equipped with 131,072 processors and 32,768GB memory.

The log file can be downloaded from Zenodo1. A sample line from the log file is shown below.

- 1121707460 2005.07.18 R23-M1-N0-C:J05-U01 2005-07-18- R23-M1-N0-C:J05-U01 RAS

KERNEL INFO generating core.7663

This can be parsed as show in table 1 below.

Note that the first column may contain values other than the alert message flag.

  1. Tasks

For this project you are required to programmatically acquire, store, pre-process, and perform data computation tasks on the BGL data set using Spark frameworks and appropriate design patterns. The data computation tasks should provide answers to the questions listed below.

We have to

  1. How many fatal log entries in the month of September resulted from a "major internal error"?

  2. For each month, what is the average number of seconds during which EDRAM errors were detected and corrected?

  3. What are the top 5 most frequently occurring dates in the log?

  4. What are the top 5 most frequently occurring days of the week in the log?

  5. What are the top 5 most frequently occurring nodes in the log?

  6. What are the top 5 most frequently occurring hours in the log?

  7. Which node generated the largest number APPSEV events?

  8. Which node generated the smallest number of KERNRTSP events?

  9. Which node generated the largest number of APPBUSY events?

  10. Which node generated the smallest number of APPUNAV events?

  11. On which date was the latest fatal kernel error resulting in an rts panic?

Code Artefact

All code used on the project must be submitted as a single gz archive. The submitted code must be thoroughly commented.

The root directory of the archive should contain a plain text file name readme.txt that provides clear instructions to re-run your code to verify the results obtained.

Submission of the code artefact is mandatory. Non-submission will result in you being marked as not present (NP) for the entire project.

