Coursework Description:
The quakes.csv data file (available in the Earthquakes folder under the Files tab on Teams) provides the raw data for globally reported “earthquake activity” from midnight on September 1 st , 2023 to 9:30am on November 15 th , 2023 (the study timeframe). Note: the data shows the most-recent seismic event first.
The goal of this assignment is to perform an analysis on this data, identifying interesting insights which are highlighted through visualisations, which are then published to the Cloud.
Deliverables:
Read the raw data into a pandas dataframe, then perform any necessary cleanups. After the cleanup process concludes, the resulting dataframe has an appropriate data-type associated with each of its columns, unwanted data is removed, and any missing data in any of the columns no longer exists. No rows should be removed as a result of this cleanup process, and the cleaned dataframe needs to be as small as possible. Save the cleaned data as quakes-cleaned.csv.
In a second notebook, process the quakes-cleaned.csv data with pandas to identify five interesting insights from this data. One of your five insights needs to identify the geographical area with the most seismic activity during the study timeframe. Additionally, include Markdown to describe what your code is doing and what your insights are highlighting. Your submitted notebook should only contain the pandas code needed to showyour insights not all of your experimentations and workings (which you are to submit in another notebook).
In a fourth notebook, produce five individual plots from the data to visualise your five insights. Use Plotly to produce your visualisations(https://plotly.com/python/), and annotate your visualisations in your notebook with Markdown as appropriate.
Install Shiny for Python (https://shiny.posit.co/py/) onto your computer, then use this technology to produce an interactive web application to showcase your plots from Part 3. (Be sure to take the time to review the material in Shiny’s getting started guide) 1 .
Sign-up for a free account on Shiny’s cloud platform, then deploy your completed Shiny webapp to the cloud (https://shiny.posit.co/py/docs/deploy-cloud.html).
Marks Allocation:
The marks for this coursework are allocated as follows:
Up-to 5 marks for Part 1: dataframe cleanup.
Up-to 15 marks for Part 2: pandas analysis/insights.
Up-to 15 marks for Part 3: plots/visualisations.
Up-to 10 marks for Part 4: shiny webapp.
Up-to 5 marks for Part 5: publishing/deployment.
Types of assistance sought by students in M.Sc. Data Science programs. Here's a breakdown of these categories and how universities can address them:
1. Project Support:
Hands-on Learning: Data Science thrives on practical application. M.Sc. programs can offer:
Capstone Projects: Culminating projects where students apply their learned skills to solve a complex data science problem from scratch, often under faculty guidance.
Industry Partnerships: Collaborating with companies on real-world data science projects provides students with valuable experience working with authentic datasets and industry-relevant problems.
Data Science Competitions: Encouraging participation in data science hackathons or competitions allows students to test their skills, learn from others, and potentially gain industry recognition.
Technical Support: Universities can provide dedicated resources for students encountering challenges during projects:
Office Hours: Faculty members holding designated times for students to consult on project ideas, methodology, or technical hurdles.
Teaching Assistants (TAs): Graduate students or experienced undergraduates familiar with the curriculum can offer project-specific guidance and troubleshooting.
Data Science Labs: Providing access to specialized software, computing resources, and datasets can empower students to work on complex data science projects.
2. Coursework Help:
Understanding Foundational Concepts: Data Science integrates various disciplines. Universities can offer:
Supplemental Instruction (SI): Peer-led study sessions facilitated by students who have excelled in the course material.
Review Sessions: Dedicated sessions led by professors or TAs to revisit challenging concepts and answer student questions.
Online Resources: Providing access to online tutorials, video lectures, and practice problems can help students solidify their understanding outside of class time.
Developing Programming Skills: Programming is a cornerstone of Data Science. M.Sc. programs can provide support, such as:
Bootcamps or Workshops: Introductory or advanced programming bootcamps or workshops tailored to data science applications can equip students with essential coding skills.
Coding Help Desks: Dedicated staff or TAs can offer support with specific programming challenges students encounter in coursework.
3. Live Training:
Interactive Learning: Live training offers a dynamic learning experience. Universities can provide:
Guest Lectures: Inviting industry professionals to share real-world data science applications and best practices.
Workshops & Seminars: Live workshops or seminars focusing on specific data science tools, techniques, or emerging trends in the field.
Data Science Clubs or Events: Student-run clubs or university-organized events can host live training sessions or invite data science practitioners for interactive discussions.
Flexibility: Balancing coursework with busy schedules is essential. Universities can offer:
Hybrid or Online Courses: Providing a mix of in-person and online lectures allows students to learn at their own pace and revisit recorded lectures for clarification.
Office Hours with Flexible Scheduling: Faculty or TA office hours can be offered in the evenings or online to cater to diverse student schedules.
By offering a comprehensive support system encompassing project guidance, coursework help, and engaging live training, M.Sc. in Data Science programs can empower students to navigate the challenges of the program and emerge as well-equipped data science professionals.
Comments