top of page

Data Warehousing and Big Data Assignment 03: Big Data Analytics

BUS5WB - Data Warehousing and Big Data

Assignment 03: Big Data Analytics

Marks: 30%

Assignment Type: Individual

Release Date: Thursday 3rd May 2018

Due Date: Sunday 3rd June 2018


The third assignment focuses on Big Data analytics on unstructured text data using Microsoft Azure. You are required to derive insights by applying big data distributed processing and machine learning techniques.



What you are required to do


1. HDInsight to aggregate reviews

Develop an aggregate of these reviews using your knowledge of Hadoop and MapReduce in Microsoft HDInsight.

  • a) Follow the same approach as the Big Data analytics workshop (using wordcount method in HDInsight) to determine the contributory words for each level of rating and sentiment category.

  • b) Present the workflow of using HDInsight (you may use screen captures) along with a summary o findings for each level of rating and sentiment category. MapReduce documentation for HDInsight is available here.


IMPORTANT: STUDENT ACCOUNTS HAVE LIMITED AZURE CREDITS. YOU MUST CREATE AND DECOMMISSION (DELETE) THE HDINSIGHT CLUSTERS EACH TIME YOU ATTEMPT THE ASSIGNMENT. IF YOU ARE PLANNING TO WORK ON THE ASSIGNMENT ACROSS MULTIPLE DAYS, REMEMBER TO DELETE AND RECREATE EACH TIME.



2. Azure Machine Learning for sentiment analysis

Using Azure ML Studio to cluster user reviews based on sentiment score. For text clustering, you should use the ‘review’ field. In Filter based feature selection module, use ‘sentiment’ field in order to cluster reviews based on sentiment score. Download the cluster outputs into a csv file to interpret the results and derive insights. You will need to calibrate algorithmic parameters by using different Number of Centroids and Distance Metric to derive meaningful clusters. Exclude sentiment, rating or postid as selected columns to train the clustering model. Use only the preprocessed hashing features.



Provide the following,

a) A screen capture of the completed model diagram.

b) Details of parameters used for 1) feature hashing module, 2) filter based feature selection module and 3) K-Means clustering module

c) Details of the approach you chose for clustering and interpretation of clusters.


3. Findings

Summarise your findings from 1) and 2), on user rating, hotel rating and sentiment towards accommodation options in Vietnam. Consider the challenges you faced in conducting Big Data analytics on a real-life text dataset.





bigdata assign3
.pdf
Download PDF • 94KB


 

🚀 Codersarts Support for Your Assignments!

Embarking on assignments involving HDInsight and Azure Machine Learning can be a thrilling yet challenging journey. Here's how Codersarts can guide and support you every step of the way:


1. HDInsight for Aggregated Reviews:

  • 🧠 Expert Guidance: Connect with our experienced tutors for personalized guidance on HDInsight, Hadoop, and MapReduce. Get insights tailored to your specific assignment requirements.

  • 📚 Workshop Access: Explore our Big Data analytics workshop resources to reinforce your understanding of the word count method and HDInsight workflow.

  • 💻 Code Assistance: Stuck on coding challenges? Our experts can provide code reviews, troubleshooting, and optimization to ensure your HDInsight clusters perform seamlessly.

  • 🌐 Documentation Support: If you need clarification on MapReduce documentation or encounter roadblocks, our team is here to provide concise explanations and tips.


2. Azure Machine Learning for Sentiment Analysis:

  • 🤖 Azure ML Studio Walkthrough: Confused about navigating Azure ML Studio? Receive step-by-step guidance and screen capture assistance to streamline your sentiment analysis clustering.

  • 📊 Parameter Calibration: Unsure about adjusting algorithmic parameters? Our experts can provide insights into choosing the optimal Number of Centroids and Distance Metric for meaningful clusters.

  • 🚀 Model Training Assistance: Need help training your clustering model effectively? Codersarts offers support in excluding specific columns and ensuring a robust model.


General Assistance:

  • 📧 24/7 Query Resolution: Have questions at any hour? Drop us a message, and our team will provide prompt assistance, ensuring you stay on track with your assignments.

  • 🌍 Community Support: Connect with peers facing similar challenges through our community forums. Share insights, seek advice, and collaborate for a richer learning experience.


Ready to elevate your assignment experience? Codersarts is your ally, offering the expertise and support you need to conquer HDInsight, Azure Machine Learning, and beyond. Let's turn your assignments into opportunities for growth! 🌐🚀



Contact us for solution, assistance and live project training

bottom of page