top of page

Applied Machine Learning - sample assignment

Updated: Sep 27, 2021

Are you looking for Digit Recognizer assignment solution on MNIST dataset? We have machine learning experts to provide help and complete solution on machine learning project, assignment, coursework and self-learning project at reasonable price. for more details check it out machine learning assignment help page

Digit Recognizer Assignment Requirment

  1. Join the Digit Recognizer competition on Kaggle. Download the training and test data. The competition page describes how these files are formatted.

  2. Write a function to display an MNIST digit. Display one of each digit.

  3. Examine the prior probability of the classes in the training data. Is it uniform across the digits? Display a normalized histogram of digit counts. Is it even?

  4. Pick one example of each digit from your training data .Then, for each sample digit, compute and show the best match (nearest neighbor) between your chosen sample and the rest of the training data. Use L2 distance between the two images’ pixel values as the metric. This probably won’t be perfect, so add an asterisk next to the erroneous examples (if any).

  5. Consider the case of binary comparison between the digits 0 and 1. Ignoring all the other digits, compute the pairwise distances for all genuine matches and all impostor matches, again using the L2 norm. Plot histograms of the genuine and impostor distances on the same set of axes.

  6. Generate an ROC curve from the above sets of distances. What is the equal error rate? What is the error rate of a classifier that simply guesses randomly?

  7. Implement a K-NN classifier. (You cannot use external libraries for this question; it should be your own implementation.)

  8. Randomly split the training data into two halves. Train your k-NN classifier on the first half of the data, and test it on the second half, reporting your average accuracy.

  9. Generate a confusion matrix (of size 10 × 10) from your results. Which digits are particularly tricky to classify?

  10. Train your classifier with all of the training data, and test your classifier with the test data. Submit your results to Kaggle.


The Titanic Disaster

  1. Join the Titanic: Machine Learning From Disaster competition on Kaggle. Download the training and test data.

  2. Using logistic regression, try to predict whether a passenger survived the disaster. You can choose the features (or combinations of features) you would like to use or ignore, provided you justify your reasoning

  3. Train your classifier using all of the training data, and test it using the testing data. Submit your results to Kaggle.


WRITTEN EXERCISES

  1. Variance of a difference. Show that the variance of a difference is var [X −Y ] = var [X]+ var [Y ]− 2cov[X,Y ], where cov [X,Y ] is the covariance between random variables X and Y .

  2. Bayes rule for quality control. You’re the foreman at a factory making ten million widgets per year. As a quality control step before shipment, you create a detector that tests for defective widgets before sending them to customers. The test is uniformly 95% accurate, meaning that the probability of testing positive given that the widget is defective is 0.95, as is the probability of testing negative given that the widget is not defective. Further, only one in 100,000 widgets is actually defective.

    1. Suppose the test shows that a widget is defective. What are the chances that it’s actually defective given the test result?

    2. If we throw out all widgets that test as defective, how many good widgets are thrown away per year? How many bad widgets are still shipped to customers each year?

  3. In k-nearest neighbors, the classification is achieved by plurality vote in the vicinity of data. Suppose our training data comprises n data points with two classes, each comprising exactly half of the training data, with some overlap between the two classes.

    1. Describe what happens to the average 0-1 prediction error on the training data when the neighbor count k varies from n to 1. (In this case, the prediction for training data point xi includes (xi , yi) as part of the example training data used by kNN.)

    2. We randomly choose half of the data to be removed from the training data, train on the remaining half, and test on the held-out half. Predict and explain with a sketch how the average 0-1 prediction error on the held-out validation set might change when k varies? Explain your reasoning.

    3. In kNN, once k is determined, all of the k-nearest neighbors are weighted equally in deciding the class label. This may be inappropriate when k is large. Suggest a modification to the algorithm that avoids this caveat.

    4. Give two reasons why kNN may be undesirable when the input dimension is high.

    5. The plot below shows a binary classification task with two classes: red and blue. Training data points are shown as triangles, and test data points are shown as squares, and all data points are colored according to their class. The line in the plot is the decision rule from training a logistic regression classifier on the training data, with points above the line classified as red and points below the line classified as blue.

      1. Calculate confusion matrices for both the training and test data for the logistic regression classifier.

      2. Now, calculate confusion matrices for both the training and test data for the kNN algorithm with k = 1 on the same data, using the same train/test split (you should be able to do this by hand).

      3. Discuss the difference in accuracy on both the training and test data between the two algorithms. Explain why this difference occurs.


Codersarts is a top-rated website that offers, Applied Machine learning assignment help, Project Help, Coursework Help. Hire us and Get your projects done by a Applied Machine Learning expert.
Contact Codersarts for any such project/assignment help

bottom of page