top of page

Master Thesis Project I Sample Project

You are given the attached CSV file for the purpose of quantifying the behavior of

Drosophila Larvae using unsupervised machine learning. The data is taken from an

experiment, where multiple larvae are allowed to freely explore some space, guided by

an odor. Each row of the dataset captures some behavioral parameters of one individual.

The column frame describes the real (physical) time of the experiment, the column id

describes the id of the individual. This follows with some behavioral parameters for this

larvae id and this physical time step. Example:

frame id parameterA


Notable parametersthat we want to focus on are the spinepoint_x_n and spinepoint_y_n

columns, which depict (x,y) coordinates fitted to the spine of each larvae. Plotting the

spine points creates the following picture:


Now carry out the following analysis of the dataset. Document everything in a python

file or an ipython notebook that you can later share with us.


1. The spine point data containssome NaN (not a number) valuesin the spinepoint_x_n

and spinepoint_y_n columns, that are marked as nan in the CSV data. Carry

out linear interpolation for any NaN values, using the values before and after the

NaN rows. This can be done with the interp command from the numpy pack-

age.


2. Now, let us fit a polynomial function with a degree of 4 to the (x,y) spine point

data. This can be done with the polyfit function from numpy. Let us then

compute the residual error e from the fit x ̃ to the datapoints x, using the RMSE

formula


3. Now, compute a polynomial fit with the degree of 8. How doesthe RMSE change?


4. We now want to perform clustering of the absolute value of the polynomial co-

efficients obtained from the first fit (4th degree). Let us for this purpose use the

KMeans functionsfrom the scikit.cluster package. Carry out the k-means

fittings for a cluster size k = [2, 4, 6, . . . , 30]. Using the inertia property for

each clustering assignment, create a Elbow Plot that suggests the optimal k for

clustering of the coefficients.


5. Now, repeat this using the 8th degree polynomial fit. What is the optimal kbased

on the Elbow method for this feature set?


6. Pick one choice of k from the 4th degree polynomial fit, based on the Elbow

method. Create a visualization (histogram) showing how many total data rows

have been assigned to each cluster. Compare to the same k from the 8th degree

fit. Bonus: visualize the average spine configuration obtained for each cluster.


7. Now, compute the average spine_length variable for each cluster. What do

you observe?


8. Dividing the dataset according to the area variable in two labels (label A=area

below a value of 450, label B=area above a value of 450). Create a classifier using

your favorite method available in the scikit package that classifies into labels

A and B. Create a random train and test data split with 10 and 90 percent of your

data, respectively. First, carry out the classification based only on your regression

coefficients (4th degree or 8th degree) and then, based on all the data available in

the data frame. What kind of 10-fold cross-validation accuracy do you get for your

method for both cases?


9. Imagine, you would want to design an artificial neural network (ANN) that needs

to classify the spine point data to some training labels. You can imagine the train-

ing labels to be cluster assignment obtained from your k-mean assignments. What

kind of network architecture would you suggest for this kind of classification task?

Sketch a network diagram that you could come up with to do this classification.


10. Create a model function that would initialize your designed ANN model in the

pytorch or tensorflow development API. You can think of a class

NeuralNet (nn.Module) initialization that creates the building blocks of the

ANN and assigned the connections between



Codersarts is a top-rated website for students which is looking for online Programming Assignment Help, Homework Help, Coursework Help in C, C++, Java, Python, Database, Data structure, Algorithms, Final year project, Android, Web, C sharp, ASP NET to students at all levels whether it is school, college and university level Coursework Help or Real-time project. Hire us and Get your projects done by a computer science expert.
Contact Codersarts for any such project/assignment help




bottom of page