Binary Classification Using Linear_SVC and Implementing PCA | Sample Assignment

Part A: Linear Support Vector Machine (SVM)

Dataset: You will use the Iris dataset for binary classification. Use petal length and petal width features of the Iris dataset, and determine whether a sample is Iris Virginica or not.

URL: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load iris.html

1. Implement a Linear SVC model class for performing binary classification. The model should implement the batch Gradient Descent (GD) algorithm.


_init__(self, C=1, max_iter=100, tol=None, learning_rate='constant', learning_rate_init=0.001, t_0=1, t_1=1000, early_stopping=False, validation fraction=0.1, **kwargs)

This method is used to initialize the data members of the class when an object of class is created. For example, self.C = C


C: float

It provides the regularization/penalty coefficient.

max iter : int

Maximum number of iterations. The GD algorithm iterates until

convergence (determined by “tol') or this number of iterations.

tol : float

Tolerance for the optimization.

learning_rate : string (default ‘constant')

It allows to specify the technique to set the learning rate: constant learning

for all iterations, or varying learning rate given by a learning rate schedule


-‘constant’: a constant learning rate given by 'learning_rate_init'.

- 'adaptive': gradually decreases the learning rate based on a schedule.

Write a function that would be used if learning_rate is set to 'adaptive'.

When 'adaptive' is used, the ‘learning_rate_init parameter has no effect as

the learning rate varies by a learning rate schedule function. It uses the

‘t_0' and `t_l' parameters (see below).

Pseudocode for the "adaptive" learning rate function:

Write a function that decreases learning rate gradually during each iteration:

where t 0 and t 1 are two constants that you need to determine empirically.

Choose constant t_0 and t_1 such that initially the learning rate is large enough.

learning_rate_init : double

The initial learning rate value if learning_rate is set to constant'. It controls the

step-size in updating the weights. It has no effect is the ‘learning_rate’ is


early_stopping : Boolean, default=False

Whether to use early stopping to terminate training when validation score is not

improving. If set to True, it will automatically set aside a fraction of training data

as validation and terminate training when validation score is not improving.

validation fraction : float, default=0.1

The proportion of training data to set aside as validation set for early stopping.

Must be between 0 and 1. Only used if early_stopping is True.

b) fit(self, X, Y):

Implement the batch GD algorithm in the fit method. The weight vector and the intercept/bias should be denoted by w and b, respectively. Store the cost values for each iteration so that later you can use it to create a learning curve.

Arguments: X: ndarray

A numpy array with rows representing data samples and columns representing features.

Note: the “fit” method should update the following parameters:

self.intercept_ = np.array([b]) self.coef_ = np.array([w]) self.support_vectors_ =

The "fit” method should display the total number of iterations using a print statement.



predict(self, X)


X: ndarray

A numpy array containing samples to be used for prediction. Its rows

represent data samples and columns represent features.


1D array of predicted class labels for each row in X.

Note: the “predict” method uses the self.coef_[0] and self.intercept_[0] to make predictions.

Binary Classification using Linear_SVC Classifier

2. Read the Iris data using the sklearn.datasets.load_iris method. Create the data matrix X by using two features: petal length and petal width

3. Partition the data into train and test set (80% - 20%). Use the "Partition” function from your previous assignment.

4. Model selection via Hyper-parameter tuning: Use the kFold function from previous assignment to find the optimal values for the following hyperparameters.

learning rate init (when constant' learning rate is used)

5. Train the model using optimal values for the hyperparameters and evaluate on the test data. Report the test accuracy and test confusion matrix.

6. Plot the learning curve.

7. Plot the decision boundary and show the support vectors using the

“decision_boundary_support_vectors” function given in:


Note : that if your test accuracy is less than 95% you will lose 10% of the total obtained points. If your test accuracy is less than 90% you will lose 30% of the total obtained points.

8. [Extra Credit for 478 and Mandatory for 878] Implement early stopping in

the "fit” method of the Linear_SVC model. You will have to use the following two parameters of the model: early_stopping and validation_fraction. Also note that when training the model using early stopping it should generate an early stopping curve.

[10 pts]

Part B: Principle Component Analysis

You will perform dimensionality reduction on a grayscale image (posted on Canvas) using PCA. The PCA will be implemented using the eigendecomposition technique. More specifically, you will find the top k eigenvectors (i.e., principle components) of a pixel matrix (a gray scale image). Then, using the top k eigenvector matrix, you will project the pixel matrix on its principle components. This will reduce the dimension of the pixel matrix without losing much variance.

9. Using the matplotlib.pyplot“imread" function read the image as a 2D matrix. Denote it with “X”. Show the image using matplotlib.pyplot imshow function. If the image is RGB, then you need to convert it into a grayscale image, as follows (use matplotlib.pyplot “gray” function).

X = imread("image_path")[:,:,0]

10. Implement the steps of eigendecomposition based PCA on X: (a) mean center the data matrix X, (b) compute the covariance matrix from it, (c) find eigenvalues and eigenvectors of the covariance matrix (you may use the numpy.linalg.eig function)

11. Then, find the top k eigenvectors (sort eigenvalue-eigenvector pairs from high to low, and get the top k eigenvectors), and create an eigenvector matrix using top k eigenvectors (each eigenvector should be a column vector in the matrix, so there should be k columns).

12. Finally project the mean centered data on the k top eigenvectors (it should be a dot product between mean centered X and the top k eigenvector matrix).

13. Reconstruct the data matrix by taking dot product between the projected data (from last step) and the transpose of the top k eigenvector matrix.

14. Compute the reconstruction error between the mean centered data matrix X and reconstructed data matrix (you may use the sklearn.metrics.mean squared error function)

15. Perform steps 11 – 14 for the following values of k: 10, 30, 50, 100, 500. For each k, show the reconstructed image (use the matplotlib.pyplot imshow function with the reconstructed data matrix for each k). With each reconstructed image print the value of k and the reconstruction error.

