top of page

The Power of GPU Acceleration for ML models with cuML

Updated: Oct 30, 2023




Introduction

In the world of machine learning, speed can be a game-changer. Faster training times mean more iterations, better models, and increased productivity. While libraries like scikit-learn have long been the go-to choice for traditional CPU-based machine learning, A less know library which is promising significant speed improvements by leveraging the power of GPUs (Graphics Processing Units) for Machine Learning models is called cuML.


In this blog, we will explore cuML, its features, and compare it with scikit-learn. We'll create an artificial dataset, sufficiently large to showcase the dramatic training speed difference between these two libraries.


What is cuML?

A Brief Overview

cuML is an open-source machine learning library built by NVIDIA, designed specifically to harness GPU acceleration for various machine learning tasks. It's an integral part of the RAPIDS suite of open-source libraries, which focuses on accelerating data science pipelines end-to-end.

cuML provides GPU-accelerated implementations of a wide range of machine learning algorithms, including:

  • Linear and Logistic Regression

  • k-Nearest Neighbors (k-NN)

  • Principal Component Analysis (PCA)

  • Random Forests

  • Support Vector Machines (SVM)

  • And many more!

Why GPUs?

Unlike CPUs, GPUs are designed to handle parallel processing efficiently. This makes them exceptionally well-suited for machine learning tasks, where many operations can be parallelized. cuML taps into this parallelism to perform computations significantly faster than CPU-based libraries like scikit-learn. If you want to learn more about how to use sklearn library I would recommend taking this course Getting Started with Scikit-Learn: A Beginner's Guide to ML.


Setting Up the Experiment

Before diving into the comparison, let's set up our environment and create a suitable dataset for the task. For easy setup we are using the Google Colab with a GPU which is free to use subject to availability. Now Google Colab comes with a set of preinstalled libraries such as sklearn etc... so we don't have to install it, but we will have to install the cuML library into the environment. To install run the below commands

# This get the RAPIDS-Colab install files and test check your GPU.  Run # this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not  # RAPIDS compatible, it will warn you and give you remediation steps.

!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py

Once installed we will import the required libraries.

First we import the cuML library and check the version

import cuml
cuml.__version__
## Ouput - 23.10.00

Then we import the other libraries

import sklearn
import numpy as np
import time
import warnings

# Filter warnings
warnings.filterwarnings('ignore')

Now we prepare an artificial large dataset so that we can compare the time it takes for training all the model.

# Create an artificial dataset with a million samples
n_samples = 10**6
n_features = 50

X = np.random.rand(n_samples, n_features)
y = np.random.randint(0, 2, n_samples)

Now let us begin the comparison between the cuML and Sklearn ML models. For the sake the keeping it small we are only comparing some of the most used models in ML such as Linear Regression, PCA, K-Means, Random Forest and Support Vector Machine.

Linear Regression

Implement the cuML Linear Regression for this if you are familiar with sklearn library you can write it easily as cuML library, model and other features are deigned to replicate similar structure to sklearn library.

cuML:

from cuml.linear_model import LinearRegression

# Create a cuML Linear Regression model
cu_linear_reg = LinearRegression()

# Start measuring time
start_time = time.time()

# Fit the model
cu_linear_reg.fit(X, y)

# End measuring time
cu_time = time.time() - start_time

print(f"cuML Linear Regression Training Time: {cu_time} seconds")
## output - cuML Random Forest Training Time: 11.54811429977417 seconds

Sklearn:

from sklearn.linear_model import LinearRegression

# Create an sklearn Linear Regression model
sk_linear_reg = LinearRegression()

# Start measuring time
start_time = time.time()

# Fit the model
sk_linear_reg.fit(X, y)

# End measuring time
sk_time = time.time() - start_time

print(f"scikit-learn Linear Regression Training Time: {sk_time} seconds")
## output - scikit-learn Linear Regression Training Time: 4.479902744293213 seconds

As you can see the for linear regression for this dataset both the models are able to perform similarly because Linear regression is a relatively simple machine learning algorithm, and both cuML and scikit-learn are highly optimized for such tasks. The specific operations involved in linear regression, such as matrix inversion and matrix multiplication, can be efficiently parallelized and executed on GPUs, which benefits cuML. However, for linear regression, the additional parallelization capabilities of GPUs may not provide a significant advantage over CPU-based implementations. As a result, cuML and scikit-learn demonstrate similar performance in terms of training time. Also the size of dataset was not large enough really utilize speed benefit from the parallelization


k-Nearest Neighbors (k-NN)

Now we implement the KNN for both cuML and Sklearn

cuML:

from cuml.neighbors import KNeighborsClassifier

# Create a cuML k-NN classifier
cu_knn = KNeighborsClassifier(n_neighbors=5)

# Start measuring time
start_time = time.time()

# Fit the model
cu_knn.fit(X, y)

# End measuring time
cu_time = time.time() - start_time

print(f"cuML k-NN Training Time: {cu_time} seconds")
## output - cuML k-NN Training Time: 0.3580195903778076 seconds

sklearn:

from sklearn.neighbors import KNeighborsClassifier

# Create an sklearn k-NN classifier
sk_knn = KNeighborsClassifier(n_neighbors=5)

# Start measuring time
start_time = time.time()

# Fit the model
sk_knn.fit(X, y)

# End measuring time
sk_time = time.time() - start_time

print(f"scikit-learn k-NN Training Time: {sk_time} seconds")
## output - scikit-learn k-NN Training Time: 0.10276365280151367 seconds

Similar to the regression model there is not much advantage of using GPU for KNN as the data is not large enough the reason for the slightly higher time for the GPU is because of the transfer of data from RAM to VRAM and back.


Principal Component Analysis (PCA)

cuML:

from cuml.decomposition import PCA

# Create a cuML PCA model
cu_pca = PCA(n_components=10)

# Start measuring time
start_time = time.time()

# Fit the model
cu_pca.fit(X)

# End measuring time
cu_time = time.time() - start_time

print(f"cuML PCA Training Time: {cu_time} seconds")
## output - cuML PCA Training Time: 0.5399038791656494 seconds

sklearn:

from sklearn.decomposition import PCA

# Create an sklearn PCA model
sk_pca = PCA(n_components=10)

# Start measuring time
start_time = time.time()

# Fit the model
sk_pca.fit(X)

# End measuring time
sk_time = time.time() - start_time

print(f"scikit-learn PCA Training Time: {sk_time} seconds")
## output - scikit-learn PCA Training Time: 6.028165102005005 seconds

Now we see the real difference when executing much complex models. The cuML PCA is is able to execute it 10 times faster than the sklearn model.


Random Forests

cuML:

from cuml.ensemble import RandomForestClassifier

# Create a cuML Random Forest classifier
cu_rf = RandomForestClassifier(n_estimators=100)

# Start measuring time
start_time = time.time()

# Fit the model
cu_rf.fit(X, y)

# End measuring time
cu_time = time.time() - start_time

print(f"cuML Random Forest Training Time: {cu_time} seconds")
## output - cuML Random Forest Training Time: 11.54811429977417 seconds

sklearn:

from sklearn.ensemble import RandomForestClassifier

# Create an sklearn Random Forest classifier
sk_rf = RandomForestClassifier(n_estimators=100)

# Start measuring time
start_time = time.time()

# Since it will take a lot of time for the model to train on the entire dataset 
# We will just use the 10% of the data and exxtrapolate to 100%
# Fit the model
sk_rf.fit(X[:int(n_samples/10)], y[:int(n_samples/10)])

# End measuring time
sk_time = time.time() - start_time

print(f"scikit-learn Random Forest Training Time: {sk_time*10} seconds")
## output - scikit-learn Random Forest Training Time: 2124.7641587257385 seconds

As you can see there is significant difference training time for Random Forest of cuML and sklearn. It is almost 190x faster. which is significant when we are dealing with large datasets. This could be also prove useful during real time usage of the models when there is a large usage we can combine and give to a single model with GPU access without worrying about the inference time there by saving cost and time.


Support Vector Machines (SVM)

cuML:

from cuml.svm import SVC

# Create a cuML SVM classifier
cu_svm = SVC()

# Start measuring time
start_time = time.time()

# Fit the model
cu_svm.fit(X, y)

# End measuring time
cu_time = time.time() - start_time

print(f"cuML SVM Training Time: {cu_time} seconds")
## output - cuML SVM Training Time: 2024.2833387851715 seconds

sklearn:

from sklearn.svm import SVC

# Create an sklearn SVM classifier
sk_svm = SVC()

# Start measuring time
start_time = time.time()

# Since it will take a lot of time for the model to train on the entire dataset 
# We will just use the 10% of the data and exxtrapolate to 100%
# Fit the model
sk_svm.fit(X[:int(n_samples/10)], y[:int(n_samples/10)])

# End measuring time
sk_time = time.time() - start_time
print(f"scikit-learn SVM Training Time: {sk_time*10} seconds")
## output - scikit-learn SVM Training Time: 11496.5017080307 seconds

As you can see the difference in time for training is significant it is almost x6 faster than the sklearn model.


Conclusion

In this extensive comparison between cuML and scikit-learn, we've demonstrated how cuML, with its GPU acceleration, can significantly outperform scikit-learn in terms of training speed. While scikit-learn remains a trusted choice for CPU-based machine learning, cuML shines when it comes to large datasets and complex models.

It's important to note that cuML's performance gains can be especially prominent with even larger datasets and more computationally intensive models. For data scientists and machine learning engineers looking to optimize their workflows and speed up model development, cuML is a library worth exploring.


As we move forward in the world of machine learning, GPU acceleration will likely become an increasingly valuable tool, and cuML is leading the way in harnessing this power. Whether you're training deep learning models, conducting extensive hyperparameter tuning, or working with big data, cuML can be a game-changer in your toolkit.


So, the next time you need to train a machine learning model, consider cuML for a faster, more efficient experience.

If you need any help in implementing project in cuML or Sci-kit Learn library feel free to contact us.

bottom of page