top of page

Visual Search Using CNN

Updated: Jul 13, 2021


Visual Search Using CNN
Visual Search Using CNN

In this blog we will see what Visual search is ? How To Work It ? and then Build the Visual search model Using Convolutional neural network (CNN).


What is Visual Search?


Visual search aims at searching for images by visual features to provide users with relevant image lists. Visual search is a new way of searching technique products on a website using images. Users will upload images on website search as opposed to the traditional way of typing. When a user takes a picture of an object and uploads it on the search bar, the software identifies the object within the picture and provides information and search results to the user. This technology is very useful for ecommerce stores to find the product using images.


Sometimes we are unable to describe the product very well. That is the reason we can not find the right item Using the text search engine. So that time this technique is very useful to search for the items.


We need a model that works with images to Build the Visual search engine. CNN is the most popular deep learning algorithm for work related to image data sets.


What is transfer learning and why do we need to implement it to build the model ?


Transfer learning is one of the methods of training machine learning algorithms. It has a special feature train for one task and we can use the knowledge for another related task.


The question is: Why is transfer learning used? Let's say we have a thousand images that's not good enough to train the convolutional neural network model. What we can do For building the Visual search Model using CNN we use an existing pretrained model ResNet-50. It is a convolutional neural network that is 50 layers deep. We will load a pre-trained version of the network. It is trained on more than a million images from the ImageNet database. The pretrained network can classify images into 1000 object categories, such as car, lamp, pen, and many animals. This is a very good example of transfer learning.



How CNN is work in Visual search


First CNN is used to extract the features from the images. CNN image classifiers convert the image to low dimensional feature vectors representing the "features" learned by the network. Trained CNN model can be used by removing last high-level layers that were used to classify the objects and using a dissected model to convert the input image into feature vectors.


 input image into feature vectors
input image into feature vectors

In these feature vectors contain redundant and noise information. To make the image retrieval process and filtering most important information efficient, the data is further compressed based on the principal component analysis.


principal component analysis
principal component analysis

Now here we get the noise free feature vector. In the second step visual research is using a similarity matrix to compare the images. To compare the images we used one of the most popular is the ‘cosine similar’ matrix. This method measures the angle between images in high dimensional feature space. People cannot visualize 256 dimensional space, so a simple two-dimensional plane is shown here for demonstration purposes. It can be observed that for the two similar images, angle ϕA is less.



two-dimensional plane
two-dimensional plane

Now let's see how to build the Visual Search Model Using CNN and transfer learning.


For building the Visual search model we take an image dataset flickr8k which is available on kaggle. This dataset contains 8000 images.

Step 1. First import all the important libraries needed to build the model.


Code snippet :


from keras.preprocessing import image
import numpy as np
from keras.models import Model, load_model
from keras.applications.resnet50 import ResNet50, preprocess_input
import os
from os import listdir 
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import pickle

Step 2. In this step we are Downloading existing pretrained model ResNet-50 convolutional neural network that is 50 layers and loading a pre-trained version of the network.


Code snippet :


model = ResNet50(weights='imagenet',input_shape=(224,224,3))
model.summary()

Summary of pretrained model RESNET50



RESNET50
RESNET50

Step 3. Here we are Extracting features from pretrained models and image dataset and store it in dictionary image id as key and vector as value.


Code Snippet :



n_model = Model(model.input,model.layers[-2].output)
image_features = dict()
import time
start_time = time.time()
for img in images_path:
    img1 = image.load_img(Folder_path +'/'+ img, target_size=(224, 224, 3))
    y = image.img_to_array(img1)
    y = np.expand_dims(y, axis=0)
    y = preprocess_input(y)
 
    fea_x = n_model.predict(y)
    fea_x1 = np.reshape(fea_x , fea_x.shape[1])
    image_features[img] = fea_x1


Using this code we can save the data for future use.


Code snippet :



loc = '/content/drive/My Drive/Colab Notebooks/Codersarts_Project/data/img_feature_extract_rn50.pkl'
pickle.dump(img_features,open(loc,"wb"))



Step 4. Now our model is ready to predict the similar images. Here we use the image vector to search the similarity of the image. We will use cosine distance which tells us two vectors are how much related to each other on the scale of 0-1. For that we import the python package cosine_similarity. Compare the two images by their cosine value.


Code snippet :


from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(img_features[names[1]].reshape(1,-1),img_features[names[8]].reshape(1,-1))

We compared two images, whose index number is 1 and 8. We can see the 0.29 is cosine value. Which indicates that these two images are not related to each other.


Output :


Now here We provide one input image and display all images related to the input image. We provide 0.65 as the minimum cosine value and see what images we get.


Code Snippet :



for i in range(len(names)):
  cos = cosine_similarity(img_features[names[1162]].reshape(1,-1),img_features[names[i]].reshape(1,-1))
  if cos > 0.65:
    img = mpimg.imread('/content/flickr_data/Flickr_Data/Images/'+names[i])
    img_plot = plt.imshow(img)
    plt.show()


Output :


This is input image


two-dimensional plane
two-dimensional plane

We get the related image



two-dimensional plane
two-dimensional plane


Conclusion


Visual similarity search techniques are still evolving, comparing the cosine similarity of two images have proven effective in a variety of scenarios.



Thank You

bottom of page