Problem Statement¶

Replication of the recommendation engines used by almost all e-commerce websites like Flipkart, Amazone etc. Creating a recommendation system that gives results for a certain user's preference and choice in fashion or clothing, The model must be easy to scale.

Image similarity for product recommendation.¶

Online shopping websites always suggest items similar to the ones you have browsed and sometimes even items that can go with it. This notebook is a basic replication of the product recommendations using cosine image similarity. This topic falls majorly in the domain of Recommendation sytems, and I have a whole notebook dedicated to the topic. Check it out!

Dash User Interface¶

Dash is basically a platform by the renowned Plotly library,which many data scientists use as a User Interface.
Dash uses Flask as the base framework and creates a web application where data scientists and analysts can display visualizations, text or any such asset. Dash is completely in Python so you do not have to worry about learning JavaScript. It is the nearest equivalent of the R Shiny platform.
Here is the static application for product recommendations that uses the vgg16 model to get recommendations.

(Click on the below image)

The Model¶

Code blocks show the model that was used for the above UI.

First step, importing all necessary libraries.

In [2]:

from PIL import Image
import os
import matplotlib.pyplot as plt
import numpy as np

from keras.applications import vgg16
from keras.preprocessing.image import load_img,img_to_array
from keras.models import Model
from keras.applications.imagenet_utils import preprocess_input


from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

Using TensorFlow backend.

Getting images and setting height and weight. Note that here VGG16 Transfer learning model is being used and the architecture of the model informs that the first input layer accepts input in the shape of (none, 224,224,3). Therefore, the shape of the images is 224,224.

In [3]:

imgs_path = "/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/"
imgs_model_width, imgs_model_height = 224, 224
nb_closest_images = 5

In [4]:

files = [imgs_path + x for x in os.listdir(imgs_path) if "jpg" in x]

print("Total number of images:",len(files))

Total number of images: 44441

In [5]:

#For reducing compilation time of the algorithm, we reduce the data to 5000 images or the system crashes!
files=files[0:5000]

Testing feature extraction with one image¶

In [6]:

original = load_img(files[9], target_size=(imgs_model_width, imgs_model_height))
plt.imshow(original)
plt.show()
print("Image loaded successfully!")

Image loaded successfully!

In [7]:

# load the model
vgg_model = vgg16.VGG16(weights='imagenet')

# remove the last layers in order to get features instead of predictions
feat_extractor = Model(inputs=vgg_model.input, outputs=vgg_model.get_layer("fc2").output)

# print the layers of the CNN
feat_extractor.summary()

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
553467904/553467096 [==============================] - 12s 0us/step
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
=================================================================
Total params: 134,260,544
Trainable params: 134,260,544
Non-trainable params: 0
_________________________________________________________________

In [8]:

numpy_image = img_to_array(original)
# convert the image / images into batch format
# expand_dims will add an extra dimension to the data at a particular axis
# we want the input matrix to the network to be of the form (batchsize, height, width, channels)
# thus we add the extra dimension to the axis 0.
image_batch = np.expand_dims(numpy_image, axis=0)
print('Image Batch size', image_batch.shape)

# prepare the image for the VGG model
processed_image = preprocess_input(image_batch.copy())

Image Batch size (1, 224, 224, 3)

In [9]:

img_features = feat_extractor.predict(processed_image)

print("Features successfully extracted for one image!")
print("Number of image features:",img_features.size)
img_features

Features successfully extracted for one image!
Number of image features: 4096

Out[9]:

array([[0.        , 0.18001345, 0.        , ..., 0.        , 0.        ,
        0.        ]], dtype=float32)

Because the algorithm works for one image, it should work for a batch of images, let us try!¶

In [10]:

importedImages = []

for f in files:
    filename = f
    original = load_img(filename, target_size=(224, 224))
    numpy_image = img_to_array(original)
    image_batch = np.expand_dims(numpy_image, axis=0)
    
    importedImages.append(image_batch)
    
images = np.vstack(importedImages)

processed_imgs = preprocess_input(images.copy())

In [11]:

imgs_features = feat_extractor.predict(processed_imgs)

print("features successfully extracted!")
imgs_features.shape

features successfully extracted!

Out[11]:

(5000, 4096)

In [12]:

cosSimilarities = cosine_similarity(imgs_features)

# store the results into a pandas dataframe

cos_similarities_df = pd.DataFrame(cosSimilarities, columns=files, index=files)
cos_similarities_df.head()

Out[12]:

	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/2345.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/5378.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/31619.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/34990.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/11242.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/3465.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/23865.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/19021.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/46940.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/3776.jpg	...	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/54871.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/19224.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/36468.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/2803.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/41877.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/32522.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/36467.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/17053.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/29253.jpg	/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/21641.jpg
/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/2345.jpg	1.000000	0.204977	0.585260	0.312118	0.408100	0.285802	0.274543	0.223055	0.204414	0.648110	...	0.151253	0.303807	0.261906	0.190400	0.230360	0.123544	0.210362	0.201421	0.260960	0.303139
/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/5378.jpg	0.204977	1.000000	0.228953	0.228382	0.275176	0.142490	0.670793	0.438715	0.159374	0.275107	...	0.202645	0.228097	0.730946	0.562728	0.480461	0.201947	0.710547	0.488952	0.757134	0.205612
/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/31619.jpg	0.585260	0.228953	1.000000	0.435020	0.632642	0.261308	0.261863	0.258934	0.194720	0.623769	...	0.170648	0.308728	0.277397	0.211464	0.267535	0.185306	0.232079	0.213372	0.272558	0.386056
/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/34990.jpg	0.312118	0.228382	0.435020	1.000000	0.483909	0.251764	0.338585	0.387454	0.220444	0.531069	...	0.219016	0.382719	0.333394	0.325201	0.350382	0.238369	0.298651	0.308487	0.326403	0.417100
/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/11242.jpg	0.408100	0.275176	0.632642	0.483909	1.000000	0.266306	0.351150	0.292643	0.187557	0.525309	...	0.209816	0.335407	0.317884	0.242726	0.297972	0.206684	0.280443	0.240455	0.305509	0.464387

5 rows × 5000 columns

In [13]:

# function to retrieve the most similar products for a given one

def retrieve_most_similar_products(given_img):

    print("-----------------------------------------------------------------------")
    print("original product:")

    original = load_img(given_img, target_size=(imgs_model_width, imgs_model_height))
    plt.imshow(original)
    plt.show()

    print("-----------------------------------------------------------------------")
    print("most similar products:")

    closest_imgs = cos_similarities_df[given_img].sort_values(ascending=False)[1:nb_closest_images+1].index
    closest_imgs_scores = cos_similarities_df[given_img].sort_values(ascending=False)[1:nb_closest_images+1]

    for i in range(0,len(closest_imgs)):
        original = load_img(closest_imgs[i], target_size=(imgs_model_width, imgs_model_height))
        plt.imshow(original)
        plt.show()
        print("similarity score : ",closest_imgs_scores[i])

In [14]:

retrieve_most_similar_products(files[500])

-----------------------------------------------------------------------
original product:

-----------------------------------------------------------------------
most similar products:

similarity score :  0.776605

similarity score :  0.76376414

similarity score :  0.75407016

similarity score :  0.7436727

similarity score :  0.7436568

In [15]:

retrieve_most_similar_products(files[50])

-----------------------------------------------------------------------
original product:

-----------------------------------------------------------------------
most similar products:

similarity score :  0.75039256

similarity score :  0.74331385

similarity score :  0.72623146

similarity score :  0.7239178

similarity score :  0.7215332

Thence, the algorithm is providing pretty good recommendations, even with a small dataset of 5000 images.

Conclusion and Takeaways:¶

The model generates good recommendations when tested even with a small dataset of only 5000 images. The VGG16 transfer learning model does the feature extraction process and cosine similarities prove to be useful yet again. The only downside is the computing time, as it takes a lot of time to process the images and generate predictions.

Furthermore, the accuracy of the model can be further enhanced by first introducing a content based filtering layer before feature extraction which will not generate recommendations from the same category. This can be done as it was observed that sometimes objects from different categories had a high similarity score.

For the User Interface, cloud computing services can be used to host the model and thus allow interaction with the model more rather than just statically generating recommendations and storing them which worked for a small-scale project prototype.

Product Recommendations 🎈 | Dash UI 🎲