By Niharika P, Mon 15 June 2020, in category Dash posts
Replication of the recommendation engines used by almost all e-commerce websites like Flipkart, Amazone etc. Creating a recommendation system that gives results for a certain user's preference and choice in fashion or clothing, The model must be easy to scale.
Online shopping websites always suggest items similar to the ones you have browsed and sometimes even items that can go with it. This notebook is a basic replication of the product recommendations using cosine image similarity. This topic falls majorly in the domain of Recommendation sytems, and I have a whole notebook dedicated to the topic. Check it out!
(Click on the below image)
Code blocks show the model that was used for the above UI.
First step, importing all necessary libraries.
from PIL import Image
import os
import matplotlib.pyplot as plt
import numpy as np
from keras.applications import vgg16
from keras.preprocessing.image import load_img,img_to_array
from keras.models import Model
from keras.applications.imagenet_utils import preprocess_input
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
Getting images and setting height and weight. Note that here VGG16 Transfer learning model is being used and the architecture of the model informs that the first input layer accepts input in the shape of (none, 224,224,3). Therefore, the shape of the images is 224,224.
imgs_path = "/kaggle/input/fashion-product-images-dataset/fashion-dataset/fashion-dataset/images/"
imgs_model_width, imgs_model_height = 224, 224
nb_closest_images = 5
files = [imgs_path + x for x in os.listdir(imgs_path) if "jpg" in x]
print("Total number of images:",len(files))
#For reducing compilation time of the algorithm, we reduce the data to 5000 images or the system crashes!
files=files[0:5000]
original = load_img(files[9], target_size=(imgs_model_width, imgs_model_height))
plt.imshow(original)
plt.show()
print("Image loaded successfully!")
# load the model
vgg_model = vgg16.VGG16(weights='imagenet')
# remove the last layers in order to get features instead of predictions
feat_extractor = Model(inputs=vgg_model.input, outputs=vgg_model.get_layer("fc2").output)
# print the layers of the CNN
feat_extractor.summary()
numpy_image = img_to_array(original)
# convert the image / images into batch format
# expand_dims will add an extra dimension to the data at a particular axis
# we want the input matrix to the network to be of the form (batchsize, height, width, channels)
# thus we add the extra dimension to the axis 0.
image_batch = np.expand_dims(numpy_image, axis=0)
print('Image Batch size', image_batch.shape)
# prepare the image for the VGG model
processed_image = preprocess_input(image_batch.copy())
img_features = feat_extractor.predict(processed_image)
print("Features successfully extracted for one image!")
print("Number of image features:",img_features.size)
img_features
importedImages = []
for f in files:
filename = f
original = load_img(filename, target_size=(224, 224))
numpy_image = img_to_array(original)
image_batch = np.expand_dims(numpy_image, axis=0)
importedImages.append(image_batch)
images = np.vstack(importedImages)
processed_imgs = preprocess_input(images.copy())
imgs_features = feat_extractor.predict(processed_imgs)
print("features successfully extracted!")
imgs_features.shape
cosSimilarities = cosine_similarity(imgs_features)
# store the results into a pandas dataframe
cos_similarities_df = pd.DataFrame(cosSimilarities, columns=files, index=files)
cos_similarities_df.head()
# function to retrieve the most similar products for a given one
def retrieve_most_similar_products(given_img):
print("-----------------------------------------------------------------------")
print("original product:")
original = load_img(given_img, target_size=(imgs_model_width, imgs_model_height))
plt.imshow(original)
plt.show()
print("-----------------------------------------------------------------------")
print("most similar products:")
closest_imgs = cos_similarities_df[given_img].sort_values(ascending=False)[1:nb_closest_images+1].index
closest_imgs_scores = cos_similarities_df[given_img].sort_values(ascending=False)[1:nb_closest_images+1]
for i in range(0,len(closest_imgs)):
original = load_img(closest_imgs[i], target_size=(imgs_model_width, imgs_model_height))
plt.imshow(original)
plt.show()
print("similarity score : ",closest_imgs_scores[i])
retrieve_most_similar_products(files[500])
retrieve_most_similar_products(files[50])
Thence, the algorithm is providing pretty good recommendations, even with a small dataset of 5000 images.
The model generates good recommendations when tested even with a small dataset of only 5000 images. The VGG16 transfer learning model does the feature extraction process and cosine similarities prove to be useful yet again. The only downside is the computing time, as it takes a lot of time to process the images and generate predictions.
Furthermore, the accuracy of the model can be further enhanced by first introducing a content based filtering layer before feature extraction which will not generate recommendations from the same category. This can be done as it was observed that sometimes objects from different categories had a high similarity score.
For the User Interface, cloud computing services can be used to host the model and thus allow interaction with the model more rather than just statically generating recommendations and storing them which worked for a small-scale project prototype.