Tezeract-preloader Tezeract-preloader

User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System

User-Based Collaborative Filtering
Content

Overview

Recommendation Systems are automated systems that suggest relevant users or items to users based on similarities in their behaviors. In this article, I will walk you through working and implementing a User to User Collaborative Filtering Recommendation System using Python. So let’s get started.

What is User-Based Collaborative Filtering

User-based Collaborative Filtering is a memory-based widely used technique in Recommendation systems to recommend people of similar interests to each other. For example, on any social media app, a user may display his interests to be sports and movies, and there may be other users having precisely the same interests i.e. sports and movies, so those users are more likely to be recommended to the first user based on their similar taste and interest. 

How User-Based Collaborative Filtering Works

Now let’s understand how User to User Collaborative Filtering Algorithm works.

User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System Tezeract

As you can see, 

User 1 likes Marvel and DC

User 2 likes Starwars

User 3 likes Marvel and Starwars

So the recommendation would work like this,

User 1: User 3

User 2: User 3

User 3: User 1 and User 2

Step by Step implementation

Now let’s begin python implementation of the above-discussed approach.

Data Preprocessing

Firstly, explore the dataset.

User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System Tezeract

Now apply NLP preprocessing techniques like lowercasing, removing stop words, lemmatization, etc to preprocess user interests.

import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re
def text_lower(data):
return data.lower()
def stopword(data):
nltk.download(‘stopwords’)
clean = []
for i in data:
if i not in stopwords.words(‘english’):
clean.append(i)
return clean
def lemmatization(data):
nltk.download(‘wordnet’)
lemma = WordNetLemmatizer()
lemmas = []
for i in data:
lem = lemma.lemmatize(i, pos=’v’)
lemmas.append(lem)
return lemmas
def remove_characters(data)
data = re.sub(r'[^a-zA-Z0-9, ]’, ‘ ‘, data)

and the results would look like this

User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System Tezeract

Word Embeddings

In this step, we vectorize our preprocessed text for the next step.

from gensim.models import KeyedVectors
import numpy as np
model_path=”/files/glove-wiki-gigaword-50.gz”
model_wiki = KeyedVectors.load_word2vec_format(model_path) #choose from multiple models https://github.com/RaRe-Technologies/gensim-data
def get_vector(data):
return np.sum(np.array([model_wiki[data]]), axis=0)

Its results would look like this.

User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System Tezeract

Clustering

Now, we will insert user embeddings to our clustering algorithm and it will return clusters

from sklearn.cluster import AgglomerativeClustering
aglo = AgglomerativeClustering(n_clusters=None, affinity=’cosine’, linkage=’complete’,distance_threshold=0.8)
agg_cluster=aglo.fit_predict(result_array)
print(agg_cluster)
User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System Tezeract

Cosine Similarity

Cosine Similarity can be used as an alternative to clustering when computing distance between users to specify when ones are closer or similar to each other based on their features. Its results can be seen as a similarity matrix or CSR matrice. It helps better in setting priorities. 

from sklearn.metrics import pairwise_distances
def get_cosine_similarity_score_with_prev_users(df):
try:
# print(df)
# ”’Computing Cosine Similarity of User Vector with Prevoius Users”’
values_array=df.values
# print(values_array)
dist_out = 1-pairwise_distances(values_array, metric=”cosine”)
similarity_with_user = pd.DataFrame(dist_out,index=df.index)
similarity_with_user.columns=df.index
similarity_with_user=similarity_with_user.round(decimals = 4)
return True,similarity_with_user except Exception as e: print(e) return False,0

  

Result

User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System Tezeract

Using one of the above two approaches, you will be able to implement a recommendation system from scratch.

Final Words

It’s one of the valuable automation techniques readily used in every other high-traffic app be it online stores, media apps, social media, music apps, or games. Recommendation systems are being used everywhere, improving the app’s traffic and user experience. 

In AI based recommendation systems, it is used for personalized recommendations of what to watch next, in social media apps, it tells who might be your next best friend, in e-commerce stores, it suggests what will suit you better, in the music app, it will pick your next favorite song for you, and so on. There are countless use cases of where and how recommendation systems can be integrated into a business to elevate it. 

So, in this article, you have learned why you use a recommendation and how easy it is to implement and integrate it with your existing app only using the app data.

Fizahat Sheikh

Fizahat Sheikh

AI Research Engineer
Share

Suggested Articles