User-based Collaborative Filtering Using Agglomerative Clustering On Recommendation System

User-Based Collaborative Filtering

Content

Overview

Recommendation Systems are automated systems that suggest relevant users or items to users based on similarities in their behaviors. In this article, I will walk you through working and implementing a User-based Collaborative Filtering Recommendation System using Python. So let’s get started.

What is User-Based Collaborative Filtering

User to User Collaborative Filtering is a memory-based widely used technique in Recommendation systems to recommend people of similar interests to each other. For example, on any social media app, a user may display his interests to be sports and movies, and there may be other users having precisely the same interests i.e. sports and movies, so those users are more likely to be recommended to the first user based on similarity of interests. 

Working

Now let’s understand how the User-Based Collaborative Filtering Algorithm works.

User based collaborative filtering

As you can see, 

User 1 likes Marvel and DC

User 2 likes Starwars

User 3 likes Marvel and Starwars

So the recommendation would work like this,

User 1: User 3

User 2: User 3

User 3: User 1 and User 2

Step by Step implementation

Now let’s begin python implementation of the above-discussed approach.

Data Preprocessing

Firstly, explore the dataset.

data set before processing

Now apply NLP preprocessing techniques like lowercasing, removing stop words, lemmatization, etc to preprocess user interests.

import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re

def text_lower(data):
  return data.lower()

def stopword(data):
    nltk.download('stopwords')
    clean = []
    for i in data:
        if i not in stopwords.words('english'):
            clean.append(i)
    return clean

def lemmatization(data):
    nltk.download('wordnet')
    lemma = WordNetLemmatizer()
    lemmas = []
    for i in data:
        lem = lemma.lemmatize(i, pos='v')
        lemmas.append(lem)
    return lemmas  

def remove_characters(data)
    data = re.sub(r'[^a-zA-Z0-9, ]', ' ', data)

and the results would look like this

data set after processing

Word Embeddings

In this step, we vectorize our preprocessed text for the next step.

from gensim.models import KeyedVectors
import numpy as np

model_path="/files/glove-wiki-gigaword-50.gz"
model_wiki = KeyedVectors.load_word2vec_format(model_path) #choose from multiple models https://github.com/RaRe-Technologies/gensim-data

def get_vector(data):
    return np.sum(np.array([model_wiki[data]]), axis=0)

Its results would look like this.

embedded data set

Clustering

Now, we will insert user embeddings to our clustering algorithm and it will return clusters

from sklearn.cluster import AgglomerativeClustering
aglo = AgglomerativeClustering(n_clusters=None, affinity='cosine', linkage='complete',distance_threshold=0.8)
agg_cluster=aglo.fit_predict(result_array)
print(agg_cluster)

data set with clusters

Cosine Similarity

Cosine Similarity can be used as an alternative to clustering when computing distance between users to specify when ones are closer or similar to each other based on their features. Its results can be seen as a similarity matrix or CSR matrice. It helps better in setting priorities. 

from sklearn.metrics import pairwise_distances


def get_cosine_similarity_score_with_prev_users(df):
    try:
        # print(df)
        # '''Computing Cosine Similarity of User Vector with Prevoius Users'''
        values_array=df.values
        # print(values_array)
        dist_out = 1-pairwise_distances(values_array, metric="cosine")
        similarity_with_user = pd.DataFrame(dist_out,index=df.index)
        similarity_with_user.columns=df.index
        similarity_with_user=similarity_with_user.round(decimals = 4)
        
        return True,similarity_with_user
    
    except Exception as e:
        print(e)
        return False,0

  

Result

cosine similarities

Using one of the above two approaches, you will be able to implement a recommendation system from scratch.

Final Words

It’s one of the valuable automation techniques readily used in every other high-traffic app be it online stores, media apps, social media, music apps, or games. Recommendations are being used everywhere, improving the app’s traffic and user experience. 

In media apps, it is used for personalized recommendations of what to watch next, in social media apps, it tells who might be your next best friend, in e-commerce stores, it suggests what will suit you better, in the music app, it will pick your next favorite song for you, and so on. There are countless use cases of where and how recommendation systems can be integrated into a business to elevate it. 

So, in this article, you have learned why you use a recommendation and how easy it is to implement and integrate it with your existing app only using the app data.

Fizahat Sheikh

Fizahat Sheikh

AI Research Engineer

Share

Suggested Articles