Collaborative Filtering: Building Recommender Systems from User BehaviorΒΆ
Netflix-style recommendations start with collaborative filtering β the idea that users who agreed in the past will agree in the future. This notebook covers user-based CF, item-based CF, matrix factorization with SVD, and ALS.
1. Setup β Synthetic User-Item Ratings MatrixΒΆ
We create a sparse matrix of 100 users Γ 200 items with ratings from 0β5 (0 = not rated).
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)
N_USERS = 100
N_ITEMS = 200
DENSITY = 0.05 # 5% of ratings are observed
# Generate sparse ratings
mask = np.random.rand(N_USERS, N_ITEMS) < DENSITY
ratings_raw = np.random.randint(1, 6, size=(N_USERS, N_ITEMS)).astype(float)
ratings_raw[~mask] = 0.0
user_ids = [f'user_{i:03d}' for i in range(N_USERS)]
item_ids = [f'item_{j:03d}' for j in range(N_ITEMS)]
ratings_df = pd.DataFrame(ratings_raw, index=user_ids, columns=item_ids)
print(f'Matrix shape : {ratings_df.shape}')
print(f'Observed ratings: {mask.sum()} / {N_USERS * N_ITEMS} ({mask.mean()*100:.1f}%)')
ratings_df.iloc[:5, :8]
2. User-Based Collaborative FilteringΒΆ
Idea: Find users similar to the target user, then predict a rating as a weighted average of their ratings.
Cosine similarity between two user vectors (ignoring unrated items): $\(\text{sim}(u, v) = \frac{\mathbf{r}_u \cdot \mathbf{r}_v}{\|\mathbf{r}_u\| \|\mathbf{r}_v\|}\)$
from sklearn.metrics.pairwise import cosine_similarity
R = ratings_df.values.copy() # shape (N_USERS, N_ITEMS)
# User-user cosine similarity
user_sim = cosine_similarity(R) # (100, 100)
np.fill_diagonal(user_sim, 0) # exclude self-similarity
user_sim_df = pd.DataFrame(user_sim, index=user_ids, columns=user_ids)
def user_based_predict(ratings, user_sim_matrix, user_idx, item_idx, k=10):
"""Predict rating for user_idx on item_idx using top-k similar users."""
sims = user_sim_matrix[user_idx] # similarities to all other users
# Only consider users who have rated the item
rated_mask = ratings[:, item_idx] > 0
if rated_mask.sum() == 0:
return 0.0
sims_rated = sims * rated_mask.astype(float)
top_k_idx = np.argsort(sims_rated)[::-1][:k]
top_k_sims = sims_rated[top_k_idx]
top_k_rats = ratings[top_k_idx, item_idx]
denom = np.abs(top_k_sims).sum()
if denom == 0:
return 0.0
return (top_k_sims * top_k_rats).sum() / denom
# Example prediction
pred = user_based_predict(R, user_sim, user_idx=0, item_idx=5, k=10)
print(f'User-based predicted rating for user_000 on item_005: {pred:.3f}')
print(f'Actual rating (0=unrated): {R[0, 5]:.1f}')
3. Item-Based Collaborative FilteringΒΆ
Idea: Instead of finding similar users, find items similar to what the user has already rated. This is more scalable β item similarities are stable over time and can be precomputed.
Amazonβs original recommendation patent was item-based CF.
# Item-item cosine similarity (transpose: items are rows)
item_sim = cosine_similarity(R.T) # (200, 200)
np.fill_diagonal(item_sim, 0)
def item_based_predict(ratings, item_sim_matrix, user_idx, item_idx, k=10):
"""Predict rating for user_idx on item_idx using top-k similar items the user rated."""
sims = item_sim_matrix[item_idx] # similarity of item_idx to all items
user_rated_mask = ratings[user_idx, :] > 0 # items the user has rated
if user_rated_mask.sum() == 0:
return 0.0
sims_rated = sims * user_rated_mask.astype(float)
top_k_idx = np.argsort(sims_rated)[::-1][:k]
top_k_sims = sims_rated[top_k_idx]
top_k_rats = ratings[user_idx, top_k_idx]
denom = np.abs(top_k_sims).sum()
if denom == 0:
return 0.0
return (top_k_sims * top_k_rats).sum() / denom
pred_item = item_based_predict(R, item_sim, user_idx=0, item_idx=5, k=10)
print(f'Item-based predicted rating for user_000 on item_005: {pred_item:.3f}')
# Top 5 items similar to item_005
top5 = np.argsort(item_sim[5])[::-1][:5]
print(f'Items most similar to item_005: {[item_ids[i] for i in top5]}')
4. Matrix Factorization with SVDΒΆ
Idea: Decompose the ratings matrix \(R \approx U \Sigma V^T\) where \(U\) = user factors, \(V\) = item factors. Missing ratings are predicted by the reconstruction.
We use Truncated SVD (only keep top-\(k\) singular values) to get compact latent factors.
from sklearn.decomposition import TruncatedSVD
# Fill missing ratings with row mean before SVD
R_filled = R.copy()
row_means = np.true_divide(R.sum(1), (R != 0).sum(1), where=(R != 0).sum(1) != 0)
for i in range(N_USERS):
R_filled[i, R[i] == 0] = row_means[i] if row_means[i] > 0 else 3.0
K = 20 # latent factors
svd = TruncatedSVD(n_components=K, random_state=42)
U_k = svd.fit_transform(R_filled) # (100, 20)
V_k = svd.components_ # (20, 200)
Sigma_k = np.diag(svd.singular_values_) # (20, 20)
R_reconstructed = U_k @ V_k # (100, 200) β predicted ratings
# Clip to valid rating range
R_pred_svd = np.clip(R_reconstructed, 1, 5)
print(f'Explained variance ratio (top {K} components): {svd.explained_variance_ratio_.sum():.3f}')
print(f'SVD predicted rating for user_000 on item_005: {R_pred_svd[0, 5]:.3f}')
# Show latent factor shape
print(f'User factor matrix U_k: {U_k.shape} β each user is a {K}-dim vector')
print(f'Item factor matrix V_k: {V_k.T.shape} β each item is a {K}-dim vector')
5. ALS β Alternating Least SquaresΒΆ
Idea: Factorize \(R \approx P Q^T\) by alternately fixing \(Q\) and solving for \(P\) in closed form, then fixing \(P\) and solving for \(Q\). Each step is a simple ridge regression.
ALS handles missing data better than vanilla SVD because it only tries to reconstruct observed ratings.
where \(I\) is the set of items rated by user \(u\).
def als(R, K=10, n_epochs=20, lam=0.1, verbose=True):
"""
Minimal ALS implementation.
R : (n_users, n_items) rating matrix (0 = unobserved)
K : number of latent factors
n_epochs: number of ALS iterations
lam : regularization strength
"""
n_users, n_items = R.shape
observed = R > 0
# Random initialization
rng = np.random.default_rng(42)
P = rng.normal(0, 0.1, (n_users, K)) # user factors
Q = rng.normal(0, 0.1, (n_items, K)) # item factors
reg = lam * np.eye(K)
for epoch in range(n_epochs):
# Fix Q, solve for P
for u in range(n_users):
I_u = np.where(observed[u])[0]
if len(I_u) == 0:
continue
Q_u = Q[I_u] # (|I_u|, K)
r_u = R[u, I_u] # (|I_u|,)
P[u] = np.linalg.solve(Q_u.T @ Q_u + reg, Q_u.T @ r_u)
# Fix P, solve for Q
for i in range(n_items):
U_i = np.where(observed[:, i])[0]
if len(U_i) == 0:
continue
P_i = P[U_i] # (|U_i|, K)
r_i = R[U_i, i] # (|U_i|,)
Q[i] = np.linalg.solve(P_i.T @ P_i + reg, P_i.T @ r_i)
# RMSE on observed ratings
R_hat = P @ Q.T
obs_idx = observed
rmse = np.sqrt(((R[obs_idx] - R_hat[obs_idx]) ** 2).mean())
if verbose and (epoch + 1) % 5 == 0:
print(f' Epoch {epoch+1:3d} RMSE (observed): {rmse:.4f}')
return P, Q
print('Running ALS (K=10, 20 epochs)...')
P, Q = als(R, K=10, n_epochs=20, lam=0.1)
R_pred_als = np.clip(P @ Q.T, 1, 5)
print(f'ALS predicted rating for user_000 on item_005: {R_pred_als[0, 5]:.3f}')
6. EvaluationΒΆ
Train/test split for recommenders: randomly hide some observed ratings, treat them as the test set.
Metrics:
RMSE β how accurate are predicted ratings?
Precision@K β of the top-K recommendations, how many are relevant?
Recall@K β of all relevant items, how many are in the top-K?
NDCG@K β normalized discounted cumulative gain, rewards putting relevant items higher
# --- Train / Test Split ---
observed_indices = list(zip(*np.where(R > 0)))
rng = np.random.default_rng(0)
rng.shuffle(observed_indices)
n_test = int(0.2 * len(observed_indices))
test_idx = observed_indices[:n_test]
train_idx = observed_indices[n_test:]
R_train = np.zeros_like(R)
for u, i in train_idx:
R_train[u, i] = R[u, i]
# Retrain ALS on train split
P_tr, Q_tr = als(R_train, K=10, n_epochs=20, lam=0.1, verbose=False)
R_hat_tr = np.clip(P_tr @ Q_tr.T, 1, 5)
# --- RMSE ---
test_u = [u for u, i in test_idx]
test_i = [i for u, i in test_idx]
rmse = np.sqrt(np.mean((R[test_u, test_i] - R_hat_tr[test_u, test_i]) ** 2))
print(f'Test RMSE: {rmse:.4f}')
# --- Precision@K, Recall@K, NDCG@K ---
def precision_recall_ndcg_at_k(R_true, R_pred, k=10, threshold=4.0):
precisions, recalls, ndcgs = [], [], []
for u in range(R_true.shape[0]):
relevant = set(np.where(R_true[u] >= threshold)[0])
if not relevant:
continue
top_k = np.argsort(R_pred[u])[::-1][:k]
hits = [1 if i in relevant else 0 for i in top_k]
precisions.append(sum(hits) / k)
recalls.append(sum(hits) / len(relevant))
# NDCG
dcg = sum(h / np.log2(rank + 2) for rank, h in enumerate(hits))
idcg = sum(1.0 / np.log2(rank + 2) for rank in range(min(len(relevant), k)))
ndcgs.append(dcg / idcg if idcg > 0 else 0)
return np.mean(precisions), np.mean(recalls), np.mean(ndcgs)
p, r, ndcg = precision_recall_ndcg_at_k(R, R_hat_tr, k=10, threshold=4.0)
print(f'Precision@10 : {p:.4f}')
print(f'Recall@10 : {r:.4f}')
print(f'NDCG@10 : {ndcg:.4f}')
7. The Cold Start ProblemΒΆ
Collaborative filtering breaks down in two scenarios:
Scenario |
Problem |
Solutions |
|---|---|---|
New user |
No ratings yet β canβt compute similarity |
Ask for preferences onboarding, use demographic/contextual features, default to popularity |
New item |
No ratings yet β never recommended |
Content-based fallback, use item metadata for initial recommendations |
Long tail |
Obscure items rarely rated |
Hybrid models, content features for rare items |
# Simulate cold start: new user with 0, 1, 3, 5 ratings
def cold_start_demo(R, item_sim, n_seed_ratings=3, k=10):
"""
For a new user with only n_seed_ratings, show item-based CF recommendations.
Content-based would be preferable at very low counts.
"""
# Pick a random existing user as ground truth
rng = np.random.default_rng(7)
rated_items = np.where(R[0] > 0)[0]
seed_items = rng.choice(rated_items, size=min(n_seed_ratings, len(rated_items)), replace=False)
new_user_ratings = np.zeros(N_ITEMS)
new_user_ratings[seed_items] = R[0, seed_items]
# Item-based score: sum of similarities to rated items, weighted by rating
scores = item_sim @ new_user_ratings
scores[seed_items] = -np.inf # don't recommend already-rated items
top_k = np.argsort(scores)[::-1][:k]
return top_k
for n_seed in [0, 1, 3, 5]:
recs = cold_start_demo(R, item_sim, n_seed_ratings=n_seed)
print(f'Seed ratings={n_seed:1d} β top-3 recs: {[item_ids[i] for i in recs[:3]]}')
print('\nAt 0 seed ratings β fall back to global popularity ranking:')
popularity = (R > 0).sum(axis=0) # number of users who rated each item
popular_items = np.argsort(popularity)[::-1][:5]
print(f' Most popular items: {[item_ids[i] for i in popular_items]}')
8. Cheat Sheet + ExercisesΒΆ
Collaborative Filtering Cheat SheetΒΆ
Method |
Similarity |
Prediction |
Scalability |
Cold Start |
|---|---|---|---|---|
User-based CF |
cosine / pearson on user vectors |
weighted avg of similar usersβ ratings |
O(UΒ²I) β slow |
Bad |
Item-based CF |
cosine on item vectors |
weighted avg of similar items user rated |
O(IΒ²U) β precomputable |
Moderate |
SVD |
β |
low-rank matrix reconstruction |
O(UKI) at inference |
Bad |
ALS |
β |
latent factor dot product |
scalable, parallelizable |
Bad |
# Core patterns
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import TruncatedSVD
# User sim
user_sim = cosine_similarity(R) # R: (users, items)
# Item sim
item_sim = cosine_similarity(R.T) # R.T: (items, users)
# SVD
svd = TruncatedSVD(n_components=20)
U_k = svd.fit_transform(R_filled)
R_hat = U_k @ svd.components_
ExercisesΒΆ
Mean-centered CF: Subtract each userβs mean rating before computing cosine similarity (this accounts for users who rate everything high or low). Compare RMSE with and without mean-centering.
Pearson correlation: Replace cosine similarity with Pearson correlation for user-based CF. When does Pearson outperform cosine?
ALS with bias terms: Extend the ALS implementation to include user bias \(b_u\) and item bias \(b_i\), so the prediction is \(\hat{r}_{ui} = \mu + b_u + b_i + p_u^T q_i\). Retrain and compare RMSE.
Implicit feedback ALS: Modify ALS to handle binary implicit feedback (clicked=1, not clicked=0) using the confidence weighting \(c_{ui} = 1 + \alpha \cdot r_{ui}\).
Coverage and serendipity: Compute catalog coverage (% of items ever recommended in top-10 across all users) and serendipity (% of recommendations that are not obviously similar to a userβs history). Compare user-based vs item-based CF.