Recommender Systems

📹 Related Topic (covered in various lectures)

Predicting User Preferences at Scale

What are Recommender Systems?

Goal: Predict items a user will like based on past behavior

Applications:

  • E-commerce: Amazon product recommendations

  • Streaming: Netflix movie suggestions, Spotify playlists

  • Social Media: Facebook friend suggestions, YouTube videos

  • News: Personalized article feeds

Key Challenge: Most users have rated/interacted with only a tiny fraction of items

Two Main Approaches

1. Content-Based Filtering

Idea: Recommend items similar to what user liked before

How it works:

  • Extract features from items (genre, director, keywords)

  • Build user profile based on features of liked items

  • Recommend items matching user profile

Example - Movies:

  • User liked action movies with Tom Cruise

  • Recommend other Tom Cruise action movies

Advantages:

  • No need for other users’ data

  • Can recommend new/unpopular items

  • Can provide explanations

Disadvantages:

  • Limited to features you can extract

  • Can’t learn from other users

  • Tends to recommend similar items (filter bubble)

2. Collaborative Filtering

Idea: Use wisdom of the crowd - “Users who liked this also liked…”

User-User Collaborative Filtering:

  • Find users similar to you

  • Recommend items they liked

Item-Item Collaborative Filtering:

  • Find items similar to ones you liked

  • Based on what other users rated similarly

Matrix Factorization:

The Problem:

User-Item Rating Matrix (sparse):
        Movie1  Movie2  Movie3  Movie4
User1     5       ?       ?       3
User2     ?       4       5       ?
User3     3       ?       ?       ?
User4     ?       5       ?       4

The Solution: Factor into two low-rank matrices:

R  U × V^T

where:

  • U: User feature matrix (users × k factors)

  • V: Item feature matrix (items × k factors)

  • k: Number of latent factors (much smaller than # users or items)

Latent Factors:

  • Not explicitly defined

  • Learned automatically

  • Might capture: genre, era, style, etc.

Cost Function for Matrix Factorization

Minimize:

J(U,V) = Σ_{(i,j):r_ij exists} (u_i^T v_j - r_ij)² + λ(||U||² + ||V||²)

Optimization:

  • Alternating Least Squares (ALS)

  • Stochastic Gradient Descent

Advanced Techniques

1. Deep Learning for Recommendations

  • Neural Collaborative Filtering

  • Autoencoders for collaborative filtering

  • Two-tower models

2. Hybrid Systems

  • Combine content-based + collaborative filtering

  • Netflix Prize winning solution used ensemble of 100+ models

3. Context-Aware Recommendations

  • Time of day

  • Device type

  • Current location

  • Social context

Evaluation Metrics

Explicit Feedback (ratings):

  • RMSE (Root Mean Squared Error)

  • MAE (Mean Absolute Error)

Implicit Feedback (clicks, views):

  • Precision@K

  • Recall@K

  • MAP (Mean Average Precision)

  • NDCG (Normalized Discounted Cumulative Gain)

Business Metrics:

  • Click-through rate

  • Conversion rate

  • User engagement time

Challenges

Cold Start Problem:

  • New users: No history to base recommendations on

  • New items: No ratings yet

  • Solutions: Hybrid approaches, ask for initial preferences

Scalability:

  • Millions of users × millions of items

  • Real-time recommendations

  • Solutions: Approximate methods, distributed computing

Filter Bubble:

  • Only recommend similar items

  • Reduce serendipity and discovery

  • Solutions: Exploration vs exploitation, diversity metrics

Real-World Considerations

A/B Testing:

  • Test recommendation algorithms live

  • Measure impact on business metrics

Diversity vs Accuracy:

  • Most accurate ≠ most useful

  • Need variety in recommendations

Explainability:

  • “Because you watched X”

  • Builds trust, helps users discover

Fairness:

  • Avoid amplifying biases

  • Equal exposure for all items

  • Fair treatment of all users

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error
import pandas as pd

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
np.random.seed(42)

print("Libraries loaded!")

13.1 Create Movie Rating Dataset

What: Generating a synthetic user-movie rating matrix

We create a simulated recommendation problem by generating a sparse rating matrix from known latent factors. Each user \(u\) has a latent preference vector \(p_u \in \mathbb{R}^k\) and each movie \(m\) has a latent feature vector \(q_m \in \mathbb{R}^k\); the true rating is \(r_{um} = p_u^T q_m\). We then observe only 30% of entries (simulating the sparsity of real rating data) while retaining the full matrix as ground truth for evaluation.

Why: Understanding the data structure that recommender systems exploit

In real systems like Netflix or Amazon, the user-item interaction matrix is extremely sparse – a typical user rates fewer than 1% of available items. The fundamental assumption of collaborative filtering is that this matrix is approximately low-rank: user preferences and item characteristics can be described by a small number of latent factors \(k\) (e.g., genre preference, quality, complexity). Generating data from a known low-rank model lets us verify that our algorithm can recover the true structure and predict missing entries.

# Create synthetic movie ratings
n_users = 50
n_movies = 30
n_factors = 5

# True latent factors
np.random.seed(42)
true_user_factors = np.random.randn(n_users, n_factors)
true_movie_factors = np.random.randn(n_movies, n_factors)

# Generate ratings
true_ratings = true_user_factors @ true_movie_factors.T
true_ratings = 1 + 4 * (true_ratings - true_ratings.min()) / (true_ratings.max() - true_ratings.min())

# Create sparse observed ratings (30% observed)
mask = np.random.rand(n_users, n_movies) < 0.3
observed_ratings = true_ratings.copy()
observed_ratings[~mask] = 0

print(f"Total possible ratings: {n_users * n_movies}")
print(f"Observed ratings: {mask.sum()} ({100*mask.sum()/(n_users*n_movies):.1f}%)")
print(f"Missing ratings: {(~mask).sum()}")

# Visualize rating matrix
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# True ratings
im1 = axes[0].imshow(true_ratings, cmap='YlOrRd', aspect='auto')
axes[0].set_xlabel('Movies', fontsize=12)
axes[0].set_ylabel('Users', fontsize=12)
axes[0].set_title('True Ratings (Complete)', fontsize=13, fontweight='bold')
plt.colorbar(im1, ax=axes[0], label='Rating (1-5)')

# Observed ratings (with missing)
masked_ratings = observed_ratings.copy()
masked_ratings[~mask] = np.nan
im2 = axes[1].imshow(masked_ratings, cmap='YlOrRd', aspect='auto')
axes[1].set_xlabel('Movies', fontsize=12)
axes[1].set_ylabel('Users', fontsize=12)
axes[1].set_title('Observed Ratings (Sparse)', fontsize=13, fontweight='bold')
plt.colorbar(im2, ax=axes[1], label='Rating (1-5)')

plt.tight_layout()
plt.show()

13.2 Matrix Factorization from Scratch

What: Implementing SGD-based matrix factorization with biases

We implement the core recommendation algorithm: decompose the sparse rating matrix \(R \approx PQ^T + b_u + b_m + \mu\), where \(P\) contains user latent factors, \(Q\) contains movie latent factors, \(b_u\) and \(b_m\) are user and movie biases, and \(\mu\) is the global mean rating. Parameters are optimized by stochastic gradient descent on the regularized squared error: \(J = \sum_{(u,m) \in \text{observed}} (r_{um} - \hat{r}_{um})^2 + \lambda(\|P\|^2 + \|Q\|^2)\).

Why: The algorithm behind Netflix-scale recommendations

Matrix factorization was the breakthrough technique in the Netflix Prize competition (2006-2009) and remains the backbone of most production recommendation systems. The biases capture systematic effects (some users rate generously, some movies are universally popular), while the latent factor dot product \(p_u^T q_m\) captures the personalized preference matching. Understanding this decomposition – and why regularization is essential to prevent overfitting on the sparse observed entries – is foundational for working on any recommendation system.

class MatrixFactorization:
    def __init__(self, n_factors=10, learning_rate=0.01, regularization=0.1, n_epochs=100):
        self.n_factors = n_factors
        self.lr = learning_rate
        self.reg = regularization
        self.n_epochs = n_epochs
        
    def fit(self, R, mask):
        """
        R: rating matrix (n_users x n_movies)
        mask: boolean matrix indicating observed ratings
        """
        n_users, n_movies = R.shape
        
        # Initialize factors
        self.P = np.random.randn(n_users, self.n_factors) * 0.01
        self.Q = np.random.randn(n_movies, self.n_factors) * 0.01
        self.user_bias = np.zeros(n_users)
        self.movie_bias = np.zeros(n_movies)
        self.global_mean = R[mask].mean()
        
        self.train_errors = []
        
        # Gradient descent
        for epoch in range(self.n_epochs):
            # Shuffle order
            users, movies = np.where(mask)
            indices = np.arange(len(users))
            np.random.shuffle(indices)
            
            for idx in indices:
                u, i = users[idx], movies[idx]
                r_ui = R[u, i]
                
                # Prediction
                pred = self.global_mean + self.user_bias[u] + self.movie_bias[i] + \
                       self.P[u] @ self.Q[i]
                
                # Error
                error = r_ui - pred
                
                # Update biases
                self.user_bias[u] += self.lr * (error - self.reg * self.user_bias[u])
                self.movie_bias[i] += self.lr * (error - self.reg * self.movie_bias[i])
                
                # Update factors
                P_u = self.P[u].copy()
                self.P[u] += self.lr * (error * self.Q[i] - self.reg * self.P[u])
                self.Q[i] += self.lr * (error * P_u - self.reg * self.Q[i])
            
            # Compute RMSE
            predictions = self.predict()
            train_rmse = np.sqrt(np.mean((R[mask] - predictions[mask])**2))
            self.train_errors.append(train_rmse)
            
            if (epoch + 1) % 20 == 0:
                print(f"Epoch {epoch+1}/{self.n_epochs}, RMSE: {train_rmse:.4f}")
    
    def predict(self):
        """Predict all ratings"""
        predictions = self.global_mean + \
                     self.user_bias[:, np.newaxis] + \
                     self.movie_bias[np.newaxis, :] + \
                     self.P @ self.Q.T
        return np.clip(predictions, 1, 5)

# Train model
mf = MatrixFactorization(n_factors=5, learning_rate=0.01, regularization=0.02, n_epochs=100)
mf.fit(observed_ratings, mask)

# Plot training curve
plt.figure(figsize=(10, 6))
plt.plot(mf.train_errors, linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('RMSE', fontsize=12)
plt.title('Matrix Factorization Training', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

13.3 Evaluate Predictions

What: Measuring recommendation quality with RMSE on observed and missing ratings

We evaluate the trained matrix factorization model on two sets: the observed ratings (which should have low error after training) and the held-out missing ratings (which test whether the model has learned meaningful latent structure). The gap between these two errors reveals whether the model is generalizing or merely memorizing the training entries.

Why: Proper evaluation of recommender systems is surprisingly tricky

In a real system, you never observe the missing ratings, so you cannot directly compute prediction error on them. Instead, practitioners use temporal splits (train on past, test on future) or held-out random splits of the observed ratings. Our simulated setup, where ground truth exists for all entries, lets us directly verify that the latent factor model is recovering the true underlying preferences – a luxury that motivates building and testing on synthetic data before deploying on real data.

# Get predictions
predicted_ratings = mf.predict()

# Evaluate on observed ratings
observed_true = observed_ratings[mask]
observed_pred = predicted_ratings[mask]
rmse_observed = np.sqrt(mean_squared_error(observed_true, observed_pred))

# Evaluate on missing ratings (we have ground truth from simulation)
missing_true = true_ratings[~mask]
missing_pred = predicted_ratings[~mask]
rmse_missing = np.sqrt(mean_squared_error(missing_true, missing_pred))

print(f"RMSE on observed ratings: {rmse_observed:.4f}")
print(f"RMSE on missing ratings: {rmse_missing:.4f}")
print(f"\nNote: In practice, we don't have ground truth for missing ratings!")

# Visualize predictions
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

im1 = axes[0].imshow(true_ratings, cmap='YlOrRd', aspect='auto')
axes[0].set_title('True Ratings', fontsize=13, fontweight='bold')
axes[0].set_xlabel('Movies')
axes[0].set_ylabel('Users')
plt.colorbar(im1, ax=axes[0])

im2 = axes[1].imshow(predicted_ratings, cmap='YlOrRd', aspect='auto')
axes[1].set_title('Predicted Ratings', fontsize=13, fontweight='bold')
axes[1].set_xlabel('Movies')
axes[1].set_ylabel('Users')
plt.colorbar(im2, ax=axes[1])

im3 = axes[2].imshow(np.abs(true_ratings - predicted_ratings), cmap='Reds', aspect='auto')
axes[2].set_title('Absolute Error', fontsize=13, fontweight='bold')
axes[2].set_xlabel('Movies')
axes[2].set_ylabel('Users')
plt.colorbar(im3, ax=axes[2])

plt.tight_layout()
plt.show()

13.4 Top-N Recommendations

What: Generating personalized movie recommendations for a specific user

For a target user, we predict ratings for all movies the user has not yet seen, rank them by predicted rating, and present the top \(N\) items as recommendations. This is the core output of any recommender system – converting a matrix of predicted scores into an actionable ranked list.

Why: The shift from rating prediction to ranking

While RMSE measures the accuracy of individual rating predictions, users ultimately experience recommendations as a ranked list. A model that accurately predicts the top few movies matters more than one with slightly lower RMSE across all predictions. In production systems, ranking metrics like Precision@K, NDCG, and MAP better capture user satisfaction than RMSE. This example bridges the gap between the optimization objective (minimize squared error) and the user-facing output (a personalized ranked list).

# Generate recommendations for a user
user_id = 0
user_predictions = predicted_ratings[user_id]
user_observed = mask[user_id]

# Get top-5 unseen movies
unseen_movies = np.where(~user_observed)[0]
unseen_predictions = user_predictions[unseen_movies]
top_indices = np.argsort(unseen_predictions)[::-1][:5]
top_movies = unseen_movies[top_indices]

print(f"Top 5 Recommendations for User {user_id}:\n")
for rank, movie_id in enumerate(top_movies, 1):
    pred_rating = user_predictions[movie_id]
    true_rating = true_ratings[user_id, movie_id]
    print(f"{rank}. Movie {movie_id}: Predicted={pred_rating:.2f}, True={true_rating:.2f}")

# Visualize user's ratings
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(n_movies)
width = 0.35

# Observed ratings
observed_vals = np.where(user_observed, observed_ratings[user_id], 0)
ax.bar(x - width/2, observed_vals, width, label='Observed Ratings', alpha=0.7)

# Predictions for unseen
unseen_vals = np.where(~user_observed, user_predictions, 0)
ax.bar(x + width/2, unseen_vals, width, label='Predicted Ratings (unseen)', alpha=0.7)

# Highlight recommendations
ax.scatter(top_movies, user_predictions[top_movies], s=200, c='red', 
          marker='*', zorder=5, label='Top 5 Recommendations')

ax.set_xlabel('Movie ID', fontsize=12)
ax.set_ylabel('Rating', fontsize=12)
ax.set_title(f'User {user_id} - Ratings and Recommendations', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

13.5 Latent Factor Interpretation

What: Visualizing the learned user and movie embeddings

We display the learned latent factor matrices \(P\) (users) and \(Q\) (movies) as heatmaps. Each row of \(P\) represents a user’s position in the latent space, and each row of \(Q\) represents a movie’s position. The predicted rating for user \(u\) and movie \(m\) is essentially the dot product of their respective rows (plus biases), so users and movies that align along the same latent dimensions will have high predicted ratings.

Why: From black-box predictions to interpretable factors

Although the latent factors are learned purely from rating patterns (no content information), they often correspond to interpretable concepts like genre, era, or quality. Visualizing these factors helps us understand why the model makes certain recommendations and builds trust in the system. In production recommender systems, interpretable factors enable features like “Because you watched action movies…” explanations, which research shows significantly increase user engagement and satisfaction.

# Visualize learned factors
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# User factors
im1 = axes[0].imshow(mf.P, cmap='RdBu_r', aspect='auto')
axes[0].set_xlabel('Latent Factor', fontsize=12)
axes[0].set_ylabel('User ID', fontsize=12)
axes[0].set_title('User Latent Factors', fontsize=13, fontweight='bold')
plt.colorbar(im1, ax=axes[0])

# Movie factors
im2 = axes[1].imshow(mf.Q, cmap='RdBu_r', aspect='auto')
axes[1].set_xlabel('Latent Factor', fontsize=12)
axes[1].set_ylabel('Movie ID', fontsize=12)
axes[1].set_title('Movie Latent Factors', fontsize=13, fontweight='bold')
plt.colorbar(im2, ax=axes[1])

plt.tight_layout()
plt.show()

print("\nLatent Factor Interpretation:")
print("- Each factor represents a hidden preference dimension")
print("- Examples: genre, era, director style, complexity, etc.")
print("- User factors: How much they like each dimension")
print("- Movie factors: How much each movie embodies each dimension")
print("- Rating prediction: Dot product of user and movie factors")

Key Takeaways

1. Matrix Factorization

  • Core idea: Decompose rating matrix R ≈ P @ Q^T

  • P: User latent factors (n_users × k)

  • Q: Movie latent factors (n_movies × k)

  • k: Number of latent dimensions (hyperparameter)

2. Collaborative Filtering

  • Uses only user-item interactions (no content features)

  • Learns from similar users and similar items

  • Cold start problem: Can’t recommend for new users/items

3. Optimization

  • SGD: Update after each rating (used above)

    • Fast, online learning

    • Can be unstable

  • ALS: Alternating Least Squares

    • Fix Q, solve for P (closed form)

    • Fix P, solve for Q

    • More stable, parallelizable

4. Regularization

  • Essential to prevent overfitting

  • L2 penalty on factors: λ(||P||² + ||Q||²)

  • Tune λ using validation set

5. Extensions

  • Biases: Add user and item bias terms

  • Implicit feedback: “Watched” vs “Not watched”

  • Temporal dynamics: Preferences change over time

  • Deep learning: Neural collaborative filtering

6. Evaluation

  • Rating prediction: RMSE, MAE

  • Ranking: Precision@k, NDCG, MAP

  • Online: A/B testing, click-through rate

  • Always use held-out test set

7. Practical Considerations

  • Sparsity: Most users rate few items

  • Scalability: Millions of users × items

  • Cold start: New users/items have no history

  • Solution: Hybrid systems (content + collaborative)

References

  1. “Matrix Factorization Techniques for Recommender Systems” - Koren et al. (2009)

  2. “Collaborative Filtering for Implicit Feedback Datasets” - Hu et al. (2008)

  3. CS229 Lecture Notes on Recommender Systems

Next: Lecture 14: Reinforcement Learning