Alternating Least Squares (ALS) for Collaborative Filtering and Matrix Factorisation

0
138
Alternating Least Squares (ALS) for Collaborative Filtering and Matrix Factorisation

Recommendation systems often need to predict what a user will like next—movies, products, courses, or articles—based on historical behaviour. Collaborative filtering is one of the most widely used approaches because it learns from patterns across many users, not just item attributes. A practical way to implement collaborative filtering at scale is matrix factorisation, where we compress a large user–item interaction table into small latent representations. This is where Alternating Least Squares (ALS) becomes especially useful, and it is a core concept you’ll encounter when learning recommendation engines in a data science course in Ahmedabad.

Why Matrix Factorisation Works for Recommendations

Most recommendation problems can be expressed as a sparse matrix:

  • Rows represent users
  • Columns represent items
  • Each cell contains a rating or an interaction signal (view, click, purchase)

In real systems, the matrix is mostly empty because users interact with only a tiny fraction of items. Matrix factorisation addresses this sparsity by assuming that user preferences and item properties can be represented using a small number of hidden factors (also called latent features).

Instead of storing a huge matrix RRR, we approximate it as the product of two smaller matrices:

  • XXX: user-factor matrix (one vector per user)
  • YYY: item-factor matrix (one vector per item)

The predicted preference becomes a dot product:

r^ui=xu⊤yi\hat{r}_{ui} = x_u^\top y_ir^ui​=xu⊤​yi​If the learned vectors are good, r^ui\hat{r}_{ui}r^ui​ is high for items the user will likely prefer, even if the user has never interacted with them before.

What ALS Optimises and Why It’s “Alternating”

ALS is an optimisation method that learns XXX and YYY by minimising prediction error on known interactions while preventing overfitting through regularisation. For explicit ratings (like 1–5 stars), a common objective is:

min⁡X,Y∑(u,i)∈Ω(rui−xu⊤yi)2+λ(∑u∥xu∥2+∑i∥yi∥2)\min_{X,Y} \sum_{(u,i)\in \Omega} (r_{ui} – x_u^\top y_i)^2 + \lambda\left(\sum_u \|x_u\|^2 + \sum_i \|y_i\|^2\right)X,Ymin​(u,i)∈Ω∑​(rui​−xu⊤​yi​)2+λ(u∑​∥xu​∥2+i∑​∥yi​∥2)Here, Ω\OmegaΩ is the set of observed user–item pairs, and λ\lambdaλ controls regularisation.

The key challenge is that the objective is not jointly convex in XXX and YYY together. However, it is convex in one when the other is fixed. ALS uses this property:

  1. Fix YYY, solve for all user vectors in XXX using least squares
  2. Fix XXX, solve for all item vectors in YYY using least squares
  3. Repeat until convergence or a set number of iterations

This “alternating” approach breaks a hard problem into a sequence of easier least-squares problems—hence the name Alternating Least Squares. Understanding this optimisation flow is a practical milestone for anyone building recommender systems through a data science course in Ahmedabad.

ALS for Implicit Feedback: A Real-World Fit

Many platforms don’t have star ratings. Instead, they have implicit feedback such as clicks, watch time, cart adds, or purchases. In these settings, “missing” entries do not mean dislike—they often mean “unknown.” ALS is popular here because it supports a weighted formulation that treats observed interactions with higher confidence than unobserved ones.

A common implicit ALS setup defines:

  • puip_{ui}pui​: preference (often 1 if interacted, 0 otherwise)
  • cuic_{ui}cui​: confidence (higher when interaction strength is higher)

The objective becomes:

min⁡X,Y∑u,icui(pui−xu⊤yi)2+λ(∥X∥2+∥Y∥2)\min_{X,Y} \sum_{u,i} c_{ui}(p_{ui} – x_u^\top y_i)^2 + \lambda(\|X\|^2+\|Y\|^2)X,Ymin​u,i∑​cui​(pui​−xu⊤​yi​)2+λ(∥X∥2+∥Y∥2)This makes ALS practical for large-scale recommendation engines where data is abundant but noisy.

Practical Implementation Notes That Matter

Choosing hyperparameters

Three parameters strongly influence outcomes:

  • Rank (number of latent factors): Higher rank captures more nuance but can overfit and slow training.
  • Regularisation (λ\lambdaλ): Controls overfitting; too low can memorise noise.
  • Iterations: More iterations usually reduce training error, but improvements plateau.

A simple starting point is moderate rank (e.g., 20–100), modest regularisation, and 10–20 iterations, then tune using validation metrics.

Scaling and speed

ALS is often used in distributed frameworks because each update step decomposes into many independent least-squares solves—one per user or item. This parallel structure is why ALS appears in tools like Apache Spark’s MLlib. If you are working with millions of interactions, this scalability is a major advantage.

Evaluation

Offline evaluation typically uses ranking metrics such as Precision@K, Recall@K, MAP, or NDCG, along with careful train/test splits that reflect time order. In production, online A/B testing is important because the true goal is user impact, not just offline accuracy.

Cold start limitations

ALS relies on interaction history, so it struggles with new users or new items. Practical systems combine ALS with business rules, popularity baselines, or content-based features to handle cold start.

Conclusion

Alternating Least Squares is a reliable and scalable optimisation method for collaborative filtering via matrix factorisation. By alternating between solving user vectors and item vectors through least squares, ALS turns a complex recommendation problem into a sequence of efficient subproblems. It is especially strong for implicit feedback scenarios and large datasets, making it a common choice in real-world recommenders. If you want to move from theory to building working recommendation pipelines—with tuning, evaluation, and deployment considerations—studying ALS in a data science course in Ahmedabad can give you a clear, hands-on pathway to mastering collaborative filtering.