Stability of Collaborative Filtering Recommendation Algorithms



Recommender systems are becoming a standard component of many e-commerce sites. While much of the prior research has focused on predictive accuracy as the main measure of recommender systems performance, other metrics are becoming increasingly important.  This work explores stability as a new measure of recommender systems performance.  Stability is defined to measure the extent to which a recommendation algorithm provides predictions that are consistent with each other.  Specifically, for a stable algorithm, adding some of the algorithm’s own predictions to the algorithm’s training data (for example, if these predictions were confirmed as accurate by users) would not invalidate or change the other predictions.  While stability is an interesting theoretical property that can provide additional understanding about recommendation algorithms and their performance, stability is also a desired practical property for recommender systems, because unstable recommendations can potentially decrease users’ trust in recommender systems and, as a result, reduce users’ acceptance of recommendations.  In this work, we also provide an extensive empirical evaluation of stability for six popular recommendation algorithms on four real-world datasets.  Our results suggest that stability performance of individual recommendation algorithms is consistent across a variety of datasets and settings.  In particular, we find that model-based recommendation algorithms consistently demonstrate higher stability than neighborhood-based collaborative filtering heuristics.  In addition, we perform a comprehensive empirical analysis of many important factors (e.g., the sparsity of original rating data, normalization of input data, the number of new incoming ratings, the distribution of incoming ratings, the distribution of evaluation data, etc.) and report the impact they have on recommendation stability.  Our analysis shows that some popular recommendation algorithms suffer from high degree of instability.  Therefore, we further propose two novel meta-algorithms that can be used in conjunction with different traditional recommendation techniques to improve their stability.  Our experimental results on real-world movie rating data demonstrate that the proposed approaches can achieve substantially higher stability as compared to the original recommendation algorithms, while, perhaps as importantly, providing additional improvements in their predictive accuracy as well.
Contact information:
Ingrid Waaijer