Social Tagging and Folksonomy in Recommendation Systems: Models, Algorithms, and Applications
This article surveys the role of social tagging (folksonomy) in modern recommendation systems, describing how user‑generated tags form a three‑dimensional "tag cube" that can be combined with rating matrices, and reviewing a range of algorithms—including neighbor‑based, ranking (FolkRank/SocialRank), content‑based, linear regression, and matrix‑factorization approaches—while also discussing tag selection, noise handling, and scalability challenges.
Social tagging systems allow users to freely annotate online resources with keywords, creating what is known as a folksonomy or "tag cube"—a three‑dimensional array F of size m × n × p where m is the number of users, n the number of items, and p the number of tags. Each entry f ijk indicates whether user i assigned tag k to item j.
The tag cube can be used alone or combined with a traditional rating matrix R (size m × n) to enhance recommendation quality. Even when explicit ratings are unavailable, implicit feedback (e.g., clicks or purchases) can serve as a binary rating.
Various algorithmic families are applied to tag‑aware recommendation:
Neighbor‑based methods augment the rating matrix with pseudo‑users or pseudo‑items derived from tags, enabling standard user‑based or item‑based collaborative filtering.
Ranking methods such as FolkRank and SocialRank adapt personalized PageRank to the tripartite graph of users, items, and tags, balancing global popularity with personalized relevance.
Content‑based methods treat the tag frequencies of an item as a tf‑idf vector, allowing similarity computation between items or between a user’s tag profile and items.
Linear regression approaches learn tag‑specific weights w jr for each item j from known ratings, predicting a user’s preference for a tag or an item by aggregating weighted tag contributions.
Matrix‑factorization techniques (e.g., TagiCoFi) factorize the rating matrix while regularizing user factors with tag‑based similarity, often using a combined loss g(U,V,R)+βf(U) and additional regularization λ.
Effective tag selection is crucial because user‑generated tags are noisy and may contain misspellings. Simple heuristics such as tag popularity (number of items tagged) or more sophisticated features (global vs. local relevance) are used to filter out low‑quality tags.
Overall, integrating social tags provides complementary information to ratings, improves recommendation accuracy—especially for cold‑start items—and introduces new challenges related to scalability, noise reduction, and the design of hybrid models that jointly exploit both data sources.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.