Design and Implementation of a Rule‑Based and Collaborative‑Filtering Recommendation System for an Educational App
This article describes the business background, cold‑start challenges, rule‑based recall pipeline, Wilson interval and time‑decay scoring methods, item‑based collaborative filtering implementation with code, and experimental results that improved click‑through rates for the 学而思网校 educational application.
The 学而思网校 APP originally focused on course purchase and attendance, but later added content‑driven features to increase user stickiness, requiring a recommendation system composed of recall, filtering, precise ranking, and re‑ranking stages.
During the early stage, the cold‑start problem was tackled using static user information (gender, age, region, interests), category hot‑lists, and expert‑rule recommendations, forming a simple rule‑based pipeline that selected high‑quality content, applied grade/region matching, and protected new content exposure.
Recall strategies were expanded to include category hot‑lists calculated with the Wilson interval method, which estimates a confidence interval for the positive‑rating proportion (p) based on binomial distribution, ranking items by the lower bound of the interval.
A time‑decay scoring algorithm based on Newton's cooling law was introduced, weighting clicks, likes, shares, and comments while reducing the influence of older popular items; the formula combines the Wilson‑based score with an exponential decay factor.
Item‑based collaborative filtering was added, computing item‑item similarity from user interaction matrices and generating recommendations for unseen items. The implementation code is shown below:
for u, items in train.items():
for i, ratei in items.items():
if i not in N:
N[i] = 0
N[i] += float(ratei) * float(ratei)
for j, ratej in items.items():
if i == j:
continue
if i not in C:
C[i] = dict()
if j not in C[i]:
C[i][j] = 0
C[i][j] += (float(ratei) * float(ratej))
count = 0
for i, related_items in C.items():
for j, cij in related_items.items():
if i not in W:
W[i] = dict()
if j not in W[i]:
W[i][j] = 0
W[i][j] = round(cij / math.sqrt(N[i] * N[j]), 4)Offline A/B experiments showed that the hot‑list recall increased content click‑through rate by 6% and user click‑through rate by 4%, while the collaborative‑filtering pipeline raised content click‑through rate by 9%.
Key lessons include keeping early rule‑based systems simple, grounding optimizations in business data, maintaining detailed logs for each recall channel, and always validating ideas with data before large‑scale development.
Future work aims to incorporate real‑time collaborative filtering, click‑feedback loops, and model‑based recall strategies to further improve user experience.
Xueersi Online School Tech Team
The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.