User Profiling: Concepts, Practices, and Data‑Driven E‑Commerce Case Study
This article introduces the fundamentals of user profiling, explains tag types and their business value, and demonstrates a data‑driven e‑commerce case study that analyzes gender, age, region, marital status, education, profession, product preferences, purchase timing, and price sensitivity to guide targeted promotion strategies.
Author: Mu Xiaoxiong, Huazhong Agricultural University Source: Datawhale
The article begins with a brief introduction to user profiling, emphasizing that profiling abstracts concrete user information into tags to create a concrete user image for personalized services.
1. User Profiling Basics
Profiling core is to label users by converting each piece of concrete information into tags, enabling targeted services.
Example: a matchmaking scenario where a female’s ideal male partner is described using tags such as age, height, income, location, education, etc.
2. Tag Types
Statistical tags : basic attributes like name, gender, age, city, activity duration, derived from registration or transaction data.
Rule‑based tags : created collaboratively by operations and data teams based on business rules and user behavior.
Learning‑derived tags : generated by machine‑learning models, e.g., inferring gender from purchase of feminine products.
3. Value of User Profiling
Large‑scale businesses invest heavily in profiling to collect and analyze data across business lines, enabling precise services and diversified operation strategies.
Applications
User acquisition via DMP advertising targeting similar‑tag users.
Cold‑start for new users by inferring attributes from regional tag distributions.
Personalized or precise services based on rich profile analysis.
Multi‑scenario identification (e.g., linking accounts across phone numbers).
Reactivating dormant users by analyzing sensitivity and designing activation strategies.
4. Practical Project: E‑Commerce Promotion Case
Scenario: A data analyst is asked to help an e‑commerce platform improve declining orders for a home‑appliance category by designing a coupon promotion.
The analysis proceeds in six steps, extracting data from a masked order dataset (2020‑08‑12 to 2020‑08‑19) and visualizing various dimensions.
Step 1 – Data Extraction
data.head()Step 2 – Gender & Age Distribution
labels = ['男','女']
values = [male_user, female_user]
trace = [go.Pie(labels=labels, values=values)]
layout = go.Layout(title=dict(text='用户的性别分布',x=0.5))
fig = go.Figure(data=trace, layout=layout)
fig x = ['18岁以下','18~25岁','25~35岁','35~45岁','45~55岁','55岁以上']
y = user_age_df['user_age_count']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户年龄分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
figFindings: Slight male dominance; age concentrated 25‑35; low activity among <18 and >45.
Step 3 – Regional Distribution
y = user_region_df['province_name'][::-1]
x = user_region_df['region_count'][::-1]
trace = go.Bar(x=x, y=y, text=x, textposition='outside', orientation='h')
layout = go.Layout(title=dict(text='用户的地域分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig y = user_city_df['ulp_addr_city'][::-1]
x = user_city_df['city_count'][::-1]
trace = go.Bar(x=x, y=y, text=x, textposition='outside', orientation='h')
layout = go.Layout(title=dict(text='用户的城市分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
figUsers are mainly in first‑tier and new‑first‑tier cities, aligning with the age distribution.
Step 4 – Marital & Child Status
labels = ['已婚','未婚']
values = [married_user, unmarried_user]
trace = [go.Pie(labels=labels, values=values)]
layout = go.Layout(title=dict(text='用户的性别分布',x=0.5))
fig = go.Figure(data=trace, layout=layout)
fig labels = ['高','较高','较低','低']
values = [very_high, high, low, very_low]
trace = [go.Pie(labels=labels, values=values)]
layout = go.Layout(title=dict(text='用户的小孩情况',x=0.5))
fig = go.Figure(data=trace, layout=layout)
fig~70% are married; >60% likely have children.
Step 5 – Education & Occupation
y = user_edu_df['edu']
x = ['初中及以下','高中(中专)','大学(专科及本科)','研究生(硕士及以上)']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户的学历分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig x = ['金融从业者','医务人员','公务员/事业单位','白领/一般职员','工人/服务业人员','教师','互联网从业人员','学生']
y = user_profession_df['profession']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户的学历分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
figTypical user: male, 28‑30, married with children, lives in a first‑tier city, bachelor’s degree, works in internet industry with stable income.
Step 6 – Purchase Behavior
y = user_order_cate_df['item_third_cate_name'][::-1]
x = user_order_cate_df['cate_count'][::-1]
trace = go.Bar(x=x, y=y, text=x, textposition='outside', orientation='h')
layout = go.Layout(title=dict(text='用户购买商品分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
figTop product: electric fans (seasonal). Recommendation: promote water purifiers, humidifiers for early autumn.
x = ['星期一','星期二','星期三','星期四','星期五','星期六','星期日']
y = user_order_week_df_2['week_count']
trace = go.Scatter(x=x, y=y, mode='lines', line=dict(width=2))
layout = go.Layout(title=dict(text='用户购买的日期分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig x = [str(i) for i in range(0,24)]
y = user_order_hms_df['hms_count']
trace = go.Scatter(x=x, y=y, mode='lines', line=dict(width=2))
layout = go.Layout(title=dict(text='用户购买的时间分布',x=0.5), xaxis=dict(tickmode='linear'))
fig = go.Figure(data=trace,layout=layout)
figPeak order times: Tuesday & Saturday, 10‑11 am and 8‑10 pm.
Step 7 – Price Sensitivity
x = ['不敏感','轻度敏感','中度敏感','高度敏感','极度敏感']
y = user_order_sens_promotion_df['sens_promotion_count']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户价格敏感度分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
fig x = ['不敏感','轻度敏感','中度敏感','高度敏感','极度敏感']
y = user_order_sens_comment_df['sens_comment_count']
trace = go.Bar(x=x, y=y, text=y, textposition='outside')
layout = go.Layout(title=dict(text='用户频率敏感度分布',x=0.5))
fig = go.Figure(data=trace,layout=layout)
figUsers are price‑sensitive and highly sensitive to reviews; thus, promote well‑reviewed products.
Recommendations for Promotion
Use neutral copy; highlight family‑quality and safety of home‑appliance products.
Focus on end‑of‑summer/early‑autumn items such as water purifiers, humidifiers, and drinking‑water machines.
Schedule ads on Tuesdays and Saturdays, especially around 10 am and 9‑10 pm.
Select products with strong positive reviews to match user sensitivity.
Finally, the author thanks the audience and invites readers to join the DataFunTalk community for further big‑data and AI discussions.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.