Big Data 11 min read

Game Industry User Data Analysis: Registration Distribution, Payment Metrics, and Consumption Patterns

This article presents a comprehensive Python-based analysis of a large game dataset (2.29 million records, 109 fields), covering user registration trends, payment rates, ARPU/ARPPU calculations, level‑based spending behavior, and consumption patterns of resources and acceleration items, with visualizations and actionable conclusions.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Game Industry User Data Analysis: Registration Distribution, Payment Metrics, and Consumption Patterns

Field Description

The dataset contains 2,290,000 records and 109 fields; key fields include user_id, bd_stronghold_level (account level), resource consumption values (wood, stone, ivory, meat, magic), acceleration consumption values, battle counts, average online minutes, pay_price, and pay_count.

Analysis Idea

User registration time distribution

Payment metrics (payment rate, ARPU, ARPPU)

Payment behavior by level

Consumption habits of different player groups

Analysis Process

1. Import data

import numpy as np
import pandas as pd
from pandas import read_csv
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import pylab as pl
from matplotlib.font_manager import FontManager, FontProperties
pd.set_option('display.max_columns', None)
# Copy data for safety
df = df0
# Check for null values
print(df.isnull().any().any())
# Preview data
print(df.head())

2. Clean data

# Remove duplicate user_id entries
df = df.drop_duplicates(subset='user_id')
print('用户总数:', len(df['user_id']))

3. Compute registration distribution

# Truncate registration_time to day
register_date = []
for i in df['register_time']:
    date = i[5:10]
    register_date.append(date)
df['register_time'] = register_date
# Count registrations per day
df_register = df.groupby('register_time').size()
df_register.columns = ['日期', '注册人数']
print(df_register)
# Plot registration trend
plt.plot(df_register)
plt.grid(True)
pl.xticks(rotation=90)
font = FontProperties(fname='/System/Library/Fonts/PingFang.ttc')
plt.title('用户注册分布图', fontproperties=font)
plt.show()

4. Payment analysis

# Payment rate (paying users / active users)
df_pay_user = df[df['pay_price'] > 0]
pay_rate = df_pay_user['user_id'].count() / df_active_user['user_id'].count()
print('付费率:%.2f' % (pay_rate))
# ARPU (total payment / active users)
arpu = df_pay_user['pay_price'].sum() / df_active_user['user_id'].count()
print('ARPU:%.2f' % (arpu))
# ARPPU (total payment / paying users)
arppu = df_pay_user['pay_price'].sum() / df_pay_user['user_id'].count()
print('ARPPU:%.2f' % (arppu))

5. Payment behavior by level

df_user = df[['user_id', 'bd_stronghold_level', 'pay_price', 'pay_count']]
df_table = pd.pivot_table(df_user, index=['bd_stronghold_level'],
    values=['user_id', 'pay_price', 'pay_count'],
    aggfunc={'user_id':'count','pay_price':'sum','pay_count':'sum'})

df_stronghold_pay = pd.DataFrame(df_table.to_records())
# Calculate paying users per level
df_stronghold_pay['pay_num'] = df_user[(df_user['pay_price']>0)].groupby('bd_stronghold_level').user_id.count()
# Conversion rate per level
df_stronghold_pay['pay_rate'] = df_stronghold_pay['pay_num'] / df_stronghold_pay['user_id']
# Average payment per level
df_stronghold_pay['avg_pay_price'] = df_stronghold_pay['pay_price'] / df_stronghold_pay['user_id']
# Average payment count per level
df_stronghold_pay['avg_pay_count'] = df_stronghold_pay['pay_count'] / df_stronghold_pay['user_id']
# Rename columns
df_stronghold_pay.columns = ['要塞等级','总人数','总付费金额','总付费次数','付费人数','付费转化率','人均付费金额','人均付费次数']
df_stronghold_pay = df_stronghold_pay[['要塞等级','总人数','付费人数','付费转化率','总付费金额','人均付费金额','总付费次数','人均付费次数']]
df_stronghold_pay = df_stronghold_pay.round(2)
print(df_stronghold_pay)

6. Consumption habits of different player groups

# Define high‑value players (level >=10 and pay_price >=500) and normal players
df_eli_user = df[(df['pay_price']>=500) & (df['bd_stronghold_level']>=10)]
df_nor_user = df[(df['pay_price']<500) & (df['bd_stronghold_level']>10)]
# Average resource consumption for each group
wood_avg = [df_eli_user['wood_reduce_value'].mean(), df_nor_user['wood_reduce_value'].mean()]
stone_avg = [df_eli_user['stone_reduce_value'].mean(), df_nor_user['stone_reduce_value'].mean()]
ivory_avg = [df_eli_user['ivory_reduce_value'].mean(), df_nor_user['ivory_reduce_value'].mean()]
meat_avg = [df_eli_user['meat_reduce_value'].mean(), df_nor_user['meat_reduce_value'].mean()]
magic_avg = [df_eli_user['magic_reduce_value'].mean(), df_nor_user['magic_reduce_value'].mean()]
props_data = {'high_value_player':[wood_avg[0], stone_avg[0], ivory_avg[0], meat_avg[0], magic_avg[0]],
              'normal_player':[wood_avg[1], stone_avg[1], ivory_avg[1], meat_avg[1], magic_avg[1]]}

df_props = pd.DataFrame(props_data, index=['wood','stone','ivory','meat','magic']).round(2)
print(df_props)
<code># Plot resource consumption
df_props.plot(kind='bar', title='Props Reduce', grid=True, legend=True)
plt.show()
# Acceleration item consumption
general_avg = [df_eli_user['general_acceleration_reduce_value'].mean(), df_nor_user['general_acceleration_reduce_value'].mean()]
building_avg = [df_eli_user['building_acceleration_reduce_value'].mean(), df_nor_user['building_acceleration_reduce_value'].mean()]
research_avg = [df_eli_user['reaserch_acceleration_reduce_value'].mean(), df_nor_user['reaserch_acceleration_reduce_value'].mean()]
training_avg = [df_eli_user['training_acceleration_reduce_value'].mean(), df_nor_user['training_acceleration_reduce_value'].mean()]
training_avg = [df_eli_user['training_acceleration_reduce_value'].mean(), df_nor_user['training_acceleration_reduce_value'].mean()]
treatment_avg = [df_eli_user['treatment_acceleration_reduce_value'].mean(), df_nor_user['treatment_acceleration_reduce_value'].mean()]
acceleration_data = {'high_value_player':[general_avg[0], building_avg[0], research_avg[0], training_avg[0], treatment_avg[0]],
                    'normal_player':[general_avg[1], building_avg[1], research_avg[1], training_avg[1], treatment_avg[1]]}

df_acceleration = pd.DataFrame(acceleration_data, index=['general','building','researching','training','treatment']).round(2)
print(df_acceleration)
<code># Plot acceleration consumption
df_acceleration.plot(kind='bar', title='Acceleration Reduce', grid=True, legend=True)
plt.show()

Conclusion

1. The game has a large user base; new registrations are strongly influenced by events and version updates.

2. ARPU of 8.55 indicates high profitability.

3. Users reaching level 10 show a sharp increase in payment propensity, approaching 100% at level 13, but most users stay below level 10, making level‑up strategies critical.

4. High‑value players consume significantly more ivory and general acceleration items than normal players.

big dataPythonuser behaviorData Visualizationgame analyticspandaspayment analysis
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.