Practical Python Implementation of Credit Card User Profiling Using SQL and Data Visualization
This tutorial demonstrates a complete workflow for building credit‑card user profiles by first extracting and transforming the KDD99 dataset with SQL, then applying Python libraries such as pandas, matplotlib, and seaborn to perform descriptive statistics and produce visualizations of demographic, transaction, and financial characteristics.
The article presents a practical case study of constructing credit‑card user profiles with Python, divided into two major sections: (1) an introduction to descriptive statistics using Python as the foundation for profiling, and (2) a hands‑on demonstration using a real credit‑card dataset.
In the first part, the author explains the basic plotting capabilities of matplotlib (including the pandas wrapper pandas.plot ) and introduces the more advanced seaborn library for richer visual styles, emphasizing their use for exploratory data analysis.
The second part focuses on the KDD99 dataset, a public financial dataset from a 1999 data‑mining competition. The workflow shows how to categorize data into static, point‑in‑time, and period windows, then extracts the necessary fields with a series of SQL statements. The key SQL scripts are:
<code>create table kdd99.var1 as
select a.*,c.sex,datediff(a.issued,c.birth_date)/365 as age
from kdd99.kdd99_card a
left join kdd99.kdd99_disp b on a.disp_id=b.disp_id
left join kdd99.kdd99_clients c on b.client_id=c.client_id;</code> <code>create table kdd99.var2 as
select a.*,c.date as trans_date,datediff(a.issued,c.date) as last_days,c.balance
from kdd99.kdd99_card a
left join kdd99.kdd99_disp b on a.disp_id=b.disp_id
left join kdd99.kdd99_Trans c on b.account_id=c.account_id
join (select card_id,max(trans_date) as max_date
from kdd99.rfm_balance1
group by card_id) d on a.card_id=d.card_id and c.date=d.max_date
order by a.card_id;</code> <code>create table kdd99.var3 as
select a.card_id,
count(trans_id) as trans_count,
sum(if(c.type='借',amount,0)) as trans_in_amount,
sum(if(c.type='贷',amount,0)) as trans_out_amount,
sum(if(c.type='贷',amount,0))/avg(c.balance) as out_ratio,
sum(if(c.type='贷',amount,0))/sum(if(c.type='借',amount,0)) as out_in_ratio
from kdd99.kdd99_card a
left join kdd99.kdd99_disp b on a.disp_id=b.disp_id
left join kdd99.kdd99_Trans c on b.account_id=c.account_id and a.issued>c.date and c.date>a.issued-180
group by a.card_id;</code>Finally, the three intermediate tables are merged to produce a comprehensive Creditcard table:
<code>create table kdd99.Creditcard as
select a.*,b.last_days,b.balance,c.trans_count,c.trans_in_amount,c.trans_out_amount,c.out_ratio,c.out_in_ratio
from kdd99.var1 as a
left join kdd99.var2 as b on a.card_id=b.card_id
left join kdd99.var3 as c on a.card_id=c.card_id
order by a.card_id;</code>With the prepared data, Python visualizations are generated. Examples include a bar chart of credit‑card type counts, box‑plots of age distribution, transaction count, and account balance, and a gender‑vs‑card‑type cross‑tab bar chart. Relevant Python snippets are:
<code>from pylab import mpl
mpl.rcParams['font.sans-serif'] = ['Songti SC']
mpl.rcParams['axes.unicode_minus'] = False
Creditcard.type.value_counts().plot(kind='bar')
</code> <code>import seaborn as sns
sns.boxplot(x='type', y='age', data=Creditcard)
</code> <code>pd.crosstab(Creditcard.sex, Creditcard.type).plot(kind='bar')
</code>These visualizations reveal that ordinary cards dominate issuance, age distributions differ across card types, and gender patterns vary by card category, providing a clear, data‑driven user profile.
The tutorial demonstrates how to combine SQL data extraction with Python analytics and visualization to build comprehensive user profiles for credit‑card customers.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.