Automating Report Generation with Python: Data Extraction, Visualization, and Email Delivery
This tutorial explains how data analysts can automate the creation, processing, and distribution of statistical reports using Python by reading various data sources, performing DataFrame calculations, generating visualizations, and sending the results via email, thereby saving time and reducing errors.
Project Background
Data analysts often spend excessive time manually creating statistical reports, which detracts from actual data analysis. Automating report generation with Python allows analysts to focus on extracting insights rather than repetitive chart creation.
Purpose of Automation
Automation aims to save time, improve efficiency, and reduce human errors by programmatically handling repetitive reporting tasks.
Scope of Automation
Not all reports need automation; focus on frequently used reports such as customer lists, sales flow, churn, and period‑over‑period comparisons. Consider report frequency, development effort, and overall workflow when deciding which reports to automate.
Implementation Steps
Step 1: Reading Data Sources
Data is loaded into pandas DataFrames from Excel, JSON, SQL, or CSV files. Example code:
import pandas as pd
import json
import pymysql
from sqlalchemy import create_engine
# Open database connection
conn = pymysql.connect(host='localhost', port=3306, user='root', passwd='xxxx', charset='utf8')
engine = create_engine('mysql+pymysql://root:xxxx@localhost/mysql?charset=utf8')
def read_excel(file):
df_excel = pd.read_excel(file)
return df_excel
def read_json(file):
with open(file, 'r') as json_f:
df_json = pd.read_json(json_f)
return df_json
def read_sql(table):
sql_cmd = 'SELECT * FROM %s' % table
df_sql = pd.read_sql(sql_cmd, engine)
return df_sql
def read_csv(file):
df_csv = pd.read_csv(file)
return df_csvStep 2: DataFrame Calculations
Using a user information example, clean the data by removing rows with missing or erroneous city values, then compute value counts for the "city_num" column.
df = df[df['city_num'].notna()]
df = df.drop(df[df['city_num'] == 'error'].index)
city_counts = df['city_num'].value_counts()Generate a pie chart of the top 10 cities:
def pie_chart(df):
df = df[df['city_num'].notna()]
df = df.drop(df[df['city_num'] == 'error'].index)
df = df['city_num'].value_counts()
df.head(10).plot.pie(subplots=True, figsize=(5,6), autopct='%.2f%%', radius=1.2, startangle=250, legend=False)
pie_chart(read_csv('user_info.csv'))Alternatively, use pyecharts for a more attractive chart:
from pyecharts.charts import Pie
from pyecharts import options as opts
def echart_pie(user_df):
user_df = user_df[user_df['city_num'].notna()]
user_df = user_df.drop(user_df[user_df['city_num'] == 'error'].index)
user_df = user_df['city_num'].value_counts()
names = user_df.head(10).index.tolist()
values = user_df.head(10).values.tolist()
data = list(zip(names, values))
pie = Pie()
pie.add('', data)
pie.set_global_opts(title_opts=opts.TitleOpts(title='Top 10 Regions'))
return pie.render_notebook()
user_df = read_csv('user_info.csv')
echart_pie(user_df)Save the chart as an image:
plt.savefig('fig_cat.png')Step 3: Automatic Email Sending
Use smtplib and email modules to send the generated report and attachments.
import smtplib
from email import encoders
from email.header import Header
from email.utils import parseaddr, formataddr
from email.mime.application import MIMEApplication
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
asender = "[email protected]"
areceiver = "[email protected]"
acc = "[email protected]"
asubject = "谢谢关注"
from_addr = "[email protected]"
password = "####"
msg = MIMEMultipart()
msg['Subject'] = asubject
msg['to'] = areceiver
msg['Cc'] = acc
msg['from'] = "fanstuck"
body = "你好,欢迎关注fanstuck,您的关注就是我继续创作的动力!"
msg.attach(MIMEText(body, 'plain', 'utf-8'))
htmlFile = 'C:/Users/10799/problem.html'
html = MIMEApplication(open(htmlFile, 'rb').read())
html.add_header('Content-Disposition', 'attachment', filename='html')
msg.attach(html)
smtp_server = "smtp.163.com"
server = smtplib.SMTP(smtp_server, 25)
server.set_debuglevel(1)
server.login(from_addr, password)
server.sendmail(from_addr, areceiver.split(',') + acc.split(','), msg.as_string())
server.quit()Running the script sends the report, chart image, and HTML attachment to the specified recipients.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.