Fundamentals 7 min read

Merging Multiple DataFrames and Handling Missing Values with pandas

This tutorial demonstrates how to combine separate student tables using pandas' merge function, explains the merge parameters, shows how to fill missing values with fillna, validates data for anomalies, and splits a phone column into separate area code and number columns.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Merging Multiple DataFrames and Handling Missing Values with pandas

The article addresses the common problem of consolidating two student tables—one with basic information and another with scores—into a single DataFrame using pandas in Python.

It shows that the merge method can directly join the tables on the ID column:

<code>import pandas as pd
import matplotlib.pyplot as plt
# Read the two sheets from an Excel workbook
students = pd.read_excel('students.xlsx', sheet_name='student')
scores   = pd.read_excel('students.xlsx', sheet_name='score')
# Merge the tables on the ID column
table = students.merge(scores, on='ID')
print(table)</code>

An equivalent call using pd.merge with the how='left' option is also provided.

<code>table = pd.merge(students, scores, how='left', on='ID')</code>

The article then explains the key parameters of merge such as left , right , how , on , left_on , right_on , sort , suffixes , and copy , describing their roles in the join operation.

After merging, missing values (NaN) may appear; the tutorial demonstrates using fillna to replace them, e.g., with zero:

<code>table = students.merge(scores, how='left', on='ID').fillna(0)</code>

The fillna function’s signature and parameter meanings ( value , method , axis , inplace , limit , downcast ) are also described.

For data validation, a simple function score_validation is defined to flag scores outside the 0‑100 range, and it is applied row‑wise with students.apply(score_validation, axis=1) to identify erroneous records.

<code>def score_validation(row):
    if not 0 <= row['成绩'] <= 100:
        print(f"#%s\t student %s has an invalid score %s" % (row.ID, row['姓名'], row['成绩']))
students.apply(score_validation, axis=1)</code>

The article then presents another common task: splitting a telephone column into separate area‑code and phone‑number columns using the pandas str.split method with expand=True :

<code>import pandas as pd
# Read the Excel file
datas = pd.read_excel('Tel.xlsx')
# Split the Tel column
df = datas['Tel'].str.split('-', expand=True)
datas['区号'] = df[0]
datas['电话'] = df[1]
datas.to_excel('Tel.xlsx')
print(datas)</code>

The split function’s arguments ( sep , n , expand ) are briefly explained.

Overall, the guide provides practical code snippets and explanations for merging DataFrames, handling missing data, validating records, and column splitting, which are fundamental data‑processing techniques in Python.

mergedataframedata validationpandasfillnasplit
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.