Fundamentals 7 min read

Merging Multiple DataFrames and Handling Missing Values with pandas

This tutorial demonstrates how to combine separate student tables using pandas' merge function, explains the merge parameters, shows how to fill missing values with fillna, validates data for anomalies, and splits a phone column into separate area code and number columns.

Python Programming Learning Circle

May 26, 2020

Merging Multiple DataFrames and Handling Missing Values with pandas

The article addresses the common problem of consolidating two student tables—one with basic information and another with scores—into a single DataFrame using pandas in Python.

It shows that the merge method can directly join the tables on the ID column:

import pandas as pd
import matplotlib.pyplot as plt
# Read the two sheets from an Excel workbook
students = pd.read_excel('students.xlsx', sheet_name='student')
scores   = pd.read_excel('students.xlsx', sheet_name='score')
# Merge the tables on the ID column
table = students.merge(scores, on='ID')
print(table)

An equivalent call using pd.merge with the how='left' option is also provided.

table = pd.merge(students, scores, how='left', on='ID')

The article then explains the key parameters of merge such as left, right, how, on, left_on, right_on, sort, suffixes, and copy, describing their roles in the join operation.

After merging, missing values (NaN) may appear; the tutorial demonstrates using fillna to replace them, e.g., with zero:

table = students.merge(scores, how='left', on='ID').fillna(0)

The fillna function’s signature and parameter meanings ( value, method, axis, inplace, limit, downcast) are also described.

For data validation, a simple function score_validation is defined to flag scores outside the 0‑100 range, and it is applied row‑wise with students.apply(score_validation, axis=1) to identify erroneous records.

def score_validation(row):
    if not 0 <= row['成绩'] <= 100:
        print(f"#%s\t student %s has an invalid score %s" % (row.ID, row['姓名'], row['成绩']))
students.apply(score_validation, axis=1)

The article then presents another common task: splitting a telephone column into separate area‑code and phone‑number columns using the pandas str.split method with expand=True:

import pandas as pd
# Read the Excel file
datas = pd.read_excel('Tel.xlsx')
# Split the Tel column
df = datas['Tel'].str.split('-', expand=True)
datas['区号'] = df[0]
datas['电话'] = df[1]
datas.to_excel('Tel.xlsx')
print(datas)

The split function’s arguments ( sep, n, expand) are briefly explained.

Overall, the guide provides practical code snippets and explanations for merging DataFrames, handling missing data, validating records, and column splitting, which are fundamental data‑processing techniques in Python.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

merge dataframe data validation pandas fillna split

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.