Fundamentals 3 min read

Python Script for Merging and Deduplicating CSV Files

This article presents a Python script that merges multiple CSV files from a specified directory and removes duplicate rows using pandas, providing a practical solution for test case management.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Python Script for Merging and Deduplicating CSV Files

This article presents a Python script that merges multiple CSV files from a specified directory and removes duplicate rows using pandas, providing a practical solution for test case management.

The script requires two Python modules: pandas for data manipulation and glob for file pattern matching. Installation is done via pip3 install pandas and pip3 install glob.

The complete Python code is provided with UTF-8 encoding and includes detailed comments. The script performs two main functions: hebing() merges all CSV files found in the specified directory by reading each file with GBK encoding and appending the data to a new CSV file without headers or index columns; quchong() removes duplicate rows from the merged file by reading it with pandas, dropping duplicates, and saving the cleaned data back to the same file.

The script is designed for test case management scenarios where multiple CSV files need to be consolidated and deduplicated. It prints status messages during execution to indicate progress and completion of each step.

# _*_ coding:utf-8 _*_
'''
csv文件的合并和去重
主要是针对测试用例增加使用此脚本
'''
import pandas as pd
import glob
#输出文件
outputfile = '/XXX/new.csv'
#合并csv的文件夹
csv_list = glob.glob('/XXX/*.csv')
print(u'共发现%s个CSV文件' % len(csv_list))
print(u'正在处理............')

def hebing():
    for inputfile in csv_list:
        f = open(inputfile,encoding='gbk')
        data = pd.read_csv(f)
        data.to_csv(outputfile, mode='a', index=False, header=None)
    print('完成合并')

def quchong(file):
    df = pd.read_csv(file, header=0)
    datalist = df.drop_duplicates()
    datalist.to_csv(file)
    print('完成去重')

if __name__ == '__main__':
    hebing()
    quchong(outputfile)
Pythonpandasdata deduplicationtest case managementCSV processing
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.