Fundamentals 20 min read

Comprehensive Guide to Python File Operations, CSV Handling, and Data Serialization

This article provides a thorough tutorial on Python file I/O, covering opening and closing files, mode flags, absolute and relative paths, reading and writing techniques, file copying, CSV read/write, in‑memory streams, sys module redirection, and serialization with JSON and pickle, including code examples and best practices.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Comprehensive Guide to Python File Operations, CSV Handling, and Data Serialization

1. Opening and Closing Files

The built‑in open() function opens a file and returns a file handle (e.g., f1 ). You can specify the file path, mode (default 'r' for read), and encoding (default depends on OS).

<code>f1 = open(r'd:\test_file.txt', mode='r', encoding='utf-8')
content = f1.read()
print(content)
f1.close()</code>

Using the context manager automatically closes the file:

<code>with open(r'd:\test_file.txt', mode='r', encoding='utf-8') as f1:
    content = f1.read()
    print(content)</code>

open() is a built‑in function that ultimately calls the operating‑system API.

The file handle (e.g., f1 , fh , file_handler ) is used for all subsequent operations.

encoding defaults to the OS setting (Windows GBK, Linux/macOS UTF‑8).

mode defaults to 'r' (read‑only).

f1.close() explicitly closes the handle.

Using with open() avoids manual closing.

Absolute and Relative Paths

Absolute path: full location, e.g., C:/Users/chris/AppData/Local/Programs/Python/Python37/python.exe .

Relative path: starts from the current directory, e.g., test.txt , ./test.txt , ../test.txt , demo/test.txt .

Three ways to write Windows paths: \ – file = open('C:\Users\chris\Desktop\file.txt') Raw string r'\' – file = open(r'C:\Users\chris\Desktop\file.txt') Forward slash (recommended) – file = open('C:/Users/chris/Desktop/file.txt')

Common File Modes

Text modes:

<code>r  # read‑only (default, file must exist)
w  # write‑only (creates or truncates)
a  # append‑only (creates if missing)
</code>

Binary modes (use b ):

<code>rb  # read binary
wb  # write binary
ab  # append binary
</code>

Read‑write modes with + :

<code>r+  # read/write, pointer starts at beginning
w+  # write/read, creates/truncates then reads
a+  # append/read, creates if missing
</code>

2. Reading and Writing Files

Reading

Assume a file read.txt contains several lines.

<code>f1 = open('read.txt', encoding='utf-8')
content = f1.read()
print(content, type(content))
f1.close()
</code>

read(n) reads n characters (text mode) or bytes (binary mode):

<code>f1 = open('read.txt', encoding='utf-8')
content = f1.read(6)
print(content)  # prints first 6 characters
f1.close()
</code>

readline() reads a single line, readlines() returns a list of all lines (may consume much memory for large files).

<code>f1 = open('read.txt', encoding='utf-8')
print(f1.readline().strip())
print(f1.readline())
f1.close()
</code>

Iterating over the file object reads line by line, which is memory‑efficient:

<code>f1 = open('read.txt', encoding='utf-8')
for line in f1:
    print(line.strip())
f1.close()
</code>

Writing

Opening a file in 'w' creates it if missing or truncates it if it exists.

<code>f1 = open('write.txt', encoding='utf-8', mode='w')
f1.write('Lucy is awesome')
f1.close()
</code>

Binary write ( 'wb' ) is used for non‑text data such as images:

<code>f1 = open('image.png', mode='rb')
content = f1.read()
f1.close()

f2 = open('copy.png', mode='wb')
f2.write(content)
f2.close()
</code>

File Pointer Positioning

tell() returns the current offset; seek(offset, whence) moves the pointer ( whence : 0‑start, 1‑current, 2‑end).

<code>f = open('test.txt')
print(f.read(10))
print(f.tell())

f.seek(2, 0)   # from start, skip 2 bytes
print(f.read())

f.seek(1, 1)   # from current, skip 1 byte
print(f.read())

f.seek(-4, 2)  # from end, go back 4 bytes
print(f.read())

f.close()
</code>

3. Implementing a File‑Copy Function

<code>import os
file_name = input('Enter a file path:')
if os.path.isfile(file_name):
    old_file = open(file_name, 'rb')
    base, ext = os.path.splitext(file_name)
    new_file_name = base + '.bak' + ext
    new_file = open(new_file_name, 'wb')
    while True:
        content = old_file.read(1024)
        new_file.write(content)
        if not content:
            break
    new_file.close()
    old_file.close()
else:
    print('The file you entered does not exist')
</code>

4. CSV File Read/Write

CSV stores tabular data as plain text with commas separating fields and newlines separating rows.

<code>name,age,score
zhangsan,18,98
lisi,20,99
wangwu,17,90
jerry,19,95
</code>

Writing CSV with the csv module:

<code>import csv
file = open('test.csv', 'w')
writer = csv.writer(file)
writer.writerow(['name', 'age', 'score'])
writer.writerows([
    ['zhangsan', '18', '98'],
    ['lisi', '20', '99'],
    ['wangwu', '17', '90'],
    ['jerry', '19', '95']
])
file.close()
</code>

Reading CSV:

<code>import csv
file = open('test.csv', 'r')
reader = csv.reader(file)
for row in reader:
    print(row)
file.close()
</code>

5. Writing Data to Memory

StringIO works like a file object for text data stored in memory:

<code>from io import StringIO
f = StringIO()
f.write('hello\r\n')
f.write('good')
print(f.getvalue())
f.close()
</code>

BytesIO does the same for binary data:

<code>from io import BytesIO
f = BytesIO()
f.write('你好\r\n'.encode('utf-8'))
f.write('中国'.encode('utf-8'))
print(f.getvalue())
f.close()
</code>

6. Using the sys Module

sys.stdin reads user input; input() is a wrapper around it.

<code>import sys
s_in = sys.stdin
while True:
    content = s_in.readline().rstrip('\n')
    if content == '':
        break
    print(content)
</code>

Redirecting standard output to a file:

<code>import sys
m = open('stdout.txt', 'w', encoding='utf8')
sys.stdout = m
print('hello')
print('yes')
print('good')
m.close()
</code>

Redirecting standard error produces a traceback file:

<code>import sys
x = open('stderr.txt', 'w', encoding='utf8')
sys.stderr = x
print(1/0)
x.close()
</code>

7. Serialization and Deserialization

To store complex Python objects, they must be serialized. Two common formats are JSON (text) and pickle (binary).

JSON Module

Convert Python objects to JSON strings with json.dumps() or directly write with json.dump() :

<code>import json
names = ['zhangsan', 'lisi', 'wangwu', 'jerry', 'henry', 'merry', 'chris']
result = json.dumps(names)
file = open('names.txt', 'w')
file.write(result)
file.close()

# or using dump
file = open('names.txt', 'w')
json.dump(names, file)
file.close()
</code>

Load JSON strings back to Python objects with json.loads() or json.load() :

<code>result = json.loads('["zhangsan", "lisi", "wangwu"]')
print(type(result))  # <class 'list'>

file = open('names.txt', 'r')
result = json.load(file)
print(result)
file.close()
</code>

Custom objects require a custom encoder:

<code>import json
class MyEncode(json.JSONEncoder):
    def default(self, o):
        return o.__dict__

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

p1 = Person('zhangsan', 18)
result = json.dumps(p1, cls=MyEncode)
print(result)  # {"name": "zhangsan", "age": 18}
</code>

pickle Module

pickle serializes Python objects to binary data.

<code>import pickle
names = ['张三', '李四', '杰克', '亨利']
# dumps returns binary
b_names = pickle.dumps(names)
file = open('names.txt', 'wb')
file.write(b_names)
file.close()

# dump writes directly to a file
file2 = open('names.txt', 'wb')
pickle.dump(names, file2)
file2.close()

# loads restores from binary
file1 = open('names.txt', 'rb')
x = file1.read()
y = pickle.loads(x)
print(y)
file1.close()

# load reads and restores directly
file3 = open('names.txt', 'rb')
z = pickle.load(file3)
print(z)
</code>

JSON vs. pickle

JSON produces human‑readable text and is suitable for cross‑platform data exchange; only basic Python types are supported.

pickle produces binary data that preserves the exact Python object structure but is Python‑specific and not safe for untrusted sources.

Both modules provide dump/dumps for serialization and load/loads for deserialization.

SerializationJSONCSVPicklefile-iopathlib
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.