Object Persistence in Python Using Pickle and Related Techniques
This article explains Python object persistence, covering the concepts of serialization with pickle and cPickle, various storage mechanisms, handling of complex objects, reference cycles, class instance pickling, versioning strategies, and advanced techniques such as custom state methods and Pickler/Unpickler usage.
What is Persistence?
Persistence means keeping objects alive across multiple executions of a program, typically by storing them on disk for later retrieval. Various methods exist, each with pros and cons, such as text files (CSV), relational databases (MySQL, PostgreSQL), and object‑oriented stores.
Object Persistence
Python provides the pickle module (and its faster C implementation cPickle ) to serialize arbitrary objects to strings, files, or file‑like objects and to reconstruct them later. Pickle can be used directly or via higher‑level object databases like ZODB or PyPerSyst.
Some Pickled Python Objects
The pickle and cPickle modules expose functions such as dumps() , loads() , dump() , and load() . By default they produce printable ASCII representations, but with the optional True flag they generate a more compact binary format. The functions automatically detect the format when loading.
<code>>> import cPickle as pickle
>>> t1 = ('this is a string', 42, [1, 2, 3], None)
>>> p1 = pickle.dumps(t1)
>>> t2 = pickle.loads(p1)
>>> p2 = pickle.dumps(t1, True)
>>> t3 = pickle.loads(p2)</code>Using dump() and load() allows multiple objects to be stored sequentially in a single file.
<code>>> a1 = 'apple'
>>> b1 = {1: 'One', 2: 'Two', 3: 'Three'}
>>> c1 = ['fee', 'fie', 'foe', 'fum']
>>> f1 = file('temp.pkl', 'wb')
>>> pickle.dump(a1, f1, True)
>>> pickle.dump(b1, f1, True)
>>> pickle.dump(c1, f1, True)
>>> f1.close()
>>> f2 = file('temp.pkl', 'rb')
>>> a2 = pickle.load(f2)
>>> b2 = pickle.load(f2)
>>> c2 = pickle.load(f2)</code>Pickle Power
Pickle handles complex objects, reference cycles, and recursive structures, preserving object identity within a single pickled graph.
<code>>> l = [1, 2, 3]
>>> l.append(l)
>>> p = pickle.dumps(l)
>>> l2 = pickle.loads(p)
>>> l2
[1, 2, 3, [...]]</code>Separate pickling of objects can break shared references unless a Pickler is used to track them.
<code>>> f = file('temp.pkl', 'w')
>>> pickler = pickle.Pickler(f)
>>> pickler.dump(a)
>>> pickler.dump(b)
>>> f.close()
>>> f = file('temp.pkl', 'r')
>>> unpickler = pickle.Unpickler(f)
>>> c = unpickler.load()
>>> d = unpickler.load()
>>> c[2] is d
True</code>Unpicklable Objects
File objects and other resources cannot be pickled directly; attempting to do so raises a TypeError .
<code>>> f = file('temp.pkl', 'w')
>>> pickle.dumps(f)
TypeError: can't pickle file objects</code>Class Instances
When pickling class instances, only the instance data and the fully‑qualified class name are stored; the class code itself is not. Upon unpickling, Python imports the module containing the class. Custom _getstate_() and _setstate_() methods allow control over what gets serialized, useful for handling unpicklable attributes such as open files.
<code>class Foo(object):
def __init__(self, value, filename):
self.value = value
self.logfile = file(filename, 'w')
def __getstate__(self):
f = self.logfile
return (self.value, f.name, f.tell())
def __setstate__(self, state):
self.value, name, position = state
f = file(name, 'w')
f.seek(position)
self.logfile = f</code>Pattern Improvements
When class definitions evolve (renaming classes, adding/removing attributes, moving modules), custom _setstate_() logic can migrate old pickles to the new structure, preserving compatibility.
<code>def __setstate__(self, state):
if 'fullname' not in state:
first = state.get('firstname', '')
last = state.get('lastname', '')
self.fullname = " ".join([first, last]).strip()
state.pop('firstname', None)
state.pop('lastname', None)
self.__dict__.update(state)</code>Conclusion
Object persistence in Python relies on the language’s serialization capabilities, with pickle providing a robust foundation for storing and retrieving Python objects across program executions.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.