Pickling Objects with Python

How to preserve Python objects when saving them to disk, cache or other storage.

pickle is a Python module that can be used to store and retrieve Python objects outside of the current Python runtime environment. For example, if you want to save an object to a file on disk or your cache and retrieve it at a later time.

It does this by serializing and de-serializing a Python object into a byte stream that can be stored as text. Let’s take a look at an example:

Example

Say we have a simple Person class in a module called person.py:

class Person():
    
    def __init__(self, name):
        self.name = name

    def intro(self):
        print("I'm pickled %s!" % self.name)

Now let’s import the pickle module and the Person class into a main.py module. In this module we’ll create an object instance of the Person class and pickle that object into a text file. Finally, we’ll open the text file again and unpickle the object.

import pickle
from person import Person


# Create a new person object
person = Person('Rick')

# Pickle the object into a text file
file = open('file.txt', 'wb')
pickle.dump(person, file)
file.close()

# Open file and unpickle data back into a Python object
new_file = open('file.txt', 'r')
p = pickle.load(new_file)

# Use the intact object's method
p.intro()

Now when we run main.py from the command line we’ll see that the object’s intro()method and name attribute can still be used after the object has been unpickled:

$ python3 main.py
I'm pickled Rick!

pickle vs JSON

You may also have noticed from the look of the pickle methods that it seems quite similar to Python’s json module. So you might be wondering what the difference is between them and when it might be best to use each of them.

JSON is a very common, human-readable, text format that is used in many other programming languages, pickle is not a human-readable format and is specific to Python only. Therefore, if you are going to be sharing data between systems built with different languages it may be more convenient to use JSON as most programming languages will have a standard library for handling JSON conveniently.

However, the json module can only handle a limited amount of Python types whereas pickle supports a much larger set of types including custom classes.

Security

Finally, just remember that while de-serializing JSON data with the json module is safe, you should never unpickle untrusted data as it is possible for this to contain malicious pickled data that can execute arbitrary code when it is unpickled.


Soumyajit Pathak picture

Thanks. This is a great little quick tip.

Rajarshi Bandopadhyay picture

Come again? A pickle can execute code when it is unpickled? Can I get some decent references about this? I would like to read more about how this is done.

Daniel Fernsby picture

Yeah, there’s a warning in the official Python docs about it. Here’s an example of an exploit.

So while it may seem like a convenient module, in some cases, it should be used with caution.

Rajarshi Bandopadhyay picture

Thank you. That example was a fantastic read. I checked it myself, and yes, it appears that the __reduce__ method is a Jupiter-sized opening in the defences of any software using a pickle.load. I am quite surprised at the sheer ease with which the module allows someone to run a function on someone else's computer. Even without the hack involving Subprocess, this is quite risky.