pickle is a Python module that can be used to store and retrieve Python objects outside of the current Python runtime environment. For example, if you want to save an object to a file on disk or your cache and retrieve it at a later time.
It does this by serializing and de-serializing a Python object into a byte stream that can be stored as text. Let’s take a look at an example:
Example
Say we have a simple Person class in a module called person.py
:
class Person():
def __init__(self, name):
self.name = name
def intro(self):
print("I'm pickled %s!" % self.name)
Now let’s import the pickle
module and the Person
class into a main.py
module. In this module we’ll create an object instance of the Person class and pickle that object into a text file. Finally, we’ll open the text file again and unpickle the object.
import pickle
from person import Person
# Create a new person object
person = Person('Rick')
# Pickle the object into a text file
file = open('file.txt', 'wb')
pickle.dump(person, file)
file.close()
# Open file and unpickle data back into a Python object
new_file = open('file.txt', 'r')
p = pickle.load(new_file)
# Use the intact object's method
p.intro()
Now when we run main.py
from the command line we’ll see that the object’s intro()
method and name
attribute can still be used after the object has been unpickled:
$ python3 main.py
I'm pickled Rick!
pickle vs JSON
You may also have noticed from the look of the pickle
methods that it seems quite similar to Python’s json
module. So you might be wondering what the difference is between them and when it might be best to use each of them.
JSON is a very common, human-readable, text format that is used in many other programming languages, pickle
is not a human-readable format and is specific to Python only. Therefore, if you are going to be sharing data between systems built with different languages it may be more convenient to use JSON as most programming languages will have a standard library for handling JSON conveniently.
However, the json
module can only handle a limited amount of Python types whereas pickle
supports a much larger set of types including custom classes.
Security
Finally, just remember that while de-serializing JSON data with the json
module is safe, you should never unpickle untrusted data as it is possible for this to contain malicious pickled data that can execute arbitrary code when it is unpickled.
Thanks. This is a great little quick tip.
Come again? A pickle can execute code when it is unpickled? Can I get some decent references about this? I would like to read more about how this is done.
Yeah, there’s a warning in the official Python docs about it. Here’s an example of an exploit.
So while it may seem like a convenient module, in some cases, it should be used with caution.
Thank you. That example was a fantastic read. I checked it myself, and yes, it appears that the
__reduce__
method is a Jupiter-sized opening in the defences of any software using apickle.load
. I am quite surprised at the sheer ease with which the module allows someone to run a function on someone else's computer. Even without the hack involvingSubprocess
, this is quite risky.