13.1.5.3 Pickling and unpickling external objects

For the benefit of object persistence, the pickle module supports the notion of a reference to an object outside the pickled data stream. Such objects are referenced by a ``persistent id'', which is just an arbitrary string of printable ASCII characters. The resolution of such names is not defined by the pickle module; it will delegate this resolution to user defined functions on the pickler and unpickler.13.7

To define external persistent id resolution, you need to set the persistent_id attribute of the pickler object and the persistent_load attribute of the unpickler object.

To pickle objects that have an external persistent id, the pickler must have a custom persistent_id() method that takes an object as an argument and returns either None or the persistent id for that object. When None is returned, the pickler simply pickles the object as normal. When a persistent id string is returned, the pickler will pickle that string, along with a marker so that the unpickler will recognize the string as a persistent id.

To unpickle external objects, the unpickler must have a custom persistent_load() function that takes a persistent id string and returns the referenced object.

Here's a silly example that might shed more light:

import pickle
from cStringIO import StringIO

src = StringIO()
p = pickle.Pickler(src)

def persistent_id(obj):
    if hasattr(obj, 'x'):
        return 'the value %d' % obj.x
    else:
        return None

p.persistent_id = persistent_id

class Integer:
    def __init__(self, x):
        self.x = x
    def __str__(self):
        return 'My name is integer %d' % self.x

i = Integer(7)
print i
p.dump(i)

datastream = src.getvalue()
print repr(datastream)
dst = StringIO(datastream)

up = pickle.Unpickler(dst)

class FancyInteger(Integer):
    def __str__(self):
        return 'I am the integer %d' % self.x

def persistent_load(persid):
    if persid.startswith('the value '):
        value = int(persid.split()[2])
        return FancyInteger(value)
    else:
        raise pickle.UnpicklingError, 'Invalid persistent id'

up.persistent_load = persistent_load

j = up.load()
print j

In the cPickle module, the unpickler's persistent_load attribute can also be set to a Python list, in which case, when the unpickler reaches a persistent id, the persistent id string will simply be appended to this list. This functionality exists so that a pickle data stream can be ``sniffed'' for object references without actually instantiating all the objects in a pickle.13.8 Setting persistent_load to a list is usually used in conjunction with the noload() method on the Unpickler.



Footnotes

... unpickler.13.7
The actual mechanism for associating these user defined functions is slightly different for pickle and cPickle. The description given here works the same for both implementations. Users of the pickle module could also use subclassing to effect the same results, overriding the persistent_id() and persistent_load() methods in the derived classes.
... pickle.13.8
We'll leave you with the image of Guido and Jim sitting around sniffing pickles in their living rooms.
See About this document... for information on suggesting changes.