Pickling allows you to
store arbitrary objects on files and file-like objects, but
it’s still a fairly unstructured medium; it doesn’t directly support easy
access to members of collections of pickled objects. Higher-level
structures can be added to pickling, but they are not inherent:
You can sometimes craft your own higher-level pickle file
organizations with the underlying filesystem (e.g., you can store each
pickled object in a file whose name uniquely identifies the object),
but such an organization is not part of pickling itself and must be
manually managed.
You can also store arbitrarily large dictionaries in a pickled
file and index them by key after they are loaded back into memory, but
this will load and store the entire dictionary all at once when
unpickled and pickled, not just the entry you are interested
in.
Shelves provide structure for collections of pickled objects that
removes some of these constraints. They are a type of file that stores
arbitrary Python objects by key for later retrieval, and they are a
standard part of the Python system. Really, they are not much of a new
topic—shelves are simply a combination of the DBM files and object
pickling we just met:
To
store
an in-memory object by key, theshelve
module
first serializes the object to a string with thepickle
module, and then it stores
that string in a DBM file by key with thedbm
module.
To
fetch
an object back by key, theshelve
module first loads the
object’s serialized string by key from a DBM file with thedbm
module, and then converts it back to the
original in-memory object with thepickle
module.
Becauseshelve
usespickle
internally, it can store any object thatpickle
can: strings, numbers, lists,
dictionaries, cyclic objects, class instances, and more. Because shelve
usesdbm
internally, it
inherits all of that module’s capabilities, as well as its portability
constraints.
In other words,shelve
is
just a go-between; it serializes and deserializes objects
so that they can be placed in string-based DBM files. The net effect is
that shelves let you store nearly arbitrary Python objects on a file by
key and fetch them back later with the
same
key
.
Your scripts never see all of this interfacing, though. Like DBM
files, shelves provide an interface that looks like a dictionary that
must be opened. In fact, a shelve is simply a persistent dictionary of
persistent Python objects—the shelve dictionary’s content is
automatically mapped to a file on your computer so that it is retained
between program runs. This is quite a feat, but it’s simpler to your
code than it may sound. To gain access to a shelve, import the module
and open your file:
import shelve
dbase = shelve.open("mydbase")
Internally, Python opens a DBM file with the name
mydbase
, or creates it if it does not yet exist (it
uses the DBM'c'
input/output open
mode by default). Assigning to a shelve key stores an object:
dbase['key'] = object # store object
Internally, this assignment converts the object to a serialized
byte stream with pickling and stores it by key on a DBM file. Indexing a
shelve fetches a stored object:
value = dbase['key'] # fetch object
Internally, this index operation loads a string by key from a DBM
file and unpickles it into an in-memory object that is the same as the
object originally stored. Most dictionary operations are supported here,
too:
len(dbase) # number of items stored
dbase.keys() # stored item key index iterable
And except for a few fine points, that’s really all there is to
using a shelve. Shelves are processed with normal Python dictionary
syntax, so there is no new database API to learn. Moreover, objects
stored and fetched from shelves are normal Python objects; they do not
need to be instances of special classes or types to be stored away. That
is, Python’s persistence system is external to the persistent objects
themselves.
Table 17-2
summarizes these and other
commonly used shelve
operations.
Table 17-2. Shelve file operations
Python | Action | Description |
---|---|---|
| Import | Get |
| Open | Create or open an |
| Store | Create or change the |
| Fetch | Load the value for the |
| Size | Return the number of |
| Index | Fetch the stored keys |
| Query | See if there’s an entry |
| Delete | Remove the entry for |
| Iterate | Iterate over stored |
| Close | Manual close, not always |
Because shelves export a dictionary-like interface, too, this
table is almost identical to the DBM operation table. Here, though, the
module namedbm
is replaced byshelve
,open
calls do not require a secondc
argument, and stored values can be nearly
arbitrary kinds of objects, not just strings. Keys are still strings,
though (technically, keys are always astr
which is encoded to and frombytes
automatically per UTF-8), and you still
shouldclose
shelves explicitly after
making changes to be safe: shelves usedbm
internally, and
some underlying DBMs require closes to avoid data loss or
damage.
Recent changes
: Theshelve
module now has an optionalwriteback
argument; if passedTrue
, all entries fetched are cached in
memory, and written back to disk automatically at close time. This
obviates the need to manually reassign changed mutable entries to
flush them to disk, but can perform poorly if many items are
fetched—it may require a large amount of memory for the cache, and it
can make theclose
operation slow
since all fetched entries must be written back to disk (Python cannot
tell which of the objects may have been changed).
Besides allowing values to be arbitrary objects instead of just
strings, in Python 3.X the shelve interface differs from the DBM
interface in two subtler ways. First, thekeys
method returns an iterable
view
object (not a physical list). Second, the
values of
keys
are alwaysstr
in your code, notbytes
—on fetches, stores, deletes, and other
contexts, thestr
keys you use are
encoded to thebytes
expected by
DBM using the UTF-8 Unicode encoding. This means that unlikedbm
, you cannot usebytes
forshelve
keys in your code to employ arbitrary
encodings.
Shelve keys are also decoded frombytes
tostr
per UTF-8 whenever they are returned
from the shelve API (e.g., keys iteration). Stored
values
are always thebytes
object produced by the pickler to
represent a serialized object. We’ll see these behaviors in action in
the examples of this
section
.
Let’s run an interactive
session to experiment with shelve interfaces. As
mentioned, shelves are essentially just persistent dictionaries of
objects, which you open and close:
C:\...\PP4E\Dbase>python
>>>import shelve
>>>dbase = shelve.open("mydbase")
>>>object1 = ['The', 'bright', ('side', 'of'), ['life']]
>>>object2 = {'name': 'Brian', 'age': 33, 'motto': object1}
>>>dbase['brian'] = object2
>>>dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'}
>>>dbase.close()
Here, we open a shelve and store two fairly complex dictionary and
list data structures away permanently by simply assigning them to shelve
keys. Becauseshelve
usespickle
internally, almost anything goes
here—the trees of nested objects are automatically serialized into
strings for storage. To fetch them back, just reopen the shelve and
index:
C:\...\PP4E\Dbase>python
>>>import shelve
>>>dbase = shelve.open("mydbase")
>>>len(dbase)
# entries
2
>>>dbase.keys()
# index
KeysView()
>>>list(dbase.keys())
['brian', 'knight']
>>>dbase['knight']
# fetch
{'motto': 'Ni!', 'name': 'Knight'}
>>>for row in dbase.keys():
# .keys() is optional
...print(row, '=>')
...for field in dbase[row].keys():
...print(' ', field, '=', dbase[row][field])
...
brian =>
motto = ['The', 'bright', ('side', 'of'), ['life']]
age = 33
name = Brian
knight =>
motto = Ni!
name = Knight
The nested loops at the end of this session step through nested
dictionaries—the outer scans the shelve and the inner scans the objects
stored in the shelve (both could use key iterators and omit their.keys()
calls). The crucial point to
notice is that we’re using normal Python syntax, both to store and to
fetch these persistent objects, as well as to process them after
loading. It’s persistent Python data on disk.
One of the more useful kinds of
objects to store in a shelve is a class instance. Because
its attributes record state and its inherited methods define behavior,
persistent class objects effectively serve the roles of both database
records
and database-processing
programs
. We can also use the underlyingpickle
module to serialize instances to flat
files and other file-like objects (e.g., network sockets), but the
higher-levelshelve
module also gives
us a convenient keyed-access storage medium. For instance, consider the
simple class shown in
Example 17-2
, which is used to
model people in a hypothetical work scenario.
Example 17-2. PP4E\Dbase\person.py (version 1)
"a person object: fields + behavior"
class Person:
def __init__(self, name, job, pay=0):
self.name = name
self.job = job
self.pay = pay # real instance data
def tax(self):
return self.pay * 0.25 # computed on call
def info(self):
return self.name, self.job, self.pay, self.tax()
Nothing about this class suggests it will be used for database
records—it can be imported and used independent of external storage.
It’s easy to use it for a database’s records, though: we can make some
persistent objects from this class by simply creating instances as
usual, and then storing them by key on an opened shelve:
C:\...\PP4E\Dbase>python
>>>from person import Person
>>>bob = Person('bob', 'psychologist', 70000)
>>>emily = Person('emily', 'teacher', 40000)
>>>
>>>import shelve
>>>dbase = shelve.open('cast')
# make new shelve
>>>for obj in (bob, emily):
# store objects
...dbase[obj.name] = obj
# use name for key
...
>>>dbase.close()
# need for bsddb
Here we used the instance objects’name
attribute as their key in the shelve
database. When we come back and fetch these objects in a later Python
session or script, they are re-created in memory exactly as they were
when they were stored:
C:\...\PP4E\Dbase>python
>>>import shelve
>>>dbase = shelve.open('cast')
# reopen shelve
>>>
>>>list(dbase.keys())
# both objects are here
['bob', 'emily']
>>>print(dbase['emily'])
>>>
>>>print(dbase['bob'].tax())
# call: bob's tax
17500.0
Notice that calling Bob’stax
method works even though we didn’t import thePerson
class in this last session. Python is
smart enough to link this object back to its original class when
unpickled, such that all the original methods are available through
fetched objects.