Programming Python (173 page)

Shelve Files

Pickling allows you to
store arbitrary objects on files and file-like objects, but
it’s still a fairly unstructured medium; it doesn’t directly support easy
access to members of collections of pickled objects. Higher-level
structures can be added to pickling, but they are not inherent:

You can sometimes craft your own higher-level pickle file
organizations with the underlying filesystem (e.g., you can store each
pickled object in a file whose name uniquely identifies the object),
but such an organization is not part of pickling itself and must be
manually managed.
You can also store arbitrarily large dictionaries in a pickled
file and index them by key after they are loaded back into memory, but
this will load and store the entire dictionary all at once when
unpickled and pickled, not just the entry you are interested
in.

Shelves provide structure for collections of pickled objects that
removes some of these constraints. They are a type of file that stores
arbitrary Python objects by key for later retrieval, and they are a
standard part of the Python system. Really, they are not much of a new
topic—shelves are simply a combination of the DBM files and object
pickling we just met:

To
store
an in-memory object by key, the
shelvemodule
first serializes the object to a string with the
picklemodule, and then it stores
that string in a DBM file by key with the
dbmmodule.
To
fetch
an object back by key, the
shelvemodule first loads the
object’s serialized string by key from a DBM file with the
dbmmodule, and then converts it back to the
original in-memory object with the
picklemodule.

Because
shelveuses
pickleinternally, it can store any object that
picklecan: strings, numbers, lists,
dictionaries, cyclic objects, class instances, and more. Because shelve
uses
dbminternally, it
inherits all of that module’s capabilities, as well as its portability
constraints.

Using Shelves

In other words,
shelveis
just a go-between; it serializes and deserializes objects
so that they can be placed in string-based DBM files. The net effect is
that shelves let you store nearly arbitrary Python objects on a file by
key and fetch them back later with the
same
key
.

Your scripts never see all of this interfacing, though. Like DBM
files, shelves provide an interface that looks like a dictionary that
must be opened. In fact, a shelve is simply a persistent dictionary of
persistent Python objects—the shelve dictionary’s content is
automatically mapped to a file on your computer so that it is retained
between program runs. This is quite a feat, but it’s simpler to your
code than it may sound. To gain access to a shelve, import the module
and open your file:

import shelve
dbase = shelve.open("mydbase")

Internally, Python opens a DBM file with the name
mydbase
, or creates it if it does not yet exist (it
uses the DBM
'c'input/output open
mode by default). Assigning to a shelve key stores an object:

dbase['key'] = object      # store object

Internally, this assignment converts the object to a serialized
byte stream with pickling and stores it by key on a DBM file. Indexing a
shelve fetches a stored object:

value = dbase['key']       # fetch object

Internally, this index operation loads a string by key from a DBM
file and unpickles it into an in-memory object that is the same as the
object originally stored. Most dictionary operations are supported here,
too:

len(dbase)                 # number of items stored
dbase.keys()               # stored item key index iterable

And except for a few fine points, that’s really all there is to
using a shelve. Shelves are processed with normal Python dictionary
syntax, so there is no new database API to learn. Moreover, objects
stored and fetched from shelves are normal Python objects; they do not
need to be instances of special classes or types to be stored away. That
is, Python’s persistence system is external to the persistent objects
themselves.
Table 17-2
summarizes these and other
commonly used shelve
operations.

Table 17-2. Shelve file operations

Python code	Action	Description
`import shelve`	Import	Get `bsddb`, `gdbm`, and so on…whatever is installed
`file=shelve.open('filename')`	Open	Create or open an existing shelve’s DBM file
`file['key'] = anyvalue`	Store	Create or change the entry for `key`
`value = file['key']`	Fetch	Load the value for the entry `key`
`count = len(file)`	Size	Return the number of entries stored
`index = file.keys()`	Index	Fetch the stored keys list (an iterable view)
`found = 'key' in file`	Query	See if there’s an entry for `key`
`del file['key']`	Delete	Remove the entry for `key`
`for key in file:`	Iterate	Iterate over stored keys
`file.close()`	Close	Manual close, not always needed

Because shelves export a dictionary-like interface, too, this
table is almost identical to the DBM operation table. Here, though, the
module name
dbmis replaced by
shelve,
opencalls do not require a second
cargument, and stored values can be nearly
arbitrary kinds of objects, not just strings. Keys are still strings,
though (technically, keys are always a
strwhich is encoded to and from
bytesautomatically per UTF-8), and you still
should
closeshelves explicitly after
making changes to be safe: shelves use
dbminternally, and
some underlying DBMs require closes to avoid data loss or
damage.

Note

Recent changes
: The
shelvemodule now has an optional
writebackargument; if passed
True, all entries fetched are cached in
memory, and written back to disk automatically at close time. This
obviates the need to manually reassign changed mutable entries to
flush them to disk, but can perform poorly if many items are
fetched—it may require a large amount of memory for the cache, and it
can make the
closeoperation slow
since all fetched entries must be written back to disk (Python cannot
tell which of the objects may have been changed).

Besides allowing values to be arbitrary objects instead of just
strings, in Python 3.X the shelve interface differs from the DBM
interface in two subtler ways. First, the
keysmethod returns an iterable
view
object (not a physical list). Second, the
values of
keys
are always
strin your code, not
bytes—on fetches, stores, deletes, and other
contexts, the
strkeys you use are
encoded to the
bytesexpected by
DBM using the UTF-8 Unicode encoding. This means that unlike
dbm, you cannot use
bytesfor
shelvekeys in your code to employ arbitrary
encodings.

Shelve keys are also decoded from
bytesto
strper UTF-8 whenever they are returned
from the shelve API (e.g., keys iteration). Stored
values
are always the
bytesobject produced by the pickler to
represent a serialized object. We’ll see these behaviors in action in
the examples of this
section
.

Storing Built-in Object Types in Shelves

Let’s run an interactive
session to experiment with shelve interfaces. As
mentioned, shelves are essentially just persistent dictionaries of
objects, which you open and close:

C:\...\PP4E\Dbase>
python
>>>
import shelve
>>>
dbase = shelve.open("mydbase")
>>>
object1 = ['The', 'bright', ('side', 'of'), ['life']]
>>>
object2 = {'name': 'Brian', 'age': 33, 'motto': object1}
>>>
dbase['brian']  = object2
>>>
dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'}
>>>
dbase.close()

Here, we open a shelve and store two fairly complex dictionary and
list data structures away permanently by simply assigning them to shelve
keys. Because
shelveuses
pickleinternally, almost anything goes
here—the trees of nested objects are automatically serialized into
strings for storage. To fetch them back, just reopen the shelve and
index:

C:\...\PP4E\Dbase>
python
>>>
import shelve
>>>
dbase = shelve.open("mydbase")
>>>
len(dbase)
# entries
2
>>>
dbase.keys()
# index
KeysView()
>>>
list(dbase.keys())
['brian', 'knight']
>>>
dbase['knight']
# fetch
{'motto': 'Ni!', 'name': 'Knight'}
>>>
for row in dbase.keys():
# .keys() is optional
...
print(row, '=>')
...
for field in dbase[row].keys():
...
print('  ', field, '=', dbase[row][field])
...
brian =>
motto = ['The', 'bright', ('side', 'of'), ['life']]
age = 33
name = Brian
knight =>
motto = Ni!
name = Knight

The nested loops at the end of this session step through nested
dictionaries—the outer scans the shelve and the inner scans the objects
stored in the shelve (both could use key iterators and omit their
.keys()calls). The crucial point to
notice is that we’re using normal Python syntax, both to store and to
fetch these persistent objects, as well as to process them after
loading. It’s persistent Python data on disk.

Storing Class Instances in Shelves

One of the more useful kinds of
objects to store in a shelve is a class instance. Because
its attributes record state and its inherited methods define behavior,
persistent class objects effectively serve the roles of both database
records
and database-processing
programs
. We can also use the underlying
picklemodule to serialize instances to flat
files and other file-like objects (e.g., network sockets), but the
higher-level
shelvemodule also gives
us a convenient keyed-access storage medium. For instance, consider the
simple class shown in
Example 17-2
, which is used to
model people in a hypothetical work scenario.

Example 17-2. PP4E\Dbase\person.py (version 1)

"a person object: fields + behavior"
class Person:
def __init__(self, name, job, pay=0):
self.name = name
self.job  = job
self.pay  = pay               # real instance data
def tax(self):
return self.pay * 0.25        # computed on call
def info(self):
return self.name, self.job, self.pay, self.tax()

Nothing about this class suggests it will be used for database
records—it can be imported and used independent of external storage.
It’s easy to use it for a database’s records, though: we can make some
persistent objects from this class by simply creating instances as
usual, and then storing them by key on an opened shelve:

C:\...\PP4E\Dbase>
python
>>>
from person import Person
>>>
bob   = Person('bob', 'psychologist', 70000)
>>>
emily = Person('emily', 'teacher', 40000)
>>>
>>>
import shelve
>>>
dbase = shelve.open('cast')
# make new shelve
>>>
for obj in (bob, emily):
# store objects
...
dbase[obj.name] = obj
# use name for key
...
>>>
dbase.close()
# need for bsddb

Here we used the instance objects’
nameattribute as their key in the shelve
database. When we come back and fetch these objects in a later Python
session or script, they are re-created in memory exactly as they were
when they were stored:

C:\...\PP4E\Dbase>
python
>>>
import shelve
>>>
dbase = shelve.open('cast')
# reopen shelve
>>>
>>>
list(dbase.keys())
# both objects are here
['bob', 'emily']
>>>
print(dbase['emily'])

>>>
>>>
print(dbase['bob'].tax())
# call: bob's tax
17500.0

Notice that calling Bob’s
taxmethod works even though we didn’t import the
Personclass in this last session. Python is
smart enough to link this object back to its original class when
unpickled, such that all the original methods are available through
fetched objects.

Other books

A Secret Proposal by Bowman, Valerie

The Pigman by Zindel, Paul

The Book by M. Clifford

Hell's Hollow by Stone, Summer

Second Son of a Duke by Gwen Hayes

Our Own Country: A Novel (The Midwife Series) by Jodi Daynard

A French Affair by Susan Lewis

Before Sunrise by Diana Palmer

Alien's Bride Book Three by Yamila Abraham

Siempre tuyo by Daniel Glattauer