The permutation
utilities of the prior section are useful in a variety of
applications, but there are even more fundamental operations that might
seem prime candidates for automation. Reversing and sorting collections of
values, for example, are core operations in a broad range of programs. To
demonstrate coding techniques, and to provide examples that yield closure
for a recurring theme of this chapter, let’s take a quick look at both of
these in turn.
Reversal of collections can be coded either recursively or
iteratively in Python, and as functions or class methods.
Example 18-23
is a first attempt
at two simple reversal functions.
Example 18-23. PP4E\Dstruct\Classics\rev1.py
def reverse(list): # recursive
if list == []:
return []
else:
return reverse(list[1:]) + list[:1]
def ireverse(list): # iterative
res = []
for x in list: res = [x] + res
return res
Both reversal functions work correctly on lists. But if we try
reversing nonlist sequences (strings, tuples, and so on), theireverse
function always returns a list for
the result regardless of the type of sequence passed:
>>>ireverse("spam")
['m', 'a', 'p', 's']
Much worse, the recursivereverse
version won’t work at all for
nonlists—it gets stuck in an infinite loop. The reason is subtle: whenreverse
reaches the empty string
(""
), it’s not equal to the empty
list ([]
), so theelse
clause is selected. But slicing an empty
sequence returns another empty sequence (indexes are scaled): theelse
clause recurs again with an
empty sequence, without raising an exception. The net effect is that
this function gets stuck in a loop, calling itself over and over again
until Python runs out of memory.
The versions in
Example 18-24
fix both problems by
using generic sequence handling techniques much like that of the prior
section’s permutation utilities:
reverse
uses thenot
operator to detect the end of the
sequence and returns the empty sequence itself, rather than an empty
list constant. Since the empty sequence is the type of the original
argument, the+
operation always
builds the correct type sequence as the recursion unfolds.
ireverse
makes use of the
fact that slicing a sequence returns a sequence of the same type. It
first initializes the result to the slice[:0]
, a new, empty slice of
the argument’s
type. Later, it uses
slicing to extract one-node sequences to add to the result’s front,
instead of a list constant.
Example 18-24. PP4E\Dstruct\Classics\rev2.py
def reverse(list):
if not list: # empty? (not always [])
return list # the same sequence type
else:
return reverse(list[1:]) + list[:1] # add front item on the end
def ireverse(list):
res = list[:0] # empty, of same type
for i in range(len(list)):
res = list[i:i+1] + res # add each item to front
return res
The combination of the changes allows the new functions to work on
any sequence, and return a new sequence of the same type as the sequence
passed in. If we pass in a string, we get a new string as the result. In
fact, they reverse any sequence object that responds to slicing,
concatenation, andlen
—even instances
of Python classes and C types. In other words, they can reverse any
object that has sequence interface protocols. Here they are working on
lists, strings, and tuples:
>>>from rev2 import *
>>>reverse([1, 2, 3]), ireverse([1, 2, 3])
([3, 2, 1], [3, 2, 1])
>>>reverse("spam"), ireverse("spam")
('maps', 'maps')
>>>reverse((1.2, 2.3, 3.4)), ireverse((1.2, 2.3, 3.4))
((3.4, 2.3, 1.2), (3.4, 2.3, 1.2))
Another staple of many systems is sorting: ordering items in a
collection according to some constraint. The script in
Example 18-25
defines a simple
sort routine in Python, which orders a list of objects on a field.
Because Python indexing is generic, the field can be an index or a
key—this function can sort lists of either sequences or mappings.
Example 18-25. PP4E\Dstruct\Classics\sort1.py
def sort(list, field):
res = [] # always returns a list
for x in list:
i = 0
for y in res:
if x[field] <= y[field]: break # list node goes here?
i += 1
res[i:i] = [x] # insert in result slot
return res
if __name__ == '__main__':
table = [ {'name':'john', 'age':25}, {'name':'doe', 'age':32} ]
print(sort(table, 'name'))
print(sort(table, 'age'))
table = [ ('john', 25), ('doe', 32) ]
print(sort(table, 0))
print(sort(table, 1))
Here is this module’s self-test code in action; the first tests
sort dictionaries, and the last sort tuples:
C:\...\PP4E\Dstruct\Classics>python sort1.py
[{'age': 32, 'name': 'doe'}, {'age': 25, 'name': 'john'}]
[{'age': 25, 'name': 'john'}, {'age': 32, 'name': 'doe'}]
[('doe', 32), ('john', 25)]
[('john', 25), ('doe', 32)]
Since functions can be passed in like any other object, we can
easily allow for an optional comparison function. In the next version,
Example 18-26
, the second
argument takes a function that should returntrue
if its first argument should be placed
before its
second
. Alambda
is used to provide an ascending-order
test by default. This sorter also returns a new sequence that is the
same type as the sequence passed in, by applying the slicing
techniques used in the reversal tools earlier—if you sort a tuple of
nodes, you get back a tuple.
Example 18-26. PP4E\Dstruct\Classics\sort2.py
def sort(seq, func=(lambda x,y: x <= y)): # default: ascending
res = seq[:0] # return seq's type
for j in range(len(seq)):
i = 0
for y in res:
if func(seq[j], y): break
i += 1
res = res[:i] + seq[j:j+1] + res[i:] # seq can be immutable
return res
if __name__ == '__main__':
table = ({'name':'doe'}, {'name':'john'})
print(sort(list(table), (lambda x, y: x['name'] > y['name'])))
print(sort(tuple(table), (lambda x, y: x['name'] <= y['name'])))
print(sort('axbyzc'))
This time, the table entries are ordered per a field comparison
function passed in:
C:\...\PP4E\Dstruct\Classics>python sort2.py
[{'name': 'john'}, {'name': 'doe'}]
({'name': 'doe'}, {'name': 'john'})
abcxyz
This version also dispenses with the notion of a field
altogether and lets the passed-in function handle indexing if needed.
That makes this version much more general; for instance, it’s also
useful for sorting strings.
But now that I’ve shown you these reversing and sorting
algorithms, I need to also tell you that they may not be an optimal
approach, especially in these specific cases.
Although
these examples serve an
educational role, built-in lists and functions generally accomplish what
we just did the hard way:
Python’s two built-in sorting tools are so fast that you
would be hard-pressed to beat them in most scenarios. To use the
list object’ssort
method for
arbitrary kinds of iterables, convert first if needed:
temp = list(sequence)
temp.sort()
...use items in temp...
Alternatively, thesorted
built-in function operates on any iterable so you don’t need to
convert, and returns a new sorted result list so you can use it
within a larger expression or context. Because it is not an
in-place change, you also don’t need to be concerned about the
possible side effects of changing the original list:
for item in sorted(iterable):
...use item...
For custom sorts, simply pass in thekey
keyword arguments to tailor the
built-in sort’s operation—it maps values to sort keys instead of
performing comparisons, but the effect is just as general (see the
earlier graph searchers’ length ordering for another
example):
>>>L = [{'n':3}, {'n':20}, {'n':0}, {'n':9}]
>>>L.sort(key=lambda x: x['n'])
>>>L
[{'n': 0}, {'n': 3}, {'n': 9}, {'n': 20}]
Both sorting tools also accept a Booleanreverse
flag to make the result order
descending instead of ascending; there is no need to manually
reverse after the sort. The underlying sort routine in Python is
so good that its documentation claims that it has “supernatural
performance”—not bad for a sort.
Our reversal routines are generally superfluous by the same
token—because Python provides for fast reversals in both in-place
and iterable forms, you’re likely better off using them whenever
possible:
>>>L = [2, 4, 1, 3, 5]
>>>L.reverse()
>>>L
[5, 3, 1, 4, 2]
>>>L = [2, 4, 1, 3, 5]
>>>list(reversed(L))
[5, 3, 1, 4, 2]
In fact, this has been a recurring theme of this chapter on
purpose, to underscore a key point in Python work: although there are
plenty of exceptions, you’re generally better off not reinventing the
data structure wheel unless you have to. Built-in functionality will
often prove a better choice in the end.
Make no mistake: sometimes you really do need objects that add
functionality to built-in types or do something more custom. The set
classes we met, for instance, can add custom tools not directly
supported by Python today, binary search trees may support some
algorithms better than dictionaries and sets can, and the tuple-tree
stack implementation was actually faster than one based on built-in
lists for common usage patterns. Moreover, graphs and permutations are
something you must code on your own.
As we’ve also seen, class encapsulations make it possible to
change and extend object internals without impacting the rest of your
system. Although subclassing built-in types can address much of the same
goals, the end result is still a custom data structure.
Yet because Python comes with a set of built-in, flexible, and
optimized datatypes, data structure implementations are often not as
important in Python as they are in lesser-equipped and lower-level
programming languages. Before you code that new datatype, be sure to ask
yourself whether a built-in type or call might be more in line with the
Python way of thinking.
For more on extended data structures for use in Python, see also
the relatively newcollections
module
in its standard library. As mentioned in the preceding chapter, this
module implements named tuples, ordered dictionaries, and more. It’s
described in Python’s library manual, but its source code, like much in
the standard library, can serve as a source of supplemental examples as
well.
This chapter has
been command line–oriented. To wrap up, I want to refer you
to a program that merges the GUI technology we studied earlier in the book
with some of the data structure ideas we’ve explored here.
This program is called PyTree—a generic tree data structure viewer
written in Python with the tkinter GUI library. PyTree sketches out the
nodes of a tree on-screen as boxes connected by arrows. It also knows how
to route mouse clicks on drawn tree nodes back to the tree, to trigger
tree-specific actions. Because PyTree lets you visualize the structure of
the tree generated by a set of parameters, it’s an arguably fun way to
explore tree-based algorithms.
PyTree supports arbitrary tree types by “wrapping” real trees in
interface objects. The interface objects implement a standard protocol by
communicating with the underlying tree object. For the purposes of this
chapter, PyTree is instrumented to display binary search trees; for the
next chapter, it’s also set up to render expression parse trees. New trees
can be viewed by coding wrapper classes to interface to new tree
types.
The GUI interfaces PyTree utilizes were covered in depth earlier in
this book. Because it is written with Python and tkinter, it should be
portable to Windows, Unix, and Macs. At the top, it’s used with code of
this form:
root = Tk() # build a single viewer GUI
bwrapper = BinaryTreeWrapper() # add extras: input line, test btns
pwrapper = ParseTreeWrapper() # make wrapper objects
viewer = TreeViewer(bwrapper, root) # start out in binary mode
def onRadio():
if var.get() == 'btree':
viewer.setTreeType(bwrapper) # change viewer's wrapper
elif var.get() == 'ptree':
viewer.setTreeType(pwrapper)
Figure 18-2
captures
the display produced by PyTree under Python 3.1 on Windows 7 by running
its top-level
treeview.py
file with
no arguments; PyTree can also be started from a button in the PyDemos
launcher we met in
Chapter 10
. As usual, you
can run it on your own computer to get a better feel for its behavior.
Although this screenshot captures one specific kind of tree, PyTree is
general enough to display arbitrary tree types, and can even switch them
while running.
Figure 18-2. PyTree viewing a binary search tree (test1 button)
Just like the PyForm example of the preceding chapter, the source
code and documentation for this example have been moved to the book’s
examples package in this edition to save real estate here. To study
PyTree, see the following directory:
C:\...\PP4E\Dstruct\TreeView
Also like PyForm, the
Documentation
directory there has the original
description of this example that appeared in the third edition of this
book; PyTree’s code is in Python 3.X form, but the third edition overview
may not be.
As mentioned, PyTree is set up to display both the binary search
trees of this chapter and the expression parse trees of the next chapter.
When viewing the latter, PyTree becomes a sort of visual calculator—you
can generate arbitrary expression trees and evaluate any part of them by
clicking on nodes displayed. But at this point, there is not much more I
can show and/or tell you about those kinds of trees until you move on
to
Chapter 19
.