Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (12 page)

System Scripting Overview

To begin our
exploration of the systems domain, we will take a quick tour
through the standard
library
sysand
osmodules in this chapter, before moving on to
larger system programming concepts. As you can tell from the length of
their attribute lists, both of these are large modules—the following
reflects Python 3.1 running on Windows 7 outside IDLE:

C:\...\PP4E\System>
python
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (...)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
import sys, os
>>>
len(dir(sys))
# 65 attributes
65
>>>
len(dir(os))
# 122 on Windows, more on Unix
122
>>>
len(dir(os.path))
# a nested module within os
52

The content of these two modules may vary per Python version and
platform. For example,
osis much
larger under Cygwin after building Python 3.1 from its source code there
(Cygwin is a system that provides Unix-like functionality on Windows; it
is discussed further in
More on Cygwin Python for Windows
):

$
./python.exe
Python 3.1.1 (r311:74480, Feb 20 2010, 10:16:52)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
import sys, os
>>>
len(dir(sys))
64
>>>
len(dir(os))
217
>>>
len(dir(os.path))
51

As I’m not going to demonstrate every item in every built-in module,
the first thing I want to do is show you how to get more details on your
own. Officially, this task also serves as an excuse for introducing a few
core system scripting concepts; along the way, we’ll code a first script
to format documentation.

Python System Modules

Most system-level
interfaces in Python are shipped in just two modules:
sysand
os. That’s somewhat oversimplified; other
standard modules belong to this domain too. Among them are the
following:

glob: For
filename expansion
socket: For network
connections and
Inter-Process Communication (IPC)
threading, _thread, queue: For running
and synchronizing concurrent threads
time, timeit: For
accessing system time details
subprocess, multiprocessing: For launching
and controlling parallel processes
signal, select, shutil, tempfile, and others: For
various other system-related tasks

Third-party extensions such as
pySerial (a serial port interface),
Pexpect (an Expect work-alike for controlling
cross-program dialogs), and even
Twisted (a networking framework) can be arguably lumped
into the systems domain as well. In addition, some built-in functions
are actually system interfaces as well—the
openfunction, for example, interfaces with
the file system. But by and large,
sysand
ostogether form the core of Python’s built-in system tools arsenal.

In principle at least,
sysexports components related to the Python
interpreter
itself (e.g., the module search path),
and
oscontains variables and functions that map to the operating
system on which Python is run. In practice, this distinction may not
always seem clear-cut (e.g., the standard input and output streams show
up in
sys, but they are arguably tied
to operating system paradigms). The good news is that you’ll soon use
the tools in these modules so often that their locations will be
permanently stamped on your memory.
^[
3
]

The
osmodule also attempts to
provide a
portable
programming interface to the
underlying operating system; its functions may be implemented
differently on different platforms, but to Python scripts, they look the
same everywhere. And if that’s still not enough, the
osmodule also exports a nested
submodule
,
os.path, which
provides a portable interface to file and directory processing
tools.

Module Documentation Sources

As you can
probably deduce from the preceding paragraphs, learning to
write system scripts in Python is mostly a matter of learning about
Python’s system modules. Luckily, there are a variety of information
sources to make this task easier—from module attributes to published
references and books.

For instance, if you want to know everything that a built-in
module exports, you can read its library manual entry; study its source
code (Python is open source software, after all); or fetch its attribute
list and documentation string interactively. Let’s import
sysin Python 3.1 and see what it has to
offer:

C:\...\PP4E\System>
python
>>>
import sys
>>>
dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__package__',
'__stderr__', '__stdin__', '__stdout__', '_clear_type_cache', '_current_frames',
'_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder',
'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle',
'dont_write_bytecode', 'exc_info', 'excepthook', 'exec_prefix', 'executable',
'exit', 'flags', 'float_info', 'float_repr_style', 'getcheckinterval',
'getdefaultencoding', 'getfilesystemencoding', 'getprofile', 'getrecursionlimit',
'getrefcount', 'getsizeof', 'gettrace', 'getwindowsversion', 'hexversion',
'int_info', 'intern', 'maxsize', 'maxunicode', 'meta_path', 'modules', 'path',
'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2',
'setcheckinterval', 'setfilesystemencoding', 'setprofile', 'setrecursionlimit',
'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_info',
'warnoptions', 'winver']

The
dirfunction
simply returns a list containing the string names of all
the attributes in any object with attributes; it’s a handy memory jogger
for modules at the interactive prompt. For example, we know there is
something called
sys.version, because
the name
versioncame back in the
dirresult. If that’s not enough, we
can always consult the
__doc__string
of built-in modules:

>>>
sys.__doc__
"This module provides access to some objects used or maintained by the\ninterpre
ter and to functions that interact strongly with the interpreter.\n\nDynamic obj
ects:\n\nargv -- command line arguments; argv[0] is the script pathname if known
\npath -- module search path; path[0] is the script directory, else ''\nmodules
-- dictionary of loaded modules\n\ndisplayhook -- called to show results in an i
...lots of text deleted here...
"

Paging Documentation Strings

The
__doc__built-in
attribute just shown usually contains a string of
documentation, but it may look a bit weird when displayed this way—it’s
one long string with embedded end-line characters that print as
\n, not as a nice list of lines. To format
these strings for a more humane display, you can simply use a
printfunction-call statement:

>>>
print(sys.__doc__)
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.
Dynamic objects:
argv -- command line arguments; argv[0] is the script pathname if known
path -- module search path; path[0] is the script directory, else ''
modules -- dictionary of loaded modules
...lots of lines deleted here...

The
printbuilt-in function,
unlike interactive displays, interprets end-line
characters correctly. Unfortunately,
printdoesn’t, by itself, do anything about
scrolling or paging and so can still be unwieldy on some platforms.
Tools such as the built-in
helpfunction
can do better:

>>>
help(sys)
Help on built-in module sys:
NAME
sys
FILE
(built-in)
MODULE DOCS
http://docs.python.org/library/sys
DESCRIPTION
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.
Dynamic objects:
argv -- command line arguments; argv[0] is the script pathname if known
path -- module search path; path[0] is the script directory, else ''
modules -- dictionary of loaded modules
...lots of lines deleted here...

The
helpfunction is one
interface provided by the
PyDoc system—standard library code that ships with Python
and renders documentation (documentation strings, as well as structural
details) related to an object in a formatted way. The format is either
like a Unix manpage, which we get for
help, or an HTML page, which is more
grandiose. It’s a handy way to get basic information when working
interactively, and it’s a last resort before falling back on manuals and
books.

A Custom Paging Script

The
helpfunction we just met
is also
fairly fixed in the way it displays information; although
it attempts to page the display in some contexts, its page size isn’t
quite right on some of the machines I use. Moreover, it doesn’t page at
all in the IDLE GUI, instead relying on manual use if the
scrollbar—potentially painful for large displays. When I want more
control over the way help text is printed, I usually use a utility
script of my own, like the one in
Example 2-1
.

Example 2-1. PP4E\System\more.py

"""
split and interactively page a string or file of text
"""
def more(text, numlines=15):
lines = text.splitlines()                # like split('\n') but no '' at end
while lines:
chunk = lines[:numlines]
lines = lines[numlines:]
for line in chunk: print(line)
if lines and input('More?') not in ['y', 'Y']: break
if __name__ == '__main__':
import sys                               # when run, not imported
more(open(sys.argv[1]).read(), 10)       # page contents of file on cmdline

The meat of this file is its
morefunction, and if
you know enough Python to be qualified to read this book, it should be
fairly straightforward. It simply splits up a string around end-line
characters, and then slices off and displays a few lines at a time (15
by default) to avoid scrolling off the screen. A slice expression,
lines[:15], gets the first 15 items
in a list, and
lines[15:]gets the
rest; to show a different number of lines each time, pass a number to
the
numlinesargument (e.g., the last
line in
Example 2-1
passes 10 to the
numlinesargument of the
morefunction).

The
splitlinesstring object
method call that this script employs returns a list of substrings split
at line ends (e.g.,
["line", "line",...]). An alternative
splitlinesmethod does similar work, but
retains an empty line at the end of the result if the last line is
\nterminated:

>>>
line = 'aaa\nbbb\nccc\n'
>>>
line.split('\n')
['aaa', 'bbb', 'ccc', '']
>>>
line.splitlines()
['aaa', 'bbb', 'ccc']

As we’ll see more formally in
Chapter 4
, the end-of-line
character
is normally always
\n(which stands for a byte usually having a binary value of 10) within a
Python script, no matter what platform it is run upon. (If you don’t
already know why this matters, DOS
\rcharacters in text are dropped by default when read.)

String Method Basics

Now,
Example 2-1
is a simple
Python program,
but it already brings up three important topics that merit
quick detours here: it uses string methods, reads from a file, and is
set up to be run or imported. Python string methods are not a
system-related tool per se, but they see action in most Python programs.
In fact, they are going to show up throughout this chapter as well as
those that follow, so here is a quick review of some of the more useful
tools in this set. String methods include calls for searching and
replacing:

>>>
mystr = 'xxxSPAMxxx'
>>>
mystr.find('SPAM')
# return first offset
3
>>>
mystr = 'xxaaxxaa'
>>>
mystr.replace('aa', 'SPAM')
# global replacement
'xxSPAMxxSPAM'

The
findcall returns the
offset of the first occurrence of a substring, and
replacedoes global search and replacement.
Like all string operations,
replacereturns a new string instead of changing its subject in-place (recall
that strings are immutable). With these methods, substrings are just
strings; in
Chapter 19
, we’ll also meet a
module called
rethat allows regular
expression
patterns
to show up in searches and
replacements.

In more recent Pythons, the
inmembership operator can often be used as an alternative to
findif all we need is a yes/no answer (it
tests for a substring’s presence). There are also a handful of methods
for removing whitespace on the ends of strings—especially useful for
lines of text read from a file:

>>>
mystr = 'xxxSPAMxxx'
>>>
'SPAM' in mystr
# substring search/test
True
>>>
'Ni' in mystr
# when not found
False
>>>
mystr.find('Ni')
-1
>>>
mystr = '\t  Ni\n'
>>>
mystr.strip()
# remove whitespace
'Ni'
>>>
mystr.rstrip()
# same, but just on right side
'\t  Ni'

String methods also provide functions that are useful for things
such as case conversions, and a standard library module named
stringdefines
some useful preset variables, among other things:

>>>
mystr = 'SHRUBBERY'
>>>
mystr.lower()
# case converters
'shrubbery'
>>>
mystr.isalpha()
# content tests
True
>>>
mystr.isdigit()
False
>>>
import string
# case presets: for 'in', etc.
>>>
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>>
string.whitespace
# whitespace characters
' \t\n\r\x0b\x0c'

There are also methods for splitting up strings around a substring
delimiter and putting them back together with a substring in between.
We’ll explore these tools later in this book, but as an introduction,
here they are at work:

>>>
mystr = 'aaa,bbb,ccc'
>>>
mystr.split(',')
# split into substrings list
['aaa', 'bbb', 'ccc']
>>>
mystr = 'a  b\nc\nd'
>>>
mystr.split()
# default delimiter: whitespace
['a', 'b', 'c', 'd']
>>>
delim = 'NI'
>>>
delim.join(['aaa', 'bbb', 'ccc'])
# join substrings list
'aaaNIbbbNIccc'
>>>
' '.join(['A', 'dead', 'parrot'])
# add a space between
'A dead parrot'
>>>
chars = list('Lorreta')
# convert to characters list
>>>
chars
['L', 'o', 'r', 'r', 'e', 't', 'a']
>>>
chars.append('!')
>>>
''.join(chars)
# to string: empty delimiter
'Lorreta!'

These calls turn out to be surprisingly powerful. For example, a
line of data columns separated by tabs can be parsed into its columns
with a single
splitcall; the
more.py
script uses the
splitlinesvariant shown earlier to split a
string into a list of line strings. In fact, we can emulate the
replacecall we saw earlier in this section
with a split/join combination:

>>>
mystr = 'xxaaxxaa'
>>>
'SPAM'.join(mystr.split('aa'))
# str.replace, the hard way!
'xxSPAMxxSPAM'

For future reference, also keep in mind that Python doesn’t
automatically
convert strings to numbers, or vice versa; if you want to
use one as you would use the other, you must say so with manual
conversions:

>>>
int("42"), eval("42")
# string to int conversions
(42, 42)
>>>
str(42), repr(42)
# int to string conversions
('42', '42')
>>>
("%d" % 42), '{:d}'.format(42)
# via formatting expression, method
('42', '42')
>>>
"42" + str(1), int("42") + 1
# concatenation, addition
('421', 43)

In the last command here, the first expression triggers string
concatenation (since both sides are strings), and the second invokes
integer addition (because both objects are numbers). Python doesn’t
assume you meant one or the other and convert automatically; as a rule
of thumb, Python tries to avoid magic—and the temptation to
guess—whenever possible. String tools will be covered in more detail
later in this book (in fact, they get a full chapter in
Part V
), but be sure to also see the library
manual for additional string method
tools.

Other books

Headstrong Quarterback: A New Adult Sports Romance by Ava Catori

The Crow Trap by Ann Cleeves

Trusting Jay: (A Chicago Suits Romance) (Loving Jay Book 1) by Simone Sowood

The Cowboy's Healing Ways (Cooper Creek) by Minton, Brenda

Girl on a Slay Ride by Louis Trimble

The Counterfeit Agent by Alex Berenson

Manhattan Mafia Guide by Eric Ferrara

The Tin Drum by Gunter Grass, Breon Mitchell

The Reluctant Pinkerton by Robert J. Randisi

For Everyone Concerned by Damien Wilkins