To begin our
exploration of the systems domain, we will take a quick tour
through the standard
librarysys
andos
modules in this chapter, before moving on to
larger system programming concepts. As you can tell from the length of
their attribute lists, both of these are large modules—the following
reflects Python 3.1 running on Windows 7 outside IDLE:
C:\...\PP4E\System>python
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (...)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>import sys, os
>>>len(dir(sys))
# 65 attributes
65
>>>len(dir(os))
# 122 on Windows, more on Unix
122
>>>len(dir(os.path))
# a nested module within os
52
The content of these two modules may vary per Python version and
platform. For example,os
is much
larger under Cygwin after building Python 3.1 from its source code there
(Cygwin is a system that provides Unix-like functionality on Windows; it
is discussed further in
More on Cygwin Python for Windows
):
$./python.exe
Python 3.1.1 (r311:74480, Feb 20 2010, 10:16:52)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>>import sys, os
>>>len(dir(sys))
64
>>>len(dir(os))
217
>>>len(dir(os.path))
51
As I’m not going to demonstrate every item in every built-in module,
the first thing I want to do is show you how to get more details on your
own. Officially, this task also serves as an excuse for introducing a few
core system scripting concepts; along the way, we’ll code a first script
to format documentation.
Most system-level
interfaces in Python are shipped in just two modules:sys
andos
. That’s somewhat oversimplified; other
standard modules belong to this domain too. Among them are the
following:
glob
For
filename expansion
socket
For network
connections and
Inter-Process Communication (IPC)
threading
,_thread
,queue
For running
and synchronizing concurrent threads
time
,timeit
For
accessing system time details
subprocess
,multiprocessing
For launching
and controlling parallel processes
signal
,select
,shutil
,tempfile
, and othersFor
various other system-related tasks
Third-party extensions such as
pySerial (a serial port interface),
Pexpect (an Expect work-alike for controlling
cross-program dialogs), and even
Twisted (a networking framework) can be arguably lumped
into the systems domain as well. In addition, some built-in functions
are actually system interfaces as well—theopen
function, for example, interfaces with
the file system. But by and large,sys
andos
together form the core of Python’s built-in system tools arsenal.
In principle at least,sys
exports components related to the Python
interpreter
itself (e.g., the module search path),
andos
contains variables and functions that map to the operating
system on which Python is run. In practice, this distinction may not
always seem clear-cut (e.g., the standard input and output streams show
up insys
, but they are arguably tied
to operating system paradigms). The good news is that you’ll soon use
the tools in these modules so often that their locations will be
permanently stamped on your memory.
[
3
]
Theos
module also attempts to
provide a
portable
programming interface to the
underlying operating system; its functions may be implemented
differently on different platforms, but to Python scripts, they look the
same everywhere. And if that’s still not enough, theos
module also exports a nested
submodule
,os.path
, which
provides a portable interface to file and directory processing
tools.
As you can
probably deduce from the preceding paragraphs, learning to
write system scripts in Python is mostly a matter of learning about
Python’s system modules. Luckily, there are a variety of information
sources to make this task easier—from module attributes to published
references and books.
For instance, if you want to know everything that a built-in
module exports, you can read its library manual entry; study its source
code (Python is open source software, after all); or fetch its attribute
list and documentation string interactively. Let’s importsys
in Python 3.1 and see what it has to
offer:
C:\...\PP4E\System>python
>>>import sys
>>>dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__package__',
'__stderr__', '__stdin__', '__stdout__', '_clear_type_cache', '_current_frames',
'_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder',
'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle',
'dont_write_bytecode', 'exc_info', 'excepthook', 'exec_prefix', 'executable',
'exit', 'flags', 'float_info', 'float_repr_style', 'getcheckinterval',
'getdefaultencoding', 'getfilesystemencoding', 'getprofile', 'getrecursionlimit',
'getrefcount', 'getsizeof', 'gettrace', 'getwindowsversion', 'hexversion',
'int_info', 'intern', 'maxsize', 'maxunicode', 'meta_path', 'modules', 'path',
'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2',
'setcheckinterval', 'setfilesystemencoding', 'setprofile', 'setrecursionlimit',
'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_info',
'warnoptions', 'winver']
Thedir
function
simply returns a list containing the string names of all
the attributes in any object with attributes; it’s a handy memory jogger
for modules at the interactive prompt. For example, we know there is
something calledsys.version
, because
the nameversion
came back in thedir
result. If that’s not enough, we
can always consult the__doc__
string
of built-in modules:
>>>sys.__doc__
"This module provides access to some objects used or maintained by the\ninterpre
ter and to functions that interact strongly with the interpreter.\n\nDynamic obj
ects:\n\nargv -- command line arguments; argv[0] is the script pathname if known
\npath -- module search path; path[0] is the script directory, else ''\nmodules
-- dictionary of loaded modules\n\ndisplayhook -- called to show results in an i
...lots of text deleted here...
"
The__doc__
built-in
attribute just shown usually contains a string of
documentation, but it may look a bit weird when displayed this way—it’s
one long string with embedded end-line characters that print as\n
, not as a nice list of lines. To format
these strings for a more humane display, you can simply use aprint
function-call statement:
>>>print(sys.__doc__)
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.
Dynamic objects:
argv -- command line arguments; argv[0] is the script pathname if known
path -- module search path; path[0] is the script directory, else ''
modules -- dictionary of loaded modules
...lots of lines deleted here...
Theprint
built-in function,
unlike interactive displays, interprets end-line
characters correctly. Unfortunately,print
doesn’t, by itself, do anything about
scrolling or paging and so can still be unwieldy on some platforms.
Tools such as the built-inhelp
function
can do better:
>>>help(sys)
Help on built-in module sys:
NAME
sys
FILE
(built-in)
MODULE DOCS
http://docs.python.org/library/sys
DESCRIPTION
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.
Dynamic objects:
argv -- command line arguments; argv[0] is the script pathname if known
path -- module search path; path[0] is the script directory, else ''
modules -- dictionary of loaded modules
...lots of lines deleted here...
Thehelp
function is one
interface provided by the
PyDoc system—standard library code that ships with Python
and renders documentation (documentation strings, as well as structural
details) related to an object in a formatted way. The format is either
like a Unix manpage, which we get forhelp
, or an HTML page, which is more
grandiose. It’s a handy way to get basic information when working
interactively, and it’s a last resort before falling back on manuals and
books.
Thehelp
function we just met
is also
fairly fixed in the way it displays information; although
it attempts to page the display in some contexts, its page size isn’t
quite right on some of the machines I use. Moreover, it doesn’t page at
all in the IDLE GUI, instead relying on manual use if the
scrollbar—potentially painful for large displays. When I want more
control over the way help text is printed, I usually use a utility
script of my own, like the one in
Example 2-1
.
Example 2-1. PP4E\System\more.py
"""
split and interactively page a string or file of text
"""
def more(text, numlines=15):
lines = text.splitlines() # like split('\n') but no '' at end
while lines:
chunk = lines[:numlines]
lines = lines[numlines:]
for line in chunk: print(line)
if lines and input('More?') not in ['y', 'Y']: break
if __name__ == '__main__':
import sys # when run, not imported
more(open(sys.argv[1]).read(), 10) # page contents of file on cmdline
The meat of this file is itsmore
function, and if
you know enough Python to be qualified to read this book, it should be
fairly straightforward. It simply splits up a string around end-line
characters, and then slices off and displays a few lines at a time (15
by default) to avoid scrolling off the screen. A slice expression,lines[:15]
, gets the first 15 items
in a list, andlines[15:]
gets the
rest; to show a different number of lines each time, pass a number to
thenumlines
argument (e.g., the last
line in
Example 2-1
passes 10 to thenumlines
argument of themore
function).
Thesplitlines
string object
method call that this script employs returns a list of substrings split
at line ends (e.g.,["line",
). An alternative
"line",...]splitlines
method does similar work, but
retains an empty line at the end of the result if the last line is\n
terminated:
>>>line = 'aaa\nbbb\nccc\n'
>>>line.split('\n')
['aaa', 'bbb', 'ccc', '']
>>>line.splitlines()
['aaa', 'bbb', 'ccc']
As we’ll see more formally in
Chapter 4
, the end-of-line
character
is normally always\n
(which stands for a byte usually having a binary value of 10) within a
Python script, no matter what platform it is run upon. (If you don’t
already know why this matters, DOS\r
characters in text are dropped by default when read.)
Now,
Example 2-1
is a simple
Python program,
but it already brings up three important topics that merit
quick detours here: it uses string methods, reads from a file, and is
set up to be run or imported. Python string methods are not a
system-related tool per se, but they see action in most Python programs.
In fact, they are going to show up throughout this chapter as well as
those that follow, so here is a quick review of some of the more useful
tools in this set. String methods include calls for searching and
replacing:
>>>mystr = 'xxxSPAMxxx'
>>>mystr.find('SPAM')
# return first offset
3
>>>mystr = 'xxaaxxaa'
>>>mystr.replace('aa', 'SPAM')
# global replacement
'xxSPAMxxSPAM'
Thefind
call returns the
offset of the first occurrence of a substring, andreplace
does global search and replacement.
Like all string operations,replace
returns a new string instead of changing its subject in-place (recall
that strings are immutable). With these methods, substrings are just
strings; in
Chapter 19
, we’ll also meet a
module calledre
that allows regular
expression
patterns
to show up in searches and
replacements.
In more recent Pythons, thein
membership operator can often be used as an alternative tofind
if all we need is a yes/no answer (it
tests for a substring’s presence). There are also a handful of methods
for removing whitespace on the ends of strings—especially useful for
lines of text read from a file:
>>>mystr = 'xxxSPAMxxx'
>>>'SPAM' in mystr
# substring search/test
True
>>>'Ni' in mystr
# when not found
False
>>>mystr.find('Ni')
-1
>>>mystr = '\t Ni\n'
>>>mystr.strip()
# remove whitespace
'Ni'
>>>mystr.rstrip()
# same, but just on right side
'\t Ni'
String methods also provide functions that are useful for things
such as case conversions, and a standard library module namedstring
defines
some useful preset variables, among other things:
>>>mystr = 'SHRUBBERY'
>>>mystr.lower()
# case converters
'shrubbery'
>>>mystr.isalpha()
# content tests
True
>>>mystr.isdigit()
False
>>>import string
# case presets: for 'in', etc.
>>>string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>>string.whitespace
# whitespace characters
' \t\n\r\x0b\x0c'
There are also methods for splitting up strings around a substring
delimiter and putting them back together with a substring in between.
We’ll explore these tools later in this book, but as an introduction,
here they are at work:
>>>mystr = 'aaa,bbb,ccc'
>>>mystr.split(',')
# split into substrings list
['aaa', 'bbb', 'ccc']
>>>mystr = 'a b\nc\nd'
>>>mystr.split()
# default delimiter: whitespace
['a', 'b', 'c', 'd']
>>>delim = 'NI'
>>>delim.join(['aaa', 'bbb', 'ccc'])
# join substrings list
'aaaNIbbbNIccc'
>>>' '.join(['A', 'dead', 'parrot'])
# add a space between
'A dead parrot'
>>>chars = list('Lorreta')
# convert to characters list
>>>chars
['L', 'o', 'r', 'r', 'e', 't', 'a']
>>>chars.append('!')
>>>''.join(chars)
# to string: empty delimiter
'Lorreta!'
These calls turn out to be surprisingly powerful. For example, a
line of data columns separated by tabs can be parsed into its columns
with a singlesplit
call; the
more.py
script uses thesplitlines
variant shown earlier to split a
string into a list of line strings. In fact, we can emulate thereplace
call we saw earlier in this section
with a split/join combination:
>>>mystr = 'xxaaxxaa'
>>>'SPAM'.join(mystr.split('aa'))
# str.replace, the hard way!
'xxSPAMxxSPAM'
For future reference, also keep in mind that Python doesn’t
automatically
convert strings to numbers, or vice versa; if you want to
use one as you would use the other, you must say so with manual
conversions:
>>>int("42"), eval("42")
# string to int conversions
(42, 42)
>>>str(42), repr(42)
# int to string conversions
('42', '42')
>>>("%d" % 42), '{:d}'.format(42)
# via formatting expression, method
('42', '42')
>>>"42" + str(1), int("42") + 1
# concatenation, addition
('421', 43)
In the last command here, the first expression triggers string
concatenation (since both sides are strings), and the second invokes
integer addition (because both objects are numbers). Python doesn’t
assume you meant one or the other and convert automatically; as a rule
of thumb, Python tries to avoid magic—and the temptation to
guess—whenever possible. String tools will be covered in more detail
later in this book (in fact, they get a full chapter in
Part V
), but be sure to also see the library
manual for additional string method
tools.