Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (20 page)

The io.StringIO and io.BytesIO Utility Classes

The prior section’s
technique of redirecting streams to objects proved so
handy that now a standard library module automates the task for many use
cases (though some use cases, such as GUIs, may still require more
custom code). The standard library tool provides an object that maps a
file object interface to and from in-memory strings. For example:

>>>
from io import StringIO
>>>
buff = StringIO()
# save written text to a string
>>>
buff.write('spam\n')
5
>>>
buff.write('eggs\n')
5
>>>
buff.getvalue()
'spam\neggs\n'
>>>
buff = StringIO('ham\nspam\n')
# provide input from a string
>>>
buff.readline()
'ham\n'
>>>
buff.readline()
'spam\n'
>>>
buff.readline()
''

As in the prior section, instances of
StringIOobjects can be assigned to
sys.stdinand
sys.stdoutto redirect streams for
inputand
printcalls and can be passed to any code that
was written to expect a real file object. Again, in Python, the object
interface
, not the concrete datatype, is the name
of the game:

>>>
from io import StringIO
>>>
import sys
>>>
buff = StringIO()
>>>
temp = sys.stdout
>>>
sys.stdout = buff
>>>
print(42, 'spam', 3.141)
# or print(..., file=buff)
>>>
sys.stdout = temp
# restore original stream
>>>
buff.getvalue()
'42 spam 3.141\n'

Note that there is also an
io.BytesIOclass with similar behavior, but
which maps file operations to an in-memory bytes buffer, instead of a
strstring:

>>>
from io import BytesIO
>>>
stream = BytesIO()
>>>
stream.write(b'spam')
>>>
stream.getvalue()
b'spam'
>>>
stream = BytesIO(b'dpam')
>>>
stream.read()
b'dpam'

Due to the sharp distinction that Python 3X draws between text and
binary data, this alternative may be better suited for scripts that deal
with binary data. We’ll learn more about the text-versus-binary issue in
the next chapter when we explore files.

Capturing the stderr Stream

We’ve been focusing
on
stdinand
stdoutredirection, but
stderrcan be similarly reset to files, pipes,
and objects. Although some shells support this, it’s also
straightforward within a Python script. For instance, assigning
sys.stderrto another instance of a class such
as
Outputor a
StringIOobject in the preceding section’s
example allows your script to intercept text written to standard error,
too.

Python itself uses standard error for error message text (and the
IDLE GUI interface intercepts it and colors it red by default). However,
no higher-level tools for standard error do what
printand
inputdo for the output and input streams. If
you wish to print to the error stream, you’ll want to call
sys.stderr.write()explicitly or read the next
section for a
printcall trick that
makes this easier.

Redirecting standard errors from a shell command line is a bit
more complex and less portable. On most Unix-like systems, we can
usually capture
stderroutput by
using shell-redirection syntax of the form
command > output 2>&1. This may not
work on some platforms, though, and can even vary per Unix shell; see
your shell’s manpages for more details.

Redirection Syntax in Print Calls

Because resetting
the stream attributes to new objects was so popular, the
Python
printbuilt-in is also
extended to include an explicit file to which output is to be sent. A
statement of this form:

print(stuff, file=afile)            # afile is an object, not a string name

prints
stuffto
afileinstead of to
sys.stdout. The net effect is similar to
simply assigning
sys.stdoutto an
object, but there is no need to save and restore in order to return to
the original output stream (as shown in the section on redirecting
streams to objects). For example:

import sys
print('spam' * 2, file=sys.stderr)

will send text the standard error stream object rather than
sys.stdoutfor the duration of this
single print call only. The next normal print statement (without
file) prints to standard output as
usual. Similarly, we can use either our custom class or the standard
library’s class as the output file with this hook:

>>>
from io import StringIO
>>>
buff = StringIO()
>>>
print(42, file=buff)
>>>
print('spam', file=buff)
>>>
print(buff.getvalue())
42
spam
>>>
from redirect import Output
>>>
buff = Output()
>>>
print(43, file=buff)
>>>
print('eggs', file=buff)
>>>
print(buff.text)
43
eggs

Other Redirection Options: os.popen and subprocess
Revisited

Near the end of
the preceding chapter, we took a first look at the
built-in
os.popenfunction and its
subprocess.Popenrelative, which
provide a way to redirect another command’s streams from within a Python
program. As we saw, these tools can be used to run a shell command line
(a string we would normally type at a DOS or
cshprompt) but also provide a Python
file-like object connected to the command’s output
stream—
reading the file object allows a
script to read another program’s output. I suggested that these tools
may be used to tap into input streams as well.

Because of that, the
os.popenand
subprocesstools are another way
to redirect streams of spawned programs and are close cousins to some of
the techniques we just met. Their effect is much like the shell
|command-line pipe syntax for redirecting
streams to programs (in fact, their names mean “pipe open”), but they
are run within a script and provide a file-like interface to piped
streams. They are similar in spirit to the
redirectfunction, but are based on running
programs (not calling functions), and the command’s streams are
processed in the spawning script as files (not tied to class objects).
These tools redirect the streams of a program that a script starts,
instead of redirecting the streams of the script itself.

Redirecting input or output with os.popen

In fact, by
passing in the desired mode flag, we redirect either a
spawned program’s output
or
input streams to a
file in the calling scripts, and we can obtain the spawned program’s
exit status code from the
closemethod (
Nonemeans “no error”
here). To illustrate, consider the following two scripts:

C:\...\PP4E\System\Streams>
type hello-out.py
print('Hello shell world')
C:\...\PP4E\System\Streams>
type hello-in.py
inp = input()
open('hello-in.txt', 'w').write('Hello ' + inp + '\n')

These scripts can be run from a system shell window as
usual:

C:\...\PP4E\System\Streams>
python hello-out.py
Hello shell world
C:\...\PP4E\System\Streams>
python hello-in.py
Brian
C:\...\PP4E\System\Streams>
type hello-in.txt
Hello Brian

As we saw in the prior chapter, Python scripts can read
output
from other programs and scripts like
these, too, using code like the following:

C:\...\PP4E\System\Streams>
python
>>>
import os
>>>
pipe = os.popen('python hello-out.py')
# 'r' is default--read stdout
>>>
pipe.read()
'Hello shell world\n'
>>>
print(pipe.close())
# exit status: None is good
None

But Python scripts can also provide
input
to spawned programs’ standard input
streams—
passing a “w” mode argument,
instead of the default “r”, connects the returned object to the
spawned program’s input stream. What we write on the spawning end
shows up as input in the program started:

>>>
pipe = os.popen('python hello-in.py', 'w')
# 'w'--write to program stdin
>>>
pipe.write('Gumby\n')
6
>>>
pipe.close()
# \n at end is optional
>>>
open('hello-in.txt').read()
# output sent to a file
'Hello Gumby\n'

The
popencall is also smart
enough to run the command string as an independent process on
platforms that support such a notion. It accepts an optional third
argument that can be used to control buffering of written text, which
we’ll finesse here.

Redirecting input and output with subprocess

For even more
control over the streams of spawned programs, we can
employ the
subprocessmodule we
introduced in the preceding chapter. As we learned earlier, this
module can emulate
os.popenfunctionality, but it can also achieve feats such as
bidirectional
stream communication
(accessing both a program’s input and output) and tying the output of
one program to the input of another.

For instance, this module provides multiple ways to spawn a
program and get both its standard output text and exit status. Here
are three common ways to leverage this module to start a program and
redirect its
output
stream (recall from
Chapter 2
that you may need to pass a
shell=Trueargument to
Popenand
callto make this section’s examples work on
Unix-like platforms as they are coded here):

C:\...\PP4E\System\Streams>
python
>>>
from subprocess import Popen, PIPE, call
>>>
X = call('python hello-out.py')
# convenience
Hello shell world
>>>
X
0
>>>
pipe = Popen('python hello-out.py', stdout=PIPE)
>>>
pipe.communicate()[0]
# (stdout, stderr)
b'Hello shell world\r\n'
>>>
pipe.returncode
# exit status
0
>>>
pipe = Popen('python hello-out.py', stdout=PIPE)
>>>
pipe.stdout.read()
b'Hello shell world\r\n'
>>>
pipe.wait()
# exit status
0

The
callin the first of
these three techniques is just a convenience function (there are more
of these which you can look up in the Python library manual), and the
communicatein
the second is roughly a convenience for the third (it sends data to
stdin, reads data from stdout until end-of-file, and waits for the
process to end):

Redirecting and connecting to the spawned program’s
input
stream is just as simple, though a bit more
complex than the
os.popenapproach
with
'w'file mode shown in the
preceding section (as mentioned in the last chapter,
os.popenis implemented with
subprocess, and is thus itself just
something of a convenience function today):

>>>
pipe = Popen('python hello-in.py', stdin=PIPE)
>>>
pipe.stdin.write(b'Pokey\n')
6
>>>
pipe.stdin.close()
>>>
pipe.wait()
0
>>>
open('hello-in.txt').read()
# output sent to a file
'Hello Pokey\n'

In fact, we can use obtain
both the input and
output
streams of a spawned program with this module. Let’s
reuse the simple writer and reader scripts we wrote earlier to
demonstrate
:

C:\...\PP4E\System\Streams>
type writer.py
print("Help! Help! I'm being repressed!")
print(42)
C:\...\PP4E\System\Streams>
type reader.py
print('Got this: "%s"' % input())
import sys
data = sys.stdin.readline()[:-1]
print('The meaning of life is', data, int(data) * 2)

Code like the following can both read from and write to the
reader script—the pipe object has two file-like objects available as
attached attributes, one connecting to the input stream, and one to
the output (Python 2.X users might recognize these as equivalent to
the tuple returned by the now-defunct
os.popen2):

>>>
pipe = Popen('python reader.py', stdin=PIPE, stdout=PIPE)
>>>
pipe.stdin.write(b'Lumberjack\n')
11
>>>
pipe.stdin.write(b'12\n')
3
>>>
pipe.stdin.close()
>>>
output = pipe.stdout.read()
>>>
pipe.wait()
0
>>>
output
b'Got this: "Lumberjack"\r\nThe meaning of life is 12 24\r\n'

As we’ll learn in
Chapter 5
, we
have to be cautious when talking back and forth to a program like
this; buffered output streams can lead to deadlock if writes and reads
are interleaved, and we may eventually need to consider tools like the
Pexpect utility
as a workaround (more on this later).

Finally, even more exotic stream control is possible—the
following
connects two programs
, by piping the
output of one Python script into another, first with shell syntax, and
then with the
subprocessmodule:

C:\...\PP4E\System\Streams>
python writer.py | python reader.py
Got this: "Help! Help! I'm being repressed!"
The meaning of life is 42 84
C:\...\PP4E\System\Streams>
python
>>>
from subprocess import Popen, PIPE
>>>
p1 = Popen('python writer.py', stdout=PIPE)
>>>
p2 = Popen('python reader.py', stdin=p1.stdout, stdout=PIPE)
>>>
output = p2.communicate()[0]
>>>
output
b'Got this: "Help! Help! I\'m being repressed!"\r\nThe meaning of life is 42 84\r\n'
>>>
p2.returncode
0

We can get close to this with
os.popen, but that the fact that its pipes
are read or write (and not both) prevents us from catching the second
script’s output in our
code:

>>>
import os
>>>
p1 = os.popen('python writer.py', 'r')
>>>
p2 = os.popen('python reader.py', 'w')
>>>
p2.write( p1.read() )
36
>>>
X = p2.close()
Got this: "Help! Help! I'm being repressed!"
The meaning of life is 42 84
>>>
print(X)
None

From the broader perspective, the
os.popencall and
subprocessmodule are Python’s portable
equivalents of Unix-like shell syntax for redirecting the streams of
spawned programs. The Python versions also work on Windows, though,
and are the most
platform
-
neutral
way to launch another program
from a Python script. The command-line strings you pass to them may
vary per platform (e.g., a directory listing requires an
lson Unix but a
diron Windows), but the call itself works
on all major Python
platforms
.

On Unix-like platforms, the combination of the calls
os.fork,
os.pipe,
os.dup, and some
os.execvariants can also be used to start a
new independent program with streams connected to the parent program’s
streams. As such, it’s yet another way to redirect streams and a
low-level equivalent to tools such as
os.popen(
os.forkis available in Cygwin’s Python on
Windows).

Since these are all more advanced parallel processing tools,
though, we’ll defer further details on this front until
Chapter 5
, especially its coverage of pipes
and exit status codes. And we’ll resurrect
subprocessagain in
Chapter 6
, to code a regression tester
that intercepts all
three
standard streams of
spawned test scripts—inputs, outputs, and errors.

But first,
Chapter 4
continues
our survey of Python system interfaces by exploring the tools
available for processing files and directories. Although we’ll be
shifting focus somewhat, we’ll find that some of what we’ve learned
here will already begin to come in handy as general system-related
tools. Spawning shell commands, for instance, provides ways to inspect
directories, and the file interface we will expand on in the next
chapter is at the heart of the stream processing techniques we have
studied
here.

Python Versus csh

If you are familiar with other common shell script languages,
it might be useful to see how Python compares. Here is a simple
script in a Unix shell language
called
cshthat
mails all the files in the current working directory with a suffix
of
.py
(i.e., all Python source files) to a
hopefully fictitious address:

#!/bin/csh
foreach x (*.py)
echo $x
mail [email protected] -s $x < $x
end

An equivalent Python script looks similar:

#!/usr/bin/python
import os, glob
for x in glob.glob('*.py'):
print(x)
os.system('mail [email protected] -s %s < %s' % (x, x))

but is slightly more verbose. Since Python, unlike
csh, isn’t meant just for shell scripts,
system interfaces must be imported and called explicitly. And since
Python isn’t just a string-processing language, character strings
must be enclosed in quotes, as in C.

Although this can add a few extra keystrokes in simple scripts
like this, being a general-purpose language makes Python a better
tool once we leave the realm of trivial programs. We could, for
example, extend the preceding script to do things like transfer
files by FTP, pop up a GUI message selector and status bar, fetch
messages from an SQL database, and employ COM objects on Windows,
all using standard Python tools.

Python scripts also tend to be more portable to other
platforms than
csh. For instance,
if we used the Python SMTP interface module to send mail instead of
relying on a Unix command-line mail tool, the script would run on
any machine with Python and an Internet link (as we’ll see in
Chapter 13
, SMTP requires only sockets). And
like C, we don’t need
$to
evaluate variables; what else would you expect in a free
language?

Other books

Icing by Stanton, Ashley

Sara and Abby (Sara's Summer Abroad) by Todd, C. E.

Cash (Hawthorne Brothers Romance) by M.L. Young

UMBERTO ECO : THE PRAGUE CEMETERY by Eco, Umberto

No Holds Barred by Callie Croix

A Baby for My Military Stepbrother 3 by Zara, Cassandra

Stolen Moments (And Then Came Love Book 2) by Ellen Wilder

Shane (Remington Ranch Book 2) by Sj McCoy

Burning Your Boats: The Collected Short Stories by Angela Carter

Christmas Wishes...Special Delivery by Mary Manners