[
47
]
We’ll see three moregetfile
programs before we leave Internet
scripting. The next chapter’s
getfile.py
fetches a file with the higher-level FTP interface instead of using
raw socket calls, and its
http-getfile
scripts
fetch files over the HTTP protocol. Later,
Chapter 15
presents a server-side
getfile.py
CGI script that transfers file
contents over the HTTP port in response to a request made in a web
browser client (files are sent as the output of a CGI script). All
four of the download schemes presented in this text ultimately use
sockets, but only the version here makes that use explicit.
The preceding chapter introduced Internet fundamentals and explored
sockets—the underlying communications mechanism over which bytes flow on
the Net. In this chapter, we climb the encapsulation hierarchy one level
and shift our focus to Python tools that support the client-side
interfaces of common Internet protocols.
We talked about the Internet’s higher-level protocols in the
abstract at the start of the preceding chapter, and you should probably
review that material if you skipped over it the first time around. In
short, protocols define the structure of the conversations that take place
to accomplish most of the Internet tasks we’re all familiar with—reading
email, transferring files by FTP, fetching web pages, and so on.
At the most basic level, all of these protocol dialogs happen over
sockets using fixed and standard message structures and ports, so in some
sense this chapter builds upon the last. But as we’ll see, Python’s
protocol modules hide most of the underlying
details—
scripts generally need to deal only
with simple objects and methods, and Python automates the socket and
messaging logic required by the protocol.
In this chapter, we’ll concentrate on the FTP and email protocol
modules in Python, and we’ll peek at a few others along the way (NNTP
news, HTTP web pages, and so on). Because it is so prevalent, we will
especially focus on email in much of this chapter, as well as in the two
to follow—we’ll use tools and techniques introduced here in the larger
PyMailGUI and PyMailCGI client and server-side programs of
Chapters
14
and
16
.
All of the tools employed in examples here are in the standard
Python library and come with the Python system. All of the examples here
are also designed to run on the client side of a network connection—these
scripts connect to an already running server to request interaction and
can be run from a basic PC or other client device (they require only a
server to converse with). And as usual, all the code here is also designed
to teach us something about Python programming in general—we’ll refactor
FTP examples and package email code to show object-oriented programming
(OOP) in action.
In the next chapter, we’ll look at a complete client-side program
example before moving on to explore scripts designed to be run on the
server side instead. Python programs can also produce pages on a web
server, and there is support in the Python world for implementing the
server side of things like HTTP, email, and FTP. For now,
let’s focus on the client.
[
48
]
[
48
]
There is also support in the Python world for other technologies
that some might classify as “client-side scripting,” too, such as
Jython/Java applets; XML-RPC and SOAP web services; and Rich Internet
Application tools like Flex, Silverlight, pyjamas, and AJAX. These
were all introduced early in
Chapter 12
.
Such tools are generally bound up with the notion of web-based
interactions—they either extend the functionality of a web browser
running on a client machine, or simplify web server access in clients.
We’ll study browser-based techniques in Chapters
15
and
16
; here, client-side scripting means
the client side of common Internet protocols such as FTP and email,
independent of the Web or web browsers. At the bottom, web browsers
are really just desktop GUI applications that make use of client-side
protocols, including those we’ll study here, such as HTTP and FTP. See
Chapter 12
as well as the end of this
chapter for more on other client-side techniques.
As we saw in the
preceding chapter, sockets see plenty of action on the Net.
For instance, the last chapter’sgetfile
example allowed us to transfer entire
files between machines. In practice, though, higher-level protocols are
behind much of what happens on the Net. Protocols run on top of sockets,
but they hide much of the complexity of the network scripting examples of
the prior chapter.
FTP—the File Transfer Protocol—is one of the more commonly used
Internet protocols. It defines a higher-level conversation model that is
based on exchanging command strings and file contents over sockets. By
using FTP, we can accomplish the same task as the prior chapter’sgetfile
script, but the interface is simpler,
standard and more general—FTP lets us ask for files from any server
machine that supports FTP, without requiring that it run our customgetfile
script
. FTP also supports more advanced operations such as
uploading files to the server, getting remote directory listings, and
more.
Really, FTP runs on top of two sockets: one for passing control
commands between client and server (port 21), and another for transferring
bytes. By using a two-socket model, FTP avoids the possibility of
deadlocks (i.e., transfers on the data socket do not block dialogs on the
control socket). Ultimately, though, Python’sftplib
support module allows us to upload and
download files at a remote server machine by FTP, without dealing in raw
socket calls or FTP protocol details.
Because the
Python FTP interface is so easy to use, let’s jump right
into a realistic example. The script in
Example 13-1
automatically fetches
(a.k.a. “downloads”) and opens a remote file with Python. More
specifically, this Python script does the
following
:
Downloads an image file (by default) from a remote FTP
site
Opens the downloaded file with a utility we wrote in
Example 6-23
, in
Chapter 6
The download portion will run on any machine with Python and an
Internet connection, though you’ll probably want to change the script’s
settings so it accesses a server and file of your own. The opening part
works if your
playfile.py
supports your platform; see
Chapter 6
for details, and change as
needed.
Example 13-1. PP4E\Internet\Ftp\getone.py
#!/usr/local/bin/python
"""
A Python script to download and play a media file by FTP. Uses ftplib, the ftp
protocol handler which uses sockets. Ftp runs on 2 sockets (one for data, one
for control--on ports 20 and 21) and imposes message text formats, but Python's
ftplib module hides most of this protocol's details. Change for your site/file.
"""
import os, sys
from getpass import getpass # hidden password input
from ftplib import FTP # socket-based FTP tools
nonpassive = False # force active mode FTP for server?
filename = 'monkeys.jpg' # file to be downloaded
dirname = '.' # remote directory to fetch from
sitename = 'ftp.rmi.net' # FTP site to contact
userinfo = ('lutz', getpass('Pswd?')) # use () for anonymous
if len(sys.argv) > 1: filename = sys.argv[1] # filename on command line?
print('Connecting...')
connection = FTP(sitename) # connect to FTP site
connection.login(*userinfo) # default is anonymous login
connection.cwd(dirname) # xfer 1k at a time to localfile
if nonpassive: # force active FTP if server requires
connection.set_pasv(False)
print('Downloading...')
localfile = open(filename, 'wb') # local file to store download
connection.retrbinary('RETR ' + filename, localfile.write, 1024)
connection.quit()
localfile.close()
if input('Open file?') in ['Y', 'y']:
from PP4E.System.Media.playfile import playfile
playfile(filename)
Most of the FTP protocol details are encapsulated by the Pythonftplib
module imported here. This
script uses some of the simplest interfaces inftplib
(we’ll see others later in this chapter),
but they are representative of the module in general.
To open a connection to a remote (or local) FTP server, create an
instance of theftplib.FTP
object,
passing in the string name (domain or IP style) of the machine you wish to
connect to:
connection = FTP(sitename) # connect to ftp site
Assuming this call doesn’t throw an exception, the resulting FTP
object exports methods that correspond to the usual FTP operations. In
fact, Python scripts act much like typical FTP client programs—just
replace commands you would normally type or select with method
calls:
connection.login(*userinfo) # default is anonymous login
connection.cwd(dirname) # xfer 1k at a time to localfile
Once connected, we log in and change to the remote directory from
which we want to fetch a file. Thelogin
method allows us to pass in a username and
password as additional optional arguments to specify an account login; by
default, it performs anonymous FTP. Notice the use of thenonpassive
flag in this script:
if nonpassive: # force active FTP if server requires
connection.set_pasv(False)
If this flag is set toTrue
, the
script will transfer the file in active FTP mode rather than the default
passive mode. We’ll finesse the details of the difference here (it has to
do with which end of the dialog chooses port numbers for the transfer),
but if you have trouble doing transfers with any of the FTP scripts in
this chapter, try using active mode as a first step. In Python 2.1 and
later, passive FTP mode is on by default. Now, open a local file to
receive the file’s content, and fetch the file:
localfile = open(filename, 'wb')
connection.retrbinary('RETR ' + filename, localfile.write, 1024)
Once we’re in the target remote directory, we simply call theretrbinary
method to download the
target server file in binary mode. Theretrbinary
call will take a while to complete, since it must download a
big file. It gets three arguments:
An FTP command string; here, the stringRETR
filename
,
which is the standard format for FTP retrievals.
A function or method to which Python passes each chunk of the
downloaded file’s bytes; here, thewrite
method of a newly created and opened
local file.
A size for those chunks of bytes; here, 1,024 bytes are
downloaded at a time, but the default is reasonable if this argument
is omitted.
Because this script creates a local file namedlocalfile
of the same name as the remote file
being fetched, and passes itswrite
method to the FTP retrieval method, the remote file’s contents will
automatically appear in a local, client-side file after the download is
finished.
Observe how this file is opened inwb
binary output mode. If this script is run on
Windows we want to avoid automatically expanding any\n
bytes into\r\n
byte sequences; as we saw in
Chapter 4
, this happens automatically on
Windows when writing files opened inw
text mode. We also want to avoid Unicode issues in Python 3.X—as we also
saw in
Chapter 4
, strings are encoded
when written in text mode and this isn’t appropriate for binary data such
as images. A text-mode file would also not allow for thebytes
strings passed towrite
by the FTP library’sretrbinary
in any event, sorb
is effectively required here (more on output
file modes later).
Finally, we call the FTPquit
method to break the connection with the server and manuallyclose
the local file to force it to be complete
before it is further processed (it’s not impossible that parts of the file
are still held in buffers before theclose
call):
connection.quit()
localfile.close()
And that’s all there is to it—all the FTP, socket, and networking
details are hidden behind theftplib
interface module. Here is this script in action on a Windows 7 machine;
after the download, the image file pops up in a Windows picture viewer on
my laptop, as captured in
Figure 13-1
. Change the server
and file assignments in this script to test on your own, and be sure yourPYTHONPATH
environment variable
includes the
PP4E
root’s container,
as we’re importing across directories on the examples tree here:
C:\...\PP4E\Internet\Ftp>python getone.py
Pswd?
Connecting...
Downloading...
Open file?y
Figure 13-1. Image file downloaded by FTP and opened locally
Notice how the standard Pythongetpass.getpass
is used
to ask for an FTP password. Like theinput
built-in function, this call prompts for
and reads a line of text from the console user; unlikeinput
,getpass
does not echo typed characters on the
screen at all (see themoreplus
stream
redirection example of
Chapter 3
for
related tools). This is handy for protecting things like passwords from
potentially prying eyes. Be careful, though—after issuing a warning, the
IDLE GUI echoes the password anyhow!
The main thing to notice is that this otherwise typical Python
script fetches information from an arbitrarily remote FTP site and
machine. Given an Internet link, any information published by an FTP
server on the Net can be fetched by and incorporated into Python scripts
using interfaces such as these.
In fact, FTP is just one way to
transfer information across the Net, and there are more
general tools in the Python library to accomplish the prior script’s
download. Perhaps the most straightforward is the Pythonurllib.request
module
: given an Internet address string—a URL, or Universal
Resource Locator—this module opens a connection to the specified server
and returns a file-like object ready to be read with normal file object
method calls (e.g.,read
,readline
).
We can use such a higher-level interface to download anything with
an address on the Web—files published by FTP sites (using URLs that
start with
ftp://
); web pages and output of scripts
that live on remote servers (using
http://
URLs);
and even local files (using
file://
URLs). For
instance, the script in
Example 13-2
does the same as the one
in
Example 13-1
, but it uses
the generalurllib.request
module to
fetch the source distribution file, instead of the protocol-specificftplib
.
Example 13-2. PP4E\Internet\Ftp\getone-urllib.py
#!/usr/local/bin/python
"""
A Python script to download a file by FTP by its URL string; use higher-level
urllib instead of ftplib to fetch file; urllib supports FTP, HTTP, client-side
HTTPS, and local files, and handles proxies, redirects, cookies, and more;
urllib also allows downloads of html pages, images, text, etc.; see also
Python html/xml parsers for web pages fetched by urllib in Chapter 19;
"""
import os, getpass
from urllib.request import urlopen # socket-based web tools
filename = 'monkeys.jpg' # remote/local filename
password = getpass.getpass('Pswd?')
remoteaddr = 'ftp://lutz:%[email protected]/%s;type=i' % (password, filename)
print('Downloading', remoteaddr)
# this works too:
# urllib.request.urlretrieve(remoteaddr, filename)
remotefile = urlopen(remoteaddr) # returns input file-like object
localfile = open(filename, 'wb') # where to store data locally
localfile.write(remotefile.read())
localfile.close()
remotefile.close()
Note how we use a binary mode output file again;urllib
fetches return byte strings, even for
HTTP web pages. Don’t sweat the details of the URL string used here; it
is fairly complex, and we’ll explain its structure and that of URLs in
general in
Chapter 15
. We’ll also useurllib
again in this and later
chapters to fetch web pages, format generated URL strings, and get the
output of remote scripts on the Web.
Technically speaking,urllib.request
supports a variety of Internet
protocols (HTTP, FTP, and local files). Unlikeftplib
,urllib.request
is generally used for reading
remote objects, not for writing or uploading them (though the HTTP and
FTP protocols support file uploads too). As withftplib
, retrievals must generally be run in
threads if blocking is a concern. But the basic interface shown in this
script is straightforward. The call:
remotefile = urllib.request.urlopen(remoteaddr) # returns input file-like object
contacts the server named in theremoteaddr
URL string and returns a file-like
object connected to its download stream (here, an FTP-based socket).
Calling this file’sread
method pulls
down the file’s contents, which are written to a local client-side file.
An even simpler interface:
urllib.request.urlretrieve(remoteaddr, filename)
also does the work of opening a local file and writing the
downloaded bytes into it—things we do manually in the script as coded.
This comes in handy if we want to download a file, but it is less useful
if we want to process its data immediately.
Either way, the end result is the same: the desired server file
shows up on the client machine. The output is similar to the original
version, but we don’t try to automatically open this time (I’ve changed
the password in the URL here to protect the innocent):
C:\...\PP4E\Internet\Ftp>getone-urllib.py
Pswd?
Downloading ftp://lutz:[email protected]/monkeys.jpg;type=i
C:\...\PP4E\Internet\Ftp>fc monkeys.jpg test\monkeys.jpg
FC: no differences encountered
C:\...\PP4E\Internet\Ftp>start monkeys.jpg
For moreurllib
download
examples, see the section on HTTP later in this chapter, and the
server-side examples in
Chapter 15
. As
we’ll see in
Chapter 15
, in bigger terms,
tools like theurllib.request urlopen
function allow scripts to both download remote files and invoke programs
that are located on a remote server machine, and so serves as a useful
tool for testing and using web sites in Python scripts. In
Chapter 15
, we’ll also see thaturllib.parse
includes tools for formatting
(escaping) URL strings for safe
transmission.