Over the 15 years since this book was first published, the Internet
has virtually exploded onto the mainstream stage. It has rapidly grown
from a simple communication device used primarily by academics and
researchers into a medium that is now nearly as pervasive as the
television and telephone. Social observers have likened the Internet’s
cultural impact to that of the printing press, and technical observers
have suggested that all new software development of interest occurs only
on the Internet. Naturally, time will be the final arbiter for such
claims, but there is little doubt that the Internet is a major force in
society and one of the main application contexts for modern software
systems.
The Internet also happens to be one of the primary application
domains for the Python programming language. In the decade and a half
since the first edition of this book was written, the Internet’s growth
has strongly influenced Python’s tool set and roles. Given Python and a
computer with a socket-based Internet connection today, we can write
Python scripts to read and send email around the world, fetch web pages
from remote sites, transfer files by FTP, program interactive websites,
parse HTML and XML files, and much more, simply by using the Internet
modules that ship with Python as standard tools.
In fact, companies all over the world do: Google, YouTube, Walt
Disney, Hewlett-Packard, JPL, and many others rely on Python’s standard
tools to power their websites. For example, the Google search
engine—widely credited with making the Web
usable—
makes extensive use of Python code.
The YouTube video server site is largely implemented in Python. And the
BitTorrent peer-to-peer file transfer system—written in Python and
downloaded by tens of millions of users—leverages Python’s networking
skills to share files among clients and remove some server
bottlenecks.
Many also build and manage their sites with larger Python-based
toolkits. For instance, the Zope web application server was an early
entrant to the domain and is itself written and customizable in Python.
Others build sites with the Plone content management system, which is
built upon Zope and delegates site content to its users. Still others use
Python to script Java web applications with Jython (formerly known as
JPython)—a system that compiles Python programs to Java bytecode, exports
Java libraries for use in Python scripts, and allows Python code to serve
as web applets downloaded and run in a browser.
In more recent years, new techniques and systems have risen to
prominence in the Web sphere. For example, XML-RPC and SOAP interfaces for
Python have enabled web service programming; frameworks such as Google App
Engine, Django, and TurboGears have emerged as powerful tools for
constructing websites; the XML package in Python’s standard library, as
well as third-party extensions, provides a suite of XML processing tools;
and the IronPython implementation provides seamless .NET/Mono integration
for Python code in much the same way Jython leverages Java
libraries.
As the Internet has grown, so too has Python’s role as an Internet
tool. Python has proven to be well suited to Internet scripting for some
of the very same reasons that make it ideal in other domains. Its modular
design and rapid turnaround mix well with the intense demands of Internet
development. In this part of the book, we’ll find that Python does more
than simply support Internet scripting; it also fosters qualities such as
productivity and maintainability that are essential to Internet projects
of all shapes and sizes.
Internet programming entails many topics, so to make the
presentation easier to digest, I’ve split this subject over the next
five chapters of this book. Here’s this part’s chapter rundown:
This chapter introduces Internet fundamentals and explores
sockets
, the underlying communications
mechanism of the Internet. We met sockets briefly as IPC tools in
Chapter 5
and again in a GUI use case
in
Chapter 10
, but here we will study
them in the depth afforded by their broader networking roles.
Chapter 13
covers the
fundamentals of
client-side scripting
and
Internet protocols. Here, we’ll explore Python’s standard support
for FTP, email, HTTP, NNTP, and more.
Chapter 14
presents a larger
client-side case study: PyMailGUI, a full-featured email
client.
Chapter 15
discusses the
fundamentals of
server-side scripting
and
website construction. We’ll study basic CGI scripting techniques and
concepts that underlie most of what happens in the Web.
Chapter 16
presents a larger
server-side case study: PyMailCGI, a full-featured webmail
site.
Each chapter assumes you’ve read the previous one, but you can
generally skip around, especially if you have prior experience in the
Internet domain. Since these chapters represent a substantial portion of
this book at large, the following sections go into a few more details
about what we’ll be studying.
In conceptual terms, the Internet can roughly be thought of as
being composed of multiple functional layers:
Mechanisms such as the TCP/IP transport mechanism, which
deal with transferring bytes between machines, but don’t care
what they mean
The
programmer’s interface to the network, which runs
on top of physical networking layers like TCP/IP and supports
flexible client/server models in both IPC and networked
modes
Structured Internet communication schemes such as FTP and
email, which run on top of sockets and define message formats
and standard addresses
Application models such as CGI, which define the structure
of communication between web browsers and web servers, also run
on top of sockets, and support the notion of web-based
programs
Third-party systems such as Django, App Engine, Jython,
and pyjamas, which leverage sockets and communication protocols,
too, but address specific techniques or larger problem
domains
This book covers the
middle three tiers
in
this list—sockets, the Internet protocols that run on them, and the
CGI model of web-based conversations. What we learn here will also
apply to more specific toolkits in the last tier above, because they
are all ultimately based upon the same Internet and web
fundamentals.
More specifically, in this and the next chapter, our main focus
is on programming the second and third layers:
sockets
and higher-level Internet
protocols
. We’ll start this chapter at the
bottom, learning about the socket model of network programming.
Sockets aren’t strictly tied to Internet scripting, as we saw in
Chapter 5
’s IPC examples, but they are
presented in full here because this is one of their primary roles. As
we’ll see, most of what happens on the Internet happens through
sockets, whether you notice or not.
After introducing sockets, the next two chapters make their way
up to Python’s
client-side
interfaces to
higher-level protocols—things like email and FTP transfers, which run
on top of sockets. It turns out that a lot can be done with Python on
the client alone, and Chapters
13
and
14
will
sample the flavor of Python client-side scripting. Finally, the last
two chapters in this part of the book then move on to present
server-side
scripting—
programs that run on a server
computer and are usually invoked by a web browser.
Now that I’ve told you what we will cover in this book, I also
want to be clear about what we won’t cover. Like tkinter, the Internet
is a vast topic, and this part of the book is mostly an introduction
to its core concepts and an exploration of representative tasks.
Because there are so many Internet-related modules and extensions,
this book does not attempt to serve as an exhaustive survey of the
domain. Even in just Python’s own tool set, there are simply too many
Internet modules to include each in this text in any sort of useful
fashion.
Moreover, higher-level tools like Django, Jython, and App Engine
are very large systems in their own right, and they are best dealt
with in more focused documents. Because dedicated books on such topics
are now available, we’ll merely scratch their surfaces here with a
brief survey later in this chapter. This book also says almost nothing
about lower-level networking layers such as TCP/IP. If you’re curious
about what happens on the Internet at the bit-and-wire level, consult
a good networking text for more
details
.
In other words, this part is not meant to be an exhaustive
reference to Internet and web programming with Python—a topic which
has evolved between prior editions of this book, and will undoubtedly
continue to do so after this one is published. Instead, the goal of
this part of the book is to serve as a tutorial introduction to the
domain to help you get started, and to provide context and examples
which will help you understand the documentation for tools you may
wish to explore after mastering the fundamentals here.
Like the prior parts of the book, this one has other agendas,
too. Along the way, this part will also put to work many of the
operating-system and GUI interfaces we studied in Parts
II
and
III
(e.g., processes, threads, signals, and tkinter). We’ll also get to
see the Python language applied in realistically scaled programs, and
we’ll investigate some of the design choices and challenges that the
Internet presents.
That last statement merits a few more words. Internet scripting,
like GUIs, is one of the “sexier” application domains for Python. As
in GUI work, there is an intangible but instant gratification in
seeing a Python Internet program ship information all over the world.
On the other hand, by its very nature, network programming can impose
speed overheads and user interface limitations. Though it may not be a
fashionable stance these days, some applications are still better off
not being deployed on the Web.
A traditional “desktop” GUI like those of
Part III
, for example, can combine the
feature-richness and responsiveness of client-side libraries with the
power of network access. On the other hand, web-based applications
offer compelling benefits in portability and administration. In this
part of the book, we will take an honest look at the Net’s trade-offs
as they arise and explore examples which illustrate the advantages of
both web and nonweb architectures. In fact, the larger PyMailGUI and
PyMailCGI examples we’ll explore are intended in part to serve this
purpose.
The Internet is also considered by many to be something of an
ultimate proof of concept for open source tools. Indeed, much of the
Net runs on top of a large number of such tools, such as Python, Perl,
the Apache web server, the sendmail program, MySQL, and
Linux.
[
42
]
Moreover, new tools and technologies for programming the
Web sometimes seem to appear faster than developers can absorb
them.
The good news is that Python’s integration focus makes it a
natural in such a heterogeneous world. Today, Python programs can be
installed as client-side and server-side tools; used as applets and
servlets in Java applications; mixed into distributed object systems
like CORBA, SOAP, and XML-RPC; integrated into AJAX-based
applications; and much more. In more general terms, the rationale for
using Python in the Internet domain is exactly the same as in any
other—Python’s emphasis on quality, productivity, portability, and
integration makes it ideal for writing Internet programs that are
open, maintainable, and delivered according to the ever-shrinking
schedules in this field.
Internet scripts generally imply execution contexts that earlier
examples in this book have not. That is, it usually takes a bit more to
run programs that talk over networks. Here are a few pragmatic notes
about this part’s examples, up front:
You don’t need to download extra packages to run examples in
this part of the book. All of the examples we’ll see are based on
the standard set of Internet-related modules that come with Python
and are installed in Python’s library directory.
You don’t need a state-of-the-art network link or an account
on a web server to run the socket and client-side examples in this
part. Although some socket examples will be shown running remotely,
most can be run on a single local machine. Client-side examples that
demonstrate protocol like FTP require only basic Internet access,
and email examples expect just POP and SMTP capable servers.
You don’t need an account on a web server machine to run the
server-side scripts in later chapters; they can be run by any web
browser. You may need such an account to change these scripts if you
store them remotely, but not if you use a locally running web server
as we will in this book.
We’ll discuss configuration details as we move along, but in
short, when a Python script opens an Internet connection (with thesocket
module or one of the Internet
protocol modules), Python will happily use whatever Internet link exists
on your machine, be that a dedicated T1 line, a DSL line, or a simple
modem. For instance, opening a socket on a Windows PC automatically
initiates processing to create a connection to your Internet provider if
needed.
Moreover, as long as your platform supports sockets, you probably
can run many of the examples here even if you have no Internet
connection at all. As we’ll see, a machine namelocalhost
or""
(an empty string) usually means the local
computer itself. This allows you to test both the client and the server
sides of a dialog on the same computer without connecting to the Net.
For example, you can run both socket-based clients and servers locally
on a Windows PC without ever going out to the Net. In other words, you
can likely run the programs here whether you have a way to connect to
the Internet or not.
Some later examples assume that a particular kind of server is
running on a server machine (e.g., FTP, POP, SMTP), but client-side
scripts themselves work on any
Internet
-aware machine with Python
installed. Server-side examples in Chapters
15
and
16
require more: to develop CGI scripts, you’ll need to either have a web
server account or run a web server program locally on your own computer
(which is easier than you may think—we’ll learn how to code a simple one
in Python in
Chapter 15
). Advanced
third-party systems like Jython and Zope must be downloaded separately,
of course; we’ll peek at some of these briefly in this chapter but defer
to their own documentation for more details.
In the Beginning There Was Grail
Besides creating the Python language, Guido
van Rossum also wrote a World Wide Web browser in Python
years ago, named (appropriately enough) Grail.
Grail was partly developed as a demonstration of
Python’s capabilities. It allows users to browse the Web much like
Firefox or Internet Explorer, but it can also be programmed with Grail
applets—Python/tkinter programs downloaded from a server when accessed
and run on the client by the browser. Grail applets work much like
Java applets in more widespread browsers (more on applets in the next
section).
Though it was updated to run under recent Python releases as I
was finishing this edition, Grail is no longer under active
development today, and it is mostly used for research purposes
(indeed, the Netscape browser was counted among its contemporaries).
Nevertheless, Python still reaps the benefits of the Grail project, in
the form of a rich set of Internet tools. To write a full-featured web
browser, you need to support a wide variety of Internet protocols, and
Guido packaged support for all of these as standard library modules
that were eventually shipped with the Python language.
Because of this legacy, Python now includes standard support for
Usenet news (NNTP), email processing (POP, SMTP, IMAP), file transfers
(FTP), web pages and interactions (HTTP, URLs, HTML, CGI), and other
commonly used protocols such as Telnet.
Python
scripts can connect to all of
these Internet components by simply importing the associated library
module.
Since Grail, additional tools have been added to Python’s
library for parsing XML files, OpenSSL secure sockets, and more. But
much of Python’s Internet support can be traced back to the Grail
browser—another example of Python’s support for code reuse at work. At
this writing, you can still find the Grail by searching for “Grail web
browser” at your favorite web search engine.