So where
are we in the Internet abstraction model now? With all
this email fetching and sending going on, it’s easy to lose the forest
for the trees. Keep in mind that because mail is transferred over
sockets (remember sockets?), they are at the root of all this activity.
All email read and written ultimately consists of formatted bytes
shipped over sockets between computers on the Net. As we’ve seen,
though, the POP and SMTP interfaces in Python hide all the details.
Moreover, the scripts we’ve begun writing even hide the Python
interfaces and provide higher-level interactive tools.
Both thepopmail
andsmtpmail
scripts provide portable email tools
but aren’t quite what we’d expect in terms of usability these days.
Later in this chapter, we’ll use what we’ve seen thus far to implement a
more interactive, console-based mail tool. In the next chapter, we’ll
also code a tkinter email GUI, and then we’ll go on to build a web-based
interface in a later chapter. All of these tools, though, vary primarily
in terms of user interface only; each ultimately employs the Python mail
transfer modules we’ve met here to transfer mail message text over the
Internet with sockets.
Before we move on, one more SMTP note: just as for reading mail,
we can use the Python interactive prompt as our email sending client,
too, if we type calls manually. The following, for example, sends a
message through my ISP’s SMTP server to two recipient addresses assumed
to be part of a mail list:
C:\...\PP4E\Internet\Email>python
>>>from smtplib import SMTP
>>>conn = SMTP('smtpout.secureserver.net')
>>>conn.sendmail(
...'[email protected]',
# true sender
...['[email protected]', '[email protected]'],
# true recipients
..."""From: [email protected]
...To: maillist
...Subject: test interactive smtplib
...
...testing 1 2 3...
...""")
{}
>>>conn.quit()
# quit() required, Date added
(221, b'Closing connection. Good bye.')
We’ll verify receipt of this message in a later email client
program; the “To” recipient shows up as “maillist” in email clients—a
completely valid use case for header manipulation. In fact, you can
achieve the same effect with thesmtpmail-noTo
script by separating recipient
addresses at the “To?” prompt with a semicolon (e.g.
[email protected]
;
[email protected]
)
and typing the email list’s name in the “To:” header line. Mail clients
that support mailing lists automate such steps.
Sending mail interactively this way is a bit tricky to get right,
though—header lines are governed by standards: the blank line after the
subject line is required and significant, for instance, and Date is
omitted altogether (one is added for us). Furthermore, mail formatting
gets much more complex as we start writing messages with attachments. In
practice, theemail
package in the
standard library is generally used to construct emails, before shipping
them off withsmtplib
. The package
lets us build mails by assigning headers and attaching and possibly
encoding parts, and creates a correctly formatted mail text. To learn
how, let’s move on to the next
section.
[
52
]
We all know by now that such junk mail is usually referred to
as spam, but not everyone knows that this name is a reference to a
Monty Python skit in which a restaurant’s customers find it
difficult to hear the reading of menu options over a group of
Vikings singing an increasingly loud chorus of “spam, spam, spam…”.
Hence the tie-in to junk email. Spam is used in Python program
examples as a sort of generic variable name, though it also pays
homage to the skit.
The second
edition of this book used a handful of standard library
modules (rfc822
,StringIO
, and more) to parse the contents of
messages, and simple text processing to compose them. Additionally, that
edition included a section on extracting and decoding attached parts of a
message using modules such asmhlib
,mimetools
, andbase64
.
In the third edition, those tools were still available, but were,
frankly, a bit clumsy and error-prone. Parsing attachments from messages,
for example, was tricky, and composing even basic messages was tedious (in
fact, an early printing of the prior edition contained a potential bug,
because it omitted one\n
character in
a string formatting operation). Adding attachments to sent messages wasn’t
even attempted, due to the complexity of the formatting involved. Most of
these tools are gone completely in Python 3.X as I write this fourth
edition, partly because of their complexity, and partly because they’ve
been made obsolete.
Luckily, things are much simpler today. After the second edition,
Python sprouted a newemail
package—a powerful
collection of tools that automate most of the work behind parsing and
composing email messages. This module gives us an object-based message
interface and handles all the textual message structure details, both
analyzing and creating it. Not only does this eliminate a whole class of
potential bugs, it also promotes more advanced mail processing.
Things like attachments, for instance, become accessible to mere
mortals (and authors with limited book real estate). In fact, an entire
original section on manual attachment parsing and decoding was deleted in
the third edition—it’s essentially automatic withemail
. The new package parses and constructs
headers and attachments; generates correct email text; decodes and encodes
Base64, quoted-printable, and
uuencoded
data; and much more.
We won’t cover theemail
package
in its entirety in this book; it is well documented in Python’s library
manual. Our goal here is to explore some example usage code, which you can
study in conjunction with the manuals. But to help get you started, let’s
begin with a quick overview. In a nutshell, theemail
package is based around theMessage
object it provides:
A mail’s full text, fetched frompoplib
orimaplib
, is parsed into a newMessage
object, with an API for accessing
its components. In the object, mail headers become dictionary-like
keys, and components become a “payload” that can be walked with a
generator interface (more on payloads in a moment).
New mails are composed by creating a newMessage
object, using an API to attach
headers and parts, and asking the object for its print
representation—a correctly formatted mail message text, ready to be
passed to thesmtplib
module for
delivery. Headers are added by key assignment and attachments by
method calls.
In other words, theMessage
object is used both for accessing existing messages and for creating new
ones from scratch. In both cases,email
can automatically handle details like content encodings (e.g., attached
binary images can be treated as text with Base64 encoding and decoding),
content types, and more.
Since theemail
module’sMessage
object is at the heart of its API, you need a cursory
understanding of its form to get started. In short, it is designed to
reflect the structure of a formatted email message. EachMessage
consists of three main pieces of
information
:
A content type (plain text, HTML text, JPEG image, and so
on), encoded as a MIME main type and a subtype. For instance,
“text/html” means the main type is text and the subtype is HTML (a
web page); “image/jpeg” means a JPEG photo. A “multipart/mixed”
type means there are nested parts within the message.
A dictionary-like mapping interface, with one key per mail
header (From, To, and so on). This interface supports almost all
of the usual dictionary operations, and headers may be fetched or
set by normal key indexing.
A “payload,” which represents the mail’s content. This can
be either a string (bytes
orstr
) for simple messages, or a
list of additionalMessage
objects for
multipart
container messages with attached or alternative parts. For some
oddball types, the payload may be a PythonNone
object.
The MIME type of a Message is key to understanding its content.
For example, mails with attached images may have a main top-levelMessage
(typemultipart/mixed
), with three moreMessage
objects in its payload—one for its
main text (typetext/plain
), followed
by two of type image for the photos (typeimage/jpeg
). The photo parts may be encoded
for transmission as text with Base64 or another scheme; the encoding
type, as well as the original image filename, are specified in the
part’s headers.
Similarly, mails that include both simple text and an HTML
alternative will have two nestedMessage
objects in their payload, of type
plain text (text/plain
) and HTML text
(text/html
), along with a main rootMessage
of typemultipart/alternative
. Your mail client
decides which part to display, often based on your preferences.
Simpler messages may have just a rootMessage
of typetext/plain
ortext/html
, representing the entire message
body. The payload for such mails is a simple string. They may also have
no explicitly given type at all, which generally defaults totext/plain
. Some single-part messages aretext/html
, with notext/plain
alternative—they require a web
browser or other HTML viewer (or a very keen-eyed user).
Other combinations are possible, including some types that are not
commonly seen in practice, such asmessage/delivery
status. Most messages have a
main text part, though it is not required, and may be nested in a
multipart or other construct.
In all cases, an email message is a simple, linear string, but
these message structures are automatically detected when mail text is
parsed and are created by your method calls when new messages are
composed. For instance, when creating messages, the messageattach
method adds parts for multipart mails,
andset_payload
sets the entire
payload to a string for simple mails.
Message
objects also have
assorted properties (e.g., the filename of an attachment), and they
provide a convenientwalk
generator
method, which returns the nextMessage
in the payload each time through in afor
loop or other iteration context.
Because the walker yields the rootMessage
object first (i.e.,self
), single-part messages don’t have to be
handled as a special case; a nonmultipart message is effectively aMessage
with a single item in its
payload—itself.
Ultimately, theMessage
object
structure closely mirrors the way mails are formatted as text. Special
header lines in the mail’s text give its type (e.g., plain text or
multipart), as well as the separator used between the content of nested
parts. Since the underlying textual details are automated by theemail
package—both when parsing and
when composing—we won’t go into further formatting details here.
If you are interested in seeing how this translates to real
emails, a great way to learn mail structure is by inspecting the full
raw text of messages displayed by email clients you already use, as
we’ll see with some we meet in this book. In fact, we’ve already seen a
few—see the raw text printed by our earlier POP email scripts for simple
mail text examples. For more on theMessage
object, andemail
in general, consult theemail
package’s entry in Python’s library
manual. We’re skipping details such as its available encoders and MIME
object classes here in the interest of space.
Beyond theemail
package, the
Python library includes other tools for mail-related processing. For
instance,mimetypes
maps a filename
to and from a MIME type:
mimetypes.guess_type(filename)
Maps a
filename to a MIME type. Name
spam.txt
maps to text/plan.
mimetypes.guess_extension(contype)
Maps a MIME
type to a filename extension. Type text/html maps to
.html
.
We also used themimetypes
module earlier in this chapter to guess FTP transfer modes from
filenames (see
Example 13-10
),
as well as in
Chapter 6
, where we used
it to guess a media player for a filename (see the examples there,
including
playfile.py
,
Example 6-23
). For email, these can
come in handy when attaching files to a new message (guess_type
) and saving parsed attachments that
do not provide a filename (guess_extension
). In fact, this module’s
source code is a fairly complete reference to MIME types. See the
library manual for more on these
tools.
Although we can’t
provide an exhaustive reference here, let’s step through a
simple interactive session to illustrate the fundamentals of email
processing. To
compose
the full text of a
message—to be delivered withsmtplib
,
for instance—make aMessage
, assign
headers to its keys, and set its payload to the message body. Converting
to a string yields the mail text. This process is substantially simpler
and less error-prone than the manual text operations we used earlier in
Example 13-19
to build mail as
strings:
>>>from email.message import Message
>>>m = Message()
>>>m['from'] = 'Jane Doe
>>>' m['to'] = '[email protected]'
>>>m.set_payload('The owls are not what they seem...')
>>>
>>>s = str(m)
>>>print(s)
from: Jane Doe
to: [email protected]
The owls are not what they seem...
Parsing
a message’s text—like the kind you
obtain withpoplib
—is similarly
simple, and essentially the inverse: we get back aMessage
object from the text, with keys for
headers and a payload for the body:
>>>s
# same as in prior interaction
'from: Jane Doe\nto: [email protected]\n\nThe owls are not...'
>>>from email.parser import Parser
>>>x = Parser().parsestr(s)
>>>x
>>>
>>>x['From']
'Jane Doe'
>>>x.get_payload()
'The owls are not what they seem...'
>>>x.items()
[('from', 'Jane Doe'), ('to', '[email protected]')]
So far this isn’t much different from the older and
now-defunctrfc822
module, but as
we’ll see in a moment, things get more interesting when there is more
than one part. For simple messages like this one, the messagewalk
generator treats it as a single-part
mail, of type plain text:
>>>for part in x.walk():
...print(x.get_content_type())
...print(x.get_payload())
...
text/plain
The owls are not what they seem...
Making a mail with
attachments
is a little
more work, but not much: we just make a rootMessage
and attach nestedMessage
objects created from the MIME type
object that corresponds to the type of data we’re attaching. TheMIMEText
class, for instance, is a
subclass ofMessage
, which is
tailored for text parts, and knows how to generate the right types of
header information when printed.MIMEImage
andMIMEAudio
similarly customize Message for
images and audio, and also know how to apply Base64 and other MIME
encodings to binary data. The root message is where we store the main
headers of the mail, and we attach parts here, instead of setting the
entire payload—the payload is a list now, not a string.MIMEMultipart
is aMessage
that provides the extra header
protocol we need for the root:
>>>from email.mime.multipart import MIMEMultipart
# Message subclasses
>>>from email.mime.text import MIMEText
# with extra headers+logic
>>>
>>>top = MIMEMultipart()
# root Message object
>>>top['from'] = 'Art
# subtype default=mixed'
>>>top['to'] = '[email protected]'
>>>
>>>sub1 = MIMEText('nice red uniforms...\n')
# part Message attachments
>>>sub2 = MIMEText(open('data.txt').read())
>>>sub2.add_header('Content-Disposition', 'attachment', filename='data.txt')
>>>top.attach(sub1)
>>>top.attach(sub2)
When we ask for the text, a correctly formatted full mail text
is returned, separators and all, ready to be sent withsmtplib
—quite a trick, if you’ve ever tried
this by hand:
>>>text = top.as_string()
# or do: str(top) or print(top)
>>>print(text)
Content-Type: multipart/mixed; boundary="===============1574823535=="
MIME-Version: 1.0
from: Art
to: [email protected]
--===============1574823535==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
nice red uniforms...
--===============1574823535==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="data.txt"
line1
line2
line3
--===============1574823535==--
If we are sent this message and retrieve it viapoplib
, parsing its full text yields aMessage
object
just like the one we built to send. The messagewalk
generator allows us to step through
each part, fetching their types and payloads:
>>>text
# same as in prior interaction
'Content-Type: multipart/mixed; boundary="===============1574823535=="\nMIME-Ver...'
>>>from email.parser import Parser
>>>msg = Parser().parsestr(text)
>>>msg['from']
'Art'
>>>for part in msg.walk():
...print(part.get_content_type())
...print(part.get_payload())
...print()
...
multipart/mixed
[, ]
text/plain
nice red uniforms...
text/plain
line1
line2
line3
Multipart alternative messages (with text and HTML renditions of
the same message) can be composed and parsed in similar fashion.
Becauseemail
clients are able to
parse and compose messages with a simple object-based API, they are
freed to focus on user-interface instead of text
processing.