Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (177 page)

Loading Database Tables from Files

One of the nice
things about using Python in the database domain is that
you can combine the power of the SQL query language with the power of
the Python general-purpose programming language. They naturally
complement each other.

Loading with SQL and Python

Suppose, for example, that you want to load a database table
from a flat file, where each line in the file represents a database
row, with individual field values separated by commas. Examples
17-6
and
17-7
list two such datafiles we’re going
to be using here.

Example 17-6. PP4E\Dbase\Sql\data.txt

bob,devel,50000
sue,music,60000
ann,devel,40000
tim,admin,30000
kim,devel,60000

Example 17-7. PP4E\Dbase\Sql\data2.txt

bob,developer,80000
sue,music,90000
ann,manager,80000

Now, some database systems like MySQL have a handy SQL statement
for loading such a table quickly. Its
load datastatement parses and loads data from a text file,
located on either the client or the server machine. In the following,
the first command deletes all records in the table, and we’re using
the fact that Python automatically concatenates adjacent string
literals to split the SQL statement over multiple lines:

# Using MySQL (currently available for Python 2.X only)
...log into MySQL first...
>>>
curs.execute('delete from people')
# all records
>>>
curs.execute(
...
"load data local infile 'data.txt' "
...
"into table people fields terminated by ','")
>>>
curs.execute('select * from people')
>>>
for row in curs.fetchall(): print(row)
...
('bob', 'devel', 50000L)
('sue', 'music', 60000L)       # 2.X long integers
('ann', 'devel', 40000L)
('tim', 'admin', 30000L)
('kim', 'devel', 60000L)
>>>
conn.commit()

This works as expected. But what if you must use a system like
the SQLite database used in this book, which lacks this specific SQL
statement? Or, perhaps you just need to do something more custom than
this MySQL statement allows. Not to worry—a small amount of simple
Python code can easily accomplish the same result with SQLite and
Python 3.X (again, some irrelevant output lines are omitted
here):

C:\...\PP4E\Dbase\Sql>
python
>>>
from sqlite3 import connect
>>>
conn = connect('dbase1')
>>>
curs = conn.cursor()
>>>
curs.execute('delete from people')
# empty the table
>>>
curs.execute('select * from people')
>>>
curs.fetchall()
[]
>>>
file = open('data.txt')
>>>
rows = [line.rstrip().split(',') for line in file]
>>>
rows[0]
['bob', 'devel', '50000']
>>>
for rec in rows:
...
curs.execute('insert into people values (?, ?, ?)', rec)
...
>>>
curs.execute('select * from people')
>>>
for rec in curs.fetchall(): print(rec)
...
('bob', 'devel', 50000)
('sue', 'music', 60000)
('ann', 'devel', 40000)
('tim', 'admin', 30000)
('kim', 'devel', 60000)

This code makes use of a list comprehension to collect string
split results for all lines in the file after removing any newline
characters, and file iterators to step through the file line by line.
Its Python loop does the same work as the MySQL
loadstatement, and it will work on more
database types, including SQLite. We can get some similar result from
an
executemanyDB API call shown
earlier as well, but the Python
forloop here has the potential to be more general.

Python versus SQL

In fact, you have the entire Python language at your disposal
for processing database results, and a little Python can often
duplicate or go beyond SQL syntax. For instance, SQL has special
aggregate function syntax for computing things such as sums and
averages:

>>>
curs.execute("select sum(pay), avg(pay) from people where job = 'devel'")
>>>
curs.fetchall()
[(150000, 50000.0)]

By shifting the processing to Python, we can sometimes simplify
and do more than SQL’s syntax allows (albeit potentially sacrificing
any query performance optimizations the database may perform).
Computing pay sums and averages with Python can be accomplished with a
simple loop:

>>>
curs.execute("select name, pay from people where job = 'devel'")
>>>
result = curs.fetchall()
>>>
result
(('bob', 50000L), ('ann', 40000L), ('kim', 60000L))
>>>
tot = 0
>>>
for (name, pay) in result: tot += pay
...
>>>
print('total:', tot, 'average:', tot / len(result))
# use // to truncate
total: 150000 average: 50000.0

Or we can use more advanced tools such as comprehensions and
generator expressions to calculate sums, averages, maximums, and the
like:

>>>
print(sum(rec[1] for rec in result))
# generator expr
150000
>>>
print(sum(rec[1] for rec in result) / len(result))
50000.0
>>>
print(max(rec[1] for rec in result))
60000

The Python approach is more general, but it doesn’t buy us much
until things become more complex. For example, here are a few more
advanced comprehensions that collect the names of people whose pay is
above and below the average in the query result set:

>>>
avg = sum(rec[1] for rec in result) / len(result)
>>>
print([rec[0] for rec in result if rec[1] > avg])
['kim']
>>>
print([rec[0] for rec in result if rec[1] < avg])
['ann']

We may be able to do some of these kinds of tasks with more
advanced SQL techniques such as nested queries, but we eventually
reach a complexity threshold where Python’s general-purpose nature
makes it attractive and potentially more portable. For comparison,
here is the equivalent SQL:

>>>
query = ("select name from people where job = 'devel' and "
...
"pay > (select avg(pay) from people where job = 'devel')")
>>>
curs.execute(query)
>>>
curs.fetchall()
[('kim',)]
>>>
query = ("select name from people where job = 'devel' and "
...
"pay < (select avg(pay) from people where job = 'devel')")
>>>
curs.execute(query)
>>>
curs.fetchall()
[('ann',)]

This isn’t the most complex SQL you’re likely to meet, but
beyond this point, SQL can become more involved. Moreover, unlike
Python, SQL is limited to database-specific tasks by design. Imagine a
query that compares a column’s values to data fetched off the Web or
from a user in a GUI—simple with Python’s Internet and GUI support,
but well beyond the scope of a special-purpose language such as SQL.
By combining Python and SQL, you get the best of both and can choose
which is best suited to your goals.

With Python, you also have access to utilities you’ve already
coded: your database tool set is arbitrarily extensible with
functions, modules, and classes. To illustrate, here are some of the
same operations coded in a more mnemonic fashion with the
dictionary-record module we wrote earlier:

>>>
from makedicts import makedicts
>>>
recs = makedicts(curs, "select * from people where job = 'devel'")
>>>
print(len(recs), recs[0])
3 {'pay': 50000, 'job': 'devel', 'name': 'bob'}
>>>
print([rec['name'] for rec in recs])
['bob', 'ann', 'kim']
>>>
print(sum(rec['pay'] for rec in recs))
150000
>>>
avg = sum(rec['pay'] for rec in recs) / len(recs)
>>>
print([rec['name'] for rec in recs if rec['pay'] > avg])
['kim']
>>>
print([rec['name'] for rec in recs if rec['pay'] >= avg])
['bob', 'kim']

Similarly, Python’s
setobject type provides operations such as intersection, union, and
difference which can serve as alternatives to other SQL database
operations (in the interest of space, we’ll leave their exploration as
a suggested side trip). For more
advanced
database extensions, see the
SQL-related tools available for Python in the
third
-
party
domain. For example, a variety of
packages add an OOP flavor to the DB
API—
the ORMs we’ll explore near the end
of this
chapter.

SQL Utility Scripts

At this point in
our SQL DB API tour, we’ve started to stretch the
interactive prompt to its breaking point—we wind up retyping the same
boilerplate code again every time we start a session and every time we
run a test. Moreover, the code we’re writing is substantial enough to be
reused in other programs. Let’s wrap up by transforming our code into
reusable scripts that automate tasks and support reuse.

To illustrate more of the power of the Python/SQL mix, this
section presents a handful of utility scripts that perform common
tasks—the sorts of things you’d otherwise have to recode often during
development. As an added bonus, most of these files are both
command-line utilities and modules of functions that can be imported and
called from other programs. Most of the scripts in this section also
allow a database file name to be passed in on the command line; this
allows us to use different databases for different purposes during
development—changes in one won’t impact others.

Table load scripts

Let’s take a
quick look at code first, before seeing it in action;
feel free to skip ahead to correlate the code here with its behavior.
As a first (and less than ideal) step,
Example 17-8
shows a simple way
to script-ify the table-loading logic of the prior section.

Example 17-8. PP4E\Dbase\Sql\loaddb1.py

"""
load table from comma-delimited text file; equivalent to this nonportable SQL:
load data local infile 'data.txt' into table people fields terminated by ','"
"""
import sqlite3
conn = sqlite3.connect('dbase1')
curs = conn.cursor()
file = open('data.txt')
rows = [line.rstrip().split(',') for line in file]
for rec in rows:
curs.execute('insert into people values (?, ?, ?)', rec)
conn.commit()       # commit changes now, if db supports transactions
conn.close()        # close, __del__ call rollback if changes not committed yet

As is,
Example 17-8
is a top-level script geared toward one particular case. It’s hardly
any extra work to generalize this into a function that can be imported
and used in a variety of scenarios, as in
Example 17-9
—a much more widely
useful module and command-line script.

Example 17-9. PP4E\Dbase\Sql\loaddb.py

"""
load table from comma-delimited text file: reusable/generalized version
Importable functions; command-line usage: loaddb.py dbfile? datafile? table?
"""
def login(dbfile):
import sqlite3
conn = sqlite3.connect(dbfile)         # create or open db file
curs = conn.cursor()
return conn, curs
def loaddb(curs, table, datafile, conn=None, verbose=True):
file = open(datafile)                                  # x,x,x\nx,x,x\n
rows = [line.rstrip().split(',') for line in file]     # [[x,x,x], [x,x,x]]
rows = [str(tuple(rec)) for rec in rows]               # ["(x,x,x)", "(x,x,x)"]
for recstr in rows:
curs.execute('insert into ' + table + ' values ' + recstr)
if conn: conn.commit()
if verbose: print(len(rows), 'rows loaded')
if __name__ == '__main__':
import sys
dbfile, datafile, table = 'dbase1', 'data.txt', 'people'
if len(sys.argv) > 1: dbfile   = sys.argv[1]
if len(sys.argv) > 2: datafile = sys.argv[2]
if len(sys.argv) > 3: table    = sys.argv[3]
conn, curs = login(dbfile)
loaddb(curs, table, datafile, conn)

Notice the way this code uses two list comprehensions to build a
string of record values for the
insertstatement (see its comments for the
transforms applied). We could also use an
executemanycall as we did earlier, but we
want to be general and avoid hardcoding the fields insertion
template—this function might be used for tables with any number of
columns.

This file also defines a
loginfunction to automate the initial
connection calls—after retyping this four-command sequence enough
times, it seemed a prime candidate for a function. In addition, this
reduces code redundancy; in the future, such logic need only be
changed in a single location if we change database systems, as long as
the
loginfunction is used
everywhere.

Table display script

Once we load
data, we probably will want to display it.
Example 17-10
allows us to
display results as we go—it prints an entire table with either a
simple display (which could be parsed by other tools) or a formatted
display (generated with the dictionary-record utility we wrote
earlier). Notice how it computes the maximum field-name size for
alignment with a generator expression; the size is passed in to a
string formatting expression by specifying an asterisk (*) for the
field size in the format string.

Example 17-10. PP4E\Dbase\Sql\dumpdb.py

"""
display table contents as raw tuples, or formatted with field names
command-line usage: dumpdb.py dbname? table? [-] (dash=formatted display)
"""
def showformat(recs, sept=('-' * 40)):
print(len(recs), 'records')
print(sept)
for rec in recs:
maxkey = max(len(key) for key in rec)                # max key len
for key in rec:                                      # or: \t align
print('%-*s => %s' % (maxkey, key, rec[key]))    # -ljust, *len
print(sept)
def dumpdb(cursor, table, format=True):
if not format:
cursor.execute('select * from ' + table)
while True:
rec = cursor.fetchone()
if not rec: break
print(rec)
else:
from makedicts import makedicts
recs = makedicts(cursor, 'select * from ' + table)
showformat(recs)
if __name__ == '__main__':
import sys
dbname, format, table = 'dbase1', False, 'people'
cmdargs = sys.argv[1:]
if '-' in cmdargs:                     # format if '-' in cmdline args
format = True                      # dbname if other cmdline arg
cmdargs.remove('-')
if cmdargs: dbname = cmdargs.pop(0)
if cmdargs: table  = cmdargs[0]
from loaddb import login
conn, curs = login(dbname)
dumpdb(curs, table, format)

While we’re at it, let’s code some utility scripts to initialize
and erase the database, so we do not have to type these by hand at the
interactive prompt again every time we want to start from scratch.
Example 17-11
completely
deletes and re-creates the database, to reset it to an initial state
(we did this manually at the start of the tutorial).

Example 17-11. PP4E\Dbase\Sql\makedb.py

"""
physically delete and re-create database files
usage: makedb.py dbname? tablename?
"""
import sys
if input('Are you sure?').lower() not in ('y', 'yes'):
sys.exit()
dbname = (len(sys.argv) > 1 and sys.argv[1]) or 'dbase1'
table  = (len(sys.argv) > 2 and sys.argv[2]) or 'people'
from loaddb import login
conn, curs = login(dbname)
try:
curs.execute('drop table ' + table)
except:
print('database table did not exist')
command = 'create table %s (name char(30), job char(10), pay int(4))' % table
curs.execute(command)
conn.commit()                 # commit may be optional here
print('made', dbname, table)

Next, the clear script in
Example 17-12
deletes all rows
in the table, instead of dropping and re-creating them entirely. For
testing purposes, either approach is sufficient. Minor caveat: the
rowcountattribute doesn’t always
reflect the number of rows deleted in SQLite; see its library manual
entry for details.

Example 17-12. PP4E\Dbase\Sql\cleardb.py

"""
delete all rows in table, but don't drop the table or database it is in
usage: cleardb.py dbname? tablename?
"""
import sys
if input('Are you sure?').lower() not in ('y', 'yes'):
sys.exit()
dbname = sys.argv[1] if len(sys.argv) > 1 else 'dbase1'
table  = sys.argv[2] if len(sys.argv) > 2 else 'people'
from loaddb import login
conn, curs = login(dbname)
curs.execute('delete from ' + table)
#print(curs.rowcount, 'records deleted')        # conn closed by its __del__
conn.commit()                                   # else rows not really deleted

Finally,
Example 17-13
provides a
command-line tool that runs a query and prints its result table in
formatted style. There’s not much to this script; because we’ve
automated most of its tasks already, this is largely just a
combination of existing tools. Such is the power of code reuse in
Python.

Example 17-13. PP4E\Dbase\Sql\querydb.py

"""
run a query string, display formatted result table
example: querydb.py dbase1 "select name, job from people where pay > 50000"
"""
import sys
database, querystr = 'dbase1', 'select * from people'
if len(sys.argv) > 1: database = sys.argv[1]
if len(sys.argv) > 2: querystr = sys.argv[2]
from makedicts import makedicts
from dumpdb    import showformat
from loaddb    import login
conn, curs = login(database)
rows = makedicts(curs, querystr)
showformat(rows)

Using the scripts

Last but not least, here
is a log of a session that makes use of these scripts in
command-line mode, to illustrate their operation. Most of the files
also have functions that can be imported and called from a different
program; the scripts simply map command-line arguments to the
functions’ arguments when run standalone. The first thing we do is
initialize a testing database and load its table from a text
file:

...\PP4E\Dbase\Sql>
makedb.py testdb
Are you sure?
y
database table did not exist
made testdb people
...\PP4E\Dbase\Sql>
loaddb.py testdb data2.txt
3 rows loaded

Next, let’s check our work with the dump utility (use a
-argument to force a formatted
display):

...\PP4E\Dbase\Sql>
dumpdb.py testdb
('bob', 'developer', 80000)
('sue', 'music', 90000)
('ann', 'manager', 80000)
...\PP4E\Dbase\Sql>
dumpdb.py testdb -
3 records
----------------------------------------
pay  => 80000
job  => developer
name => bob
----------------------------------------
pay  => 90000
job  => music
name => sue
----------------------------------------
pay  => 80000
job  => manager
name => ann
----------------------------------------

The dump script is an exhaustive display; to be more specific
about which records to view, use the query script and pass in a query
string on the command line (the command lines are split here to fit in
this book):

...\PP4E\Dbase\Sql>
querydb.py testdb
"select name, job from people where pay = 80000"
2 records
----------------------------------------
job  => developer
name => bob
----------------------------------------
job  => manager
name => ann
----------------------------------------
...\PP4E\Dbase\Sql>
querydb.py testdb
"select * from people where name = 'sue'"
1 records
----------------------------------------
pay  => 90000
job  => music
name => sue
----------------------------------------

Now, let’s erase and start again with a new data set file. The
clear script erases all records but doesn’t reinitialize the database
completely:

...\PP4E\Dbase\Sql>
cleardb.py testdb
Are you sure?
y
...\PP4E\Dbase\Sql>
dumpdb.py testdb -
0 records
----------------------------------------
...\PP4E\Dbase\Sql>
loaddb.py testdb data.txt
5 rows loaded
...\PP4E\Dbase\Sql>
dumpdb.py testdb
('bob', 'devel', 50000)
('sue', 'music', 60000)
('ann', 'devel', 40000)
('tim', 'admin', 30000)
('kim', 'devel', 60000)

In closing, here are three queries in action on this new data
set: they fetch names of developers, jobs that pay above an amount,
and records with a given pay level sorted by job. We could run these
at the Python interactive prompt, of course, but we’re getting a lot
of setup and boilerplate code for free here:

...\PP4E\Dbase\Sql>
querydb.py testdb
"select name from people where job = 'devel'"
3 records
----------------------------------------
name => bob
----------------------------------------
name => ann
----------------------------------------
name => kim
----------------------------------------
...\PP4E\Dbase\Sql>
querydb.py testdb
"select job from people where pay >= 60000"
2 records
----------------------------------------
job => music
----------------------------------------
job => devel
----------------------------------------
...\PP4E\Dbase\Sql>
querydb.py testdb
"select * from people where pay >= 60000 order by job"
2 records
----------------------------------------
pay  => 60000
job  => devel
name => kim
----------------------------------------
pay  => 60000
job  => music
name => sue
----------------------------------------

Before we move on, some context: the scripts in this section
illustrate the benefits of code reuse, accomplish their purpose (which
was partly demonstrating the SQL API), and serve as a model for canned
database utilities. But they are still not as general as they could
be; support for sorting options in the dump script, for example, may
be a useful extension. Although we could generalize to support more
options, at some point we may need to revert to typing SQL commands in
a client—part of the reason SQL is a language is because it must
support so much generality. Further extensions to these scripts are
left as exercises. Change this code as you like; it’s Python,
after all.

Other books

One Day in the Life of Ivan Denisovich by Alexander Solzhenitsyn

Riptides (Lengths) by Campbell, Steph, Reinhardt, Liz

One Week In December by Holly Chamberlin

Prophecy Girl by Melanie Matthews

Surprise Package by Henke, Shirl

Bollywood Babes by Narinder Dhami

Pattern Recognition by William Gibson

Mating the Red Wolf: (Book 10, Grey Wolf Pack Romance Novellas) by E A Price

Cat Fear No Evil by Shirley Rousseau Murphy

We So Seldom Look on Love by Barbara Gowdy