Authors: John MacCormick,Chris Bishop
Next, the database uses another important operation called
select.
A select operation chooses some of the rows from a table, based on some criteria, and throws away the other rows, producing a new virtual table. In this case, we are looking for students who take courses from Professor Kirby, so we need to do a “select” operation that chooses only rows in which the instructor is “Prof Kirby.” That leaves us with this virtual table:
The query is nearly completed. All we need now is another projection operation, to throw away the “instructor” column, leaving us with a virtual table that answers the original query:
It's worth adding a slightly more technical note here. If you happen to be familiar with the database query language SQL, you might find the above definition of the “select” operation rather strange, as the “select” command in SQL does much more than merely selecting some rows. The terminology here comes from a mathematical theory of database operations, known as
relational algebra
, in which “select” is used only for selecting rows. Relational algebra also includes the “join” and “project” operations that we used in our query to find Professor Kirby's students.
Relational Databases
A database that stores all of its data in interconnected tables such as the ones we have been using is called a
relational
database. Relational databases were advocated by the IBM researcher E. F. Codd in his extraordinarily influential 1970 paper, “A Relational Model of Data for Large Shared Data Banks.” Like many of the greatest ideas in science, relational databases seem simple in retrospect—but at the time, they represented a huge leap forward in the efficient storage and processing of information. It turns out that a mere handful of operations (such as the relational algebra operations “select,” “join,” and “project” we saw earlier) are sufficient to generate virtual tables that answer essentially any query to a relational database. So a relational database can store its data in tables that are structured for efficiency, and use the virtual table trick to answer queries that seemingly require the data to be in a different form.
That's why relational databases are used to support a large proportion of e-commerce activities. Whenever you buy something online, you are probably interacting with a slew of relational database tables storing information about products, customers, and individual purchases. In cyberspace, we are constantly surrounded by relational databases, often without even realizing it.
THE HUMAN SIDE OF DATABASES
To the casual observer, databases may well be the least exciting topic in this book. It's just hard to get excited about data storage. But under the covers, the ingenious ideas that make databases work tell a different story. Built out of hardware that can fail in the middle of any operation, databases nevertheless give us the efficiency and rocksolid dependability that we have come to expect from online banking and similar activities. The to-do list trick gives us atomic transactions, which enforce consistency even when thousands of customers are simultaneously interacting with a database. This immense level of concurrency, together with rapid query responses via the virtual table trick, make large databases efficient. The to-do list trick also guarantees consistency in the face of failures. When combined with the prepare-then-commit trick for replicated databases, we are left with iron-clad consistency and durability for our data.
The heroic triumph of databases over unreliable components, known by computer scientists as “fault-tolerance,” is the work of many researchers over many decades. But among the most important contributors was Jim Gray, a superb computer scientist who literally wrote the book on transaction processing. (The book is
Transaction Processing: Concepts and Techniques
, first published in 1992.) Sadly, Gray's career ended early: one day in 2007, he sailed his yacht out of San Francisco Bay, under the Golden Gate Bridge, and into the open ocean on a planned day trip to some nearby islands. No sign of Gray, or his boat, was ever seen again. In a heart-warming twist to this tragic story, Gray's many friends in the database community used his own tools in an effort to save him: freshly generated satellite imagery of the ocean near San Francisco was uploaded to a database so that friends and colleagues could search for any trace of the missing database pioneer. Unfortunately, the search was not successful, and the world of computer science was left without one of its leading luminaries.
9
Digital Signatures: Who
Really
Wrote This Software?
To show you how mistaken you are, and what an unfounded assumption yours is, I will lay before you a certificate…look at it! You may take it in your hand; it's no forgery.
—C
HARLES
D
ICKENS
,
A Tale of Two Cities
Of all the ideas we'll encounter in this book, the concept of a “digital signature” is perhaps the most paradoxical. The word “digital,” interpreted literally, means “consisting of a string of digits.” So, by definition, anything that is digital can be copied: to do so, just copy the digits one at a time. If you can read it, you can copy it! On the other hand, the whole point of a “signature” is that it can be read, but can't be copied (that is, forged) by anyone other than its author. How could it be possible to create a signature that is digital, yet can't be copied? In this chapter, we will discover the resolution of this intriguing paradox.
WHAT ARE DIGITAL SIGNATURES REALLY USED FOR?
It might seem unnecessary to ask the question: what are digital signatures used for? Surely, you might think, we can use them for the same kinds of things that paper signatures are used for: signing checks and other legal documents, such as the lease on an apartment. But if you think about it for a moment, you will realize that this isn't true. Whenever you make an online payment for something, whether by credit card or through an online banking system, do you provide any kind of signature? The answer is no. Typically, online credit card payments require no signature whatsoever. Online banking systems are a little different, because they require you to log in with a password that helps to verify your identity. But if you later make a payment during your online banking session, no signature of any kind is required.
Your computer checks digital signatures automatically. Top: The message my web browser displays when I attempt to download and run a program that has a valid digital signature. Bottom: The result of an invalid or missing digital signature.
What, then, are digital signatures used for in practice? The answer is the reverse of what you might first think: instead of you signing material that is sent to others, it is typically others who sign material before sending it to you. The reason you are probably not aware of this is that the digital signatures are verified automatically by your computer. For example, whenever you attempt to download and run a program, your web browser probably checks to see if the program has a digital signature and whether or not the signature is valid. Then it can display an appropriate warning, like the ones above.
As you can see, there are two possibilities. If the software has a valid signature (as in the top panel of the figure), the computer can tell you with complete confidence the name of the company that wrote the software. Of course, this doesn't guarantee that the software is safe, but at least you can make an informed decision based on the amount of trust you have in the company. On the other hand, if the signature is invalid or missing (as in the bottom panel of the figure), you have absolutely no reassurance about where the software really came from. Even if you thought you were downloading software from a reputable company, it's possible that a hacker somehow substituted some malicious software for the real thing. Alternatively, maybe the software was produced by an amateur who did not have the time or motivation to create a valid digital signature. It is up to you, the user, to decide whether you trust the software under these circumstances.
Although software-signing is the most obvious application of digital signatures, it is by no means the only one. In fact, your computer receives and verifies digital signatures surprisingly often, because some frequently used internet protocols employ digital signatures to verify the identity of the computers you are interacting with. For example, secure servers whose web addresses begin with “https” typically send your computer a digitally signed certificate before establishing a secure session. Digital signatures are also used to verify the authenticity of many software components, such as browser plugins. You have probably seen warning messages about such things while surfing the web.
There is another type of online signature you may have encountered: some websites ask you to type your name as a signature in an online form. I sometimes have to do this when filling out an online recommendation letter for one of my students, for instance. This is
not
what a computer scientist means by a digital signature! Obviously, this kind of typed signature can be forged effortlessly, by anyone who knows your name. In this chapter, we will learn how to create a digital signature that cannot be forged.
PAPER SIGNATURES
Our explanation of digital signatures is going to be built up gradually, starting with the familiar situation of paper signatures and moving in small steps toward genuine digital signatures. So to start with, let's go back to a world with no computers at all. In this world, the only way to authenticate documents is with handwritten signatures on paper. Notice that in this scenario, a signed document can't be authenticated in isolation. For example, suppose you find a piece of paper that says “I promise to pay $100 to Francoise. Signed, Ravi”—just as shown above. How can you verify that Ravi really signed this document? The answer is that you need some trusted repository of signatures, where you can go and check that Ravi's signature is genuine. In the real world, institutions such as banks and government departments perform this role—they really do keep files storing the signatures of their customers, and these files can be physically checked if necessary. In our pretend scenario, let's imagine that a trusted institution called a “paper signature bank” keeps everyone's signature on file. A schematic example of a paper signature bank is shown above.
A paper document with a handwritten signature.
A bank that stores the identities of its customers together with handwritten signatures on file.
To verify Ravi's signature on the document promising to pay Fran-coise, we just need to go to the paper signature bank and ask to see Ravi's signature. Obviously, we are making two important assumptions here. First, we assume the bank can be trusted. In theory, it would be possible for the bank employees to switch Ravi's signature for an imposter's, but we are going to ignore this possibility here. Second, we assume it is impossible for an imposter to forge Ravi's signature. This assumption, as everyone knows, is plain wrong: a skilled forger can easily reproduce a signature, and even amateurs can do a reasonable approximation. Nevertheless, we need the assumption of unforgeability—without it, the paper signature is useless. Later on, we will see how digital signatures are essentially impossible to forge. This is one of the big advantages of digital signatures over paper ones.
SIGNING WITH A PADLOCK
Our first step toward digital signatures is to abandon paper signatures altogether and adopt a new method of authenticating documents that relies on padlocks, keys, and locked boxes. Every participant in the new scheme (in our running example, that means Ravi, Takeshi, and Francoise) acquires a large supply of padlocks. It is crucial that the padlocks belonging to each individual participant are identical—so Ravi's padlocks are all the same. Additionally, each participant's padlocks must be
exclusive:
no one else can make or obtain a padlock like Ravi's. And finally, all padlocks in this chapter have a rather unusual feature: they are equipped with biometric sensors which ensure they can only be locked by their owner. So if Francoise finds an open padlock belonging to Ravi, she can't use it to lock anything. Of course, Ravi also has a supply of keys that will open his padlocks. Because all of his padlocks are identical, all the keys are identical too. The situation so far is shown schematically on the following page. This is the initial setup for what we might call the “physical padlock trick.”
Now let's suppose that just as before, Ravi owes Francoise $100, and Francoise would like to record that fact in a verifiable way. In other words, Francoise wants the equivalent of the document on the previous page, but without relying on a handwritten signature. Here is how the trick is going to work. Ravi makes a document saying “Ravi promises to pay $100 to Francoise,” and doesn't bother to sign it. He makes a copy of the document and places this document in a lockbox. (A lockbox is just a strongly made box that can be locked with a padlock.) Finally, Ravi locks the box with one of his padlocks and gives the locked box to Francoise. The complete package is shown in the figure on the facing page. In a sense that will be made precise very soon, the locked box
is
the signature for the document. Note that it would be a good idea for Francoise, or some other trusted witness, to watch while the signature is created. Otherwise, Ravi could cheat by putting a different document into the box. (Arguably, this scheme would work even better if the lockboxes were transparent. After all, digital signatures provide authenticity, not secrecy. However, transparent lockboxes are a little counterintuitive, so we won't pursue this possibility.)
In the physical padlock trick, each participant has
an exclusive supply of identical padlocks and keys.
Perhaps you can already see how Francoise can now authenticate Ravi's document. If anyone, perhaps even Ravi himself, tries to deny the authenticity of the document, Francoise can say “Okay Ravi, please lend me one of your keys for a minute. Now I'm going to open this lockbox using your key.” In the presence of Ravi and some other witnesses (maybe even a judge in a court of law), Fran-coise opens the padlock and displays the contents of the lockbox. Then Francoise can continue: “Ravi, as you are the only person with access to padlocks that work with this key, no one else can possibly be responsible for the contents of the lockbox. Therefore, you and only you wrote this note and put it in the lockbox. You
do
owe me $100!”
Although it sounds convoluted when you first encounter it, this method of authentication is both practical and powerful. It does have some drawbacks, however. The main problem is that it requires Ravi's cooperation: before Francoise can prove anything, she has to persuade Ravi to lend her one of his keys. But Ravi could refuse, or even worse, he could pretend to cooperate but in fact give her a different key—a key that will not open his padlock. Then, when Fran-coise fails to open the lockbox, Ravi can say, “See, that's not one of