Side-channel leaks in web applications: a reality today, a challenge tomorrow

It's good to be slapped upside the head with your own misconceptions every once in a while, even when it occurs within your own specialty. Now, I deal with other people's misconceptions about cryptography all the time. If people have heard of cryptography at all, they generally are left with the impression that

cryptography = secure = cryptography = secure = ...

This is very forgivable, but wrong.

For example, cryptography does not prevent traffic analysis. You can encrypt your email all you want, but that doesn't hide who that email is to. Or how often you send it, or whether you get any mail back from them, and so on. There is an awful lot that can be deduced from that kind of information.

Also, that encryption may not prevent other people from changing your message in transit. This one can be a bit of a surprise, but encryption does not protect against modification. Very much the opposite, sometimes: there are several perfectly good encryption schemes which make it very easy to modify the encrypted message. (Consider the one-time pad, if you know what that is. In this encryption scheme, there is a one-to-one correspondence between bits of the ciphertext and bits of the plaintext. By changing a single bit of the ciphertext, you also change that same bit of the plaintext. So if you know what the plaintext is, you can make that ciphertext say anything you want. And the same goes for all stream ciphers (including a common a common AES mode) as stream ciphers are just close approximations of one-time pads anyway.)

So, those are things encryption can't do. But I would have sworn up and down that encryption could at least keep your message secret. Right? Can't it at least do that? As you can probably tell from the question, the answer is actually 'not necessarily.' Encryption will keep people from reading the message, but that's very different from keeping the message secret.

That's the main lesson from Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow, by Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. They start with a very simple observation: even disregarding things like traffic analysis, encryption cannot hide all information about the plaintext. Specifically, no encryption scheme can hide the length of the plaintext.¹ But is that a problem? Yes-- but to explain why, I need to digress into how modern web pages work.

Most of the time, a web page is just 'content': the words you read, embedded in layout instructions. That's how this web-page works, for example.² But more advanced web pages, like Google or Amazon, are actually these entire programs that run in your browser. (I should admit that I'm not an expert in this, and so there may be mistakes in what follows.) Take Amazon, for example. As soon as you start typing in the 'Search' box, what happens? The page starts to display some 'completions': things you might mean given what you typed already. How does this work? Well, as you type in that box, the page (which, again, is really an entire program) sends your keystrokes back to the Amazon servers. The servers generate the list of completions, and send it back to the page. The page then displays this list for your convenience. There's a lot of magic there, and I confess again that I don't understand all of it, but the important thing to understand is this:

Every time you push a key, the page sends a message back to the mothership.
The mothership creates a response, and sends it back to the page.
The content of this response will depend on what you've typed so far.
This means that the length of the response will depend on what you've typed so far.
Encryption does not hide message length.
So even if the page and mothership encrypt their messages, an eavesdropper can still tell what you are typing.

In this case, 'message length' is what is called a 'side channel', and pages that act this way are called 'web applications.' Thus, this attack is using a side-channel leak in a web application-- which hopefully explains the title of the paper.

Now, this side channel has been known for a while, and people have discussed/debated this kind of attack before. The main contribution of this paper is to show that it actually works! In fact, they show that it works in three distinct cases:

With regard to a 'personal health information site,'³ they can determine what doctor or medical condition you are entering.
With regard to a 'tax preparation service,' they can estimate your Adjusted Gross Income and figure out what deductions you are taking.
And with regard to a 'investment service,' they can tell where you've invested your money.

The first attack is much like the Amazon example from above, but the second two are much more complicated. This leads to my only criticism: these attacks are not simple or easy. They require huge amounts of preparation, need very detailed knowledge of how the web application works, and are very fragile in the face of changes to the web application. This is not an attack that should concern the average user. (I would argue that the average user does not really need encryption in the first place.)

But despite these criticisms, I still owe this paper a debt of gratitude. I've knowing the formal security-definitions for encryption for years, and know perfectly well that none of them guarantee anything with regard to message length. I've even written an entire paper on the topic. But despite all this, I've still been walking around with the misconception

encryption = secrecy = encryption = secrecy = ...

Though in many cases, the plaintext 'length' revealed by the ciphertext is actually approximate: rounded up to the next block-length multiple. So rounded up to the next multiple of 128 bits, for example, or the next multiple of 256 bits. ↩
Actually, that's not entirely true. This web page also contains an invisible piece of Javascript that Google uses to do analytics for me. ↩
The paper refuses to identify the specific sites they attacked. ↩