22nd Feb, 2007

Protecting Your Users’ Data with a Privacy Wall

Just Another Brick In The Wall? by Iain Cuthbertson
Just Another Brick In The Wall?
by Iain Cuthbertson

We deal with a lot of very private data at Wesabe, so security and privacy are our top concerns. In this post I will describe one of our primary means for assuring privacy, a technique that is general enough that any site can use it. Our creative name for this technique is the privacy wall. Later, I’ll go on to tell you ways to hack the wall, just so you don’t get too comfortable.

The Privacy Wall

The idea is simple: don’t have any direct links in your database between your users’ “public” data and their private data. Instead of linking tables directly via a foreign key, use a cryptographic hash [1] that is based on at least one piece of data that only the user knows—such as their password. The user’s private data can be looked up when the user logs in, but otherwise it is completely anonymous. Let’s go through a simple example.

Let’s say we’re designing an application that lets members keep a list of their deepest, darkest secrets. We need a database with at least two tables: ‘users’ and ’secrets’. The first pass database model looks like this:

Standard Model

The problem with this schema is that anyone with access to the database can easily find out all the secrets of a given user. With one small change, however, we can make this extremely difficult, if not impossible:

Privacy Wall

The special sauce is the ’secret_key’, which is nothing more than a cryptographic hash of the user’s username and their password [2]. When the user logs in, we can generate the hash and store it in the session [3]. Whenever we need to query the user’s secrets, we use that key to look them up instead of the user id. Now, if some baddie gets ahold of the database, they will still be able to read everyone’s secrets, but they won’t know which secret belongs to which user, and there’s no way to look up the secrets of a given user.

Update: A commenter on my shorter post on the Wesabe blog brought up the important point of what you do if the user forgets their password. The recovery method we came up with was to store a copy of their secret key, encrypted with the answers to their security questions (which aren’t stored anywhere in our database, of course). Assuming that the user hasn’t forgotten those as well, you can easily find their account data and “move it over” when they reset their password (don’t forget to update the encrypted secret key); if they do forget them, well, there’s a problem.

Attacking the Wall

I mentioned earlier that you store the secret key in the user’s session. If you’re storing your session data in the database and your db is hacked, any users that are logged in (or whose sessions haven’t yet be deleted) can be compromised. The same is true if sessions are stored on the filesystem. Keeping session data in memory is better, although it is still hackable (the swapfile is one obvious target). However you’re storing your session data, keeping your sessions reasonably short and deleting them when they expire is wise. You could also store the secret key separately in a cookie on the user’s computer, although then you’d better make damn sure you don’t have any cross-site scripting (XSS) vulnerabilities that would allow a hacker to harvest your user’s cookies.

Other holes can be found if your system is sufficiently complex and an attacker can find a path from User to Secret through other tables in the database, so it’s important to trace out those paths and make sure that the secret key is used somewhere in each chain.

A harder problem to solve is when the secrets themselves may contain enough information to identify the user, and with the above scheme, if one secret is traced back to a user, all of that user’s secrets are compromised. It might not be possible or practical to scrub or encrypt the data, but you can limit the damage of a secret being compromised. My colleague and security guru Sam Quiqley suggests the following as an extra layer of security: add a counter to the data being hashed to generate the secret key:


secret key 1 = Hash(salt + password + '1')
secret key 2 = Hash(salt + password + '2')
...
secret key n = Hash(salt + password + '<n>')

Getting a list of all the secrets for a given user when they log in is going to be a lot less efficient, of course; you have to keep generating hashes and doing queries until no secret with that hash is found, and deleting secrets may require special handling. But it may be a small price to pay for the extra privacy.

Finally, log files can be a gold mine for attackers. There’s a very good chance you’re logging queries, debug statements, or exception reports that link users to their keys or directly to their secrets. You should scrub any identifying information before it gets written to the log file.

So That’s It, Right?

The privacy wall is far from a silver bullet. Privacy and security are hard—really hard—particularly so if your app is taking private data and extracting information out of it for public consumption, like we are at Wesabe. The privacy wall is one of a number of methods we’re using to insure that our users’ private data stays that way. If you’re lucky enough to be going to ETech next month, definitely check out Marc’s session on Super Ninja Privacy Techniques for Web App Developers.

I hope you found this helpful. Let me know what you think; I appreciate any and all feedback. And if you’ve got any cool privacy techniques up your sleeve, share the knowledge!


[1] A cryptographic hash is way of mapping any amount of plain text to a fixed-length “fingerprint” such that the same text always maps to the same hash, and given a hash, it is impossible to generate the text from which it was derived. Hashes are wonderful things with many uses. If you’re a developer, and you didn’t already know this, stop reading now and go here or here, and learn how to generate a SHA1/2 hash in your programming language of choice. Come back when you’re ready. I’ll wait.

[2] You can throw in a salt too, to be safe; just make sure that you’re not using the same hash that you’re using for checking the user’s password. You are smart enough not to store passwords in plaintext in the database, aren’t you?

[3] Danger, Will Robinson! Keep reading.

Responses

I like this idea, thanks for the article! One question.

When a user wants to change their password, do you then go through and recalculate that user’s hash to update all the private data with the new password-hash? Or do you use some other method to handle this?

Yes, you have to update the hashes. It’s not an expensive operation, though, and isn’t going to be done frequently anyway.

I enjoyed this. I hope you write more tidbits such as this. Very well explained.

Very nicely explained. I feel more comfortable now with Wesabe. I hope to see all four of your security process explained here, their flaws and what you do to protect against.

While this may expose your secrets, I believe you will end up having even better solutions.

Thanks!!

Very good article for a simple and effective system. Thanks

[...] problem of identity theft comes up frequently on the WeLL, this is a nice programmatic start.footle » Protecting Your Users’ Data with a Privacy Wall The idea is simple: don’t have any direct links in your database between your users’ [...]

[...] The smart folks over at Wesabe describe a neat idea for protecting private data in a database. They call it the “privacy wall”. [...]

[...] Ninja Privacy Techniques” was on one-way hashes which is ancient (in computer terms), but the privacy wall techniques they’re both implementing and educating around are beautifully simple, and pressingly [...]

[...] wall - this is a clever idea: Normally, in a database tables are connected through keys: each row in one table has an [...]

[...] of Rights” to outline their commitment to users’ privacy. They’ve also been very forthcoming with information about the “super ninja privacy techniques” they use to keep your personal data secure. [...]

[...] working on a follow-up to my privacy wall post which will describe a much better way to go about keeping a user’s private data private, [...]

Huh? So the combination of username and password, along with knowledge of the hash function, becomes the foreign key, right?

Whenever someone hacks the database, he or she will have both the username and password. One can generate the foreign key just by running the hash function on the two. In fact, they could simply run a SQL query joining the two tables normally and then read off the values as if they were directly keyed to the usernames (assuming the hash is supported by the DBMS):

SELECT [whatever] FROM user INNER JOIN secret ON HASH(user.username, user.password) = secret.secret_key;

You can argue that knowledge of the hash function is difficult to come by, but an attacker can simply create 100 accounts with known usernames and passwords, and use those to deduce the algorithm.

mjo- the attacker will most certainly not have the passwords, as they aren’t stored in the database as plaintext. You should never store passwords in plaintext in the database; you store a salt and a hash of the password plus the salt.

So the attacker then runs a local attack on the hashed passwd+salt to get the password in two hours and the system is wide open.

It should also be made clear that the privacy you are talking about is privacy from people hacking your database - you could make the connection whenever you wanted. It might be obvious to some, but I think the average user might think that their data is private even from prying system administrators.

sean- yes, an attacker can run a dictionary attack, but each user has a different salt, so if they manage to break one password, they’ve only compromised that user’s accounts, not the whole system.

And I’m not just talking about keeping your data private from attackers–one of the initial motivations for us implementing this at Wesabe was so that we could assure our friends and family that they can upload their data without worrying about us peeking at their transactions. We can’t figure out whose accounts are whose based on just their username.

Brad- Does that mean that the user’s secretkey hash is calculated within their browser using javascript or something? That seems like the only way the user could ensure that you couldn’t make the connection between them and their data.

But even then if you give the user the ability to edit their account information as well as their data in the same session you could easily tie the two together by simply looking at the IP address of their connection since you obviously control the server. But promising you won’t look at that stuff doesn’t provide a provable level of privacy - it requires the users of the system to trust you, which I don’t see any way around.

I’d like to be convinced, but I guess I’m missing something. It would be a very impressive feat to be able to prove the level of privacy you are claiming!

Sean- the secret_key hash is generated on the server-side when they log in, and stored in the session (which in our case is kept in memory). When they log out or their session expires, their session data is deleted.

Yes, if we really wanted to track someone and figure out which accounts are theirs, we can do that. But it requires significant extra work, including some code changes, to do so.

I also did not say that this technique is the ultimate in privacy protection; this is just one technique out of many that we employ (including log scrubbing, to make sure our log files aren’t leaking data that could let an attacker connect accounts).

Privacy is really hard, and no system is ever going to offer 100% protection, especially against people who control the server. You’re absolutely right–at some point our users just have to trust that we’re not bad guys. There’s never going to be a way around that. We’ve found that being open about what we’re doing on the backend by writing articles like this, or giving presentations at conferences, goes a long way in establishing that trust, however.

I enjoyed this. I hope you write more tidbits such as this. Very well explained.

[...] footle » Protecting Your Users’ Data with a Privacy Wall. the privacy scheme is pretty obvious but they have some good tips on other production problems that come up. [...]

Thanks for this site!
evu_pfntrkdtg.cn

Thank you a lot for this Article, yet a Brick more :D

This article is really detail and nicely explain. If we are using a privacy wall, will we still be hacked?

Meteko - it’s absolutely no protection against being hacked. It just makes it harder for an intruder to steal identifying data from your users.

Curious how you deal with situations where data will be inserted and the user will not be logged in (thus you will not have the hash you need to insert the data)

Case in point is wesabe’s new automatic upload feature. You’d have to have the users plain text password somewhere to be able to associate date with them correct?

I’m curious how this can be done, we’d like to use this concept, just curious how the tech heads deal with these types of issues once this methodology is in place.

Anthony - currently the user’s bank credentials are encrypted in part with their Wesabe password, so we don’t/can’t kick off the automatic upload until the user logs in.

You’re right, though–with the current scheme, we can’t insert data unless we have the user’s password at some point. We don’t need to–and would never–keep it around in plaintext, though.

You could, for example, use their initial login to unlock a random key, which would then be used to access their accounts and credentials. You could even develop the system so that the key expires after a certain period, and unless the user logs in again within that period, their accounts stop updating and their credentials are secured until they log in again.

We’re taking things slowly, though, making sure we have as secure a system as possible, which is why we’re only updating on login right now.

I’ll be following this closely. Very intriguing to brain storm how this could work. Thanks for the explanation!

Great article. In addition to a quality backup system, you might also consider encrypting your financial records in the same way you would store them in a locked file cabinet. I keep all of my statements and tax records in an encrypted volume on my computer with a free open source program called TrueCrypt. It is very seamless and works just like any other partition on your computer

Wonderful article. I’m interested in this subject and want to learn more. Privacy wall is so important. Thank you.

Security is always going to be a difficult thing because it has to stay within the tension of making things easier for the average joe user, while making it difficult for the super-IQ hacker guy. How guys manage to do this is really way over my head – but, I think, that many are doing a sterling job given what they have to work with (and, the amount of complaining Mr. CEO of whatever company who doesn’t know how to work his computer might do when he is wanting to get his daughter to use the company’s bandwidth from home…)

Leave a response

Your response:

Categories