Archive for the ‘business’ Category

Google Takes Steps Towards Greater Privacy

Wednesday, March 14th, 2007

Google recently announced that it will soon start anonymizing search logs older than 18-24 months. Full details can be found in their Log Retention Policy FAQ (PDF). This is a heartening step back towards their “Don’t Be Evil” corporate philosophy, which some think has been largely abandoned.

I’ve just recently started using Scroogle as a way of defeating their tracking of my every search (their site is awful; Wikipedia has more readable information about the project), although the motives of the man behind it, Daniel Brandt, who also runs the Google Watch site, may be questionable. Still, he doesn’t have much incentive for keeping a log of queries and IP addresses, and if he did, since he’s not giving me a cookie, he can’t tie all my searches together.

Protecting Your Users’ Data with a Privacy Wall

Thursday, February 22nd, 2007

Just Another Brick In The Wall? by Iain Cuthbertson
Just Another Brick In The Wall?
by Iain Cuthbertson

We deal with a lot of very private data at Wesabe, so security and privacy are our top concerns. In this post I will describe one of our primary means for assuring privacy, a technique that is general enough that any site can use it. Our creative name for this technique is the privacy wall. Later, I’ll go on to tell you ways to hack the wall, just so you don’t get too comfortable.

The Privacy Wall

The idea is simple: don’t have any direct links in your database between your users’ “public” data and their private data. Instead of linking tables directly via a foreign key, use a cryptographic hash [1] that is based on at least one piece of data that only the user knows—such as their password. The user’s private data can be looked up when the user logs in, but otherwise it is completely anonymous. Let’s go through a simple example.

Let’s say we’re designing an application that lets members keep a list of their deepest, darkest secrets. We need a database with at least two tables: ‘users’ and ‘secrets’. The first pass database model looks like this:

Standard Model

The problem with this schema is that anyone with access to the database can easily find out all the secrets of a given user. With one small change, however, we can make this extremely difficult, if not impossible:

Privacy Wall

The special sauce is the ‘secret_key’, which is nothing more than a cryptographic hash of the user’s username and their password [2]. When the user logs in, we can generate the hash and store it in the session [3]. Whenever we need to query the user’s secrets, we use that key to look them up instead of the user id. Now, if some baddie gets ahold of the database, they will still be able to read everyone’s secrets, but they won’t know which secret belongs to which user, and there’s no way to look up the secrets of a given user.

Update: A commenter on my shorter post on the Wesabe blog brought up the important point of what you do if the user forgets their password. The recovery method we came up with was to store a copy of their secret key, encrypted with the answers to their security questions (which aren’t stored anywhere in our database, of course). Assuming that the user hasn’t forgotten those as well, you can easily find their account data and “move it over” when they reset their password (don’t forget to update the encrypted secret key); if they do forget them, well, there’s a problem.

Attacking the Wall

I mentioned earlier that you store the secret key in the user’s session. If you’re storing your session data in the database and your db is hacked, any users that are logged in (or whose sessions haven’t yet be deleted) can be compromised. The same is true if sessions are stored on the filesystem. Keeping session data in memory is better, although it is still hackable (the swapfile is one obvious target). However you’re storing your session data, keeping your sessions reasonably short and deleting them when they expire is wise. You could also store the secret key separately in a cookie on the user’s computer, although then you’d better make damn sure you don’t have any cross-site scripting (XSS) vulnerabilities that would allow a hacker to harvest your user’s cookies.

Other holes can be found if your system is sufficiently complex and an attacker can find a path from User to Secret through other tables in the database, so it’s important to trace out those paths and make sure that the secret key is used somewhere in each chain.

A harder problem to solve is when the secrets themselves may contain enough information to identify the user, and with the above scheme, if one secret is traced back to a user, all of that user’s secrets are compromised. It might not be possible or practical to scrub or encrypt the data, but you can limit the damage of a secret being compromised. My colleague and security guru Sam Quiqley suggests the following as an extra layer of security: add a counter to the data being hashed to generate the secret key:


secret key 1 = Hash(salt + password + '1')
secret key 2 = Hash(salt + password + '2')
...
secret key n = Hash(salt + password + '<n>')

Getting a list of all the secrets for a given user when they log in is going to be a lot less efficient, of course; you have to keep generating hashes and doing queries until no secret with that hash is found, and deleting secrets may require special handling. But it may be a small price to pay for the extra privacy.

Finally, log files can be a gold mine for attackers. There’s a very good chance you’re logging queries, debug statements, or exception reports that link users to their keys or directly to their secrets. You should scrub any identifying information before it gets written to the log file.

So That’s It, Right?

The privacy wall is far from a silver bullet. Privacy and security are hard—really hard—particularly so if your app is taking private data and extracting information out of it for public consumption, like we are at Wesabe. The privacy wall is one of a number of methods we’re using to insure that our users’ private data stays that way. If you’re lucky enough to be going to ETech next month, definitely check out Marc’s session on Super Ninja Privacy Techniques for Web App Developers.

I hope you found this helpful. Let me know what you think; I appreciate any and all feedback. And if you’ve got any cool privacy techniques up your sleeve, share the knowledge!


[1] A cryptographic hash is way of mapping any amount of plain text to a fixed-length “fingerprint” such that the same text always maps to the same hash, and given a hash, it is impossible to generate the text from which it was derived. Hashes are wonderful things with many uses. If you’re a developer, and you didn’t already know this, stop reading now and go here or here, and learn how to generate a SHA1/2 hash in your programming language of choice. Come back when you’re ready. I’ll wait.

[2] You can throw in a salt too, to be safe; just make sure that you’re not using the same hash that you’re using for checking the user’s password. You are smart enough not to store passwords in plaintext in the database, aren’t you?

[3] Danger, Will Robinson! Keep reading.

Must…stop…looking at…stats…

Friday, November 17th, 2006

I’ve been too tired today to do any actual work, so I’ve spent much of the day camping out on the Wesabe site stats. It’s terribly exciting having so many people hitting your site and so many signing up (almost 1/3 of our unique visitors have created accounts). The site has been humming along beautifully, too–major props goes out my colleague Coda Hale for his Apache/Mongrel/Pen prowess.

Anyway, I just wanted to share one of the more interesting stats from our analytics (Mint–very tasty):

That’s a lovely thing to see. Granted, these are largely very tech-savvy, early-adopter people at this stage, but it’s heartening to see IE getting the beatdown.

Amazon’s new S3 Storage Service

Wednesday, March 15th, 2006

Amazon just launched a new service, S3 – Simple Storage Service. It is a web service that allows you to store as much data as you like, with file sizes up to 5GB, and you just pay for the storage you use and the data transferred. Rates are very reasonable, too — $0.15/GB/month of storage, and $0.20/GB in data transferred.

This is pretty interesting. It gives developers the ability to create applications requiring significant storage space without having to make a huge upfront investment in equipment and expertise. Want to write your own Flickr? Go for it. Granted, it’s risky relying on a third party for a core part of your business, but you only need them until you get your million users and can get enough funding to build your own storage backend.

Google is apparently working on their own storage backend, Google Drive. It will be interesting to see how this plays out. Nothing but good news for aspiring entrepreneurs, though.

via TechCrunch

Your next ISP: Google

Wednesday, February 8th, 2006

John C. Dvorak has a good piece at pcmag.com about speculations that Google is going to be creating their own network (see the “Google is the Internet” scenario from the article linked in my previous post). I certainly hope they do, as telcos have been dragging their feet on broadband for a long time, and acting like the Mafia whenever someone encroaches on “their” territory.

More articles on the subject:

Imagining the Google Future

Thursday, February 2nd, 2006

Great article from Business 2.0 describing four future scenarios for Google:

Imagining the Google Future

RIP IM Smarter

Monday, January 30th, 2006

My imsmarter proxy stopped working last week, and I just got around to going to their site to see what was up. Looks like they’ve shut down. A bit of a pity; I thought it was a useful service. I use IM on four different machines and it made finding something from a past conversation a lot easier. Actually, though, what I used most often was its reminder feature. I could send it an IM saying “Remind me in 2 hours to check the car” and it would do just that, saving me many parking tickets. I imagine there are other services like that out there; I should check around. Actually, that would be a pretty trival thing to implement myself. Hmmm.

Business Blogging

Sunday, January 1st, 2006

Chris Anderson of Wired / The Long Tail has started a wiki page to track public blogs by Fortune 500 companies. The list isn’t terribly long yet, but I’m sure it will be growing, both as more people discover existing company blogs and as more companies jump on the bandwagon.

Speaking of which, we’ve jumped on the bandwagon ourselves at Triporama. The Triporama Blog isn’t yet linked in from the main site (it will be soon), but Wendell has already posted a great piece about the origins of Triporama.

Happy New Year!

Triporama Launches

Thursday, December 15th, 2005

I’ve been pretty busy lately, if the infrequency of my posts is any indicator, but it’s paid off: Triporama officially launched yesterday. We sent out some 300 emails and then immediately left to go to the bar. Fortunately, the site held up, with only one serious bug so far, which I fixed last night.

Not that I can slack off now…we’ve got a mile-long list of features we’d like to implement. It’s great to finally get it out there, though.


Bad Behavior has blocked 102 access attempts in the last 7 days.