Let’s talk about your password model
This entire article is obviated by the password_hash
family of functions.
Please check out password_hash()
and friends for information on the up-to-date and correct way to handle passwords. Generally speaking, if you are using another method, it is wrong. More specifically, if you are using another method, and it is not based on crypt()
, or even if it is, it cannot automatically handle recreating hashes if you do something such as change algorithms or change cost factors, you’re doing it wrong. People who understand the math behind cryptography are, of course, exempt. The remainder of this article is preserved for historical reason, but please, switch to password_hash()
and friends.
Historical Content
First off, let me just say that I am by no means an expert cryptographer; there are all sorts of wonderful, terrible things about hashes and block ciphers that I just don’t understand (I’d like to believe that it’s because I’ve not been exposed to them, whoever’s fault that is, and that if given a chance I would get it), but that’s also why I’m writing this – to give the opinion of someone who recognizes his own weakness, and how that translates to another’s strength. Furthermore, this explanation gives a very simplistic view of web security that only examines one aspect of a secure system. For loads more information about securing your web application, take a look at “Dos and Don’ts of Client Authentication on the Web” [PDF] written by some very smart folks at M.I.T.
So, let’s start with a beginner’s introduction. In the beginning, there were users, and users wanted to be able to log in because otherwise being a user was rather pointless indeed. Thus, the password is born, and forevermore it becomes the goal of clever crackers and security experts alike. The first problem someone encounters with passwords is how to store them, and that depends very much on a few key factors: Audience, Exposure, and Uniqueness. If you are running a “homegrown” application (shout out to MecTracker) for use only inside the company, containing (in general) zero sensitive data, and you intend to pick user’s passwords for them (preventing the loss of a life password, itself a bad-yet-unavoidable practice), then why not just store them in plain text? Certainly makes it easy to retrieve a password for someone without having to reset it (useful for someone away from their work machine with saved password who needs to log in).
Conversely, if you’re a bank, and you’re storing any of this in plain text, you will be razed to the ground by angry tech-savvy customers and auditors alike, hopefully BEFORE you get grandma and grandpa Jones to type in the password they use for everything else, too. Hopefully, if you’re a bank, you’re using some crazy method I’m not about to describe here.
Then, there’s the middle ground. I, for example, am not a bank (who would’ve guessed? Can someone please notify my ex-girlfriend?), so my needs are much more middle-of-the-road, which is why I’ve settled for hashing. When I started using PHP, I generally stuck to simple MD5 hashes; it was 10 years ago, and breaking MD5 seemed reasonably difficult. Then I was told not to use MD5 because, at 128 bits, it was too weak, and I should be using SHA-1, which was 160 bits. Then came the recommendation for SHA-256 (guess how many bits that one is!), and then whirlpool, and so on. If you’re using a proper password strategy then you’ve been salting all along (I’ll admit I wasn’t in the old days, but you’ve got to be a beginner sometime), but if you haven’t, allow me to give you a word on salt.
“Salting” a password hash is the practice of taking a piece of input data, adding in an extra piece of information (called “salt”; see where this is going?), and hashing that, instead of just hashing the raw input. In fact, with sites that act like a search engine for MD5 and SHA-1 hashes, not salting your input is, for general purpose storage, only one-degree of separation away from just storing the data in plain text. Furthermore, good salt will be ever-changing (in this practice, the salt is also known as a ‘nonce’), and can safely be stored without obfuscation, as having included it means that a table not accounting for the nonce is useless, and a table that accounts for the nonce is only good against one of the passwords in your database. Now you’ve just made an attack much more expensive, but that may not be as useful in reality as we’d like to believe.
MD5 and SHA-1 hashes can be calculated very, very quickly. In fact, it’s generally more expensive to include some data about the current time (for use in salting/as a nonce) than it is to calculate the actual hash. Here is some experimental code to prove my point:
define('ITERATIONS',5); $tt = $th = 0; for ($j = 0; $j < ITERATIONS; ++$j) { $start = microtime(true); for ($i = 0; microtime(true) - $start < 1; ++$i) { $k = md5($i); } $tt += (microtime(true) - $start); $th += $i; } var_dump($tt / ITERATIONS, $th / ITERATIONS); |
Simply hashing the value of the counter averaged 320,000 hashes per second on my work machine, which is not very powerful, and is certainly not running this in a very optimized way. By changing what is being hashed to the current time to the microsecond, the number of hashes per second is reduced to an average of about 150,000 – in short, the hash is NOT the expensive part of what’s going on here. So, let’s say that, given a more optimized environment but a more expensive dictionary list to be hashed, that the average is 200,000 hashes per second, and the dictionary is about 50,000,000 common passwords. Simple math tells you that generating a hash list for this will take about 250 seconds, or less than 5 minutes. If it takes under 5 minutes to generate a table, and only a few seconds from there to query it, then even a database of 150,000 users can be fully cracked in just under a fortnight.
So how can this be combated? Well, strong password guidelines are a good start, but if you’re relying on users to implement password security for you, you’re probably doing it very, very wrong. I’d like to challenge one of the assumptions you’ve probably made that I’ve had to challenge recently, and that is the value of speed; speed is bad. Think about it: using a hash method that can generate a table of fifty million values in under 5 minutes sounds great from a performance perspective, but who are you really helping? Is your user going to notice that your hash method took under 1ms to calculate, or is this performance more likely to benefit someone trying to crack your passwords? Who would be more hurt if your passwords took closer to 12ms to generate and verify, your users or your would-be attacker?
If you haven’t heard of it yet, may I introduce you to Blowfish Encryption. Blowfish is designed to scale with Moore’s Law by allowing you, the programmer, to decide how long it takes to generate a hash. This is done by allowing you to specify a number which will be interpreted as a log-base-2 of how many iterations the hashing sequence should take; this metadata is then stored as part of the salt, prepended to the hash, and can be verified by the same function that created it since hashes are of fixed length and will be truncated or padded accordingly. By using a log-base-2 scale, every increment of that number (n) literally doubles the time required to calculate the hash, as it will have to undertake 2n iterations to generate the password. From what I can gather, a number like 7 or 8 is a fair industry standard at this time, and on my work machine limits the hashes-per-second to around 86.6 and 43.3, respectively.
Now, performance is a factor in real world applications, so let’s pick a number like 27, which as I said allows about 87 hashes per second. At that rate, a single dictionary table (useful for only one user, since we are salting these passwords) takes about six and a half days to generate. For that same database of 150,000 users, it would take over 2,733 years to crack. Of course, computational power will get less expensive as time goes on, and the same number of operations can and will get faster, but with the blowfish algorithm you need only increment the log to double the computational cost, keeping the cracking of your database safely outside the realm of technical feasibility.
So how does one use the blowfish algorithm in PHP? The crypt() function is your friend! However, the manual is not entirely clear on the implementation details of blowfish, as it does not include one key part (which caused me to tear my hear out a little bit, since, as a Windows user, I was unable to check the man pages for crypt(3)) in any great detail, and that is the log base. When you generate the salt, you will need to prepend it with an instruction string that tells it what kind of hash to generate, and what parameters to use. Furthermore, the salt is not sixteen characters, but sixteen BYTES, and the characters in your hash will be read as a BASE64 encoded string, which means that using characters not allowed in a base64 string will cause the function to revert back to whatever the default is on your system, probably STD_DES or MD5.
All of that information might have seemed a bit hazy, so I’ll include the timing example I used before modified to suit crypt/blowfish. Note also that I am storing the microtime result on every iteration of the for-loop, as in order to give you worst-case scenarios on the cracker’s timetable, I had to give best-case timings on the hashing, and that means as few calls to microtime as possible.
define('ITERATIONS',5); $tt = $th = 0; for ($j = 0; $j < ITERATIONS; ++$j) { $start = microtime(true); for ($i = 0; ($z = microtime(true)) - $start < 1; ++$i) { $k = crypt($i, '$2a$07$' . (string)$z); } $tt += ($z - $start); $th += $i; } var_dump($tt / ITERATIONS, $th / ITERATIONS); |
Of paramount importance is the literal string prepended to the stored value. The first four characters, $2a$, simply instruct crypt to use the blowfish algorithm. The next three, 07$, pass the number 7 as our log-base-2 argument, meaning the computation will run for 27 iterations. After that, we concatenate our salt (values shorter than 22 characters will be padded in a predictable fashion, and longer than 22 will be truncated) to the argument string and send it off on its merry, 12ms way.
Do I think I’ve defeated all the clever crackers out there? Certainly not. However, I’m definitely in a better boat for having stood on the shoulders of giants and listened to people smarter than I am about security. In fact, don’t listen to me, check out these links for more info:
(Victor) Xi Wang talks about salt, nonces and rainbow tables
Linked earlier, explains blowfish encryption – very math/pseudocode heavy.
Also linked earlier, the PHP Manual Entry for Crypt()
Happy Hashing!
February 9th, 2010 by Dereleased | 6 Comments »