Cocojunk

🚀 Dive deep with CocoJunk – your destination for detailed, well-researched articles across science, technology, culture, and more. Explore knowledge that matters, explained in plain English.

Navigation: Home

Rainbow table

Published: Sat May 03 2025 19:23:38 GMT+0000 (Coordinated Universal Time) Last Updated: 5/3/2025, 7:23:38 PM

Read the original article here.


The Forbidden Code: Underground Programming Techniques They Won’t Teach You in School

Module: Advanced Cryptanalysis - Cracking Password Hashes

Welcome back to "The Forbidden Code." In our journey through the digital underground, we've explored systems, networks, and hidden vulnerabilities. Now, we turn our attention to one of the most common targets: passwords. You might think passwords stored in a database as inscrutable hash values are safe, but today we're going to uncover a powerful technique used to crack them: Rainbow Tables.

Understanding rainbow tables isn't just for those on the offensive; it's crucial knowledge for anyone building secure systems. By seeing how these attacks work, you can better understand how to defend against them.

What Are Rainbow Tables?

At its core, a rainbow table is a powerful tool for reversing cryptographic hash functions, specifically when dealing with password hashes. Instead of trying to guess passwords one by one and hashing each guess (like a traditional brute-force attack), rainbow tables precompute vast numbers of potential password hashes and store them in a highly optimized way. This allows an attacker who obtains a list of stolen password hashes to quickly look up the corresponding original plaintext passwords.

Definition: Cryptographic Hash Function A mathematical algorithm that takes an input (like a password) and produces a fixed-size string of characters, called a hash value or digest. Key properties for security include:

  • Determinism: The same input always produces the same output.
  • One-way (Pre-image Resistance): It's computationally infeasible to reverse the function and find the original input from the hash output alone.
  • Collision Resistance: It's computationally infeasible to find two different inputs that produce the same hash output.

Definition: Plaintext In cryptography, this refers to the original, unencrypted data. In the context of passwords, it's the human-readable password the user actually types (e.g., "MySuperSecretP@ssw0rd").

Definition: Hash Value / Digest The output produced by a hash function after processing an input. It's a fixed-length string of characters, regardless of the input size.

The fundamental reason rainbow tables work is that while a good hash function is designed to be one-way (hard to reverse), if you already have the output hash, you just need to find any input that produces that specific hash. For passwords, this input is almost always the original password itself (or occasionally a different password that produces the same hash, which is rare but possible with hash collisions).

The Password Storage Problem

Why are passwords stored as hashes in the first place?

Definition: Database Compromise An incident where an unauthorized party gains access to a database, potentially viewing, copying, or modifying its contents.

Storing passwords in plaintext in a database is incredibly risky. If the database is compromised, all user passwords are immediately exposed. This is why standard practice dictates storing password hashes instead.

When a user logs in, the system doesn't compare the entered password to a stored plaintext password. Instead, it takes the entered password, computes its hash using the same function used to store the original password, and compares this newly computed hash to the stored hash. If they match, the user is authenticated. If they don't, authentication fails. Entering a hash value directly as a password won't work because the system would hash it again, resulting in a hash of the hash, which won't match the stored single hash.

So, the attacker's goal isn't to find the hash function's mathematical inverse (which is computationally impossible by design), but to find an input string (the password) that hashes to a given output hash. This is known as a pre-image attack.

Rainbow Tables and the Space-Time Tradeoff

To understand the power of rainbow tables, we need to grasp the concept of a space-time tradeoff.

Definition: Space-Time Tradeoff A computer science concept where an algorithm can be optimized by either using more computer memory (space) to reduce the time it takes to run, or by using less memory at the cost of taking longer to run.

Consider these basic password cracking approaches for a given hash:

  1. Pure Brute Force: For every possible password (e.g., "a", "b", ..., "aa", "ab", ...), compute its hash and compare it to the target hash.
    • Space: Very low (just storing the target hash and current guess).
    • Time: Very high (potentially millions or billions of hashes computed per target hash). For a large password space, this becomes infeasible quickly.
  2. Simple Lookup Table: Precompute the hash for every single possible password in the target password space and store the pair (password, hash) in a table.
    • Space: Extremely high (storing |P| * (password length + hash size) bits, where |P| is the size of the possible password space). This is usually prohibitively large for realistic password spaces.
    • Time: Very low (one table lookup per target hash).

Rainbow tables sit between these two extremes. They use significantly less storage than a full lookup table but require more computation during the lookup phase than a simple table (though far less than brute force). This is their core optimization.

The Genesis: Simple Hash Chains

Rainbow tables evolved from a simpler concept: hash chains. Let's explore this first to appreciate the improvement offered by rainbow tables.

The idea of a hash chain is to link together alternating plaintext passwords and hash values. To do this, we need:

  1. A hash function (H): This is the standard cryptographic hash we want to break (e.g., MD5, SHA-1).
  2. A reduction function (R): This function takes a hash value and transforms it back into a plausible plaintext password string from the target password space (e.g., converting a hash like 281DAF40 into a 6-character lowercase string like sgfnyd).

Definition: Reduction Function (R) A function used in hash chain techniques that maps a hash value (an output of the hash function) back into the domain of possible plaintext inputs (passwords). It is not an inverse of the hash function, and it doesn't guarantee finding the original input, just a valid input from the target space. Its design is crucial and often targets the expected format/charset of passwords.

A chain looks like this: P_0 --[H]--> H_0 --[R]--> P_1 --[H]--> H_1 --[R]--> P_2 --[H]--> H_2 ...

Where P_i is a potential password and H_i is a hash value. For example, using a 6-character lowercase alphabet space and a 32-bit hash:

aaaaaa --[H]--> 281DAF40 --[R]--> sgfnyd --[H]--> 920ECF10 --[R]--> kiebgt ...

To build a simple hash chain table:

  1. Choose a set of random starting points (P_0).
  2. For each starting point, compute a chain of a fixed length k.
  3. Store only the starting point (P_0) and the endpoint (P_k) of each chain. The intermediate values (H_i, P_i for i=1..k-1) are discarded.

Example Lookup with a Simple Hash Chain Table:

Suppose you have a target hash h = 920ECF10 and a simple hash chain table. To find the password:

  1. Apply the reduction function to the target hash: 920ECF10 --[R]--> kiebgt.
  2. Check if kiebgt is an endpoint in your table.
  3. If it is (like in our example chain aaaaaa -> ... -> kiebgt), retrieve the corresponding starting point (aaaaaa).
  4. Recompute the chain starting from aaaaaa: aaaaaa --[H]--> 281DAF40 --[R]--> sgfnyd --[H]--> 920ECF10.
  5. Scan this recomputed chain for your target hash 920ECF10.
  6. If found, the previous value in the chain is the password (sgfnyd).

Definition: Starting Point The initial plaintext password used to begin the computation of a hash chain.

Definition: Endpoint The final plaintext password value generated at the end of a hash chain after a fixed number of hash/reduce operations. Only this value is stored along with the starting point.

Definition: False Alarm In hash chain attacks, a false alarm occurs when a target hash's chain computation results in an endpoint that matches a stored endpoint in the table, but the target hash itself is not present in the full chain derived from the matched starting point. This happens when two different chains merge before their endpoints.

Simple hash chains face a critical problem: collisions. If at any point two chains produce the same intermediate password or hash value, they will merge and follow the exact same sequence of hash/reduce operations from that point onwards. Since the intermediate values aren't stored, you can't detect these merges efficiently during generation. This reduces the effective coverage of the table – you've done the computation for two chains, but they end up covering largely the same passwords. This is particularly problematic because the reduction function R is not collision-resistant by design (it maps a large hash space to a smaller password space), making collisions under R much more likely.

The Rainbow Improvement

Philippe Oechslin's rainbow tables tackle the collision problem of simple hash chains by using multiple different reduction functions within a single chain. Instead of just one R, a rainbow table chain uses R_1, R_2, ..., R_k sequentially.

A rainbow chain looks like this: P_0 --[H]--> H_0 --[R_1]--> P_1 --[H]--> H_1 --[R_2]--> P_2 --[H]--> H_2 ... --[H]--> H_{k-1} --[R_k]--> P_k

Each link in the chain uses a different reduction function R_i.

How this mitigates collisions: For two chains to merge completely and follow the exact same path, they must not only collide on a value (H_i or P_i) but this collision must happen at the same position (i) in both chains, because that position determines which reduction function (R_{i+1}) is applied next.

By using different reduction functions at each step, collisions that happen at different points in the chains are less likely to result in complete merges. Chains might briefly overlap, but they are less likely to become identical for the rest of their length compared to simple hash chains.

Additionally, after generating a large number of these multi-reduction chains, a postprocessing step sorts the chains by their endpoints and removes any chains that have identical endpoints. New chains are then generated to replace the removed duplicates, increasing the overall unique coverage of the table. This drastically reduces the number of effective collisions (chains that produce the same endpoint).

Example Lookup with a Rainbow Table:

The lookup process is slightly more complex because the target hash h could potentially be H_0, H_1, ..., H_{k-1} in a chain. So, for a target hash h, you must perform k different lookup attempts:

  1. Assume h is H_{k-1}. Apply R_k: h --[R_k]--> P_k. Check if P_k is an endpoint in the table. If yes, retrieve the starting point P_0, recompute the chain P_0 --[H]--> ... --[R_k]--> P_k, and scan for h.
  2. If not found, assume h is H_{k-2}. Apply R_{k-1} then H then R_k: h --[R_{k-1}]--> P_{k-1} --[H]--> H_{k-1} --[R_k]--> P_k. Check if this P_k is an endpoint. If yes, retrieve P_0, recompute P_0 --[H]--> ... --[R_k]--> P_k, and scan for h.
  3. ...and so on, until you assume h is H_0. Apply R_1, then H, then R_2, etc., all the way to R_k: h --[R_1]--> P_1 --[H]--> H_1 --[R_2]--> ... --[R_k]--> P_k. Check if this P_k is an endpoint. If yes, retrieve P_0, recompute, and scan.

This means a rainbow table lookup requires up to k times more computation (generating these intermediate chains) than a simple hash chain lookup. However, because rainbow tables suffer far fewer effective collisions, they can be built much larger for the same computational effort during generation. A larger, collision-resistant table covers a greater proportion of the password space.

The Tradeoff Quantified (Simplified):

For a hypothetical, ideal case without collisions, the relationship between the password space size |P|, the time T to build the table, the table size (number of chains) L, and the average time t to find a password for a given hash is roughly:

  • T ≈ |P| (Building the table takes roughly the equivalent computation of hashing every password once)
  • t ≈ |P| / (2 * L) (Finding a password takes time proportional to the password space size divided by the table size. More chains means faster lookup.)

This highlights the tradeoff: more time spent generating the table (T) allows for a larger table (L), which in turn reduces the time needed for individual lookups (t).

Rainbow tables significantly improve the effectiveness of this tradeoff compared to simple hash chains, making large-scale precomputed tables for password cracking practical for certain password spaces and hash types.

Real-World Examples & Vulnerabilities

Rainbow tables are specific to the hash function and the character set/length they were generated for. An MD5 rainbow table is useless against SHA-256 hashes, and a table for 8-character lowercase letters won't crack a 10-character password with symbols.

Historically, certain hash functions were particularly vulnerable:

  • LM Hash: An old algorithm used by Microsoft Windows (NT/2000/XP). It has several critical flaws:
    • Passwords longer than 7 characters are split into two 7-character chunks, hashed separately. This means a 14-character password is no stronger against this attack than two 7-character passwords.
    • Passwords are case-insensitive and padded with nulls.
    • It uses a fixed, known constant (the string "KGS!@#$%"). These weaknesses make LM hash tables relatively easy to generate and use, and they were notoriously vulnerable to rainbow table attacks. Using passwords 15 characters or longer would bypass LM hash generation entirely on older systems.
  • Unsalted Hashes (e.g., some old MD5 implementations): If a system simply stores hash(password), the same password always produces the same hash. This is the ideal scenario for a rainbow table – one table can crack that specific hash for any user who uses that password. This makes password reuse across vulnerable systems particularly dangerous.

While once a dominant technique, rainbow tables have seen reduced usage in recent years for reasons we'll discuss next, but they remain relevant against legacy systems or poorly implemented password storage. Tools like Ophcrack (focused on LM hash) and RainbowCrack (more general purpose) were prominent implementations.

Defense Against Rainbow Tables: The Forbidden Code Strikes Back

Knowing how rainbow tables work is the first step to defending against them. Fortunately, effective countermeasures exist and are standard practice in modern systems.

  1. Salting: This is the most critical defense against rainbow tables.

    Definition: Salt A unique, randomly generated string of data added to a password before it is hashed. The salt is usually stored in plaintext alongside the resulting salted hash.

    Instead of hashing just the password (hash(password)), modern systems hash the password concatenated with a salt: hash(password + salt) or hash(hash(password) + salt).

    Why is this effective?

    • Uniqueness: Even if two users have the same password, if they are assigned different salts, their resulting salted hashes will be completely different (hash("password" + "saltA") != hash("password" + "saltB")).
    • Invalidates Tables: A rainbow table is precomputed for a specific hash function without any salt (or for a single specific salt value). To use a rainbow table against a salted hash, an attacker would need a separate rainbow table generated for each unique salt value used in the database.
    • Salt Size Matters: If the salt is small (e.g., 12 bits, as in older Unix passwords, allowing only 4096 possible salts), generating a separate table for each salt is still feasible, although more expensive for the attacker (4096 tables instead of 1). With large salts (e.g., 128 bits in SHA2-crypt or bcrypt), the number of possible salt values is astronomically large, making it computationally infeasible for an attacker to precompute tables for every possible salt.

    Salting doesn't make the hash function any harder to reverse for a single password and its known salt. But by requiring a unique table for each salt, it shifts the infeasibility from finding a pre-image to finding all pre-images across a system with many users and many salts.

  2. Key Stretching (Password Strengthening): This technique increases the computational cost of hashing a single password, making both brute-force attacks and rainbow table generation much slower.

    Definition: Key Stretching A technique that involves repeatedly applying a cryptographic hash function (or a similar function) multiple times to the password (and salt) to generate the final hash. This intentionally increases the time and computational resources required to perform the hashing operation.

    Instead of hash(password + salt), stretching computes hash(hash(...hash(password + salt)...)), iterating hundreds or thousands of times.

    Why is this effective?

    • Increased Cost per Attempt: Each guess in a brute-force attack now takes significantly longer to hash. If a hash takes 1000 iterations, an attacker can perform 1000 times fewer guesses per second on the same hardware.
    • Increased Table Generation Cost: Similarly, building a rainbow table for a stretched hash takes vastly more time and computational power during the table generation phase, making it less practical to build large tables.
    • Minimal User Impact: The extra computation time (e.g., a few hundred milliseconds) is usually imperceptible to a user logging in, but it adds substantial overhead for an attacker trying to compute millions or billions of hashes.

    Modern, recommended password hashing schemes like bcrypt, scrypt, and Argon2 are designed to be slow and resistant to parallel processing (unlike fast hashes like MD5 or SHA-1), incorporating both salting and key stretching (or similar mechanisms). MD5-Crypt, an older standard, also incorporated iterations (a form of stretching) but is still based on MD5, which has other weaknesses.

  3. Password Complexity and Length: While not a direct cryptographic defense, encouraging users to choose longer and more complex passwords increases the size of the possible password space |P|. As seen in the formulas, a larger |P| requires a proportionally larger table L to maintain the same lookup speed t. Beyond a certain size/complexity threshold (which increases as computing power grows), generating comprehensive rainbow tables becomes computationally infeasible.

Rainbow tables remain a powerful tool against systems using older, unsalted, or fast hash functions. However, properly implemented salting and key stretching render them largely ineffective. Understanding the mechanics of rainbow tables reveals the importance of these modern password storage best practices. This knowledge empowers you, whether you're exploring offensive techniques or fortifying defenses, to appreciate the constant evolution of the digital security landscape.

Related Articles

See Also