Modern cryptography offers a wide range of tools and techniques to protect data in transit and at rest. Cryptographic hash functions belong to the type of cryptographic protocol that is based on block ciphers. They play a significant role in securing modern-day communication systems and work as a barrier for intruders by ensuring data authenticity. Securely saving and retrieving passwords and anonymous cryptographic transactions are a few of the many hash function applications.
The article provides a broad overview and usage of cryptographic hash functions. We explain hash properties, their usage in various domains, possible attacks and weaknesses, and most importantly, ways to strengthen hashes and improve hash functions.
What is a Hash Function?
It is a one-way function or a unique identifier that, given a variable-length input, outputs a fixed-length hash digest. However, the length of the output depends on the hashing algorithm. In general, the most popularly known algorithms have a hash length of 160-512 bits.
But to output the data to a fixed-length, hash functions first take in a pre-set block of data known as data blocks. The size of the data block varies from one algorithm to another, but it is always the same for one algorithm. For instance, the SHA-1 hash function accepts a block size of 512 bits, and if the input is of the same size, it runs only once, that is 80 rounds. If the input size is 1024, the algorithm will run twice.
Realistically, the input size is rarely a multiple of 512-bit. In such cases, we employ a technique known as Padding that divides the message/data into same-length data blocks. Such that the function is performed on the basis of a total number of data blocks and processes each data block at a time. It takes the output of the first block as an input with the second block, and so on. Hence, the final hash is the combined value of all output values.
Hashing vs. Encryption
Hashing and encryption are unique and separate processes with their own set of features, properties, and procedures.
Encryption is a two-way/reversible process as it incorporates the use of a key that is either shared or are mathematically related but non-identical public and private keys. Unlike encryption, hashes are easy to compute and difficult to reverse to the original plaintext.
Hashing provides integrity of the data. Encryption, on the other hand, renders data confidentiality.
The Good of Hashes
Even though hashing is not encryption, it is a form of cryptography that provides:
- password protection
- data integrity / file verification
- digital signatures, and
- virus signatures.
Password Protection
Whenever a user enters a password for authentication, the password hash is compared against the file containing system hashes in the computer. The system only allows access after successful authentication. For instance, Windows stores password hashes in the Security Account Manager (SAM) file, whereas Linux stores password hashes in the /etc/shadow file.
File Verification
Similarly, some websites share a hash value to verify the integrity of downloaded software that ensures it is not corrupt or someone did not tamper with the file during download.
For instance, the website for downloading Linux Mint 20.2 “Uma”-Cinnamon (64-bit) ISO image shares its SHA256 hash in the sha256sum.txt file. To verify the integrity of the image, cd into the downloaded image directory and generate the SHA256 sum as follows:
The sha256sum.txt file contains four hashes depending on the different Desktop releases. Compare the generated hash value with the Cinnamon desktop ISO image hash in the file to verify its integrity if they match; that means the ISO image is ready to use.
Before SHA256, the MD5 hashing algorithm was used to verify the integrity of a downloaded file but it is no longer a true cryptographic hash algorithm as it is not collision-resistant (more on this later).
Digital Signatures
A digital signature authenticates the sender by appending the original message with the encrypted message digest. The sender encrypts with the private key to ensure nonrepudiation while the hash protects against data tampering and provides integrity i.e., digital signature=sender’s private key(hash(message)).
The receiver decrypts the message digest with the sender’s public key and takes the hash of the original message to compare against the decrypted hash.
Virus Signatures
Antivirus solutions use various approaches to identify malware; one of them is hash matching. They take a portion or block of an executable to create a hash and compare it against the hashes of malware stored in their databases.
Properties of Hashes
The set of properties that make hash functions play a critical role in public-key cryptography are as follows:
- A good hash algorithm returns a hash value of fixed size/length irrespective of the input size.
- It offers pre-image resistance that means it is impossible to recover the original value by reversing the hash.
- A strong hashing algorithm ensures collision resistance. That is, no two different inputs can have similar output.
- A minor change in the input generates significant changes in the output. This property of hashes helps ensure file/data integrity.
- The computational speed of the cryptographic hash functions is another ideal property. However, it is subjective and varies on the basis of purpose and area of application.
Modes of Hashes
The most known hash functions are Message-Digest Algorithm (MDA), Secure Hash Algorithm (SHA), NTLM, etc.
- MD5: MD5 is the fifth version of Message-Digest Algorithms that has an output length of 128-bit. It was the most widely known hashing algorithm until it became prone to collision attacks (more on this later). For instance, before Secure hashing algorithms (SHA), the MD5 hashing algorithm was the most commonly used method for file integrity verification.
- SHA: The Secure Hash Algorithm was introduced by NSA. It is a suite of algorithms containing four variant functions SHA-224, SHA-256, SHA-384, and SHA-512. Each variant name represents the size of its output. It is a more secure hashing algorithm as no compromise of the hashing algorithms is so far known.
- NTLM: The NT LAN Manager hash algorithm is used for hashing passwords. NTLM uses cyclic redundancy checks and message digests, but its only drawback is, it is based on the RC4 cipher, which in contrast to new cryptographic protocols, AES and SHA-256 have been a target of a successful attack. NTLMv2 resolves these issues by using the HMAC-MD5 128-bit system.
The Bad of Hashes
As discussed earlier, large block size hashes can slow down attackers, and reverse engineering a cryptographic hash might be difficult but it isn’t impossible. All attackers require is a time that they can easily manage by using fast hardware, and by creating collisions or side-channel attacks. The section discusses some of the ways of hash exploitation.
Collision
Collision in hashing occurs when two inputs return the same output value. Reliable hash functions are designed to provide collision resistance. But it is unavoidable due to a pigeonhole effect. According to the pigeonhole effect, there are a set number of output values against the input of any size. That means, since there will always be more inputs than output, the collision is an unavoidable event.
Rainbow Tables
As mentioned earlier, operating systems do not store passwords in plaintext. Hence, rainbow tables are pre-computed databases or lookup tables that map hashes to the plaintext password. Crackstation website, for instance, provides a massive database for cracking hashes to passwords without salting. However, the rainbow tables trade time to crack hashes with a large amount of storage space.
The advantage of rainbow tables against brute forcing is a simple search and compare operation in contrast to the automated trial-and-error attempts with a problem of hash computation. Moreover, it does not require an exact password match that means if the hash matches against any password/phrase, the system allows authentication.
John the Ripper
John is a powerful and versatile tool that helps crack the hashes. It is similar to an automated dictionary attack that uses a wordlist or a dictionary to compute the hash and compare. Hence, it allows brute force on an array of hash modes. An example of the wordlist is a rockyou.txt file that contains passwords from a breach on the rockyou.com website. The wordlist is available from Github SecLists under /Passwords/Leaked-Databases.
The simplest way to crack the hash is to use the john command with the format option to specify the hash type, path to the wordlist, and the file with the hash value. In Kali Linux, the path to the rouckyou.txt file is /usr/share/wordlists.
Pass The Hash
Pass-the-hash is a credential theft that has the potential for horizontal privilege escalation. Even though the attack can occur on Linux/Unix systems, it is more prevalent in Windows. Windows authenticates a legitimate user by matching the hash of the entered password, which is static and that only changes when the password changes. Besides, passwords are available in various locations in Windows, such as SAM and Local Security Authority Subsystem (LSASS) process memory, etc.
Hence, the attackers manipulate the challenge and response model of the NTLM security protocol that allows them to authenticate themselves as valid users. The attacker dumps the hashes of a target system and uses the ‘pass-the-hash’ tool to impersonate an authentic user. Hence, the attacker does not need to enter or brute-force the password or reverse-engineer the hash value. Find more details on the attack from here.
Birthday Attack
The attack belongs to the class of brute-force attack and is based on a birthday paradox in probability theory. It uses the birthday problem to send two different messages with the same cryptographic hash protocol to cause a collision. The attack generally aims to manipulate communication. More details about the Birthday attack are available here.
Power Up Hashes
There are various ways to protect against attacks on hashes and limit them against cryptographic hash functions.
Salted Hashes
Salting is the process of adding randomly generated data (salt) to the input of a hash function. The process helps protect against rainbow table attacks. Users can include the salt value at the start or end of the password before creating a hash that generates different outputs even if the passwords are similar. Besides, a user can keep the salt public.
Keyed Hashes
HMAC is an example of keyed hashes that uses cryptographic keys and hash functions to improve the limitations of the Message Authentication Code (MAC) algorithm. It helps achieve confidentiality and integrity of information at the same time.
Adaptive Hash Function
Adaptive hash functions are designed to reiterate their inner working, as the name suggests the user can adjust the number of iterations. Key-stretching is one technique that takes a weak key as an input, processes it, and iterates the process to output a powerful large-size key. The process can increase the key size up to 128 bits, which is difficult for brute-force attack. PBKDF2 and bcrypt are examples of adaptive hash functions.
Conclusion
The article provides an extensive overview of cryptographic hash protocols. It demonstrates how to verify file integrity and gives an overview of how it’s possible to crack password hashes via the John the Ripper tool. We also discuss a range of attacks and measures to generate uncrackable hashes via salting, keyed hashes, and adaptive hash functions.