This write-up will present:
What is Python MD5 Hash?
Python’s “hashlib” module has a cryptographic hash function called “MD5” hash. It takes a string of data and produces a 128-bit hexadecimal number. Hash can also be utilized to construct caches of large data sets, check passwords, fingerprints, file integrity, and more. It is essential to select the appropriate character encoding to convert/transform text data to binary before hashing. This is because hashing algorithms operate on binary data.
MD5 hash has three associated functions:
- encode(): This function turns a string into bytes for the hash function to use.
- digest(): It retrieves the encrypted data in bytes.
- hexdigest(): This function returns the encrypted data in hexadecimal format.
Example 1: Calculating MD5 Hash of String Objects
The following code is used to determine the MD5 hash of the specified string objects:
str1=b'Welcome to Python Guide!'
res = hashlib.md5(str1)
print(res.digest())
In the above code:
- The “hashlib” module is imported, and a specified byte string literal is initialized.
- After that, the “hashlib.md5()” function of the “hashlib” module is used to create a hash object. Here, the md5() function implements the MD5 hash algorithm.
- Lastly, the “digest()” method returns the hash value as a byte string.
Output
The MD5 hash algorithm has been implemented on the input string object.
We can also display the string value in Hexadecimal equivalent to MD5 Hash using the “hexdigest()” function:
str1=b'Welcome to Python Guide!'
res = hashlib.md5(str1)
print(res.hexdigest())
The hexadecimal representation of the encrypted data of the MD5 hash is shown below:
Example 2: Calculating MD5 Hash of Files
We can also determine the MD5 hashes of files using the “hashlib” module. To get the hash value for larger files, we need to process it in chunks for memory efficiency.
Let us utilize the following/below example code to comprehend it:
res = hashlib.md5('newfile.txt'.encode('UTF-8'))
print(res.hexdigest())
In this code, the “hashlib.md5()” function from the “hashlib” module is used to create a hash object. Here, the “encode()” method is used to encode the file object to a byte string. Lastly, the “hexdigest()” method is used to get the hash output in hexadecimal representation.
Output
The MD5 hash value of a file has been represented in hexadecimal.
We can also hash big files that are greater than “10GB”, like video games or log files. To create an MD5 hash without using all memory, we need to break the file into smaller chunks of bytes. The size of the chunks depends on things like the size of the file and the computer’s memory. We process each chunk one at a time and update the hash as we go. If there are 100 chunks, the MD5 hash will be updated 100 times:
md5 = hashlib.md5()
with open(r"Video.mp4", "rb") as f:
while chunk := f.read(4096):
md5.update(chunk)
print(md5.hexdigest())
In the above-given code, we first create a hash object using the “hashlib.md5()” function. Then, the code opens the file in read-binary mode and reads the file contents in chunks. Next, each chunk is passed to the update() method of the hash object to update the hash value.
Once the entire file has been read, the “hexdigest()” method of the hash object is called to get the hash value in hexadecimal format:
Conclusion
In Python, the “hashlib.md5()” function of the “hashlib” module is used to create a cryptographic hash by taking the string of data or files. It can produce a 128-bit hexadecimal number. This function is used along with the “encode()”, “digest()”, and “hexdigest()” functions to calculate MD5 Hash of strings or files. This blog illustrated a detailed tutorial on Python “hashlib” md5 using numerous examples.