MD5 Hashing in Python: hashlib Complete Guide
Generate MD5 hashes in Python using hashlib. Examples for strings, file checksums, hexdigest, streaming large files, and HMAC authentication.
How to Generate MD5 Hashes in Python
Python's built-in hashlib module provides MD5 hashing through hashlib.md5(). The function accepts bytes and returns a hash object with methods for hex or binary output. While MD5 is not suitable for security purposes, it remains widely used for file integrity checks, cache keys, and legacy system compatibility.
Basic String Hashing
import hashlib
# Hash a string (must encode to bytes first)
hash_obj = hashlib.md5(b\"Hello, World!\")
print(hash_obj.hexdigest()) # 65a8e27d8879283831b664bd8b7f0ad4
# Using .encode() for string variables
text = \"Hello, World!\"
md5_hash = hashlib.md5(text.encode(\"utf-8\")).hexdigest()
print(md5_hash) # 65a8e27d8879283831b664bd8b7f0ad4
# Binary digest (16 bytes) vs hex digest (32 chars)
print(len(hash_obj.digest())) # 16 bytes
print(len(hash_obj.hexdigest())) # 32 hex characters
File Checksums
import hashlib
def md5_file(filepath):
\"\"\"Calculate MD5 checksum of a file.\"\"\"
hash_md5 = hashlib.md5()
with open(filepath, \"rb\") as f:
for chunk in iter(lambda: f.read(8192), b\"\"):
hash_md5.update(chunk)
return hash_md5.hexdigest()
checksum = md5_file(\"ubuntu-24.04.iso\")
print(f\"MD5: {checksum}\")
# Verify against a known checksum
expected = \"d41d8cd98f00b204e9800998ecf8427e\"
assert md5_file(\"download.iso\") == expected, \"Checksum mismatch!\"
Incremental Hashing
The update() method lets you hash data incrementally, which is essential for large files or streaming data:
import hashlib
md5 = hashlib.md5()
md5.update(b\"Hello, \")
md5.update(b\"World!\")
print(md5.hexdigest()) # Same as hashing \"Hello, World!\" at once
# This is how file chunking works internally
HMAC with MD5
import hmac, hashlib
# HMAC-MD5 for message authentication (legacy APIs)
key = b\"secret_key\"
message = b\"important data\"
signature = hmac.new(key, message, hashlib.md5).hexdigest()
print(signature) # keyed hash that verifies both integrity and authenticity
When to Use MD5 in Python
- File verification — comparing downloads against published checksums
- Data deduplication — quick content comparison in ETL pipelines
- Cache keys — generating deterministic keys from query parameters or content
- Legacy compatibility — systems that still require MD5 signatures or checksums
Never use MD5 for: password hashing (use bcrypt or argon2), digital signatures, or any security-critical application.