Try the Hash Generator

Merkle Trees and Hash Functions: How Git, Blockchain, and Certificate Transparency Work

Hash functions underpin Git commits, blockchain blocks, and certificate transparency logs through Merkle trees. Here's why MD5 and SHA-1 are "broken," how Merkle trees verify large datasets with O(log n) hashes, and how Bitcoin light clients verify transactions without the full blockchain.

By sadiqbd Β· June 9, 2026

Share:
Merkle Trees and Hash Functions: How Git, Blockchain, and Certificate Transparency Work

Hash functions underpin Git, blockchain, BitTorrent, and certificate transparency β€” all using the same core idea

A hash function takes an input of any size and produces a fixed-size output deterministically. Change one bit of the input and roughly half the output bits change. Given the output, you cannot reconstruct the input. These properties β€” determinism, avalanche effect, and one-way irreversibility β€” make hash functions the building block of data integrity systems across computing.

Understanding Merkle trees (the structure that Git and blockchains use) reveals how hash functions scale from verifying a single file to verifying millions of transactions.


Hash function properties and where each matters

Determinism: the same input always produces the same output. Required for any verification use case β€” you hash the file, record the hash, and later hash the file again to verify nothing changed.

Avalanche effect: a tiny change to the input produces a dramatically different output. SHA-256("hello") and SHA-256("Hello") are completely different hashes. This prevents attackers from making small, hard-to-detect modifications.

Preimage resistance (one-way function): given a hash output, it's computationally infeasible to find an input that produces it. Makes hashed passwords much harder to recover.

Collision resistance: it's computationally infeasible to find two different inputs that produce the same hash. Broken for MD5 and SHA-1; currently holds for SHA-256 and SHA-3.


What makes MD5 and SHA-1 broken

"Broken" in cryptography means there are faster attacks than brute force β€” not that the function is completely useless.

MD5 collision attacks (Wang and Yu, 2004): researchers demonstrated a method to produce two different files with the same MD5 hash in minutes. This was exploited in practice: the Flame malware in 2012 used an MD5 collision to forge a legitimate-looking Microsoft certificate.

SHA-1 collision attack (SHAttered, 2017): Google's security team demonstrated a practical SHA-1 collision β€” two different PDF files with the same SHA-1 hash. The computation required approximately 9 Γ— 10¹⁸ SHA-1 operations.

SHA-1 is deprecated: browsers no longer accept TLS certificates signed with SHA-1. Git is migrating from SHA-1 to SHA-256 for object addressing.

For data integrity, not security: MD5 and SHA-1 remain useful for non-security applications β€” checking file downloads against a known hash, verifying data transfers, database checksums. They're fast and their collision weaknesses are only exploitable with significant computational effort. They're broken for security contexts (digital signatures, certificate pinning) but acceptable for basic integrity checking.


Merkle trees: how Git and blockchain scale hash verification

A Merkle tree (named after Ralph Merkle, 1987) is a tree structure where every leaf node contains the hash of a data block, and every non-leaf node contains the hash of its children's hashes. The root hash summarises all the data in the tree.

Simple example with four files:

Root hash = H(H12 + H34)
              /          \
        H12 = H(H1+H2)   H34 = H(H3+H4)
        /     \           /     \
       H1     H2         H3     H4
       |       |          |      |
    file1   file2      file3   file4

Properties:

  • Changing any single file changes its leaf hash, which changes its parent's hash, which changes the root hash
  • The root hash verifies all data in the tree
  • To prove one file hasn't changed, you only need the root hash and the path from that file's leaf to the root (O(log n) hashes) β€” not all the other file hashes

How Git uses Merkle trees

Every Git object (blob, tree, commit) has a SHA-1 (or SHA-256) hash. A commit contains:

  • The hash of the top-level tree object
  • The hash of the parent commit(s)
  • Metadata (author, message, timestamp)
  • Its own hash (derived from all of the above)

The tree object contains hashes of all file blobs and subtree objects. Blobs contain the file content.

This structure means every commit is a cryptographic snapshot of the entire repository state. Any change to any file, in any commit, changes that file's blob hash, which changes the tree hash, which changes the commit hash, which changes every subsequent commit hash (since each commit includes its parent's hash).

This is why you can't quietly edit history in Git β€” any modification to past commits changes all subsequent commit hashes, making the tampering immediately visible to anyone with a copy.


How blockchain uses Merkle trees

Bitcoin blocks use a Merkle tree of transaction hashes:

  1. Hash every transaction in the block
  2. Hash pairs of transaction hashes together
  3. Repeat until one root hash remains (the Merkle root)
  4. Include the Merkle root in the block header

The benefit: to verify a single transaction is in a block, you need only the transaction itself, the Merkle root, and logβ‚‚(n) intermediate hashes β€” not all n transactions. A Bitcoin block with 2,000 transactions can be verified with only ~11 hashes instead of all 2,000.

This enables "light clients" β€” mobile wallets that don't store the full blockchain but can still verify transactions using Merkle proofs.


Certificate Transparency and hash trees

When a Certificate Authority (CA) issues an SSL certificate, it's logged in Certificate Transparency (CT) logs β€” publicly auditable, append-only records. These logs use hash trees (similar to Merkle trees) to ensure:

  1. Certificates can't be added to the log without the addition being permanently recorded
  2. Log operators can prove a certificate was in the log at a specific time
  3. Users can verify certificates they receive are actually in CT logs

Browser requirements that all certificates be logged in CT logs mean that any CA secretly issuing fraudulent certificates would be detectable.


Practical hash verification

Verifying a downloaded file:

# Download file and verify SHA-256 hash
curl -O https://example.com/release.tar.gz
sha256sum release.tar.gz
# Compare with the published hash on the software's website

Checking file integrity in Python:

import hashlib

def sha256_file(filepath):
    sha256 = hashlib.sha256()
    with open(filepath, 'rb') as f:
        for chunk in iter(lambda: f.read(65536), b''):
            sha256.update(chunk)
    return sha256.hexdigest()

How to use the Hash Generator on sadiqbd.com

  1. Enter text or upload a file
  2. Select the algorithm β€” MD5, SHA-1, SHA-256, SHA-512, SHA-3
  3. Generate β€” the hash output appears instantly
  4. Use for:
    • Verifying file downloads against published checksums
    • Generating unique identifiers from content
    • Understanding how hash output changes with small input changes

Frequently Asked Questions

Should I use SHA-256 or SHA-3? Both are secure. SHA-256 (SHA-2 family) is more widely supported, faster in software, and the current standard for most applications. SHA-3 uses a completely different internal structure (Keccak sponge construction), designed as a backup if SHA-2 is ever weakened. Either is appropriate for new applications; SHA-256 is the common choice.

Can hash functions be used for storing passwords? Not SHA-256, SHA-512, or any fast hash. Fast hashes are inappropriate for passwords because they can be computed billions of times per second, enabling rapid dictionary attacks. Use bcrypt, scrypt, or Argon2 for passwords.

Is the Hash Generator free? Yes β€” completely free, no sign-up required.


Hash functions are the primitive that makes data integrity scalable β€” from verifying a single downloaded file to securing millions of blockchain transactions. Merkle trees take that primitive and extend it to verifying large datasets with logarithmic efficiency.

Try the Hash Generator free at sadiqbd.com β€” generate MD5, SHA-1, SHA-256, SHA-512, and SHA-3 hashes for any text or file instantly.

Share:
Try the related tool:
Open Hash Generator

More Hash Generator articles