How Diff Works: Git, Version Control, and the Myers Algorithm Explained

Git diff, pull request reviews, and three-way merges all run on the Myers diff algorithm. Here's how diff finds the shortest edit between two texts, how to read git diff output, why merge conflicts happen, and how diff applies beyond version control.

Diff is how software teams know exactly what changed — and the algorithm behind it is elegant

When you run git diff, git log -p, or open a pull request and see lines highlighted in green and red, you're looking at the output of a diff algorithm. The question "what changed between these two versions?" is deceptively simple — the naive answer is to compare every line to every other line, which quickly becomes computationally expensive. The real answer uses a clever algorithm that finds the shortest edit script between two sequences.

Understanding diff turns the output from a wall of coloured lines into readable, actionable information.

How diff algorithms work: the Myers algorithm

The most widely used diff algorithm is the Myers algorithm (Eugene Myers, 1986), which underlies GNU diff, Git, and most code comparison tools. It finds the shortest sequence of insertions and deletions (the edit distance) that transforms text A into text B.

The algorithm works on a graph:

Each position in text A is on one axis
Each position in text B is on the other axis
Moving diagonally means "this line matches" (no change)
Moving horizontally means "delete this line from A"
Moving vertically means "insert this line from B"

Finding the shortest edit path through this graph is equivalent to finding the longest common subsequence (LCS) of the two texts — the lines that appear in both, in order. Lines in the LCS are unchanged; everything else is either added or removed.

Why this matters practically: the diff algorithm doesn't just mark things as "different" — it specifically finds which lines are shared between both versions and uses them as anchors. This is why git diff produces sensible, human-readable output even on heavily modified files.

Reading git diff output

diff --git a/calculator.py b/calculator.py
index 7f3c8a2..9b1e4d6 100644
--- a/calculator.py
+++ b/calculator.py
@@ -12,7 +12,9 @@ def calculate_emi(principal, rate, tenure):
     monthly_rate = rate / 12 / 100
-    emi = principal * monthly_rate / (1 - (1 + monthly_rate) ** (-tenure))
+    if monthly_rate == 0:
+        return principal / tenure
+    emi = principal * monthly_rate / (1 - (1 + monthly_rate) ** -tenure)
     return round(emi, 2)

Header line: --- a/calculator.py is the old file, +++ b/calculator.py is the new file.

Hunk header: @@ -12,7 +12,9 @@ means:

-12,7: the shown section starts at line 12 of the old file and shows 7 lines
+12,9: starts at line 12 of the new file and shows 9 lines (more lines because 2 were added)

Change lines:

Lines starting with -: removed from old file
Lines starting with +: added in new file
Lines with no prefix: context lines (unchanged, shown for reference)

Reading this diff: a zero-rate check was added (handles the edge case where rate is 0, which would cause division by zero), and a minor operator change (** (-tenure) → ** -tenure).

Three-way merge: why conflicts happen

A regular diff compares two files. A three-way merge (what git merge uses) works with three versions: a common ancestor (the base), version A (one branch), and version B (another branch).

How it works:

Diff base vs. A — identify what A changed
Diff base vs. B — identify what B changed
Where A and B changed different lines: apply both changes (auto-merge)
Where A and B changed the same lines differently: conflict

A conflict happens when the same region was modified in both branches since their common ancestor. Git can't automatically know which version is correct — it needs a human decision.

Conflict markers in files:

<<<<<<< HEAD
emi = principal * monthly_rate / (1 - (1 + monthly_rate) ** -tenure)
=======
emi = principal * monthly_rate / (1 - pow(1 + monthly_rate, -tenure))
=======
>>>>>>> feature-branch

Everything between <<<<<<< and ======= is the current branch (HEAD)
Everything between ======= and >>>>>>> is the incoming branch
The human (or merge tool) decides which to keep, or writes a combined version

Diff in code review workflows

GitHub, GitLab, and Bitbucket pull requests display a side-by-side or inline diff of every changed file. Best practices for reviewing diffs effectively:

Focus on the semantic change, not the visual change. A line that moved unchanged appears as a deletion and addition in the diff. A refactor that extracts a function may show as many additions and deletions while being logically equivalent. Understand what the code is doing, not just which lines changed.

Check the hunk headers. The @@ -X,Y +X,Y @@ header shows exactly which part of the file you're looking at. Large diffs that span many hunks can be disorienting without tracking which section of the file is being changed.

Review edge case additions. The example diff above added a zero-rate check — the most important line in the diff from a correctness standpoint. In a large diff with many trivial changes, edge case handling can get missed.

Use git log -p for archaeology. Viewing the commit history with patches (git log -p -- filename) shows every change ever made to a file with full diff context — invaluable for understanding why code was written a certain way.

diff in non-code contexts

Diff is a general-purpose comparison algorithm, not just for code:

Document comparison: word processors (Word, Google Docs) show tracked changes using diff. Legal contracts, academic papers, and policy documents use diff-based review.

Configuration management: diff between last-known-good and current config files reveals unintended changes. Infrastructure-as-code (Terraform, Ansible) shows planned changes as diffs before applying.

Database schema migration: migration tools show the diff between current schema and target schema as a set of ALTER TABLE statements.

Content management: many CMS platforms show edit history as diffs, allowing recovery of accidentally deleted content.

How to use the Text Diff tool on sadiqbd.com

Paste text A — the original version
Paste text B — the modified version
Compare — the tool shows additions (green), deletions (red), and unchanged lines
Review — identify exactly what changed between the two versions

Useful for:

Comparing two versions of a document to find what was changed
Spotting accidental modifications in configuration or data files
Verifying that a "no-change" edit truly didn't alter content
Comparing two pieces of similar content for plagiarism analysis

Frequently Asked Questions

Why does git diff sometimes show whitespace-only changes in unexpected ways? By default, git diff marks whitespace changes as modifications. Use git diff -w to ignore all whitespace differences, or git diff --ignore-space-change to ignore only changes to the amount of whitespace.

What's the difference between a unified diff and a context diff? The format shown above (with ---, +++, and @@ headers) is unified diff format, which interleaves additions and deletions in a single view. Context diff (older format) shows deletions and additions separately in two sections. Unified diff is now nearly universal.

Is the Text Diff tool free? Yes — completely free, no sign-up required.

Diff is infrastructure — invisible when it works, indispensable when you need to understand exactly what changed and why. The text diff tool brings the same comparison capability to any two pieces of text.

Try the Text Diff tool free at sadiqbd.com — compare any two texts and see exactly what changed, highlighted clearly.