Character Frequency Counter

Analyze any text and see how often each character appears. Sort by frequency, alphabetically, or by category.

Frequency Table
CharCountBar%
Start typing to see frequency analysis…

Frequently Asked Questions

Frequency analysis is a classical cryptanalysis technique. In English, E is the most frequent letter (~13%), followed by T, A, O, I, N. If you know ciphertext is a substitution cipher, comparing its letter frequencies to expected English frequencies can help identify the mapping.

In a large corpus of English text, the letter frequency order (most to least common) is approximately: E, T, A, O, I, N, S, H, R, D, L, C, U, M, W, F, G, Y, P, B, V, K, J, X, Q, Z. The most frequent letter E accounts for about 12–13% of all letters.

The percentage is the count of that character divided by the total number of characters in the analyzed set. If "Case-insensitive" is on, a and A are merged; if "Letters only" is on, only alphabetic characters are counted in the total.

In a large corpus of English text, the most frequent letters in order are: E, T, A, O, I, N, S, H, R, D, L, followed by C, U, M, W, F, G, Y, P, B, V, K, J, X, Q, Z. The letter E accounts for roughly 12–13% of all letters. This frequency order is why certain letters are more valuable in word games and why the most common keys are placed on the home row of optimised keyboard layouts like Dvorak.

Frequency analysis is a technique used to break substitution ciphers by exploiting the fact that letters appear with predictable regularity in a language. If a ciphertext has the symbol Q appearing most often (~12% of characters), it is likely a substitution for E. By mapping ciphertext symbols to expected English frequencies, analysts can reconstruct the cipher alphabet. This method was described by Arab polymath Al-Kindi in the 9th century and was the primary method for breaking codes until the invention of polyalphabetic ciphers. Modern ciphers (AES, RSA) are immune to frequency analysis.

Yes, significantly. Each language has a distinct frequency profile. In German, E is even more dominant (~17%). In French, E, A, and S dominate. In Spanish, E and A are the top two. In Portuguese, A is the most common letter. These differences allow analysts to identify a text's language by comparing its observed letter frequencies against known language profiles — a technique used in language detection algorithms and natural language processing pipelines.

A bigram is a sequence of two adjacent characters (or words); a trigram is a sequence of three. In English, the most common letter bigrams are TH, HE, IN, ER, AN, and the most common trigrams are THE, AND, ING, ION. Bigram and trigram frequency analysis is used in spell-checkers, language models, input method editors (IMEs), and cryptanalysis — pairs and triples are harder to disguise than single letters, making them powerful codebreaking tools against polyalphabetic ciphers.

Scrabble tile distribution is directly based on English letter frequency. Common letters receive more tiles and lower point values — there are 12 E tiles worth 1 point each, while rare letters have fewer tiles and higher values — only 1 Z tile worth 10 points. The original tile distribution was determined by Alfred Mosher Butts by counting letter frequencies in newspapers. Understanding frequency analysis helps Scrabble players appreciate why certain tiles are more useful and how the game's balance was designed.

In programming and data processing, unexpected character distributions often signal encoding problems. If a file contains an unusually high frequency of characters like Ã, â, or Â, it may be a UTF-8 file incorrectly interpreted as Latin-1 (Mojibake). A high proportion of null bytes (\0) suggests binary data read as text. Invisible control characters (codes 0–31) appearing frequently indicate a Windows CRLF vs Unix LF mismatch or embedded control sequences. Character frequency analysis is a quick first diagnostic for data pipeline text corruption.

A Caesar cipher shifts every letter by a fixed number of positions (e.g., A→D, B→E with shift 3). Because the shift is uniform, letter frequencies are preserved — only the identities change. To crack it: find the most frequent letter in the ciphertext (it is almost certainly a substitution for E), calculate the shift between that letter and E, then apply the reverse shift to decode the entire message. For example, if H is most common, the shift is 3 (H − E = 3), so you shift every letter back by 3. A Caesar cipher over any reasonable text length can be broken in seconds this way.

About This Character Frequency Analyser

This free character frequency analyser counts how many times each character appears in any text and displays the results sorted by frequency. Letters, digits, spaces, punctuation, and Unicode characters are all counted separately.

Character frequency analysis has applications in cryptography (cracking substitution ciphers), linguistics (identifying languages), quality assurance (finding unexpected characters), and data cleaning.

When to use this tool

  • Analysing letter frequency in cipher text for cryptanalysis
  • Finding unexpected characters in imported or pasted data
  • Counting punctuation or whitespace occurrences in text
  • Identifying the most common terms in a block of content

How It Works

Paste Your Text

Paste any text into the input area. Choose whether to count case-insensitively, ignore spaces, or count letters only.

Count & Sort

Each character in the filtered text is counted. Results can be sorted by frequency (descending or ascending) or alphabetically.

Visual Bar Chart

A proportional bar and percentage show each character's relative frequency. The most frequent character's bar spans the full column width.

Common Use Cases

Cipher Cryptanalysis

Analyze ciphertext from substitution ciphers by comparing character frequencies to expected English letter distributions, helping decode unknown ciphers.

Language Identification

Different languages have distinct character frequency profiles. Frequency analysis can confirm whether unknown text is English, French, German, or another language.

Font Optimization

Identify the most-used characters in your content to prioritize which glyphs to include when subsetting a custom web font for performance.

Writing Style Analysis

Compare character frequency profiles of different authors or writing styles. High frequency of rare characters may indicate jargon-heavy or technical writing.

Keyboard Layout Design

Frequency analysis of a text corpus is a key input for designing optimized keyboard layouts (like Dvorak or Colemak) that place common keys on home row.

Data Quality Inspection

Detect unexpected characters in imported data — stray non-ASCII symbols, invisible control characters, or encoding artifacts — by inspecting the frequency table.

Related Articles

View all articles
Huffman Coding: How "E Is the Most Common Letter" Becomes Smaller ZIP Files and PNG Images

Huffman Coding: How "E Is the Most Common Letter" Becomes Smaller ZIP Files and PNG Images

Every ZIP file, PNG image, and gzip-compressed page relies on the same observation as the previous article's frequency analysis: characters aren't equally frequent, so assigning shorter codes to common symbols and longer codes to rare ones saves space. Here's how Huffman coding builds optimal variable-length codes from frequency data, why "prefix-free" codes need no separators, how this fits into larger compression pipelines like DEFLATE, and why already-compressed data resists further compression.

Jun 17, 2026
Frequency Analysis: How Counting Letters Breaks Caesar Ciphers, Substitution Ciphers, and Why Modern Encryption Is Immune

Frequency Analysis: How Counting Letters Breaks Caesar Ciphers, Substitution Ciphers, and Why Modern Encryption Is Immune

A Caesar cipher can be broken in seconds — not by trying all 25 shifts, but by counting which ciphertext letter appears most often and matching it against English's most common letter, "E." Here's how frequency analysis breaks substitution ciphers, why polyalphabetic ciphers like Vigenère were designed to defeat it, and why modern encryption (AES, RSA) is immune to this entire category of attack.

Jun 17, 2026
Character Frequency and NLP Foundations: How Zipf's Law Underlies Search Engines and Language Models

Character Frequency and NLP Foundations: How Zipf's Law Underlies Search Engines and Language Models

Word frequency analysis underlies search engines, compression algorithms, and how large language models learn. Here's Zipf's Law, TF-IDF for meaningful keyword extraction, how word embeddings come from co-occurrence statistics, and why the character frequency distribution you measure is the same foundation that GPT models learn from.

Jun 10, 2026
Shannon Entropy and Character Frequency: The Information Theory Behind Text Analysis

Shannon Entropy and Character Frequency: The Information Theory Behind Text Analysis

Character frequency analysis connects directly to Shannon entropy, data compression, and information theory. Here's what the distribution of characters in text reveals about compressibility, password strength, Zipf's Law, and stylometric authorship analysis.

Jun 8, 2026
Character Frequency — Count How Often Every Character Appears in Your Text

Character Frequency — Count How Often Every Character Appears in Your Text

Learn how character frequency analysis works, what the classic English letter distribution looks like, and how it's used in cryptography, data cleaning, linguistics, and writing analysis — with a free character frequency tool.

Jun 6, 2026