How Long Should a Random String Be? Collision Probability Explained

How long does a random string need to be? The birthday problem shows that 6-character codes collide far sooner than most developers expect. Here's how to calculate the right length for tokens, IDs, and API keys.

A 6-character random code sounds unique — until you generate enough of them

You're building a referral system. Each user gets a unique referral code. You decide 6 random alphanumeric characters (62 possible characters per position) sounds like plenty. After all, that's 62⁶ = 56 billion possible codes.

Then your platform reaches 100,000 users. What's the probability that at least two of them have the same code?

The answer is about 8.7% — nearly a 1-in-10 chance of a collision somewhere in your user base. At 200,000 users, it's 30%. At 500,000 users, it's essentially certain.

This is the birthday problem, and it applies to every system that generates random strings. Understanding it is the difference between choosing a token length that works and choosing one that fails quietly at scale.

The birthday problem, briefly

The classic formulation: how many people need to be in a room before there's a 50% chance that two of them share a birthday? Most people guess 183 (half of 365). The actual answer is 23.

The reason it feels wrong is that we naturally think about matching one specific birthday. But the question is about any two people matching each other — which grows much faster as the group size increases.

The same logic applies to random strings. The question isn't "what's the chance this specific string was already generated?" but "what's the chance any two strings in the entire set are identical?"

The collision probability formula

For a random string drawn from a space of N possible values, the probability of at least one collision after generating k strings is approximately:

P(collision) ≈ 1 − e^(−k² / 2N)

Or equivalently, for a 50% collision probability, you need approximately:

k ≈ 1.177 × √N

Where N is the total number of possible strings.

For a 6-character alphanumeric string (62⁶ = 56.8 billion possible values):

50% collision probability at k ≈ 1.177 × √(56.8 × 10⁹) ≈ 280,000 strings

So a system generating 280,000 referral codes with 6-character strings has a coin-flip chance of a collision somewhere. That might be fine for a small blog but isn't acceptable for a payments platform.

Collision probability at different lengths

Here's what the numbers look like for alphanumeric strings (62-character set) at different lengths:

Length	Possible values	1% collision at	50% collision at
6 chars	5.7 × 10¹⁰	~33,700	~280,000
8 chars	2.2 × 10¹⁴	~2.1M	~17M
12 chars	3.2 × 10²¹	~80B	~660B
16 chars	4.7 × 10²⁸	~97T	~800T
24 chars	1.0 × 10⁴³	~1.4 × 10²⁰	~1.2 × 10²¹
32 chars	2.3 × 10⁵⁷	practically never	practically never

A 16-character alphanumeric string won't produce a collision until you've generated roughly 100 trillion of them. For any realistic application, that's functionally infinite.

Choosing the right length for your use case

Single-use verification codes (email, SMS)

6-digit numeric codes (10⁶ = 1 million possibilities) work fine here because:

They expire in minutes
Rate limiting prevents systematic guessing
The attacker would need to try ~500,000 codes on average to get lucky

The short lifespan and brute-force protection compensate for the small keyspace. 6 alphanumeric characters would offer much better protection, but 6 digits is conventional and adequate with proper rate limiting.

Referral and discount codes

These often have longer lifespans and may not have rate limiting. 8–10 characters alphanumeric is a reasonable floor. If you're running a promotion that might generate millions of redemptions, go to 12.

Session tokens

Session tokens need to be unpredictable (so attackers can't guess them) and collision-resistant (so two active sessions can't accidentally share a token). 32 random alphanumeric characters (about 190 bits) is standard. Never use anything shorter than 16 characters for session tokens.

API keys

32–64 characters is typical. The key is long-lived and grants persistent access, so both unpredictability and collision resistance need to be strong. Many services use a prefix (sk_live_, pk_test_) followed by 32+ random characters — the prefix aids identification, the random suffix provides security.

Database IDs (when UUIDs aren't enough)

UUID v4 is 122 random bits, which produces a collision probability of 50% only after generating approximately 2.7 × 10¹⁸ IDs — well beyond any practical system. For most use cases, UUID v4 is more than adequate. For shorter, URL-friendly IDs, a 16-character base62 string gives similar collision resistance with a more compact representation.

Cryptographic nonces and salts

32+ bytes (256 bits) is the standard. Nonces and salts are typically not stored in large collections (each is used once per operation), so collision resistance is less of a concern than unpredictability. The length requirement here is about preventing brute-force reconstruction.

The format question: hex, base62, or base64?

The character set affects both the length needed and practical usability:

Hex (0–9, a–f, 16 chars): 4 bits of entropy per character. Standard for cryptographic outputs (hashes, keys). Safe everywhere but verbose — you need 2× the characters for the same entropy as base62.

Base62 (0–9, A–Z, a–z, 62 chars): ~5.95 bits per character. URL-safe, readable, compact. Good default for tokens, codes, and IDs.

Base64 (62 chars + +/ or -_): 6 bits per character. Slightly more compact than base62 but + and / need URL-encoding in standard base64. URL-safe base64 (using -_) avoids this. Common for binary data encoding, less common for human-readable tokens.

Numeric only: 3.32 bits per character. Use when the recipient must type the code (SMS verification, phone entry). Compensate for the reduced entropy with rate limiting and expiry.

For most developer use cases — tokens, API keys, session IDs — base62 hits the best balance of compactness, readability, and compatibility.

How to use the Random String Generator on sadiqbd.com

Set the length — use the collision table above to choose a length appropriate for your scale and use case
Select the character set — alphanumeric (base62) for general use; hex for cryptographic contexts; custom for specific requirements
Set the quantity — generate multiple strings at once for batch use (seeding a database, creating test fixtures)
Copy — paste directly into your application, configuration, or test data

The generator uses a cryptographically secure source — the output is suitable for security-sensitive contexts like session tokens and API keys, not just test data.

A note on predictable randomness

The collision mathematics above assumes true randomness. If the random source is predictable — a poorly seeded PRNG, a timestamp-based generator, or anything using Math.random() in a security context — the effective keyspace is much smaller than the character math suggests, and the collision analysis is irrelevant.

Math.random() in JavaScript, for example, is a deterministic PRNG. Its output can theoretically be predicted from a small number of observed values. For test data and mock IDs in non-security contexts, it's fine. For session tokens, API keys, and anything an attacker might try to predict, use crypto.getRandomValues() (browser) or the crypto module (Node.js).

The sadiqbd generator uses a secure source, but if you're generating strings in application code, make sure you're pulling from the right randomness API for the context.

Frequently Asked Questions

What length random string do I need for a session token? 32 characters from a base62 set gives about 190 bits of entropy — more than enough. The industry minimum for session tokens is typically 128 bits, which is 22 base62 characters or 32 hex characters. Round up to 32 base62 for comfort.

Is UUID v4 better than a random string for database IDs? UUID v4 is a 122-bit random value with a standardised format. It's widely supported and has excellent collision resistance. The downside is its length (36 characters with hyphens) and that it's not sortable. For most use cases it's ideal; for compact, URL-friendly IDs a 16-character base62 string is a reasonable alternative with similar collision properties.

What's the difference between a random string and a cryptographically secure random string? Both look random. The difference is predictability. A standard PRNG produces sequences that appear random but are deterministic from the seed. A CSPRNG (cryptographically secure PRNG) produces sequences that are computationally infeasible to predict even with knowledge of previous outputs. Use CSPRNG for anything an adversary might try to guess.

How often should I regenerate API keys? When compromised, or when access should be revoked. Unlike passwords, API keys don't benefit from periodic rotation unless there's a specific threat model that demands it. Rotating them needlessly creates operational overhead. Do rotate them immediately if a key appears in logs, is committed to a repository, or access needs to be terminated.

Is the Random String Generator free? Yes — completely free, no sign-up required.

The length of a random string isn't arbitrary — it's a decision with calculable consequences. Get it right once by working from the collision probability math, and you won't have to debug mysterious duplicate-key errors at scale.

Try the Random String Generator free at sadiqbd.com — generate cryptographically secure strings at any length and character set, instantly.