Try the Sort Lines

Alphabetical Sort Order Isn't Universal: Locale Collation, Swedish Å, and Why Your Database Might Be Sorting Wrong

Alphabetical sort order isn't the same in every language — Swedish Å, Ä, Ö go at the end of the alphabet; German has two competing sort conventions for umlauts; Spanish ñ sits between n and o. Most sort tools and database defaults use Unicode code-point order, which is correct for English and wrong for nearly every other language. Here's what locale-sensitive collation actually is, how to configure it in SQL, JavaScript, and Python, and the case/accent-sensitivity dimensions on top of letter ordering.

By sadiqbd · June 17, 2026

Share:
Alphabetical Sort Order Isn't Universal: Locale Collation, Swedish Å, and Why Your Database Might Be Sorting Wrong

Sorting a list of 100 items vs 1,000,000 items isn't just "slower by the same factor" — the relationship between list size and sort time depends entirely on which sort algorithm is being used, and the difference between algorithms is the difference between "takes 1 second" and "takes 3 days"

The previous articles on this site covered sort algorithm trade-offs, sort-order conventions (natural vs lexicographic), and sort behavior across languages and databases. This article addresses locale-sensitive sorting — the fact that alphabetical sort order isn't the same in every language — and why a "correctly sorted" list in one locale can be wrong in another.


The assumption most sort tools make: English ASCII order

When a sort tool (or a database ORDER BY, or a programming language's default sort) sorts text, it typically compares characters by their code point values — a number assigned to each character in Unicode. For basic ASCII characters (A-Z, a-z, 0-9), this produces an order that matches English alphabetical order (roughly — uppercase letters sort before lowercase in ASCII, which isn't how humans sort alphabetically).

This works acceptably for English text. It fails for virtually every other language.


What goes wrong in other languages

Swedish: the letters Å, Ä, and Ö are sorted at the end of the Swedish alphabet — after Z. In a Unicode/ASCII code-point sort, they might sort based on their code point positions, which could place them inconsistently relative to where Swedish speakers expect them.

Spanish: traditionally, "ch" and "ll" were treated as single letters in the Spanish alphabet (sorted after c and l respectively) — though modern Spanish orthography eliminated this, dictionaries and older materials may still reflect it. More relevantly, the ñ sorts between n and o in Spanish — a code-point sort would likely misplace it.

German: the "sorting weight" of ä, ö, ü is debated — DIN standard sorts them as if they were ae, oe, ue (for dictionary ordering), while phone book ordering sorts them among the other vowels. Neither matches a naive Unicode sort.

Danish/Norwegian: Æ, Ø, Å sort at the end of the alphabet, and the order between them differs between Danish and Norwegian.

Arabic, Hebrew, and RTL languages: have their own alphabetical ordering conventions that bear no relationship to Unicode code points.

Japanese: has multiple character systems (hiragana, katakana, kanji) that each have their own ordering conventions, and kanji can be ordered by reading or by stroke count — the "correct" order depends on context.


ICU and CLDR: the technical infrastructure for locale-aware sorting

The International Components for Unicode (ICU) library — and the Unicode Common Locale Data Repository (CLDR) it's built on — contain locale-specific sorting rules for hundreds of locales. This is the foundation that most modern programming languages and databases use for locale-sensitive string comparison:

  • JavaScript: Array.prototype.sort() with a custom localeCompare() comparator — strings.sort((a, b) => a.localeCompare(b, 'sv')) for Swedish, for example
  • Python: the locale module (for system locale), or the pyicu library for more complete ICU support
  • SQL databases: collation settings — ORDER BY name COLLATE Swedish_CI_AI in SQL Server, or ORDER BY name COLLATE "sv-SE-x-icu" in PostgreSQL
  • Java, .NET: built-in Collator classes using ICU data

The important distinction: most default sort behaviors in code don't use locale-sensitive collation unless explicitly configured. "Sort alphabetically" in code typically means "sort by Unicode code point" — which produces wrong results for most non-English languages.


Why this matters for your sort tool and for databases

For a text sort tool processing user input:

  • Input is sorted as-entered, which may or may not be locale-appropriate
  • For English text, the difference is usually minor
  • For Swedish, German, Spanish, or many other languages, the output may be demonstrably wrong by the conventions of that language

For databases with multilingual content:

  • Choosing the correct collation at table/column creation time matters — and is difficult to change later without reindexing
  • A collation mismatch means ORDER BY returns results in the wrong order, and index-accelerated string comparisons may give incorrect results
  • Many databases default to a "binary" or "ASCII" collation unless explicitly configured otherwise

For APIs returning sorted data to international clients:

  • The "correct" sort order may need to differ based on the user's locale
  • Server-side sorting with a fixed collation will be wrong for some users regardless of which collation is chosen

Case-insensitive and accent-insensitive sorting: two more dimensions

Beyond letter order, sort collations typically have settings for:

Case sensitivity: does "Apple" sort before or after "apple"? Or are they treated identically for ordering purposes? Most human-facing sorts are case-insensitive; most binary sorts are case-sensitive.

Accent/diacritic sensitivity: does "café" sort identically to "cafe"? Accent-insensitive collations treat them as equivalent; accent-sensitive collations sort them differently. In database terms, _CI_AS = Case Insensitive, Accent Sensitive; _CI_AI = Case Insensitive, Accent Insensitive.

The combination of case and accent sensitivity settings compounds: a collation can be: case-sensitive and accent-sensitive, case-insensitive and accent-sensitive, case-sensitive and accent-insensitive, or case-insensitive and accent-insensitive — each producing different sort results for the same input.


How to use the Sort Lines tool on sadiqbd.com

  1. For English text: standard alphabetical sort produces correct results; case sensitivity preference (uppercase before lowercase, or case-insensitive) is the main setting to check
  2. For non-English text with diacritics: be aware that the sort may not match the conventions of the language in question — this is a limitation of tools using Unicode code-point ordering rather than locale-sensitive collation
  3. For database or application sorting: the tool is useful for verifying expected order — if you need a specific locale's collation for production use, implement it at the database or application layer with appropriate ICU/collation settings, not by sorting text in this tool and then hardcoding the result

Frequently Asked Questions

If I'm building a multilingual application, should I sort on the server or let the client sort? For small data sets that are already loaded on the client, client-side sorting with localeCompare() using the user's locale (navigator.language in browsers) is the simplest approach and correctly adapts to the individual user's locale. For large data sets where you're paginating (showing 50 of 10,000 results), server-side sorting is necessary — and the server needs to use the appropriate locale collation for the user, which requires either accepting the user's locale as a parameter or applying a sensible default based on context. Hardcoding any single locale's sort order is wrong for at least some of your users regardless of which locale you choose — the question is whether the wrongness matters for your specific use case.

Is the Sort Lines tool free? Yes — completely free, no sign-up required.

Try the Sort Lines tool free at sadiqbd.com — sort any list alphabetically, numerically, or by length instantly.

Share:
Try the related tool:
Open Sort Lines

More Sort Lines articles