← Back to docs

HTML Cleaning

JustHTML includes a built-in, policy-driven HTML sanitizer intended for rendering untrusted HTML safely.

This page focuses on HTML cleaning: tags, attributes, and inline styles. For URL validation and rewriting, see URL Cleaning.

On this page:

Important: DOM vs construction

The parsed DOM is sanitized by default at construction time (JustHTML(..., sanitize=True)), and serialization is a pure output step.

If you want to sanitize after other transforms or after direct DOM edits, apply the Sanitize(...) transform to sanitize the in-memory tree.

Safe-by-default construction

By default, construction removes all dangerous html:

from justhtml import JustHTML

user_html = '<p>Hello <b>world</b> <script>alert(1)</script> <a href="javascript:alert(1)">bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>'
doc = JustHTML(user_html, fragment=True)

print(doc.to_html())
print()
print(doc.to_markdown())

Output:

<p>Hello <b>world</b>  <a>bad</a> <a href="https://example.com/?a=1&amp;b=2">ok</a></p>

Hello **world** [bad] [ok](https://example.com/?a=1&b=2)

Sanitizing the in-memory DOM

If you will be working with the DOM and want a clean slate to work from, add Sanitize(...) to your transform pipeline.

If you want explicit pass boundaries (advanced use), you can group transforms using Stage([...]).

from justhtml import JustHTML, Sanitize

user_html = '<p>Hello <b>world</b> <script>alert(1)</script> <a href="javascript:alert(1)">bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>'
doc = JustHTML(user_html, fragment=True, transforms=[Sanitize()])

# The DOM is now sanitized in-memory.
print(doc.to_html(pretty=False))
# => <p>Hello <b>world</b>  <a>bad</a> <a href="https://example.com/?a=1&amp;b=2">ok</a></p>

Disable sanitization

If you want to (dangerously) disable sanitization, because you know that your trusted HTML can’t contain malicious code:

from justhtml import JustHTML

user_html = '<p>Hello <b>world</b> <script>init_page_view_tracker()</script> <a href="javascript:track_pageview()">ok</a></p>'
doc = JustHTML(user_html, fragment=True, sanitize=False)

print(doc.to_html())

Output:

<p>Hello <b>world</b> <script>init_page_view_tracker()</script> <a href="javascript:track_pageview()">ok</a></p>

Default sanitization policy

The built-in default is DEFAULT_POLICY (a conservative allowlist).

The default URL policy is conservative about remote loads: by default a[href] allows common link schemes, while img[src] only allows relative URLs (so images won’t load from remote hosts unless you opt in via a custom policy). For details, see URL Cleaning.

High-level behavior:

Disallowed tags

Disallowed tag handling is controlled by SanitizationPolicy(disallowed_tag_handling=...):

Default allowlists:

Inline styles (optional)

Inline styles are disabled by default. To allow them you must:

1) Allow the style attribute for the relevant tag via allowed_attributes, and 2) Provide a non-empty allowlist via allowed_css_properties.

Even then, JustHTML is conservative: it rejects declarations that look like they can load external resources (such as values containing url( or image-set(), as well as legacy constructs like expression(.

To avoid “footgun” policies, you can start from the built-in preset CSS_PRESET_TEXT.

from justhtml import CSS_PRESET_TEXT, JustHTML, SanitizationPolicy, UrlPolicy

policy = SanitizationPolicy(
    allowed_tags=["p"],
    allowed_attributes={"*": [], "p": ["style"]},
    url_policy=UrlPolicy(allow_rules={}),
    allowed_css_properties=CSS_PRESET_TEXT | {"width"},
)

html = '<p style="color: red; background-image: url(https://evil.test/x); width: expression(alert(1));">Hi</p>'
print(JustHTML(html, policy=policy).to_html())

Output:

<p style="color: red">Hi</p>

Use a custom sanitization policy

You are encouraged to write your own SanitizationPolicy, and not rely on the default one. This makes it easier for future developers to understand what’s being cleaned, without having to look it up in justhtml’s documentation.

When expanding the default policy, prefer adding small, explicit allowlists.

Treat these as a separate security review if you plan to allow them:

For URL-related risks and controls, see URL Cleaning.

from justhtml import JustHTML, SanitizationPolicy, UrlPolicy, UrlRule

user_html = '<p>Hello <b>world</b> <script>alert(1)</script> <a href="javascript:alert(1)">bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>'

policy = SanitizationPolicy(
    allowed_tags=["p", "b", "a"],
    allowed_attributes={"*": [], "a": ["href"]},
    url_policy=UrlPolicy(
        default_handling="strip",
        allow_rules={
            ("a", "href"): UrlRule(allowed_schemes=["https"]),
        }
    ),
)

doc = JustHTML(user_html, fragment=True)
doc = JustHTML(user_html, fragment=True, policy=policy)
print(doc.to_html())

Output:

<p>Hello <b>world</b>  <a>bad</a> <a href="https://example.com/?a=1&amp;b=2">ok</a></p>

Reporting issues

If you find a sanitizer bypass, please report it responsibly (see SECURITY.md).