JustHTML includes a built-in, policy-driven HTML sanitizer intended for rendering untrusted HTML safely.
This page focuses on HTML cleaning: tags, attributes, and inline styles. For URL validation and rewriting, see URL Cleaning.
On this page:
The parsed DOM is sanitized by default at construction time (JustHTML(..., sanitize=True)), and serialization is a pure output step.
If you want to sanitize after other transforms or after direct DOM edits, apply the Sanitize(...) transform to sanitize the in-memory tree.
By default, construction removes all dangerous html:
from justhtml import JustHTML
user_html = '<p>Hello <b>world</b> <script>alert(1)</script> <a href="javascript:alert(1)">bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>'
doc = JustHTML(user_html, fragment=True)
print(doc.to_html())
print()
print(doc.to_markdown())
Output:
<p>Hello <b>world</b> <a>bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>
Hello **world** [bad] [ok](https://example.com/?a=1&b=2)
If you will be working with the DOM and want a clean slate to work from, add Sanitize(...) to your transform pipeline.
If you want explicit pass boundaries (advanced use), you can group transforms using Stage([...]).
from justhtml import JustHTML, Sanitize
user_html = '<p>Hello <b>world</b> <script>alert(1)</script> <a href="javascript:alert(1)">bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>'
doc = JustHTML(user_html, fragment=True, transforms=[Sanitize()])
# The DOM is now sanitized in-memory.
print(doc.to_html(pretty=False))
# => <p>Hello <b>world</b> <a>bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>
If you want to (dangerously) disable sanitization, because you know that your trusted HTML can’t contain malicious code:
from justhtml import JustHTML
user_html = '<p>Hello <b>world</b> <script>init_page_view_tracker()</script> <a href="javascript:track_pageview()">ok</a></p>'
doc = JustHTML(user_html, fragment=True, sanitize=False)
print(doc.to_html())
Output:
<p>Hello <b>world</b> <script>init_page_view_tracker()</script> <a href="javascript:track_pageview()">ok</a></p>
The built-in default is DEFAULT_POLICY (a conservative allowlist).
The default URL policy is conservative about remote loads: by default a[href] allows common link schemes, while img[src] only allows relative URLs (so images won’t load from remote hosts unless you opt in via a custom policy). For details, see URL Cleaning.
High-level behavior:
script/style have their content dropped.on*), srcdoc, and namespace-style attributes (anything with :) are removed.Disallowed tag handling is controlled by SanitizationPolicy(disallowed_tag_handling=...):
"unwrap" (default): remove the disallowed tag, keep/sanitize its children"escape": emit the disallowed tag’s start/end tags as escaped text, keep/sanitize its children"drop": drop the entire disallowed subtreeDefault allowlists:
a, img, common text/structure tags, headings, lists, and tables (table, thead, tbody, tfoot, tr, th, td).class, id, title, lang, dira: href, titleimg: src, alt, title, width, height, loading, decodingth/td: colspan, rowspanInline styles are disabled by default. To allow them you must:
1) Allow the style attribute for the relevant tag via allowed_attributes, and
2) Provide a non-empty allowlist via allowed_css_properties.
Even then, JustHTML is conservative: it rejects declarations that look like they can load external resources (such as values containing url( or image-set(), as well as legacy constructs like expression(.
To avoid “footgun” policies, you can start from the built-in preset CSS_PRESET_TEXT.
from justhtml import CSS_PRESET_TEXT, JustHTML, SanitizationPolicy, UrlPolicy
policy = SanitizationPolicy(
allowed_tags=["p"],
allowed_attributes={"*": [], "p": ["style"]},
url_policy=UrlPolicy(allow_rules={}),
allowed_css_properties=CSS_PRESET_TEXT | {"width"},
)
html = '<p style="color: red; background-image: url(https://evil.test/x); width: expression(alert(1));">Hi</p>'
print(JustHTML(html, policy=policy).to_html())
Output:
<p style="color: red">Hi</p>
You are encouraged to write your own SanitizationPolicy, and not rely on the default one. This makes it easier for future developers to understand what’s being cleaned, without having to look it up in justhtml’s documentation.
When expanding the default policy, prefer adding small, explicit allowlists.
Treat these as a separate security review if you plan to allow them:
iframe, object, embedmeta, link, baseFor URL-related risks and controls, see URL Cleaning.
from justhtml import JustHTML, SanitizationPolicy, UrlPolicy, UrlRule
user_html = '<p>Hello <b>world</b> <script>alert(1)</script> <a href="javascript:alert(1)">bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>'
policy = SanitizationPolicy(
allowed_tags=["p", "b", "a"],
allowed_attributes={"*": [], "a": ["href"]},
url_policy=UrlPolicy(
default_handling="strip",
allow_rules={
("a", "href"): UrlRule(allowed_schemes=["https"]),
}
),
)
doc = JustHTML(user_html, fragment=True)
doc = JustHTML(user_html, fragment=True, policy=policy)
print(doc.to_html())
Output:
<p>Hello <b>world</b> <a>bad</a> <a href="https://example.com/?a=1&b=2">ok</a></p>
If you find a sanitizer bypass, please report it responsibly (see SECURITY.md).