JustHTML’s Linkify transform scans text nodes and wraps detected URLs/emails in <a> elements.
This is a DOM transform (it does not operate on raw HTML strings), so it never rewrites tag soup or breaks markup.
from justhtml import JustHTML, Linkify
doc = JustHTML("<p>See example.com</p>", fragment=True, transforms=[Linkify()])
print(doc.to_html(pretty=False))
# => <p>See <a href="http://example.com">example.com</a></p>
<a href="...">…</a> nodes around matches.a, pre, textarea, code, script, style.<template> contents.Linkify can detect domains containing Unicode characters.
When it generates a link, it normalizes the hostname portion of href using IDNA (punycode).
This keeps the visible link text readable while ensuring the href is ASCII-only.
Example:
from justhtml import JustHTML, Linkify
doc = JustHTML("<p>See bücher.de</p>", fragment=True, transforms=[Linkify()])
print(doc.to_html(pretty=False))
# => <p>See <a href="http://xn--bcher-kva.de">bücher.de</a></p>
Notes:
http://, https://, ftp://, and protocol-relative //... URLs.from justhtml import JustHTML, Linkify
doc = JustHTML(
"<p>See 127.0.0.1 and example.dev</p>",
transforms=[
Linkify(
fuzzy_ip=True,
extra_tlds={"dev"},
skip_tags={"a", "pre", "textarea", "code", "script", "style"},
)
],
)
Options:
skip_tags: iterable of tag names to skip (matched case-insensitively).fuzzy_ip: enable linkifying bare IPv4 addresses like 192.168.0.1.extra_tlds: additional TLDs to accept for fuzzy domain/email detection.enabled (default: True): if set to False, Linkify is skipped.For protocol-less “fuzzy” detection (like example.com or test@example.com), Linkify uses a TLD allowlist to reduce false positives.
This allowlist is not used for links that already include an explicit scheme like http://... (those are accepted regardless of TLD).
Similarly, mailto: links are accepted even when the domain doesn’t have a recognized TLD.
By default, Linkify accepts:
se, uk, de, …).xn--....biz, com, edu, gov, net, org, pro, web, xxx, aero, asia, coop, info, museum, name, shop, рф.If you want fuzzy matching for newer gTLDs (like .dev, .app, .email, …), pass them via extra_tlds:
from justhtml import JustHTML, Linkify
doc = JustHTML(
"<p>See example.dev and mail me@company.app</p>",
transforms=[Linkify(extra_tlds={"dev", "app"})],
)
extra_tlds values are compared case-insensitively and should be provided without a leading dot.
To add attributes to generated links, compose with SetAttrs:
from justhtml import JustHTML, Linkify, SetAttrs
doc = JustHTML(
"<p>See example.com</p>",
transforms=[
Linkify(),
SetAttrs("a", rel="nofollow", target="_blank"),
],
)
Transforms mutate the in-memory DOM. JustHTML(..., sanitize=True) appends a final Sanitize(...) step only when your transform list does not already include Sanitize(). If you include Sanitize() explicitly, that explicit position becomes the sanitize point and later transforms can reintroduce unsafe content.
This matters for Linkify because sanitization policies can remove or rewrite attributes on the generated <a> when the final sanitizer runs:
a[href] are stripped (the <a> remains, but href is removed).//example.com is resolved according to policy (default: https://example.com).If you want Linkify output without any sanitization changes (trusted input only), use sanitize=False and avoid adding Sanitize(...) in transforms.
JustHTML’s Linkify behavior is validated against the upstream linkify-it fixture suite (MIT licensed).
tests/linkify-it/fixtures/tests/linkify-it/LICENSE.txt