← Back to docs

Linkify

JustHTML’s Linkify transform scans text nodes and wraps detected URLs/emails in <a> elements.

This is a DOM transform (it does not operate on raw HTML strings), so it never rewrites tag soup or breaks markup.

Quickstart

from justhtml import JustHTML, Linkify

doc = JustHTML("<p>See example.com</p>", fragment=True, transforms=[Linkify()])
print(doc.to_html(pretty=False))
# => <p>See <a href="http://example.com">example.com</a></p>

Behavior

Unicode and punycode (IDNA)

Linkify can detect domains containing Unicode characters.

When it generates a link, it normalizes the hostname portion of href using IDNA (punycode). This keeps the visible link text readable while ensuring the href is ASCII-only.

Example:

from justhtml import JustHTML, Linkify

doc = JustHTML("<p>See bücher.de</p>", fragment=True, transforms=[Linkify()])
print(doc.to_html(pretty=False))
# => <p>See <a href="http://xn--bcher-kva.de">bücher.de</a></p>

Notes:

Configuration

from justhtml import JustHTML, Linkify

doc = JustHTML(
    "<p>See 127.0.0.1 and example.dev</p>",
    transforms=[
        Linkify(
            fuzzy_ip=True,
            extra_tlds={"dev"},
            skip_tags={"a", "pre", "textarea", "code", "script", "style"},
        )
    ],
)

Options:

Fuzzy domains and TLD allowlist

For protocol-less “fuzzy” detection (like example.com or test@example.com), Linkify uses a TLD allowlist to reduce false positives.

This allowlist is not used for links that already include an explicit scheme like http://... (those are accepted regardless of TLD). Similarly, mailto: links are accepted even when the domain doesn’t have a recognized TLD.

Default accepted TLDs

By default, Linkify accepts:

Adding extra TLDs

If you want fuzzy matching for newer gTLDs (like .dev, .app, .email, …), pass them via extra_tlds:

from justhtml import JustHTML, Linkify

doc = JustHTML(
        "<p>See example.dev and mail me@company.app</p>",
        transforms=[Linkify(extra_tlds={"dev", "app"})],
)

extra_tlds values are compared case-insensitively and should be provided without a leading dot.

Composing with other transforms

To add attributes to generated links, compose with SetAttrs:

from justhtml import JustHTML, Linkify, SetAttrs

doc = JustHTML(
    "<p>See example.com</p>",
    transforms=[
        Linkify(),
        SetAttrs("a", rel="nofollow", target="_blank"),
    ],
)

Interaction with sanitization

Transforms mutate the in-memory DOM. JustHTML(..., sanitize=True) appends a final Sanitize(...) step only when your transform list does not already include Sanitize(). If you include Sanitize() explicitly, that explicit position becomes the sanitize point and later transforms can reintroduce unsafe content.

This matters for Linkify because sanitization policies can remove or rewrite attributes on the generated <a> when the final sanitizer runs:

If you want Linkify output without any sanitization changes (trusted input only), use sanitize=False and avoid adding Sanitize(...) in transforms.

Provenance

JustHTML’s Linkify behavior is validated against the upstream linkify-it fixture suite (MIT licensed).