Fragment Parsing

Parse HTML fragments as if they were inserted into a specific context element. This is essential for WYSIWYG editors, sanitizers, and template systems.

Why Fragment Parsing?

HTML parsing rules depend on context. The same markup can produce different results depending on where it appears:

from justhtml import JustHTML

# "<tr>" at document level gets moved outside tables
doc = JustHTML("<tr><td>cell</td></tr>")
print(doc.to_html(indent_size=4))

Output:

<html>
    <head></head>
    <body>cell</body>
</html>

But if we tell the parser this HTML will be inserted into a <tbody>:

from justhtml import JustHTML
from justhtml.context import FragmentContext

html = "<tr><td>cell</td></tr>"
ctx = FragmentContext("tbody")
doc = JustHTML(html, fragment_context=ctx)
print(doc.to_html(indent_size=4))

Output:

<tr>
    <td>cell</td>
</tr>

Basic Usage

from justhtml import JustHTML

# Parse as if inside a <div> (common for WYSIWYG editors)
html = "<p>User's <b>content</b></p>"
doc = JustHTML(html, fragment=True)

# The root is #document-fragment, not #document
print(doc.root.name)  # "#document-fragment"

# No implicit <html>, <head>, or <body> are inserted
print(doc.to_html())

# Query and serialize work normally
paragraphs = doc.query("p")

Output:

#document-fragment
<p>User's <b>content</b></p>

Common Use Cases

WYSIWYG Editor Content

When users edit HTML in a rich text editor, the content will typically be inserted into a container like <div> or <article>:

# User's editor content
user_html = "<p>Hello</p><ul><li>Item 1</li><li>Item 2</li></ul>"

# Parse as fragment inside a div
doc = JustHTML(user_html, fragment=True)

# Sanitize, transform, or validate...
clean_html = doc.to_html()

Table Cell Content

Parsing content that will go inside a table cell:

cell_content = "Price: <b>$99</b>"
ctx = FragmentContext("td")
doc = JustHTML(cell_content, fragment_context=ctx)

List Item Content

item_html = "Buy <a href='/milk'>milk</a>"
ctx = FragmentContext("li")
doc = JustHTML(item_html, fragment_context=ctx)

Select Options

options_html = "<option>Red</option><option>Blue</option>"
ctx = FragmentContext("select")
doc = JustHTML(options_html, fragment_context=ctx)

Context Elements

The context element affects parsing rules:

Context	Use Case
`div`, `article`, `section`	General HTML content
`tbody`, `thead`, `tfoot`	Table rows
`tr`	Table cells
`td`, `th`	Cell content
`ul`, `ol`	List items
`select`	Option elements
`textarea`	Raw text (no HTML parsing)
`title`	Raw text (no HTML parsing)

Raw Text Contexts

Some elements treat their content as raw text:

from justhtml import JustHTML
from justhtml.context import FragmentContext

# Content in <textarea> is not parsed as HTML
ctx = FragmentContext("textarea")
doc = JustHTML("<b>not bold</b>", fragment_context=ctx)
print(doc.to_html())

Output:

&lt;b&gt;not bold&lt;/b&gt;

SVG and MathML Fragments

Parse fragments in foreign namespaces:

# SVG fragment
svg_content = "<circle cx='50' cy='50' r='40'/>"
ctx = FragmentContext("svg", namespace="svg")
doc = JustHTML(svg_content, fragment_context=ctx)

# MathML fragment
math_content = "<mi>x</mi><mo>=</mo><mn>5</mn>"
ctx = FragmentContext("math", namespace="math")
doc = JustHTML(math_content, fragment_context=ctx)

FragmentContext API

from justhtml.context import FragmentContext

# HTML namespace (default)
ctx = FragmentContext("div")
ctx = FragmentContext("div", namespace=None)

# SVG namespace
ctx = FragmentContext("svg", namespace="svg")

# MathML namespace
ctx = FragmentContext("math", namespace="math")

Properties

Property	Type	Description
`tag_name`	`str`	The context element’s tag name
`namespace`	`str \\| None`	`None` for HTML, `"svg"` for SVG, `"math"` for MathML

Document vs Fragment

Feature	`JustHTML(html)`	`JustHTML(html, fragment_context=ctx)`
Root node	`#document`	`#document-fragment`
Implicit elements	`<html>`, `<head>`, `<body>` auto-inserted	None - only your content
Children	Full document structure	Just the parsed content
Use case	Complete HTML documents	Partial HTML snippets

Example: HTML Sanitizer

from justhtml import JustHTML
from justhtml.context import FragmentContext

from justhtml import Sanitize, SanitizationPolicy, UrlPolicy, UrlRule

def sanitize_fragment(html: str) -> str:
    """Sanitize user HTML as if it was inserted into a <div>."""
    ctx = FragmentContext("div")
    policy = SanitizationPolicy(
        allowed_tags={"p", "b", "i", "a", "ul", "ol", "li"},
        allowed_attributes={"*": [], "a": ["href"]},
        url_policy=UrlPolicy(
            default_allow_relative=True,
            allow_rules={("a", "href"): UrlRule(allowed_schemes=["https"])},
        ),
    )

    # Sanitize the in-memory DOM by applying a Sanitize transform.
    # Put Sanitize at the end if you want a sanitized DOM.
    doc = JustHTML(html, fragment_context=ctx, transforms=[Sanitize(policy)])
    return doc.to_html(pretty=False)

# Usage
dirty = '<p>Hello</p><script>alert("xss")</script><b>world</b>'
print(sanitize_fragment(dirty))

Output:

<p>Hello</p><b>world</b>

This site is open source. Improve this page.