← Back to docs

Comparison

Use JustHTML when you want browser-grade HTML parsing, safe-by-default sanitization, CSS selectors, transforms, text extraction, and serialization in one pure-Python package.

Use a different tool when one narrow requirement matters more than the whole pipeline: maximum throughput, a BeautifulSoup-specific API, XPath-heavy XML work, or integration with an existing lxml tree.

At a Glance

Tool HTML5 parsing 1 Speed Query Build Sanitize Notes
JustHTML
Pure Python
✅ 100% ⚡ Fast ✅ CSS selectors element() ✅ Built-in Correct, secure, easy to install, and fast enough.
selectolax
Python wrapper of C-based Lexbor
✅ 100% 🚀 Very Fast ✅ CSS selectors create_node() ❌ Needs sanitization Very fast and spec-compliant.
Chromium
browser engine
✅ 99.5% 🚀 Very Fast
WebKit
browser engine
✅ 98.4% 🚀 Very Fast
Firefox
browser engine
✅ 97.6% 🚀 Very Fast
markupever
Python wrapper of Rust-based html5ever
🟡 89% 🚀 Very Fast ✅ CSS selectors TreeDom .create_*() ❌ Needs sanitization Fast and mostly correct, but missing benchmarked capabilities count against compliance.
html5lib
Pure Python
🟡 86% 🐢 Slow 🟡 XPath (lxml) 🟡 Tree API 🔴 Deprecated Unmaintained reference implementation; incomplete coverage of the tree-construction fixtures.
html5_parser
Python wrapper of C-based Gumbo
🔴 49% 🚀 Very Fast 🟡 XPath (lxml) 🟡 etree (lxml) ❌ Needs sanitization Fast, but its public tree API loses information needed by many fixtures.
BeautifulSoup
Pure Python
🔴 <1% (default) 🐢 Slow 🟡 Custom API new_tag() API ❌ Needs sanitization Wraps html.parser (default). Can use lxml or html5lib.
html.parser
Python stdlib
🔴 <1% ⚡ Fast ❌ None ❌ None ❌ Needs sanitization Standard library. Chokes on malformed HTML.
lxml
Python wrapper of C-based libxml2
🔴 <1% 🚀 Very Fast 🟡 XPath etree / E-factory ❌ Needs sanitization Fast but not HTML5 compliant. Context-fragment cases are skipped; supported cases still perform poorly. Don’t use the old lxml.html.clean module!

Why JustHTML

Most Python HTML projects start simple and then accumulate extra tools:

JustHTML keeps those operations on one DOM. That makes the behavior easier to reason about, especially when the input is untrusted.

from justhtml import JustHTML

doc = JustHTML("<p>Hello<script>alert(1)</script><a href='javascript:x'>link</a></p>", fragment=True)

print(doc.to_html(pretty=False))
# <p>Hello<a>link</a></p>

Sanitization happens before you query or serialize unless you explicitly disable it with sanitize=False.

When to Choose Another Tool

Choose selectolax when raw speed is the main requirement and the HTML is trusted or sanitized elsewhere.

Choose markupever or html5_parser when you specifically want their underlying parser engines or tree APIs and can accept their compatibility tradeoffs.

Choose BeautifulSoup when you want its forgiving, familiar scraping API and parser correctness is not the main risk.

Choose lxml when your project is already built around XPath, etree, or XML-style processing.

Choose nh3 when you only need fast sanitization and are happy with a Rust-backed dependency.

Choose html.parser when you need a tiny stdlib-only script for trusted input and HTML5 correctness does not matter.

Choose Bleach only for existing codebases that already depend on it. For new projects, prefer an actively maintained sanitizer path. See Migrating from Bleach.

Tradeoffs

JustHTML is pure Python. That makes it easy to install, inspect, debug, and run in environments like Pyodide, but it will not beat C or Rust parsers on raw throughput.

JustHTML sanitizes HTML output by default. That is the right default for user-generated content, CMS snippets, comments, scraped fragments, and transform pipelines that eventually return to a browser. If all of your input is trusted, pass sanitize=False.

JustHTML’s sanitizer emits HTML-only output. SVG and MathML can still be parsed when sanitization is disabled, but sanitized output drops foreign-namespace content to keep the security model smaller and more reviewable.