Use JustHTML when you want browser-grade HTML parsing, safe-by-default sanitization, CSS selectors, transforms, text extraction, and serialization in one pure-Python package.
Use a different tool when one narrow requirement matters more than the whole pipeline: maximum throughput, a BeautifulSoup-specific API, XPath-heavy XML work, or integration with an existing lxml tree.
| Tool | HTML5 parsing 1 | Speed | Query | Build | Sanitize | Notes |
|---|---|---|---|---|---|---|
| JustHTML Pure Python |
✅ 100% | ⚡ Fast | ✅ CSS selectors | ✅ element() |
✅ Built-in | Correct, secure, easy to install, and fast enough. |
selectolaxPython wrapper of C-based Lexbor |
✅ 100% | 🚀 Very Fast | ✅ CSS selectors | ✅ create_node() |
❌ Needs sanitization | Very fast and spec-compliant. |
| Chromium browser engine |
✅ 99.5% | 🚀 Very Fast | — | — | — | — |
| WebKit browser engine |
✅ 98.4% | 🚀 Very Fast | — | — | — | — |
| Firefox browser engine |
✅ 97.6% | 🚀 Very Fast | — | — | — | — |
markupeverPython wrapper of Rust-based html5ever |
🟡 89% | 🚀 Very Fast | ✅ CSS selectors | ✅ TreeDom .create_*() |
❌ Needs sanitization | Fast and mostly correct, but missing benchmarked capabilities count against compliance. |
html5libPure Python |
🟡 86% | 🐢 Slow | 🟡 XPath (lxml) | 🟡 Tree API | 🔴 Deprecated | Unmaintained reference implementation; incomplete coverage of the tree-construction fixtures. |
html5_parserPython wrapper of C-based Gumbo |
🔴 49% | 🚀 Very Fast | 🟡 XPath (lxml) | 🟡 etree (lxml) |
❌ Needs sanitization | Fast, but its public tree API loses information needed by many fixtures. |
BeautifulSoupPure Python |
🔴 <1% (default) | 🐢 Slow | 🟡 Custom API | ✅ new_tag() API |
❌ Needs sanitization | Wraps html.parser (default). Can use lxml or html5lib. |
html.parserPython stdlib |
🔴 <1% | ⚡ Fast | ❌ None | ❌ None | ❌ Needs sanitization | Standard library. Chokes on malformed HTML. |
lxmlPython wrapper of C-based libxml2 |
🔴 <1% | 🚀 Very Fast | 🟡 XPath | ✅ etree / E-factory |
❌ Needs sanitization | Fast but not HTML5 compliant. Context-fragment cases are skipped; supported cases still perform poorly. Don’t use the old lxml.html.clean module! |
Most Python HTML projects start simple and then accumulate extra tools:
JustHTML keeps those operations on one DOM. That makes the behavior easier to reason about, especially when the input is untrusted.
from justhtml import JustHTML
doc = JustHTML("<p>Hello<script>alert(1)</script><a href='javascript:x'>link</a></p>", fragment=True)
print(doc.to_html(pretty=False))
# <p>Hello<a>link</a></p>
Sanitization happens before you query or serialize unless you explicitly disable it with sanitize=False.
Choose selectolax when raw speed is the main requirement and the HTML is trusted or sanitized elsewhere.
Choose markupever or html5_parser when you specifically want their underlying parser engines or tree APIs and can accept their compatibility tradeoffs.
Choose BeautifulSoup when you want its forgiving, familiar scraping API and parser correctness is not the main risk.
Choose lxml when your project is already built around XPath, etree, or XML-style processing.
Choose nh3 when you only need fast sanitization and are happy with a Rust-backed dependency.
Choose html.parser when you need a tiny stdlib-only script for trusted input and HTML5 correctness does not matter.
Choose Bleach only for existing codebases that already depend on it. For new projects, prefer an actively maintained sanitizer path. See Migrating from Bleach.
JustHTML is pure Python. That makes it easy to install, inspect, debug, and run in environments like Pyodide, but it will not beat C or Rust parsers on raw throughput.
JustHTML sanitizes HTML output by default. That is the right default for user-generated content, CMS snippets, comments, scraped fragments, and transform pipelines that eventually return to a browser. If all of your input is trusted, pass sanitize=False.
JustHTML’s sanitizer emits HTML-only output. SVG and MathML can still be parsed when sanitization is disabled, but sanitized output drops foreign-namespace content to keep the security model smaller and more reviewable.