JustHTML reports a deliberately small set of high-value parser diagnostics. It does not attempt to reproduce every error emitted by the WHATWG reference parser. Tree recovery and output remain standards-compatible even when no diagnostic is reported.
Error collection is disabled by default because the additional diagnostic scan has a performance cost.
from justhtml import JustHTML
doc = JustHTML("<!doctype html><!--", collect_errors=True)
for error in doc.errors:
print(f"{error.line}:{error.column} - {error.category}:{error.code}")
doc.errors is ordered by source position (line, column), with unknown
positions appearing last.
Each error has a category field:
tokenizer: lexical/scanning errorstreebuilder: basic document-structure errorssecurity: sanitizer findings when unsafe_handling="collect"transform: errors explicitly emitted by a custom transformCustom transforms may emit application-specific codes in addition to the built-in codes listed below.
from justhtml import JustHTML, StrictModeError
try:
doc = JustHTML("<!doctype html><!--", strict=True)
except StrictModeError as error:
print(error)
Strict mode enables error collection and raises on the earliest supported parser diagnostic. It is not a complete HTML conformance validator: malformed input that the parser can recover from may not have a corresponding diagnostic.
The codes listed on this page are the supported built-in diagnostic set. Exact
locations are best-effort, and new codes may be added in minor releases. Code
that needs stable behavior should match on error.code, not the complete
message or the exact list produced for a complex malformed document.
| Code | Description |
|---|---|
eof-in-comment |
Input ended inside an HTML comment |
eof-in-tag |
Input ended inside a start or end tag |
unexpected-null-character |
Input contains a NULL character (U+0000) |
| Code | Description |
|---|---|
expected-doctype-but-got-chars |
A document started with text instead of a DOCTYPE |
expected-doctype-but-got-start-tag |
A document started with an element instead of a DOCTYPE |
unexpected-end-tag |
An end tag had no matching open element |
unknown-doctype |
The DOCTYPE is not the standard HTML DOCTYPE |
These diagnostics intentionally omit detailed recovery events such as foster-parenting and adoption-agency repairs. Those events are implementation details of successful tree construction and reproducing a legacy parser’s diagnostic stream would add substantial duplicate work.
Security errors are reported by the sanitizer when its unsafe handling mode is
collect.
| Code | Description |
|---|---|
unsafe-html |
Content violated the sanitization policy; details are in error.message |
unsafe-rawtext-child |
A non-text child inside <script> or <style> was removed |
unsafe-rawtext-end-tag |
A dangerous closing-tag sequence in raw text was neutralized |
unsafe-style-resource |
Resource-loading CSS in a <style> element was removed |
Node locations are separate from parse errors and disabled by default. Enable
them with track_node_locations=True:
from justhtml import JustHTML
doc = JustHTML("<p>hi</p>", track_node_locations=True)
p = doc.query("p")[0]
print(p.origin_location) # (1, 1)
print(p.origin_line) # 1
print(p.origin_col) # 1
print(p.origin_offset) # 0
Locations are best-effort. Nodes created or moved during tree recovery retain the location of the token that created them, or the closest available source position.