← Back to docs

Command Line Interface

JustHTML ships with a small CLI for parsing HTML and extracting HTML/text/Markdown from selected parts of a document.

Running

If you installed JustHTML (for example with pip install justhtml or pip install -e .), you can use the justhtml command.

If you don’t have it available, use the equivalent python -m justhtml ... form.

Basic usage

# Pretty-print an HTML file
justhtml page.html

# Read HTML from stdin
curl -s https://example.com | justhtml -

Selecting nodes

Use --selector to choose which nodes to extract.

# Extract text from all paragraphs
justhtml page.html --selector "p" --format text

# Only output the first match
justhtml page.html --selector "main p" --format text --first

Fragments

Use --fragment to parse the input as an HTML fragment (instead of a full document). This avoids implicit <html>, <head>, and <body> insertion.

echo '<li>Hi</li>' | justhtml - --fragment

Output formats

--format controls what is printed:

Notes:

Sanitization

By default, the CLI sanitizes output (same safe-by-default behavior as JustHTML(..., sanitize=True)).

To disable sanitization for trusted input, pass --unsafe.

Allow extra tags

In safe mode, you can allow additional tags via --allow-tags (comma-separated). This augments the default policy (document vs fragment).

Example:

justhtml page.html --selector "article" --allow-tags article,section --format markdown

Cleanup

--cleanup removes common unhelpful output artifacts:

This is useful when sanitization has stripped attributes and left behind empty tags.

curl -s https://example.com | justhtml - --format html --cleanup

Text options

When using --format text, you can control whitespace handling:

Example:

justhtml page.html --selector "main" --format text --separator "" --no-strip

Exit codes

Real-world example

curl -s https://github.com/EmilStenstrom/justhtml/ | justhtml - --selector '.markdown-body' --format markdown | head -n 15

Output:

# JustHTML

[](#justhtml)

A pure Python HTML5 parser that just works. No C extensions to compile. No system dependencies to install. No complex API to learn.

**[πŸ“– Read the full documentation here](/EmilStenstrom/justhtml/blob/main/docs/index.md)**

## Why use JustHTML?

[](#why-use-justhtml)

### 1. Just... Correct βœ…

[](#1-just-correct-)