clarity's wiki

Site Generator

A tool for generating static html and gemtext pages from markdown-like source documents.

Source code

By keeping the functionality simple and the code structure straightforward, adding and removing features becomes an inviting task.

> sitegen help

sitegen

usage:
  sitegen make [-psP] [<options>...] <out_dir> [<site_dir>]
  sitegen index [-plua] [<options>...]
  sitegen docs
  sitegen html_template
  sitegen gmi_template

commands:
  make           The main generation tool
  index          Outputs site indeces in various formats
  docs           Print the sitegen documentation in sitegen format
  html_template  Print the default html template
  gmi_template   Print the default gmi template

Building the site

> sitegen make --help

sitegen make

usage:
  sitegen make [-psP] [<options>...] <out_dir> [<site_dir>]

arguments:
  out_dir   the output folder for all generated content
  site_dir  the input folder for the site itself, defaults to cwd

options:
  --help                    show this text
  -p, --private             include private content in the build
  --html <template>         render html output with the template
  --gmi <template>          render gmi output with the template
  -s, --silent              don't output filenames as they are rendered
  -P <path>, --path <path>  appended to the PATH when running shell commands

The site generator will generate /html and /gmi directories inside of the given out directories. My publishing process looks something like:

rm -rf out
mkdir -p out/html out/gmi
cp -r public-html/* out/html/
cp -r public/* out/html/
cp -r public/* out/gmi
sitegen make out site && \
scp -r out/html/* [email protected]:/path/to/http/server &&\
scp -r out/gmi/* [email protected]:/path/to/gmi/server

Please note that these documents are full-fledged programs. Running the site generator on a document that you haven't vetted yourself is like running a shell script you got off the internet without making sure you know what it does. Use at your own risk.

Indexing

> sitegen index --help

sitegen index

usage:
  sitegen index [-plua] [<options>...]

options:
  -p, --private            include private content in the list
  -l <num>, --limit <num>  limit the number of entries to the top num
  -u, --updates            only include updates (and not additions)
  -a, --additions          only include additions (and not updates)

Indexing allows you to create lists of files inside of your site, which you can combine with the ability to run arbitrary commands inside of documents to produce in-page tables of contents.

> sitegen index --limit 5 *

Document format

Sitegen documents are UTF8-encoded .txt files. The first few lines of the file are the title and metadata, broken by an empty line, followed by its contents.

Info

The beginning of a document starts with the title and the date it was written, followed by optional metadata information.

Site Generator
Written 2021-02-04

You can keep track of the history of updates to the page:

Updated 2021-02-05 Added a bunch of features
Updated 2021-02-06 You can have multiple updates now

You can also mark a page as private or unlisted

Private
Unlisted

Private pages will only be generated if you include the "--private" option during generation. Unlisted pages will be generated, but won't show up during indexing.

The "using" setting declares what program the file should use as its shell (defaulting to $SHELL).

Using bash

Body

The remaining text comprises the document itself. Documents are raw text with special "blocks" marked by prefixing a line with special characters.

Raw text is copied literally in gemini, and separated into paragraphs by empty lines in html. Newlines are carried-over in the final content, so you'll need an editor that support text wrapping for large blocks of text.

There are two levels of heading available:

# Section heading
## Sub-section heading

Links are written gemini-style.

=> gemini://gemini.circumlunar.space/ About Gemini

Images must include a title (which will be used for the gemini link) and alt text (which will be used in html).

!> a_selfie.png A selfie ~~~ A selfie of me, wearing a red dress.

Links to internal documents need to have a different extension based on the output format. Using "*" as an extension will accomplish this.

=> computers.* Computers

Block quotes are formatted into paragraphs the same way raw text is:

> This line and...
> ... this one are all one paragraph within this quote block.
> 
> This is a separate paragraph in the same quote block.

There is only a single style of list, with only one line allowed per entry:

- list
- items

Preformatted text is preceded by two spaces:

Here is a code block:

  echo "hello world"

Sometimes you'll want certain blocks to only be included in certain output formats. You can start the line with the given extension and write raw text that will be literally copied into matching documents.

.html <h1><img src="logo.png" alt="The logo for My Website"/></h1>
.gmi # My Website

Lines prefixed with a semicolon are private. You can use them to keep track of personal notes that don't need to be shared, or to comment your documents. Private lines are only included in the final output if you pass the "--private" option during generation.

; This is a private line

Parsing for private lines happens during a pre-parsing step, which means your private lines can include formatting like any other.

; => somewhere_secret.* A private link

You can also run arbitrary commands with your shell and parse the output as formatted text!

: echo "# Hello world
: => hello.png Hello world!"

The commands use the value set by the "using" info header, or otherwise default to "$SHELL". This has pretty huge implications as it allows you to write "plugins" in any language that generate data without constraints. The shell is invoked with its working directory as the directory of the document file, and the "$FILE" environment variable is set to the name of the file (without the .txt extension.)

A pattern I find myself using fairly often (including in this document) is to pipe the result of commands into sed in order to format the output as a preformatted block.

: sitegen index --help | sed 's/.*/  &/'

Example

For a complete example of a document, here is the source of the page you are currently reading:

Site Generator
Written 2021-02-04
Using bash
Updated 2021-02-06 Added a bunch of features
Updated 2021-02-16 Blank lines are now handled literally in gemini
Updated 2021-02-21 Rewrote to a more conventional documation format.
Updated 2021-03-07 Added images
Updated 2021-03-09 Added templates
Updated 2021-08-06 Support for CRLF line endings, moved to sourcehut

A tool for generating static html and gemtext pages from markdown-like source documents.

=> https://git.sr.ht/~clarity/sitegen Source code

By keeping the functionality simple and the code structure straightforward, adding and removing features becomes an inviting task.

: echo '  > sitegen help'
: echo '  '
: sitegen help | sed 's/.*/  &/'

# Building the site

: echo '  > sitegen make --help'
: echo '  '
: sitegen make --help | sed 's/.*/  &/'

The site generator will generate /html and /gmi directories inside of the given out directories. My publishing process looks something like:

  rm -rf out
  mkdir -p out/html out/gmi
  cp -r public-html/* out/html/
  cp -r public/* out/html/
  cp -r public/* out/gmi
  sitegen make out site && \
  scp -r out/html/* [email protected]:/path/to/http/server &&\
  scp -r out/gmi/* [email protected]:/path/to/gmi/server

Please note that these documents are full-fledged programs. Running the site generator on a document that you haven't vetted yourself is like running a shell script you got off the internet without making sure you know what it does. Use at your own risk.


# Indexing

: echo '  > sitegen index --help'
: echo '  '
: sitegen index --help | sed 's/.*/  &/'

Indexing allows you to create lists of files inside of your site, which you can combine with the ability to run arbitrary commands inside of documents to produce in-page tables of contents.

: echo '  > sitegen index --limit 5 *'
: echo '  '
: sitegen index --limit 5 * | sed 's/.*/  &/'

# Document format

Sitegen documents are UTF8-encoded .txt files. The first few lines of the file are the title and metadata, broken by an empty line, followed by its contents.

## Info

The beginning of a document starts with the title and the date it was written, followed by optional metadata information.

  Site Generator
  Written 2021-02-04

You can keep track of the history of updates to the page:

  Updated 2021-02-05 Added a bunch of features
  Updated 2021-02-06 You can have multiple updates now

You can also mark a page as private or unlisted

  Private
  Unlisted

Private pages will only be generated if you include the "--private" option during generation. Unlisted pages will be generated, but won't show up during indexing.

The "using" setting declares what program the file should use as its shell (defaulting to $SHELL).

  Using bash

## Body

The remaining text comprises the document itself. Documents are raw text with special "blocks" marked by prefixing a line with special characters.

Raw text is copied literally in gemini, and separated into paragraphs by empty lines in html. Newlines are carried-over in the final content, so you'll need an editor that support text wrapping for large blocks of text.

There are two levels of heading available:

  # Section heading
  ## Sub-section heading

Links are written gemini-style.

  => gemini://gemini.circumlunar.space/ About Gemini

Images must include a title (which will be used for the gemini link) and alt text (which will be used in html).

  !> a_selfie.png A selfie ~~~ A selfie of me, wearing a red dress.

Links to internal documents need to have a different extension based on the output format. Using "*" as an extension will accomplish this.

  => computers.* Computers

Block quotes are formatted into paragraphs the same way raw text is:

  > This line and...
  > ... this one are all one paragraph within this quote block.
  > 
  > This is a separate paragraph in the same quote block.

There is only a single style of list, with only one line allowed per entry:

  - list
  - items

Preformatted text is preceded by two spaces:

  Here is a code block:
  
    echo "hello world"

Sometimes you'll want certain blocks to only be included in certain output formats. You can start the line with the given extension and write raw text that will be literally copied into matching documents.

  .html <h1><img src="logo.png" alt="The logo for My Website"/></h1>
  .gmi # My Website

Lines prefixed with a semicolon are private. You can use them to keep track of personal notes that don't need to be shared, or to comment your documents. Private lines are only included in the final output if you pass the "--private" option during generation.

  ; This is a private line

Parsing for private lines happens during a pre-parsing step, which means your private lines can include formatting like any other.

  ; => somewhere_secret.* A private link

You can also run arbitrary commands with your shell and parse the output as formatted text!

  : echo "# Hello world
  : => hello.png Hello world!"

The commands use the value set by the "using" info header, or otherwise default to "$SHELL". This has pretty huge implications as it allows you to write "plugins" in any language that generate data without constraints. The shell is invoked with its working directory as the directory of the document file, and the "$FILE" environment variable is set to the name of the file (without the .txt extension.)

A pattern I find myself using fairly often (including in this document) is to pipe the result of commands into sed in order to format the output as a preformatted block. 

  : sitegen index --help | sed 's/.*/  &/'

## Example

For a complete example of a document, here is the source of the page you are currently reading:

: sed 's/.*/  &/' ${FILE}.txt


# Templates

Documents are rendered into templates, which allow you to use variable escapes to insert your content and document metadata.

## Variables

Variables are surrounded in two curly braces.

  The title of the document is {{title}}

The available variables are:

- title: The title you put at the top of your document
- file: The name of the output file, without any extension
- dir: The name of the file's directory (empty for root-level files)
- written: The written date of the file
- updated: The most recent updated date of the file, if any
- parent: The path to the page one level "higher" than this one (".." for files named "index" and "." for all other files)
- parent_name: The title of the parent page
  either "return home" or "{{dir}} index"
- content: The rendered document content

## Date Formatting

Dates can be formatted:

  Written {{written|Weekday, Month D, YYYY}}
  Written Monday, June 1, 2020

The format options are:

- D - the day as one or two digits (1, 3, 15)
- DD - the day as two digits (01, 03, 15)
- Weekday - the day of the week (Monday, Friday)
- Day - the day of the week, short (Mon, Fri)
- M - the month as one or two digits (1, 10)
- MM - the month as two digits (01, 10)
- Month - the month, written out (January, October)
- Mon - the month, short (Jan, Oct)
- YY - the year as two digits (98, 20)
- YYYY - the year as four digits (1998, 2020)

Any non-format characters will be copied literally.

## Conditional Rendering

You can render content conditionally by including a question mark inside variable escapes. All content inside (including more escapes) will only be rendered if that variable "exists". Some variables (like the title) will always exist, but others (like the updated date, back url, or directory) may need special care.

  {{parent?<a href="{{parent}}">{{parent_name}}</a>}}

Note that

  {{parent?{{parent}}}}

is identical to

  {{parent}}

as empty variables will always be skipped over. 

## Content

If the location of the content isn't included, the content will be rendered after the template (treating the whole template as a header). You can't include the content more than once.

## Default templates

html:
: sitegen html_template | sed 's/.*/  &/'

gemini:
: sitegen gmi_template | sed 's/.*/  &/'

Templates

Documents are rendered into templates, which allow you to use variable escapes to insert your content and document metadata.

Variables

Variables are surrounded in two curly braces.

The title of the document is {{title}}

The available variables are:

either "return home" or "{{dir}} index"

Date Formatting

Dates can be formatted:

Written {{written|Weekday, Month D, YYYY}}
Written Monday, June 1, 2020

The format options are:

Any non-format characters will be copied literally.

Conditional Rendering

You can render content conditionally by including a question mark inside variable escapes. All content inside (including more escapes) will only be rendered if that variable "exists". Some variables (like the title) will always exist, but others (like the updated date, back url, or directory) may need special care.

{{parent?<a href="{{parent}}">{{parent_name}}</a>}}

Note that

{{parent?{{parent}}}}

is identical to

{{parent}}

as empty variables will always be skipped over.

Content

If the location of the content isn't included, the content will be rendered after the template (treating the whole template as a header). You can't include the content more than once.

Default templates

html:

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" type="text/css" href="/style.css" />
<title>{{title}}</title>
</head>
<body>
{{parent?<a href="{{parent}}">{{parent_name}}</a>}}
<main>
<header>
  <h1>{{title}}</h1>
  Written {{written|Month D, YYYY}}{{updated?, last changed {{updated|Month D, YYYY}}}}
</header>
{{content}}
</main>
</body>
</html>

gemini:

# {{title}}
Written {{written|Month D, YYYY}}{{updated?, last changed {{updated|Month D, YYYY}}}}