Building a DIY Static Site Generator

Clayton Carter • 2026-01-06

Background

Sometime back in 2017, I got a bad idea that it would be a good idea to start a blog. Being a PHP/Laravel developer at the time, I picked up Jigsaw and wrote a post. Great! The problem (as I saw it) was that I then walked away from my “blog” (aka, single post) for a few years. When I finally wanted to write a second post, I came back … and had no idea how to. Where are the posts? How do I tweak some of the styles? Where are the templates? No idea.

I went back to the docs and – after a while – determined that Jigsaw was more than I needed. What I wanted was a simple model like that used by WordPress: a site is made up of pages (for non-blog content; eg “About”, “Portfolio”, Experience) and posts (ie, your blog). I recall looking at Hugo, but was turned off because their “quick start” involved git submodule add ..., which I feared would just leave me in the same situation the next time I wanted to add a post or update something. (Also, I didn’t want a theme; I wanted to manage my own styles.)

I had recently seen shite, and loved the idea of a “simple” SSG built from composable command line tools, so I did the only logical thing and wrote my own static site generator.

What I Ended up With

My “final” version ended up supporting

Markdown w/ YAML front matter
pages
posts (the actual blog)
hot reloading
navigating posts by tag
RSS

There are 3 scripts:

setup.sh to install dependencies (primarily Tailwind)
build.sh to actually build the blog
serve.sh to serve the blog locally, including simplistic hot reloading

The entire thing works off of ~5 dependencies:

bash, to orchestrate everything
awk and sed, to do some file & text processing
pandoc, to convert Markdown into HTML, with templates
SQLite, to ensure unique slugs, and to handle tags on posts
Tailwind, for styling
Prettier, for formatting the generated HTML

prettier is optional, obviously, but it’s my homage to anyone that grew up learning HTML via “view page source”.

And the overall directory structure is just

_pages for pages
_posts for posts
_templates for templates (noticing a theme?)
_assets for non-static assets (ie my TW styles that need to be built into a final asset)
assets for static assets, ie that can be used as-is

This works fairly well, but is not without its own challenges (see below). There are less than 500 lines of Bash code, it will build the site in a couple of seconds, and it does everything I need it to do and not much else.

Summary of Build Process

You can read the code for the details, but the gist of the build process boils down to:

Render every post, storing the path and tags in a temporary database
For every tag in the database
1. Create an empty Markdown file that only contains YAML front matter, which itself only contains an array of the posts for that tag¹
2. Render that file through a “blog index page” template
Do the same thing, but for all blog posts, to build the main blog index
Then, use the same list of posts to build the RSS feed
Render every page (I don’t bother with a sitemap, so there’s no need to track these in the database.)
Build the CSS with Tailwind
Format all HTML with prettier

Again, this seems to work well, is interesting to work on, and it mostly all fits in my head.

Challenges

Data Handling in Bash

The initial challenge was how to handle the data processing in bash. Processing each post isn’t too hard, but then later building an RSS feed, or an index of posts by tag seemed either impossible, or ridiculously cumbersome. In the end, I reached for sqlite3 and built a database of pages and tags which I could query later.

I also couldn’t figure out how to open an in-memory database for SQLite (ie :memory:) and pass it around to multiple commands. Using a file-based database has been fine, but I wonder if an in-memory db wouldn’t be a little faster. I have my eye on sqlite-shell-lib, but haven’t played with it yet.

Templating with Pandoc

Templating in pandoc seems to be full featured and I’ve been able to do most of what I wanted to do fairly easily. One issue that I struggled with, though, is nesting or inheriting templates. In some templating engines that I’ve use (eg Twig and Blade), you can define a partial template (aka a view in Blade) and – in that partial – set the template from which it extends. For example, I would define a _post.blade.php partial and then say that it @extends('main'), which instructs the compiler to (hand waving) inline the contents _post into the main template. With this, you can – for example – render a post into the _post template, and it will produce the correct, complete final markup. Pandoc doesn’t seem to support this, instead requiring you to use the “final” template and conditionally render sub-templates within it.

This felt complicated (ha) considering the number of different page types I wanted to use (posts, pages, blog/tag indices, main pages, etc), and I went around in circles about how to resolve this.

Templating, Take 1

As for nested templates², pandoc just doesn’t seem to support them. The workaround I came up with was to pipe pandoc output back onto itself, like

pandoc my-post.md --to=html --template=_post.html \
  | pandoc --from=html --template=main.html --outfile=my-post.html

So, note that the first pandoc has Markdown as an input, and is outputting HTML to stdout, using the _post.html template. Then, then second pandoc is reading HTML from stdin to generate an HTML file, using the main.html template. This works!

But it’s messy! In particular, none of the metadata defined in the original Markdown file is transferred to the second call to pandoc. So we also need to work around this by passing these into the second call explicitly, for example by adding --variable=title:"$_TITLE". This becomes tedious very quickly. I think that some of this could be mitigated by ensuring that every template has a <head>...</head> with relevant elements filled out, but that becomes boilerplate that needs to exist on each partial template. I’m not sure that my workaround is any better though!

Another issue that was related to this is that pandoc requires a <title> element for HTML documents. This makes sense for final documents, but what about my partials? It doesn’t make sense to add a <head><title>... to the body of a blog post, but – if <title> is missing – pandoc prints warning messages for every page:

[WARNING] This document format requires a nonempty <title> element.
  Defaulting to '-' as the title.
  To specify a title, use 'title' in metadata or --metadata title="..."

My unsatisfying workaround for this was to add dummy titles to every partial: they looked like <head><title>suppress</title></head>, suppressed the warning message, and were discarded in the final template. But … 👎

As you can see, this approach “worked”, but was rife with workarounds and kludges. I was very happy to get rid it in favor of …

Templating, Take 2

While writing up this post and complaining about the limitations of pandoc, another approach dawned on me. In the end, I ended up shaving this yak using sed to simulate template inheritance. By inlining the sub-templates into the main template at build time, I’ve accomplished most of what I wanted:

# use sed to
#  1. read main template from _templates/main.html
#  2. insert contents of partial template in place of ${body}
#  3. delete ${body} (otherwise sed will print it, not replace it)
#  4. write the "composed" template to build/_post.template.html
# credit to https://stackoverflow.com/a/6790967
sed -e "/\${body}/{
        r _templates/_post.html
        d
    }" \
  _templates/main.html > build/_post.template.html

Now, instead of using pandoc --template=_post.html | pandoc --template=main.html, I use a single call to pandoc --template=build/_post.template.html and … everything seems to work.

This avoids the messy <title> stuff, as well as having to pass around the metadata that is already defined in the Markdown front matter: pandoc just reads it and uses it as is.

I was also going to write about how pandoc doesn’t support writing HTML within a Markdown document, but it turns out that my previous “pandoc piped to pandoc” approach was stripping it out. This updated approach fixes that issue, too!

Other Challenges

I won’t go too deep into other challenges, except to note that setting up a watch + hot reload mechanism (via polling of a local PHP script) took a while to figure out.

Conclusion

As with many of my projects, this was infuriating at times, but ultimately fun and it was certainly a lot of learning. In the end, I do love the strong but simple conventions, as well as being able to fit it all into my head and easily tinker with it as needed/desired. As (if?) this blog grows and build times start to bug me, I suspect I may end up ejecting in favor of Zola. Then again, maybe I’ll look at Hugo; at a second glance, doesn’t look too bad…

These “empty” Markdown files look like this:

---
title: 'Blog | Danan Consulting'
posts:
  - title: jx, a minimal reactive library for the frontend
    path: blog/jx-minimal-reactive-library
    published: 2026-01-03

  - title: Building a DIY Static Site Generator
    path: blog/diy-ssg
    published: 2026-01-03
---

And then the _blog_index template does something like this

${ for(posts) }
<a href="/${it.path}">${it.title}</a>
${ endfor }

In other words, we define an array of posts and links in the YAML, and then iterate over that list in the template.↩︎

In my case, I wanted to define a main template for all pages on the site, and then define sub-templates that customize some pages. For example, to have posts show published dates and tags while regular pages don’t. In other words,
- /_posts/01-my-post.md → _post.html → main.html → /blog/my-post/index.html
- /_pages/experience.md → _page.html → main.html → /experience/index.html
In this case, 01-my-post.md and experience.md both have a YAML title: ..., but neither _post.html nor _page.html have a <head><title> element, because they’re just adding some extra markup to the content in the Markdown files. The actual <head><title> element is in main.html.↩︎