Building a DIY Static Site Generator
Background
Sometime back in 2017, I got a bad idea that it would be a good idea to start a blog. Being a PHP/Laravel developer at the time, I picked up Jigsaw and wrote a post. Great! The problem (as I saw it) was that I then walked away from my “blog” (aka, single post) for a few years. When I finally wanted to write a second post, I came back … and had no idea how to. Where are the posts? How do I tweak some of the styles? Where are the templates? No idea.
I went back to the docs and – after a while – determined that Jigsaw
was more than I needed. What I wanted was a simple model
like that used by WordPress: a site is made up of
pages (for non-blog content; eg “About”, “Portfolio”,
Experience) and posts (ie, your
blog). I recall looking at Hugo,
but was turned off because their
“quick start”
involved git submodule add ..., which I feared would
just leave me in the same situation the next time I wanted to add a
post or update something. (Also, I didn’t want a theme; I wanted to
manage my own styles.)
I had recently seen shite, and loved the idea of a “simple” SSG built from composable command line tools, so I did the only logical thing and wrote my own static site generator.
What I Ended up With
My “final” version ended up supporting
- Markdown w/ YAML front matter
- pages
- posts (the actual blog)
- hot reloading
- navigating posts by tag
- RSS
There are 3 scripts:
-
setup.shto install dependencies (primarily Tailwind) build.shto actually build the blog-
serve.shto serve the blog locally, including simplistic hot reloading
The entire thing works off of ~5 dependencies:
- bash, to orchestrate everything
- awk and sed, to do some file & text processing
- pandoc, to convert Markdown into HTML, with templates
- SQLite, to ensure unique slugs, and to handle tags on posts
- Tailwind, for styling
- Prettier, for formatting the generated HTML
prettier is optional, obviously, but it’s my homage to
anyone that grew up learning HTML via “view page source”.
And the overall directory structure is just
_pagesfor pages_postsfor posts_templatesfor templates (noticing a theme?)-
_assetsfor non-static assets (ie my TW styles that need to be built into a final asset) -
assetsfor static assets, ie that can be used as-is
This works fairly well, but is not without its own challenges (see below). There are less than 500 lines of Bash code, it will build the site in a couple of seconds, and it does everything I need it to do and not much else.
Summary of Build Process
You can read the code for the details, but the gist of the build process boils down to:
- Render every post, storing the path and tags in a temporary database
-
For every tag in the database
- Create an empty Markdown file that only contains YAML front matter, which itself only contains an array of the posts for that tag1
- Render that file through a “blog index page” template
- Do the same thing, but for all blog posts, to build the main blog index
- Then, use the same list of posts to build the RSS feed
- Render every page (I don’t bother with a sitemap, so there’s no need to track these in the database.)
- Build the CSS with Tailwind
- Format all HTML with prettier
Again, this seems to work well, is interesting to work on, and it mostly all fits in my head.
Challenges
Data Handling in Bash
The initial challenge was how to handle the data processing in bash.
Processing each post isn’t too hard, but then later building an RSS
feed, or an index of posts by tag seemed either impossible, or
ridiculously cumbersome. In the end, I reached for
sqlite3 and built a database of pages and tags which I
could query later.
I also couldn’t figure out how to open an in-memory database for
SQLite (ie :memory:) and pass it around to multiple
commands. Using a file-based database has been fine, but I wonder if
an in-memory db wouldn’t be a little faster. I have my eye on
sqlite-shell-lib, but haven’t played with it yet.
Templating with Pandoc
Templating in pandoc seems to be
full featured
and I’ve been able to do most of what I wanted to do fairly easily.
One issue that I struggled with, though, is nesting or inheriting
templates. In some templating engines that I’ve use (eg Twig and
Blade), you can define a partial template (aka a view in Blade) and
– in that partial – set the template from which it extends. For
example, I would define a _post.blade.php partial and
then say that it @extends('main'), which instructs the
compiler to (hand waving) inline the contents
_post into the main template. With this,
you can – for example – render a post into the
_post template, and it will produce the correct,
complete final markup. Pandoc doesn’t seem to support this, instead
requiring you to use the “final” template and conditionally render
sub-templates within it.
This felt complicated (ha) considering the number of different page types I wanted to use (posts, pages, blog/tag indices, main pages, etc), and I went around in circles about how to resolve this.
Templating, Take 1
As for nested templates2, pandoc just doesn’t seem to support them. The workaround I came up with was to pipe pandoc output back onto itself, like
pandoc my-post.md --to=html --template=_post.html \
| pandoc --from=html --template=main.html --outfile=my-post.html
So, note that the first pandoc has Markdown as an
input, and is outputting HTML to stdout, using the
_post.html template. Then, then second
pandoc is reading HTML from stdin to generate an HTML
file, using the main.html template. This works!
But it’s messy! In particular, none of the metadata defined in the
original Markdown file is transferred to the second call to
pandoc. So we also need to work around this by passing
these into the second call explicitly, for example by adding
--variable=title:"$_TITLE". This becomes tedious very
quickly. I think that some of this could be mitigated by ensuring
that every template has a
<head>...</head> with relevant elements
filled out, but that becomes boilerplate that needs to exist on each
partial template. I’m not sure that my workaround is any better
though!
Another issue that was related to this is that pandoc requires a
<title> element for HTML documents. This makes
sense for final documents, but what about my partials? It doesn’t
make sense to add a <head><title>... to the
body of a blog post, but – if <title> is missing
– pandoc prints warning messages for every page:
[WARNING] This document format requires a nonempty <title> element.
Defaulting to '-' as the title.
To specify a title, use 'title' in metadata or --metadata title="..."
My unsatisfying workaround for this was to add dummy titles to every
partial: they looked like
<head><title>suppress</title></head>, suppressed the warning message, and were discarded in the final
template. But … 👎
As you can see, this approach “worked”, but was rife with workarounds and kludges. I was very happy to get rid it in favor of …
Templating, Take 2
While writing up this post and complaining about the limitations of
pandoc, another approach dawned on me. In the end, I ended up
shaving this yak using sed to simulate template
inheritance. By inlining the sub-templates into the main template at
build time, I’ve accomplished most of what I wanted:
# use sed to
# 1. read main template from _templates/main.html
# 2. insert contents of partial template in place of ${body}
# 3. delete ${body} (otherwise sed will print it, not replace it)
# 4. write the "composed" template to build/_post.template.html
# credit to https://stackoverflow.com/a/6790967
sed -e "/\${body}/{
r _templates/_post.html
d
}" \
_templates/main.html > build/_post.template.html
Now, instead of using
pandoc --template=_post.html | pandoc --template=main.html, I use a single call to
pandoc --template=build/_post.template.html and …
everything seems to work.
This avoids the messy <title> stuff, as well as
having to pass around the metadata that is already defined in the
Markdown front matter: pandoc just reads it and uses it as is.
I was also going to write about how pandoc doesn’t support writing HTML within a Markdown document, but it turns out that my previous “pandoc piped to pandoc” approach was stripping it out. This updated approach fixes that issue, too!
Other Challenges
I won’t go too deep into other challenges, except to note that setting up a watch + hot reload mechanism (via polling of a local PHP script) took a while to figure out.
Conclusion
As with many of my projects, this was infuriating at times, but ultimately fun and it was certainly a lot of learning. In the end, I do love the strong but simple conventions, as well as being able to fit it all into my head and easily tinker with it as needed/desired. As (if?) this blog grows and build times start to bug me, I suspect I may end up ejecting in favor of Zola. Then again, maybe I’ll look at Hugo; at a second glance, doesn’t look too bad…
-
These “empty” Markdown files look like this:
--- title: 'Blog | Danan Consulting' posts: - title: jx, a minimal reactive library for the frontend path: blog/jx-minimal-reactive-library published: 2026-01-03 - title: Building a DIY Static Site Generator path: blog/diy-ssg published: 2026-01-03 ---And then the
_blog_indextemplate does something like this${ for(posts) } <a href="/${it.path}">${it.title}</a> ${ endfor }In other words, we define an array of posts and links in the YAML, and then iterate over that list in the template.↩︎
-
In my case, I wanted to define a main template for all pages on the site, and then define sub-templates that customize some pages. For example, to have posts show published dates and tags while regular pages don’t. In other words,
-
/_posts/01-my-post.md→_post.html→main.html→/blog/my-post/index.html -
/_pages/experience.md→_page.html→main.html→/experience/index.html
In this case,
01-my-post.mdandexperience.mdboth have a YAMLtitle: ..., but neither_post.htmlnor_page.htmlhave a<head><title>element, because they’re just adding some extra markup to the content in the Markdown files. The actual<head><title>element is inmain.html.↩︎ -