Pelican Site Generator for my Neocities

Notes on my occasional process of customising a Pelican Static Site Project for my own purposes on Neocities.

I like Pelican; it’s the first static site generator that I’ve bothered learning, but even if it wasn’t, I still would probably have a good opinion of it. Even considering, that I seem to start my projects in Pelican by disabling a lot of its features, and changing many defaults. Sometimes, even customising something can be (mostly) pleasant.

Language & Webpage Filenames

My first change is usually the multilingual features. I don’t like that Pelican generates translation counterparts of pages by taking a “main language” page and then using that to name the files for the others, with their own language stuck on the end. I prefer my page translations to have no favourites or hierarchies, and for all the URL’s and filenames to be treated equally. The only thing I need for page translations, is an attribute for their shared id.

In that same vein, I also don’t want my page URL’s and saved filenames to be named after the title i write in the text files. I prefer just keeping my filenames.

With both of these requirements, I configure my pelican project like so (in pelicanconf.py, of course!):

ARTICLE_URL = ARTICLE_LANG_URL = \
    PAGE_URL = PAGE_LANG_URL = '/{path_no_ext}'

ARTICLE_SAVE_AS = ARTICLE_LANG_SAVE_AS = \
    PAGE_SAVE_AS = PAGE_LANG_SAVE_AS = '{path_no_ext}' + EXT

I set a bunch of the config’s variables at once by piping them through a centipede of equal signs, and I make it a bit prettier and multiline with backslashes. Don’t forget white spaces on the line after!

It’s important to also not forget the LANGs too, otherwise Pelican uses it’s default filenaming style for pages with translation id’s.

I set the translation_id variable name to something shorter, like “tid”.

ARTICLE_TRANSLATION_ID = PAGE_TRANSLATION_ID = 'tid'

That’s all the basic customisation that I can think of for now. I will probably add more to this article, or maybe even remove parts, as time goes on.

Custom Plugins

Img Utils Plugin


import logging
import re
import os
from pelican import signals
from bs4 import BeautifulSoup
from os.path import exists
from PIL import Image
import logging
log = logging.getLogger(__name__)


class ImgUtils():
    """ Image utilities for HTML pelican site contents.
    Modifies generated HTML <img> tags based on image data.
    TODO:
        instead of storing all detected images data itself,
        try looking for images in pelican's own list of transferred
        objects.
    """
    images = {}
    firstrun = False
    def __init__(self, entry):
        self.entrypath = str(entry)
        if not self.firstrun:
            self.settings = entry.settings
            self.config = self.settings["IMG_UTILS"]
            self.firstrun = True
        soup = BeautifulSoup(entry._content,'html.parser')
        tags = soup.find_all("img",src=True)
        for tag in tags:
            src = tag.get("src")
            data = self.get_data(src)
            self.edit_tag(tag, src, data)
            par = tag.parent
            # if <p><img></p>, turn p into div.
            if (par.name == 'p' and len(par.contents) == 1):
                par.name = 'div'
            # turn description-list image into figure image.
            if ( tag.has_attr("fig") and par.name == "dt" ):
                self.html_figure(tag,par)
            if re.match(r"^\/", tag['src']):
                tag['src'] = self.settings["SITEURL"] + tag['src']

        atags = soup.find_all("a",{'href':re.compile('^\/')})

        for tag in atags:
            ext = ""
            if re.match(r"^\/[a-zA-Z0-9]+", tag['href']):
                ext = ".html"
            tag['href'] = self.settings["SITEURL"] + tag['href'] + ext

        entry._content = str(soup)

    def html_figure(self,a,b):
        dl = b.parent
        dl.dd.name = "figcaption"
        dl.name = "figure"
        dl.insert(0,a)
        del a["fig"]
        b.decompose()

    def get_data(self,src):
        d = None
        if src in self.images:
            d = self.images[src]
        else:
            filesrc = self.settings["PATH"] + src
            if exists(filesrc):
                d = self.get_im_data(filesrc)
                self.images[src] = d
            else:
                log.warning("image not found at %s \n for %s",
                filesrc,self.entrypath)
        return d

    def get_im_data(self,p):
        with Image.open( p ) as f:
            d = f.__dict__
            d['width'] = f.width
            d['height'] = f.height
        d['filesize'] = os.path.getsize( p )
        return d

    def edit_tag(self, tag, src, data):
        conf = self.config
        if conf["img_regex"]:
            regp = re.compile(conf["img_regex"])
            if re.match(regp, src):
                self.set_size(tag, data)
        if conf['transparency_attribute']:
            self.set_alpha_attr(tag, data)

    def set_size(self, t, d1):
        # tag["scale"]
        for i in ['width','height']:
            if i in t: continue
            if d1:
                t[i] = d1.get(i)

    def set_alpha(self, t, d2):
        b = self.conf['transparency_attribute']
        if 'transparency' in d2.get('info'):
            t[b[0]].append(b[1])

def imgutils_entries(generator):
    for entries in ['articles','pages']:
        for e in getattr(generator, entries, []):
            ImgUtils(e)

def register():
    signals.article_generator_finalized.connect(imgutils_entries)
    signals.page_generator_finalized.connect(imgutils_entries)

Extra options and utilities, for <img> tags that were generated for text content. Can add width and height attributes automatically based on the source image’s path.

Deliberately set at the HTML level, instead of Markdown or Restructured-text, because it’s both more universal and easier to program for.

TODO:

Generate custom attributes or values based on transparency channel.

Carré

2024-03-13