As part of my recent Move to Hugo I wrote a few small tools that may be useful (with tweaking) for someone else doing the same, or moving from a similar static hosting platform.
Basic configuration
I was extremely happy to find that I could keep my existing permalink structure just by editing config.toml
:
[permalinks]
post = "/:year/:month/:title/"
Importing content
The first part of content was easy, as Nikola and Hugo have very similar methods for storing static files. I just had to copy the contents of my serialized-nikola/files
tree to my serialized-hugo/static
tree.
The next part was a little trickier. Nikola:
- Supports Restructured Text, which I used for a few posts
- Has a different frontmatter format (Restructured Text style, wrapped in HTML comments for markdown)
- Has a different date syntax – actually, supports more flexible dates, so I had several syntaxes in play.
I had enough posts (137) that doing this by hand would have been not fun at all, so of course, I scripted it. It’s not a generically useful script – I even hard coded in some of my paths – but if you’re doing a similar migration it might be a good starting point. Nikola has a crazy-high degree of flexibility, this script specifically only handles the subset of what I was using.
Please also note, for these one-and-done scripts I tend to ignore my typically rigorous testing and error checking habits. 😁
Here’s the process:
Yet again, pandoc to the rescue, as it made converting from Restructured Text to Markdown a breeze.
args = ['pandoc', '--from=rst', '--to=markdown', '--output=-']
args.append(srcpath)
data['content'] = subprocess.check_output(args)
I went through the majority of the posts by hand, and there were only a few things that got left behind (that I noticed), like YouTube embed codes, that were easy to fix up by hand. It was really incredible to run the script and in a matter of seconds have the livereload refresh to reveal a fully functional site.
Spring Cleaning
While I was migrating, I realized there were a lot of images and random other files which were no longer used, many from posts which I had retired long ago. Almost all static site generators (including hugo) do struggle with image/post locality; there’s a good discussion in a github issue.. Because of this, I had about 500 files in my static/
directory, and I had no idea which were still being referenced or not.
Update 2019-09-24: Today I learned that Hugo fixed the above problem a few years ago. As of 0.32 (6 months after this was posted) you can do page bundles. This means instead of
content/post/mypost.md
, you can docontent/post/mypost/index.md
and also put images incontent/post/mypost/myimage.png
, where you can refer to it in markdown as. Super convenient and A+ lovely for organization. Thanks Hugo!
Thanks to all the posts being in markdown, I realized the paths would have to show up in those files, so built out a simple tool to
- find all the files in
static/
, and normalize the paths to match what they actually look like from the webserver - Open every file in the content tree, and keep track of any of the static file paths which appear in them
- Output all the static files which have zero references
This tool is in go: unused_images.go.
Go’s, unsuprisingly, very powerful when doing this kind of task. A snippet showing the gathering process:
var seen map[string]int
func findImages(path string, f os.FileInfo, err error) error {
imageRe := regexp.MustCompile("images/.*$")
seen[imageRe.FindString(path)] = 0
return nil
}
func main() {
...
seen = make(map[string]int)
filepath.Walk(images, findImages)
...
}
It ran seemingly instantly, and spit out a list of over 100 files that could be deleted, which is excellent (and 100 images fewer to have to check state on every time the deploy/sync process runs.)