A long time ago, I set up a WordPress blog for a family member. There are lots of options these days, but back then there were few decent choices if you needed a web-based CMS with a WYSIWYG editor. An unfortunate side effect of things working well is that the blog has generated a lot of content over time. That means I was also regularly updating WordPress to protect against the exploits that are constantly popping up.
So I decided to convince the family member that switching to Hugo would be relatively easy, and the blog could then be hosted on GitLab. But trying to extract all that content and convert it to Markdown turned into a huge hassle. There were automated scripts that got me 95% there, but nothing worked perfectly. Manually updating all the posts was not something I wanted to do, so eventually, I gave up trying to move the blog.
Recently, I started thinking about this again and realized there was a solution I hadn't considered: I could continue maintaining the WordPress server but set it up to publish a static mirror and serve that with GitLab Pages (or GitHub Pages if you like). This would allow me to automate Let's Encrypt certificate renewals as well as eliminate the security concerns associated with hosting a WordPress site. This would, however, mean comments would stop working, but that feels like a minor loss in this case because the blog did not garner many comments.
Here's the solution I came up with, which so far seems to be working well:
- Host WordPress site at URL that is not linked to or from anywhere else to reduce the odds of it being exploited. In this example, we'll use http://private.localconspiracy.com (even though this site is actually built with Pelican).
- Set up hosting on GitLab Pages for the public URL https://www.localconspiracy.com.
- Add a cron job that determines when the last-built date differs between the two URLs; if the build dates differ, mirror the WordPress version.
- After mirroring with
wget
, update all links from "private" version to "public" version. - Do a
git push
to publish the new content.
These are the two scripts I use:
check-diff.sh
(called by cron every 15 minutes)
#!/bin/bash
ORIGINDATE="$(curl -v --silent http://private.localconspiracy.com/feed/ 2>&1|grep lastBuildDate)"
PUBDATE="$(curl -v --silent https://www.localconspiracy.com/feed/ 2>&1|grep lastBuildDate)"
if [ "$ORIGINDATE" != "$PUBDATE" ]
then
/home/doc/repos/localconspiracy/mirror.sh
fi
mirror.sh:
#!/bin/sh
cd /home/doc/repos/localconspiracy
wget \
--mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--retry-connrefused \
--exclude-directories=comments \
--execute robots=off \
http://private.localconspiracy.com
git rm -rf public/*
mv private.localconspiracy.com/* public/.
rmdir private.localconspiracy.com
find ./public/ -type f -exec sed -i -e 's|http://private.localconspiracy|https://www.localconspiracy|g' {} \;
find ./public/ -type f -exec sed -i -e 's|http://www.localconspiracy|https://www.localconspiracy|g' {} \;
git add public/*
git commit -m "new snapshot"
git push origin master
That's it! Now, when the blog is changed, within 15 minutes the site is mirrored to a static version and pushed up to the repo where it will be reflected in GitLab pages.
This concept could be extended a little further if you wanted to run WordPress locally. In that case, you would not need a server to host your WordPress blog; you could just run it on your local machine. In that scenario, there's no chance of your blog getting exploited. As long as you can run wget
against it locally, you could use the approach outlined above to have a WordPress site hosted on GitLab Pages.
This article was originally posted at Local Conspiracy. Reposted with permission.
1 Comment