This is the first in a series of posts in which I’ll explain how I currently maintain my academic website (here). By “maintain”, I mean everything from editing the site locally on my PC, to pushing the changes to the remote McGill server that hosts the site, to version–controlling it all with git. The best thing about it: no browser (or any GUI at all) is required—everything happens in the terminal—and I still don’t have to deal with HTML. The method should work for any simple website with static content.
Also, I realize that my site is currently pretty small, and there’s not much to update very often, so my method may seem overly complex, but (i) it was a fun learning experience setting it all up, and (ii) as my site grows, I think my method will make site maintenance way easier than it otherwise would be, while also keeping the actual site simple (see next).
KISS disclaimer. Before I begin, I should mention that when it comes to
professional websites, I believe that the KISS principle should be
followed: Keep it simple, stupid! For me, that means no flashy
banners or animations, no crazy amount of fonts or colors, etc. For this post,
I’ll assume we’re dealing with a website with a single page called
(but extending the method to multiple pages should be trivial) consisting of
little more than basic text (headers, links, lists) and basic formatting (bold,
italics) and maybe an image or two. Of course, feel free to go crazy in your
CSS stylesheet… but please, no neon.
Here is my general workflow for site maintenance.
- Edit the site locally on my PC, using markdown and pandoc.
- Push changes to the remote server hosting the site, using
- Track changes using git, and push changes to GitHub (or similar) for version control.
Each post in this series will cover one step in detail, including the various scripts I’ve hacked together to automate it all.
Editing the site
Editing your webpage could be as simple as opening
index.html in your
favorite text editor and hammering away, and in fact, that’s what I used to do.
But I really hate editing HTML. To me, the tags make everything ugly and
unreadable, and since I’m no web developer, I never know the proper way to do
things anyway. Is it
<br />? Do I close with
</p>, or is that
unnecessary? I dunno!
That’s why now I write exclusively in markdown. In fact, I write this blog in markdown, I write my notes in markdown, I write emails (mostly) in markdown. When I want HTML, I just use pandoc to automagically convert the markdown to HTML and add the necessary HTML header stuff.
The rest of this post will explain the merits of markdown and pandoc and how I use them together to write my webpage. Here’s the breakdown:
Markdown was designed as a way to write highly readable plain text that can be converted into HTML while also faithfully reproducing lists, textual emphasis, links, etc. Take, for example, the following simple HTML code:
<h1>My supercool site</h1> Welcome to the <em>best</em> academic site in the world! Here are my research interests: <ul> <li>Stuff</li> <li>Junk</li> <li>More stuff</li> </ul> You can download my CV <a href="cv.pdf">here</a>!
Okay, okay, it’s not as ugly and illegible as I made it out to be (that is, as
span tags everywhere),
but compare it with the totally equivalent markdown version:
My supercool site ================= Welcome to the *best* site in the world! Here are my research interests: - Stuff - Junk - More stuff You can download my CV [here](cv.pdf).
If you’ve never seen markdown before, you might not even realize that there’s
anything “special” about the text above. By “special”, I mean syntactically:
the fact that
[..](..) have special meanings when they
occur in markdown. The text reads very naturally, as if it were written just to
look good but without any well–defined syntactic meaning. But it’s much more
than just good–looking text.
In markdown, underlining text with
= makes it a main header, surrounding text
* makes it emphatic (usually rendered as italics), listing things with
*) turns them into, well, lists, and so forth. Only the link syntax
is slightly unintuitive (maybe), but it’s easy to learn, and if you use
reddit or stackoverflow, you probably already know it.
The basic markdown syntax is pretty simple, and yet also quite comprehensive. As long as you’re maintaining a simple, KISS–type website, markdown should serve you well. That’s all I’ll say about markdown syntax; for more info, head to the markdown website, and definitely read the entire syntax page. (It’s not long, which is a testament to markdown’s simplicity.)
OK, so you’ve got a markdown file, like
index.markdown, with some headers,
paragraphs, lists, links. Now what? Enter pandoc, the Swiss army knife of
document conversion tools. You can convert your file to HTML5, , and
plenty other formats, but we’ll stick with HTML5.
Moreover, pandoc understands a superset of markdown, i.e., a sort of extended markdown. For example, you can add metadata to the top of your markdown file that pandoc can use when creating HTML header info (see below), and you can do other cool things like add footnotes. Following the KISS principle, though, we’ll stick with normal, non–extended markdown.
(Note that pandoc is not the only conversion tool you can use. Markdown comes
with its own perl script,
Markdown.pl, and there’s also kramdown,
maruku, etc., written in Ruby. But pandoc has some really useful options
that’ll make life easier, as we’ll see below. If you prefer Ruby over Haskell,
try one of the above.)
The basic command is:
$ pandoc -f markdown -t html5 -o index.html index.markdown
$is the terminal prompt; don’t type this.
-f markdownmeans convert from markdown.
-t html5means convert to HTML5.
-o index.htmlmeans make
index.htmlthe output filename.
index.markdownis the main argument of
pandoc; it’s the file we’re converting.
Assuming you have a basic
index.markdown with no extra HTML in it (oh, I
forgot to mention that markdown can include arbitrary HTML, but remember,
KISS), then pandoc will produce an HTML file that has all the basic HTML tags
corresponding to your markdown syntax, but it’ll lack any metadata
(the stuff inside
<head>) , like what the title of the website is—which is
very important, since, for example, that’s what people will see in Google
search results—and who the author is, also very important, if you want people
to find your site when they Google you.
Luckily, pandoc provides a commandline option for automatically generating
header info, the
DOCTYPE, etc.: it’s
-s, meaning standalone, as in the
output can stand on its own as an HTML file.
-s still won’t add a title or an author. There are two ways to
do that. First, as mentioned above, you can add pandoc–specific metadata to
your markdown file, like this:
% My title % My name My supercool site ================= ...
When pandoc parses the file, it’ll see the top two lines starting with
from them generate title and author info for the header. However, I don’t like
this method because it dirties up the markdown file, in the sense that
index.markdown is now written in pandoc–specific, extended markdown. (If you
later decided to convert your file with, say, kramdown, or if you viewed it in
your browser on GitHub, then you’d see those two
% lines in your HTML
output.) But if that doesn’t bother you, by all means use this method.
A second way to supply title and author info is by explicitly telling pandoc what values to use for its internal author and title variables:
$ pandoc -V pagetitle="My title" -V author-meta="My name" ...
Since I use a personalized script (see below) to run pandoc, I prefer this method because I can keep this metainfo inside my script and not inside the markdown file itself. Separation of main content and meta–content is important!
Now what if you want to add more header info for which pandoc doesn’t have
internal variables or command–line options? That’s easy: create
(or whatever you want to call it), throw in whatever HTML you want in your
header (except title and author), and run
$ pandoc -H header.html ...
header.html is a great place to add optional stuff like
keyword metadata, the URL to your favicon (if you have one), and any
Google Analytics code.
You can also have a
before-body.html file which, if
-B before-body.html is
used, will be inserted as the very first thing after
<body>. I use this to
hold the code that puts my picture in the top–right corner of my webpage. The
reason I do this is that markdown doesn’t deal with images very well, so I need
div and other ugly–looking HTML. Plus, I don’t feel that an image is
part of the main content anyway; if I wanted to give someone a text–only
version of my page, I’d like to be able to give them the markdown source, with
no image code.
Also, as you probably guessed, you can have an
after-body.html file which, if
-A after-body.html is used, will be inserted as the last thing before
</body>. This is useful if you want, say, a footer that’s not semantically
part of the main page, e.g., a “last updated: …” line.
(Note: the mnemonic is that
-B specifies what goes before the body, and
-A what goes after, but keep in mind that both contents do ultimately end
up inside of
Putting it all together
All right, so you’ve got
index.markdown, and maybe also
after-body.html. You probably also have a CSS
mystyle.css, which you can tell pandoc about with
mystyle.css (or you can refer to it yourself in
Here’s what your command will look like:
pandoc \ -c mystyle.css \ -H header.html -B before-body.html -A after-body.html \ -V pagetitle="My title" -V author-meta="My name" \ -f markdown -t html5 -o index.html index.markdown
Wow, that’s a lot to type each time you want to convert a newly modified markdown file into HTML. Better put that inside a script. Let’s also put each of those things into a variable, so that we can easily modify the script command by changing variables rather than the command itself.
#!/bin/bash TITLE="My title" AUTHOR="My name" IN_FILE="./index.markdown" OUT_FILE="./index.html" CSS="./mystyle.css" HEADER="./header.html" BEFORE="./before-body.html" AFTER="./after-body.html" # Convert markdown to html5. pandoc -c "$CSS" \ -H "$HEADER" -B "$BEFORE" -A "$AFTER" \ -V pagetitle="$TITLE" -V author-meta="$AUTHOR" \ -f markdown -t html5 -o "$OUT_FILE" "$IN_FILE"
Save this as, say,
md2html, make it executable with
chmod +x md2html, plop
it inside the website directory containing
index.markdown, and simply run:
You should now see
index.html in the same directory, which you can open in
your browser to inspect and make sure it looks good.
And there you have it. Now, whenever you need to edit your webpage, you can deal
index.markdown using your favorite text editor, save the
changes, and run
md2html to (re–)generate
In the next post, I’ll explain how to push your website onto a remote server,
e.g., a university server, using ssh and rsync inside a script. The end result
is that, in the same way that
md2html does the whole conversion in one fell
swoop, so too will
push-website push your site in one fell swoop: no passwords
or GUI clicking required.