This is the second in a series of posts on how I currently maintain my academic website (here). In the first post, I explained how I write and edit my site using the simple and intuitive syntax of markdown, rather than pure HTML, and convert from markdown to HTML using pandoc. I also explained how I modularize my website into (i) main content, the stuff written in markdown, which evolves over time, and (ii) metacontent, which is kept in separate header, footer, etc. files, which is more static; and I showed how pandoc can combine all such content into one standalone HTML page. I gave a pretty basic script for automating all that each time any part of the site is edited.
In this post I’ll explain how to automate another important aspect of site maintenance: pushing the website from your local PC to the remote server hosting the website, e.g., a university server (McGill’s, in my case). For this task, we’ll be using ssh and rsync.
Pushing to server
Alright, so you’ve got a website all set up, and the directory structure looks something like this.
after-body.html before-body.html favicon.png files \--- handout-stuff.pdf \--- handout-junk.pdf header.html images \--- pic-of-me.jpg index.html index.markdown md2html mystyle.css
(Most of these files are optional; all you really need is
md2htmlsh. But for completeness, I’ll assume we’re dealing
with a CSS stylesheet, some images, downloadable files, etc.)
Essentially, we want to transfer the all necessary website components from a
local PC location to a remote server. The way we do this is with ssh (actually,
the suite of utilities provided by OpenSSH, including
You know, instead of talking about “you” or “me”, it’ll be easier to talk about
a hypothetical third person. Meet Bob. Bob’s website is located on his PC in
/home/bob/website. Bob attends ABC University, which has been
kind enough to give Bob some server space for his website. They also tell Bob
he can access his server space remotely using “secure shell access”.
What this means is that, while Bob is sitting on his couch in his apartment on his own PC, he can access/log onto his university server. How so? With ssh, a secure access utility provided OpenSSH.
Let’s assume that Bob’s university login name is
bob22, because he’s the 22nd
Bob, and so his login name is
email@example.com. Then he can access the server
with the following simple command (recall that
$ is the command–line prompt;
don’t type it):
$ ssh firstname.lastname@example.org
Pretty easy. After executing this command, Bob will be prompted for his
university password, which happens to be
iluvssh (but don’t tell anyone). He
enters the password and is greeted with something like:
Welcome to the ABC University server! Blah blah blah, GNU/Linux license stuff, no warranty, yada yada. bob22@abc:~$
Bob went from being inside his personal home directory to being on his home
directory on his uni server, hence why
$ is the prompt in both cases. Note,
however, that Bob’s local home directory is
/home/bob, whereis his remote uni
one is (probably)
bob22@abc:~$ echo $HOME /home/bob22
Bob looks around in his home directory, and he notices two folders:
bob22@abc:~$ ls private public_html
private is for stuff that no other students/users of that server
has access to;
public_html is where Bob needs to put his website. But how
does he do that? Right now, he’s “inside” his uni home, with no way look at his
PC home, except in another shell, but then in that shell he would have no way
to look at his uni home. That is, the two shells could not “communicate”, as it
scp, or secure copy. First Bob exits from his uni server with
putting him back into his ordinary PC home. Now he can do this:
$ scp ~/website/index.html email@example.com:/home/bob22/public_html
This command (securely) copies the file
index.html from the local home
/home/bob, over to Bob’s university home directory,
and into the
But there’s a snag: Bob has to enter his password again. How annoying. In fact,
each time Bob runs
scp, he has to enter his password. If only there
were a way for Bob’s uni server to recognize that it’s Bob (or Bob’s PC)
requesting access, so that Bob doesn’t always have to type
Well, there is a way: ssh identity files (or keys). Basically, Bob generates a pair of keys—one private, which he keeps on his PC, and one public, which he sends over to the server. The server, since it has Bob’s public key, can recognize and grant access to anyone having Bob’s private key. Obviously, Bob should not share the private key (the public one doesn’t matter).
The command for all this is:
$ ssh-keygen -f abc -t rsa -C 'ABC University'
-fspecifies the outut filename.
-tspecifies the encryption type. I use RSA, but DSA is fine too.
-Cis an optional comment; use it to describe what the key is for.
(You’ll be asked to specify a passphrase, which is optional.)
After running this command, Bob has two files:
abc, his personal identity
abc.pub, the public one. He should first put
abc into the
~/.ssh, where any other keys are located, too:
$ mkdir ~/.ssh # create this directory, if not already existing $ mv abc ~/.ssh/
(Bob could also have simply run
ssh-keygen from inside
~/.ssh to begin
Now he needs to get
abc.pub onto the remote server. That’s easy:
$ scp ~/abc.pub firstname.lastname@example.org:/home/bob22
But that’s not quite enough. The way OpenSSH works is that the public key has
to be concatenated to a file
authorized_keys, located in the remote
which contains all public keys needed by Bob’s remote server. To do that, Bob
must ssh one more time onto the server, create
~/.ssh if necessary, append
authorized_keys, change the permissions on
that only Bob can read and write to it, and finally delete
$ ssh email@example.com Welcome! ... bob22@abc:~$ ls abc.pub private public_html bob22@abc:~$ mkdir ~/.ssh bob22@abc:~$ cat ~/abc.pub >> ~/.ssh/authorized_keys bob22@abc:~$ chmod 600 ~/.ssh/authorized_keys bob22@abc:~$ rm abc.pub bob22@abc:~$ exit
If all went well, Bob should now be able to ssh onto the server without
iluvssh every time. Cool!
ssh config file
But there’s another snag: What if Bob’s username were actually
and/or what if his university’s domain name were actually
It’d be pretty annoying to type all that out every time Bob wanted to
onto the server or
scp something over to it. Sure, Bob could create a shell
alias for it, but ssh offers an easy solution: an ssh config file. Bob can
simply create a file
~/.ssh/config that looks like this:
Host abc User bob22 HostName abc.edu IdentityFile ~/.ssh/abc
The keywords are pretty straightforward. The only one worth discussing is
Host: this is the name that this particular entry goes by, and it’s that name
which, when used in a shell or script, is equivalent to
other words, typing
$ ssh abc
is equivalent to typing
$ ssh firstname.lastname@example.org
$ scp blah.txt abc:/home/bob22
is equivalent to typing
$ scp blah.txt email@example.com:/home/bob22
You can see how a config file drastically simplifies things.
Now all Bob has to do is
scp over all the necessary website files. He could
do this manually, or write a script. If he wrote a script, then any time he
edited or added a file locally, he could then run the script to update the
remote website directory. However, if I’m not mistaken, all files, even those
untouched, would be copied over every time. There may be a smart way to use
scp to handle this problem, but in any case, I prefer rsync for all major
copying/backing up of anything.
Rsync is a great tool for copying or backing up data. Here are some advantages
that it has over
- It’s smart enough to skip transferring files that are “the same”, in some sense, on the local and remote machines: e.g., if they have the same name and size, and/or same last edit timestamp, and/or same md5sum check, etc.
- When it copies over files that have been changed, it only transfers the changes, which speeds things up dramatically.
- It allows you to specify an “exclude” file that lists files it should exclude from transfer. (Conversely, you can specify an “include” file that lists the only files that should be transferred.)
- Importantly for our (or Bob’s) purposes, it seamlessly integrates ssh.
…and so forth.
Since this post is already pretty long, I’ll wrap up with a simple rsync script
push-website, stored in Bob’s
~/website directory, which transfers
Bob’s website from his local PC to his remote server’s
It integrates an include file as well as a log file, both of which are stored in
a (hidden) directory
#!/bin/bash SRC="$HOME/website" DEST="abc:/home/bob22/public_html" EXCL="$SRC/.push-website/exclude-list" LOG="$SRC/.push-website/log" rsync \ -avhhh \ --exclude-from=$EXCL \ --log-file=$LOG \ $SRC/ $DEST/
-vmeans verbose (make rsync say what it’s doing while it runs).
-hhhmeans extra human readable, e.g., “2M” instead of “2000”.
Important. The forward slash,
$SRC/ is crucial. It tells rsync
to transfer the contents of the source directory into the destination
directory, rather than transfering
$SRC itself. See
man rsync for more
info. It’s useful to read about all the rsync options.
So now Bob can update his site very simply by editing
md2html to convert to HTML, and running
push-website to push the changes to
his university server.
~/website $ vim index.markdown # edit, edit, edit, save, quit ~/website $ ./md2html # convert to HTML ~/website $ ./push-website # push changes to remote server
Nice! By the way, here are some things that are good to keep in the exclude file:
All of that is already integrated into
md2html is executed.
In fact, you also don’t need to transfer over
either, or the directory
Really, you just need to transfer the main HTML file
mystyle.css, and any downloadables, like stuff in
Important. Make sure that the permissions of all files are properly set on the remote server. In particular, things that you want to be viewed (pages, images) or downloaded (files) must allow read and (maybe) execute privileges set. If an image fails to show up, or if clicking a link lands you on a “Forbidden” page, then the permissions are not set right.
In the last part of this series, I’ll explain how to version control your
website, scripts, etc. using git and GitHub. The setup will be a lot like the
above, because sites like GitHub and BitBucket use ssh for remote access. We’ll
simply generate a new ssh key pair, plop the public one onto GitHub, and add a
github entry in
~/.ssh/config. Easy stuff.