Managing a Bunch of Servers

It is 2017 and you just acquired your first domain. You start out small, with a webhosting package from some hoster. After a while, you realize that there are limits. You can’t publish your Python webapp. So, you rent your first VPS and set it up. It resembles the webhosting from before, with the addition of some apps you built.

One day you notice that your registrar takes ages to update your DNS entries. So you go about setting up two authoritative DNS Servers. Now you have three servers.

You subscribe to some mailing lists and create accounts on some websites. Your inbox gets fuller each day. Setting up a a mail server allows you to split your mailboxes. Now there are four servers with overlapping configurations, and all of them have to be managed.

At least that is how it went for me. Here are some of the lessons I learned along the way:

State is important

Your webserver has some kind of state, made up by all the programs installed, the configuration options set, and the data stored. Knowing what that state is and being able to (re)set your server to a known-good state is important for a few reasons: You will change some settings. Experimenting or making temporary adjustments changes the server’s state. If you make mistakes, you can just reset to your known-good state. If you ever wonder why something is the way it is or how it should be, you can simply look that up.

Lesson #1: Keep a single source of truth for your servers state, and make sure you can roll back to it.

Multiple Servers get repetitive fast

Having multiple servers interact with each other is fun! The downside is that changing the same option on every server can get boring. You might have to solve some problems with the first server, but after that it is quite tedious. Having help here is great. That is what Infrastructure-as-Code is for. You describe your servers with code, and the IaC tool helps to set them up and keep them in the state you defined.

Lesson #2: Infrastructure-as-Code is your friend, especially for similar servers.

Access is a necessary evil

Having your servers in a well-defined state is not everything. You have to make sure that no-one messes with them. Restrict access to as little as possible, while keeping the necessary doors open (but only as much as necessary). Change your access points away from standard ports. Use firewalls and tools to rate limit connections.

Lesson #3: Make it harder to break into your server.

An easy way to greatly improve your access security is by using SSH keys instead of passwords. If you want to take it one step further you can use SSH certificates. Create your own certificate authority and sign your server and user keys. [1]

Lesson #4: Cryptography is better than passwords, especially if it authenticates both user and server.

Monitor them

Last but not least: pay attention to your servers and setup a monitoring and alerting tool. If you know when something is wrong, you can fix it. If you know something could go wrong soon, you can take preemptive measures. Pay attention to unusual patterns. You don’t have to start out with 10 Grafana dashboards, a simple Munin instance might suffice.

Lesson #5: Listen to your servers.

Have fun!

Servers are great. Play around with them. Start your own DevOps pipeline. You will learn a ton.

Footnotes

1.

Have a look at the CERTIFICATES section in man 1 ssh-keygen. There are basically these commands:

create your certificate authorities:
ssh-keygen -f user_ca
ssh-keygen -f server_ca

sign a user’s key:
ssh-keygen -s <user_ca_key_file> -I "<username> user key" -n "<username>" -V -5m:+365d <keyfile>

sign a server’s key:
ssh-keygen -s <server_ca_key_file> -I "<servername> server key" -h -n <servername> -V -5m:+365d <keyfile>

-I describes the key identity
-h signs a host key
-n describes the principal (who owns the key, either the server or the user. This is the important part for authentication)
-V describes the keys validity time.

Then set TrustedUserCAKeys and AuthorizedPrincipalsFile in sshd_config