Who Decides DevOps Best Practices? | stack.io

You’ve probably heard the term “best practices” thrown about before but what does it actually mean? If you asked three different people to define “best practices”, you’d get three different answers. Who’s making these “best practices”? Why are they “the best”? Are any of them actually worth following?

There is no true authority on “best practices” for everything. Though a lot of engineers will point the finger at Google’s SRE book, careful readers will notice there is a lot missing there. Google includes no technical details on what you should or should not implement. It’s very difficult to read the book and translate its contents into distinct action items for a team. Most of these "best practices” are general guidelines that only prove useful if you’re in the process of overhauling an existing system, starting a new project, or building a new ops team.

So, who is coming up with all of these “best practices" that you hear about all of the time? And what makes them a best practice?

In Google’s case, much of the SRE book’s fame comes simply from the fact that it’s written by Google. The rest comes from it being pretty the only widely publicized instance of anyone sitting down to write one for operations work. That said, guides to “best practices” are not uncommon – most server-side software will include a set of best practices for you to implement in its blog posts or documentation.

In almost every case – all of these “best practices” have a few things in common:

Someone, somewhere, thinks that they have a good idea that you should be doing too.
Whoever wrote these “best practices” genuinely wants to help you. The person who wrote them either thinks it will save you effort or save you pain, but in almost every case, they wrote things down for you to benefit from their experience. (I use the word “almost” because there are definitely people on the internet trying to sell you things by telling you it’s a “best practice”.)
Despite the (typically) helpful nature of these “best practices”, no one is forcing you to implement them. At the end of the day, it’s ultimately up to you whether you implement them or not.
Even if someone wrote down some “best practices” that you feel are questionable, there are regularly things to be gained by considering them or having a discussion on why you’re not implementing them. If it’s relevant, we suggest writing down why you chose not to do something as a favour to the next person (or future you) who is wondering why things were done a certain way. (Yes, we just snuck a best practice into an article about best practices.)

The Example: SSH

Here at stack.io, we believe there are several factors that separate the best "best practices” from the “not so best”.

There should be a clear and demonstrable benefit to following the advice of someone on the internet. If something is only a good idea in certain scenarios, there will be a set of criteria that tells you whether or not it’s the right choice for your environment.

Good “best practices” are actionable. There should be something you can immediately do, or a clear plan of action for your team if it takes more than a few days to implement.

As an example, here’s a set of “good” best practices for something (we assume) you’re already familiar with: SSH. All of these best practices are focused on improving SSH security or making it easier to use.

Disable password logins completely (“PasswordAuthentication no” in /etc/ssh/sshd_config). Passwords are frequently reused, and are relatively easy to brute force. Using SSH public key authentication instead of passwords is the single biggest change you can make to avoid someone hacking into a server through SSH.
Disable root logins (“PermitRootLogin no” in /etc/ssh/sshd_config). There is no valid reason for the root user to login to a server directly over SSH – it is better to have a user login and then sudo to root because you can see a trail of who used root privileges this way (not to mention it allows you to have the option of setting up specific sudo rules for which commands a user is allowed to run instead of letting them run everything).
Install fail2ban on any internet-facing servers and set up either permanent bans, or bans of a reasonably long duration (>30 min). fail2ban is a tool that automatically bans all connections from an IP address after a certain number of failures and is spectacularly effective in preventing brute force attacks against sshd (Fail X number of times? That IP is banned for the next Y minutes).
Use the SSH cryptographic settings suggested here under “common configuration settings for the enterprise”. These have no impact on normal usability, but we’ve found that implementing these settings results in most automated attackers’ authentication attempts failing before they can even attempt to authenticate against sshd (many attackers will try to force the use of less secure crypto algorithms and this stops that).
If you have a lot of users (>10) and servers (>50) consider using a tool like FreeIPA to manage your SSH keys and access rules. Managing who has access to each server (and what sudo commands they are authorized to run) quickly turns into a full-time job if you’re not careful. FreeIPA is a major time-saver because it lets you centrally manage all of your users and access rules from a single server (you can also define these users and rules via Ansible).

Curious about more best practices? Explore the DevOps Maturity checklists in our services for additional tips. Even if you choose not to implement all of them, hopefully this article has helped separate the good “best practices” from the bad. Don’t know how to implement something? Get in touch - we’d love to help you out.

How Do We Get "Best Practices"?

So, who is coming up with all of these “best practices" that you hear about all of the time? And what makes them a best practice?

The Example: SSH

Rob Newsome

We are the Ops Side of DevOps…