DevOps is easy to get wrong, hard to get right, and even harder to know if you actually got it right. Modern technology is incredibly complex, and most companies struggle with the sheer number of technologies their staff are expected to master. DevOps engineers (or whatever you might call them at your company) are expected to be experts on:
A wide variety of operating systems including Linux, Windows, and sometimes BSD.
Multiple cloud providers like AWS, Azure, GCP, etc.. This is a big one – “knowing a cloud provider” doesn’t mean just clicking around in a web dashboard, it means knowing how to build an entire set of cloud infrastructure from the ground up, including all required networking, databases, application instances, load balancers, monitoring, etc.. This alone can take years to master, and each cloud provider will have their own proprietary managed services like AWS RDS with their own quirks and shortcomings.
Infrastructure-as-code/configuration management tools like Terraform, Ansible, Salt, Puppet, etc. that are used to build out infrastructure in a sane and reproducible manner.
Databases. Engineers all need to be proficient DBAs (database administrators). Twenty years ago, being a DBA was its own dedicated job – now DevOps engineers are expected to know how to operate, optimize, and back up every aspect of limitless different database technologies like MySQL, Postgres, Elasticsearch, etc. in addition to everything else they do.
Logging and monitoring tools like New Relic, Elasticsearch, Prometheus, Grafana, and Datadog that allow visibility into an environment.
Application deployment and testing – your DevOps team is usually responsible for setting up how your company’s applications get deployed to your infrastructure (usually this complex process often gets abbreviated as “CI/CD pipelines”).
And much, much more (I stopped early here – I think you get the picture).
Implementing all of this is sometimes an insurmountable goal at a company – not only are these skills difficult to learn, but frequently time constraints, budget limitations, and opposing business priorities may mean infrastructure needs get sidelined or abandoned altogether. Many businesses are forced to pick and choose what gets implemented. Often it’s non-technical individuals making the final call on whether a project gets the “go ahead” without understanding the actual implications of what a decision may mean. Other times, a company’s technical staff might simply be out of their depth – an important project may be set up completely wrong, and fail catastrophically at a moment's notice. Over years of operation, companies can build up astounding amounts of technical debt that has very real financial and operational costs.
At the end of the day, how do you know if you got it all right? How do you know that everything is set up correctly? How do you know that there isn’t a better way?
You don’t.
There’s a famous saying that “you don’t know what you don’t know” - but there is a way to know: ask someone else. A second pair of eyes is the most valuable tool at your disposal.
Sometimes, you’re just uncertain about things and want to verify that what you’ve done is right. Perhaps you need to run “dd” on an important system (Linux’s “dd” command is sometimes jokingly referred to as “destroy disk”) or a Terraform plan gives an uncertain output. Or perhaps you’ve had your team build a new set of cloud infrastructure and want to be extra, extra sure that nothing was missed when setting things up. The best thing to do is to ask someone else. Get someone knowledgeable to take a look and verify that things look correct before proceeding.
Other times, you might have a system already running, but it has problems. Perhaps a slow database is constantly bringing down your application. Maybe you’ve been plagued by constant outages and it’s still unclear where the problem lies. You’ve tried fixing the issue, but it keeps coming back. Don’t be afraid to reach out and have another pair of eyes to investigate the issue.
There are frequently major cost savings to be had by changing a small setting or switching to newer technology. Sometimes you’ll even already be aware of a way to save money or add a major feature, but implementing the change looks daunting (Kubernetes, anyone?). It may turn out that this project was easier to implement than you initially expected, or a knowledgeable third-party gets you past the difficult initial setup. Either way, you won’t know unless you ask someone else.
Beyond simply “asking someone else”, there’s a common theme with all of these situations: confidence. As technology gets more and more complex, it’s more important than ever to be confident that your team is doing things the right way. Are there any easy cost-savings you might not be aware of? Is there a fix for that performance issue in production? Are there any big security holes your team may have missed? Is there a better way to do things? The only way to be confident that your systems are the best they can possibly be is to ask.
Say you’re convinced. You’ve got an issue and you want a second opinion on it.
The easiest way to get a second pair of eyes is just to ask a coworker. Better still, you can enforce that your employees always check in with a teammate before making a change. Git providers like GitHub, GitLab, and Bitbucket have the ability to set up “branch protection rules” that enforce your engineers to get their changes reviewed before merging changes into your master branch. We highly recommend requiring all merge requests to be reviewed and approved by at least one other team member before the change becomes live. There’s a similar tool for infrastructure changes: Terraform Cloud. Though tools like Terraform or CloudFormation let you formalize infrastructure changes as code, Terraform Cloud takes it a step further and shows proposed infrastructure changes online every time a change is made in Git, and attaches this report to pull requests for team members to review and approve (we extensively use Terraform Cloud at Stack.io!).
You can even automate this review process. Tools like CircleCI, Travis CI, and Github Actions let you run code checking tools automatically every time a change is made, and block bad changes from making it into production. You can configure these CI tools to check for code style, run tests and compute test coverage, and identify some types of bugs (the tools you run as part of CI will depend on what your tech stack looks like). Though automated testing tools have their limits, they’re great for enforcing coding standards on a project.
Both of these techniques (enforcing review by humans, and automated checks by CI pipelines) are great, but what happens if this isn’t within your teams’ capabilities or you need something more? You’ve really got one option: ask someone else. The best choice is to hire someone who knows what they’re doing. I won’t even go into the interview process, but this is a tough process and expensive. It’s tough to find senior engineers who actually know what they’re doing, and if you manage to find one, these candidates come with a price tag to match. What if this isn’t an option, or you just want help with a specific project and don’t need a full-time set of hands? There are quite a few companies and freelancers out there who specialize in helping with these types of projects (including Stack.io!). Bringing in an outside opinion is typically the most cost-effective way of getting these types of projects completed.
Looking for a second pair of eyes to look over a project? Feeling a little overwhelmed and don’t know where to start? Let us know and we’d love to help you out. We want you to be confident that your infrastructure is the best that it can be.