What Are “DevOps” Skills?

Someone recently asked me about DevOps ‘courses’ which got me thinking about what the required skillsets are for getting into roles that include the word “DevOps” in the title or description.

I’ve been working in various “DevOpsy” roles professionally for about 7 years now and I still don’t know exactly what it means. Each company seems to have a different definition, and the philosophy that it all started from is now only a distant memory :-D.

That said, every “DevOps” role I’ve had has drawn on some combination of the following skillsets:

  • Linux Skills
  • Networking and Web/HTTP skills. And DNS. Always DNS. And caching, oh god the caching problems.
  • Some cloud provider’s stack — AWS, GCP, Azure, whatever — knowing how different architectures are implemented, using the tools they expose to infrastucture designers and operators.
  • Familiarity with a CI/CD process (specific tools are usually not important in interviews, as long as you’re comfortable with ONE of them).
  • Generalized troubleshooting and problem solving skills. Almost every problem you face as a DevOps person will be 15% known, 85% unknown. The ability to quickly learn about the problem domain and start troubleshooting is invaluable.
  • Be comfortable with the software development process — how software gets written and deployed. Know the basics of software tooling — git, the basics of the language your devs are writing in, debugging tools for that language/environment, etc.
  • Be *really* comfortable with reading through (and puzzling over) large codebases.

It *really* helps to have some programming (developing software with a team of other devs) experience, although it’s not a hard requirement.

I’m trying to stay away from specific tools recommendations in this post, but several important ones come to mind.

My question to you: what skills do you find yourself using at your DevOpsified job?

The Hardest (and most fun) Problems to Troubleshoot

I recently wrote a FAQ-style post about System Administration and technology careers in general. One of the best questions I was asked was about what kinds of really interesting troubleshooting problems I’ve had to deal with. Here’s that question, along with my answer:

What’s one of the most interesting things you’ve had to troubleshoot / do while maintaining a system?

I’m leaving out specific examples because they’re a mixture of non-public information and hyperspecific (uninteresting) technical stuff, but I can give some outlines for what generally makes for interesting problems to solve.

The really interesting problems I’ve seen tend to be related to performance, networking, and distributed systems. Usually they require a combination of different knowledge to solve:

  • Systems/OS: What is the operating system doing when everything slows down? What’s causing it to do that?
  • Networking/Distributed Systems: What’s actually happening when these machines communicate? How are they supposed to share and manage state, deal with network partitions, and ensure high availability? What are they *actually* doing when this problem happens?
  • Software Development: Which part of the code is causing this network/OS issue, and which code path leads there? Can I actually look at and modify this code? Is this code written by our developers, or an open-source project? What can I do to confirm the issue and test a fix? Can I contribute a fix back to the upstream project?