Over the past several years there has been a move to the Site Reliability Engineering (SRE) model. This was thanks to Google’s sharing of operational experiences through their book Site Reliability Engineering – How Google Runs Production Systems. Now, there is no shortage of SRE jobs wherever you look. The ultimate goal of this move by many organizations and teams was one driven by accountability and eliminating the gap between development and operations. Hence, the term DevOps, which is often used interchangeably.
In some ways this was a great move, but like every idea it has it’s own considerations and drawbacks that we’ve encountered. The most impactful one to me has been that the move to “DevOps” has primarily been one of developers moving into an operational role, and suffering a lack of operational knowledge. It’s great that they can fix bugs faster, but can they secure the environment too? How about change control? It’s not very promising to your customers if you lose your SOC 2 because your team fixes bugs quickly, but doesn’t see the value of change control.
Part of the fault here is that the landscape isn’t as simple as many think it to be. Now not only do you have to write code and know how to run it; you likely need to know cloud technologies underneath it and secure them too. Security was a mature discipline years (decades) ago, yet still today it’s a struggle for many enterprises. Fast-forward to a couple years ago and the upcoming challenge for teams was building and operating products and services that complied to privacy regulations such as GDPR and CCPA. Some have met this challenge admirably, others struggle, and some have no idea where to even begin.
In today’s world that was rocked by the successful cybersecurity incident against Solarwinds by the APT 29 Group, the risks continue to rise. Now, trusted vendors that you used to just trust, can’t necessarily be trusted without oversight. Vendors are now struggling to come up with ways to demonstrate their trustworthiness. In some areas the path forward is clear, having been proposed by the Executive Order on Improving the Nations Cybersecurity. The need to be able to share with your customers a Software Bill of Materials (SBOM) and the management of third-party vendors is now front-and-center at the negotiating table.
Stepping back though, if we’re honest the average individual’s perception of the impact of Solarwinds was really nothing. Nobody died, your mobile phone, bank and home’s utilities were not impacted. However, imagine for a minute that this attack wasn’t perpetuated by APT 29 (currently attributed to Russia’s Foreign Intelligence Service (SVR)) and instead had been accomplished by a less-disciplined entity that was focused on causing unrestrained havoc. I guarantee ransomware delivery via the Solarwinds incident instead of information gathering would have touched everyone’s lives, probably in surprising ways. I say this because the horizons are rapidly broadening. Now with cloud providers, 5G Telco, and utilities as likely targets the level of potential impact to the general population has never been higher.
Back to the DevOps/SRE conversation, the goal here is one of not getting lost in the weeds while at the same time not over simplifying the problem. From a management prospective, the answer is one of honest investment in being faithful stewards of the products and services that we provide to our users. From an architect’s prospective it’s one of ensuring that proper design and management is in place that is mindful of the threats (malicious or accidental) that threaten our users. From an individual contributor, the answer is in faithfully executing work that will stand up to the environments that they will exist in.
He who is faithful in a very little thing is faithful also in much; and he who is unrighteous in a very little thing is unrighteous also in much.
Jesus Christ – Luke 16:10