3 Oct 2019

DevOps Needs Collaboration and a Safe Place for Success

The first thing you need to achieve DevOps success is a collaborative team environment. One of the central ideas of DevOps is you don’t throw software over the wall between functions, such as development and testing, so there needs to be an understanding and acceptance that everyone in the team is on the same side. Second is the provision of a safe place to fail and the realization that failure is just an iterative step on the path to implementing the best ideas. A financial services company that Van Kalken works with was adopting GitOps but needed to convince the service management team this was the right move. “Initially, it was quite challenging because service management thought it [GitOps] was uncontrolled chaos, even though that isn’t the case,” he said. Its support was gained after running a series of workshops that showed the underlying processes were actually the same, even though the tools used to implement them were changing. Service management is still the final gatekeeper, but it now approves releases in the repository rather than on paper.

Things such as GitOps provide a quick and easy way of reverting to a previous release if something goes wrong. So the “safe place to fail” idea can be implemented in a way that protects the organization in a practical sense as well as staff members in a psychological (and career protecting) sense. Another way DevOps can protect the organization is that frequent releases generally involve relatively minor changes, which in turn have a smaller blast radius in the event something goes wrong. Again, this is about having the maturity to recognize risk and dealing with it, rather than trying to avoid it in the first place. A sometimes overlooked aspect of increasing the release cadence is the effect on users, who are being asked to cope with frequent changes. Van Kalken pointed out that increasing the frequency does not necessarily mean multiple releases a day–it could be from two releases a year to three or four, if that’s what suits the organization. Not all changes have a direct impact on users. Some are under the hood, improving performance or addressing rarely-encountered bugs. But when users’ interaction with the system is changed, he suggested canary deployments as a way of checking the acceptability of the new approach among a larger pool of users than those brought into the development process, before it is released to the entire user population.

It was the best coach I’ve ever had!

Mark Johnson, Google

Perhaps the biggest challenge to an organization’s culture is the adoption of chaos engineering because if you’re going to kill a container or flood a network with data in order to check that the wider system can cope, you absolutely need to be in a safe place to fail. You also need to realize everyone involved–including developers, security people and those involved in the business side–wants to achieve high availability, and that means designing systems that can handle (partial) failures. It isn’t enough to do this in pre-deployment testing. Ongoing testing provides confidence that production systems will continue to cope with such failures. “It’s pretty cool when you get it right,” said Van Kalken. However, “your culture has to be OK with this iterative approach.” “DevOps is really just a methodology to achieve a better outcome,” stated Van Kalken. That involves collaboration, iteration and embedding security and other considerations up front. But the wider organization needs to understand and tolerate this, and the risk that goes with it.