Adapt to the Cloud Operating Model

There’s a lot of truth to the statement that all companies are technology companies. After all, the core focus of a technology company is to deliver software, whether internally to empower the workforce or externally to serve customers. Technology companies also maintain servers to create, collect, store, and access data—which is now the norm for organizations worldwide, whether public or private, commercial or enterprise.

  • Published: 05-05-2022

  • Related Category: Training

  • Type of Content: Articles

  • Owner: Okta

Yet the software and servers of today are different than they used to be. Software delivery has undergone a transformation with the rise of cloud computing: what were once multi-year waterfall projects to develop and package software for major releases are now continuous, agile streams of regular updates. Similarly, what were once fixed data centers with static servers are now dynamic cloud environments of elastic resources across IaaS providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.

The underlying theme behind software delivery and cloud infrastructure has always been increased speed, as both competitive pressures and growth demand it. Businesses must deliver continuous innovation to their customers, which means developers must deliver continuous innovation back to the business-all in the form of software.

In this virtuous cycle, velocity at scale emerges as the primary driver, and DevOps programs are the agent of change as well as the method of achievement. Every company is trying to do more with less, which means streamlining as much digital innovation and process improvement as possible. One way that IT and security teams are embracing this change is by adopting a Cloud Operating Model.

What is the Cloud Operating Model?

This new DevOps-driven paradigm differs significantly from the traditional operating model. Prior to the advent of the cloud, companies would spend large amounts of capital on data center space. These environments were fixed and the servers in them were meant to live for years, with one-time configurations and infrequent updates. And changing these environments required a long, manual, and painful process.

The cloud changed all of this. Companies can now consume a wide array of compute, storage, and networking resources on-demand as pay-as-you-go operating expenses. The environments are dynamic, with elastic resources spinning up and down, adapting to usage patterns in real-time. The resources are only meant to last for weeks, days, hours, or even minutes. Not only that, but these environments are configured via automation tools and workflows so any changes occur almost instantly across the fleet.

The demands for velocity at scale can only be realized by fully embracing the cloud, but with such a significant shift in environmental characteristics and behaviors, the underlying architecture must also be able to adapt. This is the crux of the Cloud Operating Model, a strategic framework for cloud adoption—driven by a DevOps culture of automation—that impacts people, process, and technology.

Fostering a culture of automation

Changing DevOps processes has to be preceded by a change in culture. That’s because any agent of change needs buy-in from across the organization to be successful, especially when it spans cross-functional teams. There needs to be alignment on shared goals, shared responsibilities, and shared accountability. Organizations that can achieve this alignment and empower their teams pave the way for this shift in culture, leading to mature DevOps programs.

The people, process, and technology of a mature DevOps program are wrapped in automation, with security embedded right from the start. Self-organized teams deploy self-service tools to streamline the delivery and deployment of software, as well as the provisioning and configuration of infrastructure. A key reason cloud adoption requires a shift in the operating model is to remove any barriers from this automation in a secure manner. The traditional operating model is both a blocker to automation and a risk to security, which is bad for the business outcome of velocity.

Measuring DevOps success

Since speed can be quantified, the business leaders who are behind this change will expect results—and naturally, they will expect them quickly. DevOps programs are measured on key metrics related to software development and delivery, while the underlying operations and security functions are evaluated based on how well they support these outcomes.

The DevOps Research & Assessment (DORA) team, now a part of Google, found the following metrics most important for businesses to measure:

  • Deployment frequency: How often does your organization deploy code to production or release it to end users? (High performers: between once per day and once per week.)
  • Lead time for changes: How long does it take to go from code committed to code successfully running in production? (High performers: between one day and one week.)
  • Time to restore service: How long does it generally take to restore service when a service incident or defect that impacts users occurs? (High performers: less than one day.)
  • Change failure rate: What percentage of changes to production or releases to users result in degraded service and subsequently require remediation? (High performers: 0–15%.)

Source: DevOps Research Agency (DORA) State of DevOps Report 2019

The DevOps function doesn’t end once software gets out the door; it takes a concerted effort to keep systems and applications healthy. If throughput and stability are the leading indicators for Dev performance, then availability, resilience, and security are the most important factors on the Ops side. While these attributes have traditionally been considered counter to velocity given their protective nature, organizations with mature DevOps programs understand that they go hand-in-hand. Highly available systems better enable throughput, and resilient systems better enable stability.

Improving system reliability

Popularized by Google, the Site Reliability Engineering (SRE) role complements DevOps programs by taking ownership of uptime, widely understood to be the most critical metric of all. Any company that delivers software as a service places their reputation on the line with their Service Level Agreement (SLA); Okta is no exception, with 99.99% uptime SLA.

As an engineering function, SREs are constantly monitoring and improving systems and processes to minimize two key metrics:

  • Mean Time to Acknowledge: How long does it take to begin working on an issue once an alert has been triggered? (High performers: less than five minutes.)
  • Mean Time to Resolve: How long does it take to resolve an outage and restore service to customers? (High performers: less than one hour.)

The dynamic nature of the cloud, with the constant drive for velocity, can make this difficult to manage. Companies with mature DevOps programs tie SRE metrics back to the business, which makes sense because uptime is a competitive advantage. The core engineering work of an SRE team is to build observability throughout the entire tech stack, and throughout every automated process. The more proactive they can be with potential bottlenecks, barriers, and breakers, the better.

Where development and operations meet

A fully automated software development, delivery, and deployment program doesn’t just happen spontaneously. Most businesses have technical debt to modernize and processes to streamline. DevOps is the unity of software development and infrastructure operations, but each carries their own independent roles and responsibilities. Maturity is defined by how efficiently it all comes together.

Inefficiencies exist when these functions are disjointed, such as when developers write code in isolation and then “throw it over the wall” for the operations team to deploy. This siloed approach can lead to problems when flaws are discovered, or when the two sides have different standards of when a software is considered finished. The famous tagline, “It worked on my machine,” originates from this occurrence.

>> Download Article to continue reading.

Related Articles: