Ten DevOps Metrics IT departments should Track

The evolution of DevOps Engineering has led to the practice of tracking useful metrics that bring in cost efficiency for the IT department of an enterprise. There are a few properties of the DevOps metrics namely – Measurable, Relevant, Incorruptible, Actionable and Traceable. Complying to the properties of the metrics will help in high uptime for the IT as well. This is applicable for any set of metrics that we are tracking in an IT environment and not necessarily DevOps Metrics. Let us list some of the important DevOps Metrics.

  1. Deployment Frequency: It is important to understand how frequently a deployment is made as it could reflect the possible bug fixes or change velocity or quick feature changes based on the business requirements. High Frequency is a double-edged sword as the frequency indicates agility as well as the possibility of code stability, testing ability and so on.
  2. Deployment Volume: The deployment volume indicates the new features or bug fixes as it is difficult to go through the Bugzilla-like software to check what is going on in the application. Frequency and Volume is an input to more meticulous monitoring required for the applications and the infrastructure.
  3. Lead Time to Deployment: This helps in planning the deployment from the time work starts for deployment to the time it is deployed in production. If one does not follow the blue/green deployment, there could be a possible impact on planning a deployment. A high lead time leads to more downtime for the applications and therefore the business.
  4. Deployment Failures: Deployment failures indicate the inadequate testing; code stability and frequent deployment failures will require reviews to ensure that the stability and reliability of the applications are restored.
  5. Ticket Volume: The no of tickets against an application indicates the stability from the code perspective as well as the usability perspective. Depending on the type of the tickets, if the ticket volumes are high, the issue with respect to the application needs to be addressed. Ticket volume is a good indicator of stability and tells a tale of staleness in the application.
  6. Volume of Production bugs: While a lot of bugs are caught during QA some of the bugs occur in production. The production bugs are the one that impacts the customer and therefore results in loss of business. Productions bugs have the attributes of urgency and importance to be tracked.
  7. Mean Time between Failures: The mean time between failures of the application is another indicator of the stability of the application. Ability to trace and track the reasons for failure is an important metric to ensure that the mean time between failures is usually high.
  8. Mean Time to Recover: The mean time to recover from failure will help identify the resiliency of the application. Failures can happen in production or during deployment but the ability to recover or repair after failure ensures business continuity of the applications.
  9. Mean Time to Detect:Issue resolution in IT can be effectively divided into two parts. Time to detect an issue and time to repair an issue. The time to detect an issue takes us to a place where the issue has happened and the time to repair starts from that place. Usually, the time to detect takes more time than the time to repair. Tracking the time to detect helps in reducing the time to recover an application and therefore minimizing the impact to the business.
  10. Ratio of actionable alerts to Total alerts: Most of the IT environments have monitoring systems which give alerts when the IT system resources deviate from normal behaviour. The goal of the monitoring system should be to give only actionable alerts and suppress noise alerts. It is important to track the ratio as the industry norm requires the ratio of actionable to total no of alerts be 2%. Effectively, if there are 100 alerts only 2 are actionable.

With the DevOps culture picking up in most organizations it has become important to track the DevOps metrics to make the IT Operations of the enterprise more efficient. DevOps dashboards should give way to better productivity.

Data References:


Serverless – The New Option Of Reducing The IT Infrastructure Cost

The word serverless does not mean applications can run without a server. Every application requires CPU, Memory to run the program which is a process in execution. However, Serverless enables applications to share the resource’s availability in an optimal manner. Serverless imply applications that are written in a stateless container, ephemeral and managed by a third party. The Serverless was first started by the AWS in 2014 by the launch of AWS Lambda. There are three aspects to the serverless namely application/services. Infrastructure and architecture. Let us look at all the aspects of the serverless.

Why Serverless?

There are three fundamental reasons to go serverless as listed below.

1. Lower Operational Cost: This means fewer servers, fewer people to manage servers and there is a division of labour.

2. Faster time to Value: Usually applications or services require servers to be provisioned. With serverless, there are zero applications to be provisioned.

3. Focus on core value: Serverless means outsourcing our architecture and focusing on the core value.

Perspectives of Serverless:

1. Application/services perspective: Serverless is lightweight event-based microservices like Google functions. Google cloud functions are light weight event-based response functions that allow a small single-purpose function that allows a lightweight response without needing a server to be managed at any given point in time. Effectively any lightweight function that is not dependent on a server can be run on a serverless architecture.

2. Infrastructure for Serverless: The infrastructure for serverless is totally managed by the vendor. Like AWS lambda enables the serverless infrastructure. Scaling is done automatically, and it is triggered by events.

3. Architecture: The architecture is usually stateless function; event-driven and uses API gateway to as an input to get triggered. An example of a stateless function in a website is the addition of an item to a cart.


Serverless Offerings

Serverless offerings are being done both by the public cloud service providers and private cloud service providers. AWS offers Lambda service for serverless mode. AWS lambda is very popular, and the shift has happened to AWS cloud lambda based on the fit for purpose. Not every service can run on serverless but whatever is doing only focused on a single purpose and uses independently the Compute power then serverless becomes an option to be used. Like AWS, Microsoft Azure offers serverless compute as well. Google cloud provides cloud serverless to deploy and develop APIs in the form of Microservices. Serverless provides a new way of running an application as a FaaS. (Function as a Service).


Disadvantages of Serverless

1. Cold Starts: Sometimes cold starts take quite a lot of time say anywhere from 200ms-600ms.

2. Parallel Requests: Parallel requests are not allowed inside the code. Parallelism is an issue.

3. Coding Language: Need the language to support application development. Node.js supports the Serverless architecture and not python. It is best suited for background jobs, API calls, batch jobs etc.

4. Hidden costs: The right job must use the serverless as some of the cloud service providers charge based on the no of requests/usage of API gateway though the cost of the CPU. So, RAM will be less as the cost is being shared.

5. Code Maintenance: This is higher on serverless architecture.
The transformation to Serverless is worth doing considering the fact it leads to the huge cost savings available due to shared CPU and RAM cost. At the same time, the right application must be chosen to run the serverless.

Data References: