The evolution of DevOps Engineering has led to the practice of tracking useful metrics that bring in cost efficiency for the IT department of an enterprise. There are a few properties of the DevOps metrics namely – Measurable, Relevant, Incorruptible, Actionable and Traceable. Complying to the properties of the metrics will help in high uptime for the IT as well. This is applicable for any set of metrics that we are tracking in an IT environment and not necessarily DevOps Metrics. Let us list some of the important DevOps Metrics.
- Deployment Frequency: It is important to understand how frequently a deployment is made as it could reflect the possible bug fixes or change velocity or quick feature changes based on the business requirements. High Frequency is a double-edged sword as the frequency indicates agility as well as the possibility of code stability, testing ability and so on.
- Deployment Volume: The deployment volume indicates the new features or bug fixes as it is difficult to go through the Bugzilla-like software to check what is going on in the application. Frequency and Volume is an input to more meticulous monitoring required for the applications and the infrastructure.
- Lead Time to Deployment: This helps in planning the deployment from the time work starts for deployment to the time it is deployed in production. If one does not follow the blue/green deployment, there could be a possible impact on planning a deployment. A high lead time leads to more downtime for the applications and therefore the business.
- Deployment Failures: Deployment failures indicate the inadequate testing; code stability and frequent deployment failures will require reviews to ensure that the stability and reliability of the applications are restored.
- Ticket Volume: The no of tickets against an application indicates the stability from the code perspective as well as the usability perspective. Depending on the type of the tickets, if the ticket volumes are high, the issue with respect to the application needs to be addressed. Ticket volume is a good indicator of stability and tells a tale of staleness in the application.
- Volume of Production bugs: While a lot of bugs are caught during QA some of the bugs occur in production. The production bugs are the one that impacts the customer and therefore results in loss of business. Productions bugs have the attributes of urgency and importance to be tracked.
- Mean Time between Failures: The mean time between failures of the application is another indicator of the stability of the application. Ability to trace and track the reasons for failure is an important metric to ensure that the mean time between failures is usually high.
- Mean Time to Recover: The mean time to recover from failure will help identify the resiliency of the application. Failures can happen in production or during deployment but the ability to recover or repair after failure ensures business continuity of the applications.
- Mean Time to Detect:Issue resolution in IT can be effectively divided into two parts. Time to detect an issue and time to repair an issue. The time to detect an issue takes us to a place where the issue has happened and the time to repair starts from that place. Usually, the time to detect takes more time than the time to repair. Tracking the time to detect helps in reducing the time to recover an application and therefore minimizing the impact to the business.
- Ratio of actionable alerts to Total alerts: Most of the IT environments have monitoring systems which give alerts when the IT system resources deviate from normal behaviour. The goal of the monitoring system should be to give only actionable alerts and suppress noise alerts. It is important to track the ratio as the industry norm requires the ratio of actionable to total no of alerts be 2%. Effectively, if there are 100 alerts only 2 are actionable.
With the DevOps culture picking up in most organizations it has become important to track the DevOps metrics to make the IT Operations of the enterprise more efficient. DevOps dashboards should give way to better productivity.