Thursday, February 18, 2016

[System Engineering] Availability Short Tutorial

For life-critical systems, we usually want to know the quality of service of the system. Among of the metrics, one commonly used is Availability. 

What is Availability?

Availability is a probabilistic metric to measure the percentage that a system is available in a period of time. On the contrary, unavailability is the percentage that a system is not available in a period of time.

How do we calculate Availability?

Since we want to calculate the availability, we need to understand that how much time does the system provide the service.

First let’s define two terms, MTBF and MTTR.

MTBF: Mean Time Between Failure
This is the metric to measure the period of time between two consecutive system failures.

MTTR: Mean Time To Repair
This is the metric to measure the period of time to repair the system from the failure to be back on the service.

The unit of MTBF and MTTR can be a minute, hour, day or any other unit for a period of time.

So, the Availability can be calculated:

                              Availability = (MTBF) / (MTBF+MTTR)

That is the fraction of the time that a system can provide service for a period of time including the time to provide the service and the time to repair the system.

Based on the definition of Availability,
                              Unavailability = 1 - Availability = 1- (MTTR) / (MTBF+MTTR)

How to increase Availability?

As you can see from the definition of Availability, we can either manipulate MTTR or MTBF to increase Availability.

Intuitively, if we decrease the time to repair, MTTR, say MTTR equals 0, then the Availability becomes 100%

                                   MTTF / (MTTF+0) = 100%

That's pretty awesome! However, we understand it is not possible in the real world.

On the other hand, if MTTR remains the same, we increase the MTBF, the impact of MTTR could be even smaller in comparison to MTBR.

To increase the Availability, you can either
     (1) Increase MTBF (make the system more reliable by testing or adding redundancy)
   
     (2) Decrease MTTR (shorten the time to repair)

Summary:


  • Availability is a probabilistic metric to measure the percentage that a system is available in a period time.
                              Availability = (MTBF) / (MTBF+MTTR)

  • MTBF and MTTR
  • Increase Availability by increasing MTBR and decreasing MTTR 

Labels