For any business-critical operations, be it the concern of a single point of failure or load sharing, the concept of high availability has been around for quite some time. This is also referred to as HA, redundancy, etc. even though the idea remains the same.
The uptime of networking or network components is getting more and more crucial as the technology is growing out vertically as well as horizontally. For the workforce to stay productive almost all businesses are being dependent on the network and computing device to be running and available to perform designated tasks. Talking about real-world scenarios, the sweetest network can have trouble in paradise anytime. There could be ‘n’ number of reasons for any piece of hardware to malfunction or stop functioning. Now, this could be due to issues with the device logically or maybe something physical in nature. Some of the examples could be due to a power outage, NIC card failure, device failure, memory or CPU high utilization, routing engine failure PID crash, etc. Now one approach could be to keep a device ready and get immediately replaced by, as soon as the issue occurs, which we suppose would not be a preferred situation to be in. This would also require the availability of an engineer as well in times of crisis. Who would want to receive calls at 2 AM, to rush to office as some users are not able to work due to gateway device down? Another reason for having HA is to add the device in redundancy for load sharing which is otherwise also known as active-active deployment.
The above diagram represents a single point of failure. If the node in the center fails due any reason the nodes currently connected to it will be isolated. This is called the star topology. We have a detailed section on different types and benefits/disadvantages of each Network Topologies.
Now in today’s world, two factors matter from an IT operational standpoint, THE downtime and THE availability. Any network or system administrator would prefer to keep the downtime to 0% and availability to 100%. However to set the expectations correct, lets accept the fact that doing so regularly is next to impossible. There are various factors which affect the availability of an computing system, which we will discuss further.
In the field of IT, you would come across this term a lot. It is one of the fundamental blocks of a successful IT operation. We are referring to the A in CIA, i.e. availability. Any system no matter how efficient, how fast, how modular, how interactive or God knows what – anything, is of no use unless it is available.
What is Availability
A fundamental topic in terms of network or system functioning is Availability. Availability can be defined as the duration of time a service or equipment was up – is uptime, or otherwise referred to as Available. It is very critical for the equipment and/or service to be available from a business perspective. Availability is one of the key metrics on which the quality in the computing world is measured. The priority of a network or system admin should keep the uptime as high as possible for the equipment and/or services. The availability can get differed again due to n number of reasons. As a good administrator, you should constantly keep your focus on achieving good availability.
Most of the businesses want to run and be operational 365 days 24 hours. It is technically not possible to achieve constantly a 100% availability. There could be known or unforeseen issues occurring causing the services to be impacted. However, anything above 90% is considered world-class. In most of the robust and well-maintained networks, you would see above 98% uptime. A 100% availability for all equipment and services in an organization is a myth like having a happy woman in your life. Eww, where did that come from, I hope my wife doesn’t find this out. 😉
How is Availability Calculated
Let’s, take that a network device – router is being monitored for 24 hours. Apparently due to NIC failure the internet at the site is down. Now, you being-you, an instaelearner troubleshoot the issue and found out the issue with particular NIC. You had swapped and connected the cable to a newly configured spare port and voila, the internet is back up in the office. Now the internet was down for a total period of 30 mins. The uptime of the network can simply be calculated at the end of the day like below.
Total number of seconds Internet monitored: 86,400 sec (24 Hours)
Total number of seconds your website was down: 1800 sec (30 Mins)
Now the total time the internet was up would be 86,400 – 1800 = 84,600
In percentage, it can be calculated simply by diving the downtime by total monitored time, i.e.
1800 / 86400, which would be 0.0208, multiply by 100 and then minus 100
1800 / 86400 * 100 and then 100 – total
In other words, we can say that internet uptime or availability was 97.92%.