Category: 12. Software Reliability

https://cdn-icons-png.freepik.com/512/9592/9592943.png

Software Fault Tolerance

Software fault tolerance is the ability for software to detect and recover from a fault that is happening or has already happened in either the software or hardware in the system in which the software is running to provide service by the specification.

Software fault tolerance is a necessary component to construct the next generation of highly available and reliable computing systems from embedded systems to data warehouse systems.

To adequately understand software fault tolerance it is important to understand the nature of the problem that software fault tolerance is supposed to solve.

Software faults are all design faults. Software manufacturing, the reproduction of software, is considered to be perfect. The source of the problem being solely designed faults is very different than almost any other system in which fault tolerance is the desired property.

Software Fault Tolerance Techniques

1. Recovery Block

The recovery block method is a simple technique developed by Randel. The recovery block operates with an adjudicator, which confirms the results of various implementations of the same algorithm. In a system with recovery blocks, the system view is broken down into fault recoverable blocks.

The entire system is constructed of these fault-tolerant blocks. Each block contains at least a primary, secondary, and exceptional case code along with an adjudicator. The adjudicator is the component, which determines the correctness of the various blocks to try.

The adjudicator should be kept somewhat simple to maintain execution speed and aide in correctness. Upon first entering a unit, the adjudicator first executes the primary alternate. (There may be N alternates in a unit which the adjudicator may try.) If the adjudicator determines that the fundamental block failed, it then tries to roll back the state of the system and tries the secondary alternate.

If the adjudicator does not accept the results of any of the alternates, it then invokes the exception handler, which then indicates the fact that the software could not perform the requested operation.

The recovery block technique increases the pressure on the specification to be specific enough to create various multiple alternatives that are functionally the same. This problem is further discussed in the context of the N-version software method.

2. N-Version Software

The N-version software methods attempt to parallel the traditional hardware fault tolerance concept of N-way redundant hardware. In an N-version software system, every module is done with up to N different methods. Each variant accomplishes the same function, but hopefully in a various way. Each version then submits its answer to voter or decider, which decides the correct answer, and returns that as the result of the module.

This system can hopefully overcome the design faults present in most software by relying upon the design diversity concept. An essential distinction in N-version software is the fact that the system could include multiple types of hardware using numerous versions of the software.

N-version software can only be successful and successfully tolerate faults if the required design diversity is met. The dependence on appropriate specifications in N-version software, (and recovery blocks,) cannot be stressed enough.

3. N-Version Software and Recovery Blocks

The differences between the recovery block technique and the N-version technique are not too numerous, but they are essential. In traditional recovery blocks, each alternative would be executed serially until an acceptable solution is found as determined by the adjudicator. The recovery block method has been extended to contain concurrent execution of the various alternatives.

The N-version techniques have always been designed to be implemented using N-way hardware concurrently. In a serial retry system, the cost in time of trying multiple methods may be too expensive, especially for a real-time system. Conversely, concurrent systems need the expense of N-way hardware and a communications network to connect them.

The recovery block technique requires that each module build a specific adjudicator; in the N-version method, a single decider may be used. The recovery block technique, assuming that the programmer can create a sufficiently simple adjudicator, will create a system, which is challenging to enter into an incorrect state.

January 11, 2026
Reliability Metrics

Reliability metrics are used to quantitatively expressed the reliability of the software product. The option of which metric is to be used depends upon the type of system to which it applies & the requirements of the application domain.

Some reliability metrics which can be used to quantify the reliability of the software product are as follows:

1. Mean Time to Failure (MTTF)

MTTF is described as the time interval between the two successive failures. An MTTF of 200 mean that one failure can be expected each 200-time units. The time units are entirely dependent on the system & it can even be stated in the number of transactions. MTTF is consistent for systems with large transactions.

For example, It is suitable for computer-aided design systems where a designer will work on a design for several hours as well as for Word-processor systems.

To measure MTTF, we can evidence the failure data for n failures. Let the failures appear at the time instants t₁,t₂…..t_n.

MTTF can be calculated as

2. Mean Time to Repair (MTTR)

Once failure occurs, some-time is required to fix the error. MTTR measures the average time it takes to track the errors causing the failure and to fix them.

3. Mean Time Between Failure (MTBR)

We can merge MTTF & MTTR metrics to get the MTBF metric.

MTBF = MTTF + MTTR

Thus, an MTBF of 300 denoted that once the failure appears, the next failure is expected to appear only after 300 hours. In this method, the time measurements are real-time & not the execution time as in MTTF.

4. Rate of occurrence of failure (ROCOF)

It is the number of failures appearing in a unit time interval. The number of unexpected events over a specific time of operation. ROCOF is the frequency of occurrence with which unexpected role is likely to appear. A ROCOF of 0.02 mean that two failures are likely to occur in each 100 operational time unit steps. It is also called the failure intensity metric.

5. Probability of Failure on Demand (POFOD)

POFOD is described as the probability that the system will fail when a service is requested. It is the number of system deficiency given several systems inputs.

POFOD is the possibility that the system will fail when a service request is made.

A POFOD of 0.1 means that one out of ten service requests may fail.POFOD is an essential measure for safety-critical systems. POFOD is relevant for protection systems where services are demanded occasionally.

6. Availability (AVAIL)

Availability is the probability that the system is applicable for use at a given time. It takes into account the repair time & the restart time for the system. An availability of 0.995 means that in every 1000 time units, the system is feasible to be available for 995 of these. The percentage of time that a system is applicable for use, taking into account planned and unplanned downtime. If a system is down an average of four hours out of 100 hours of operation, its AVAIL is 96%.

Software Metrics for Reliability

The Metrics are used to improve the reliability of the system by identifying the areas of requirements.

Different Types of Software Metrics are:-

Requirements Reliability Metrics

Requirements denote what features the software must include. It specifies the functionality that must be contained in the software. The requirements must be written such that is no misconception between the developer & the client. The requirements must include valid structure to avoid the loss of valuable data.

The requirements should be thorough and in a detailed manner so that it is simple for the design stage. The requirements should not include inadequate data. Requirement Reliability metrics calculates the above-said quality factors of the required document.

Design and Code Reliability Metrics

The quality methods that exists in design and coding plan are complexity, size, and modularity. Complex modules are tough to understand & there is a high probability of occurring bugs. The reliability will reduce if modules have a combination of high complexity and large size or high complexity and small size. These metrics are also available to object-oriented code, but in this, additional metrics are required to evaluate the quality.

Testing Reliability Metrics

These metrics use two methods to calculate reliability.

First, it provides that the system is equipped with the tasks that are specified in the requirements. Because of this, the bugs due to the lack of functionality reduces.

The second method is calculating the code, finding the bugs & fixing them. To ensure that the system includes the functionality specified, test plans are written that include multiple test cases. Each test method is based on one system state and tests some tasks that are based on an associated set of requirements. The goals of an effective verification program is to ensure that each elements is tested, the implication being that if the system passes the test, the requirement?s functionality is contained in the delivered system.

January 11, 2026
Software Reliability Measurement Techniques
Reliability metrics are used to quantitatively expressed the reliability of the software product. The option of which parameter is to be used depends upon the type of system to which it applies & the requirements of the application domain.

Measuring software reliability is a severe problem because we don’t have a good understanding of the nature of software. It is difficult to find a suitable method to measure software reliability and most of the aspects connected to software reliability. Even the software estimates have no uniform definition. If we cannot measure the reliability directly, something can be measured that reflects the features related to reliability.

The current methods of software reliability measurement can be divided into four categories:

1. Product Metrics

Product metrics are those which are used to build the artifacts, i.e., requirement specification documents, system design documents, etc. These metrics help in the assessment if the product is right sufficient through records on attributes like usability, reliability, maintainability & portability. In these measurements are taken from the actual body of the source code.
1. Software size is thought to be reflective of complexity, development effort, and reliability. Lines of Code (LOC), or LOC in thousands (KLOC), is an initial intuitive approach to measuring software size. The basis of LOC is that program length can be used as a predictor of program characteristics such as effort &ease of maintenance. It is a measure of the functional complexity of the program and is independent of the programming language.
2. Function point metric is a technique to measure the functionality of proposed software development based on the count of inputs, outputs, master files, inquires, and interfaces.
3. Test coverage metric size fault and reliability by performing tests on software products, assuming that software reliability is a function of the portion of software that is successfully verified or tested.
4. Complexity is directly linked to software reliability, so representing complexity is essential. Complexity-oriented metrics is a way of determining the complexity of a program’s control structure by simplifying the code into a graphical representation. The representative metric is McCabe’s Complexity Metric.
5. Quality metrics measure the quality at various steps of software product development. An vital quality metric is Defect Removal Efficiency (DRE). DRE provides a measure of quality because of different quality assurance and control activities applied throughout the development process.
2. Project Management Metrics

Project metrics define project characteristics and execution. If there is proper management of the project by the programmer, then this helps us to achieve better products. A relationship exists between the development process and the ability to complete projects on time and within the desired quality objectives. Cost increase when developers use inadequate methods. Higher reliability can be achieved by using a better development process, risk management process, configuration management process.

These metrics are:
- Number of software developers
- Staffing pattern over the life-cycle of the software
- Cost and schedule
- Productivity
3. Process Metrics

Process metrics quantify useful attributes of the software development process & its environment. They tell if the process is functioning optimally as they report on characteristics like cycle time & rework time. The goal of process metric is to do the right job on the first time through the process. The quality of the product is a direct function of the process. So process metrics can be used to estimate, monitor, and improve the reliability and quality of software. Process metrics describe the effectiveness and quality of the processes that produce the software product.

Examples are:
- The effort required in the process
- Time to produce the product
- Effectiveness of defect removal during development
- Number of defects found during testing
- Maturity of the process
4. Fault and Failure Metrics

A fault is a defect in a program which appears when the programmer makes an error and causes failure when executed under particular conditions. These metrics are used to determine the failure-free execution software.

To achieve this objective, a number of faults found during testing and the failures or other problems which are reported by the user after delivery are collected, summarized, and analyzed. Failure metrics are based upon customer information regarding faults found after release of the software. The failure data collected is therefore used to calculate failure density, Mean Time between Failures (MTBF), or other parameters to measure or predict software reliability.
January 11, 2026

Software Failure Mechanisms

The software failure can be classified as:

Transient failure: These failures only occur with specific inputs.

Permanent failure: This failure appears on all inputs.

Recoverable failure: System can recover without operator help.

Unrecoverable failure: System can recover with operator help only.

Non-corruption failure: Failure does not corrupt system state or data.

Corrupting failure: It damages the system state or data.

Software failures may be due to bugs, ambiguities, oversights or misinterpretation of the specification that the software is supposed to satisfy, carelessness or incompetence in writing code, inadequate testing, incorrect or unexpected usage of the software or other unforeseen problems.

Hardware vs. Software Reliability

Hardware Reliability	Software Reliability
Hardware faults are mostly physical faults.	Software faults are design faults, which are tough to visualize, classify, detect, and correct.
Hardware components generally fail due to wear and tear.	Software component fails due to bugs.
In hardware, design faults may also exist, but physical faults generally dominate.	In software, we can simply find a strict corresponding counterpart for “manufacturing” as the hardware manufacturing process, if the simple action of uploading software modules into place does not count. Therefore, the quality of the software will not change once it is uploaded into the storage and start running
Hardware exhibits the failure features shown in the following figure: It is called the bathtub curve. Period A, B, and C stand for burn-in phase, useful life phase, and end-of-life phase respectively.	Software reliability does not show the same features similar as hardware. A possible curve is shown in the following figure: If we projected software reliability on the same axes.

There are two significant differences between hardware and software curves are:

One difference is that in the last stage, the software does not have an increasing failure rate as hardware does. In this phase, the software is approaching obsolescence; there are no motivations for any upgrades or changes to the software. Therefore, the failure rate will not change.

The second difference is that in the useful-life phase, the software will experience a radical increase in failure rate each time an upgrade is made. The failure rate levels off gradually, partly because of the defects create and fixed after the updates.

The upgrades in above figure signify feature upgrades, not upgrades for reliability. For feature upgrades, the complexity of software is possible to be increased, since the functionality of the software is enhanced. Even error fixes may be a reason for more software failures if the bug fix induces other defects into the software. For reliability upgrades, it is likely to incur a drop in software failure rate, if the objective of the upgrade is enhancing software reliability, such as a redesign or reimplementation of some modules using better engineering approaches, such as clean-room method.

A partial list of the distinct features of software compared to hardware is listed below:

Failure cause: Software defects are primarily designed defects.

Wear-out: Software does not have an energy-related wear-out phase. Bugs can arise without warning.

Repairable system: Periodic restarts can help fix software queries.

Time dependency and life cycle: Software reliability is not a purpose of operational time.

Environmental factors: Do not affect Software reliability, except it may affect program inputs.

Reliability prediction: Software reliability cannot be predicted from any physical basis since it depends entirely on human factors in design.

Redundancy: It cannot improve Software reliability if identical software elements are used.

Interfaces: Software interfaces are merely conceptual other than visual.

Failure rate motivators: It is generally not predictable from analyses of separate statements.

Built with standard components: Well-understood and extensively tested standard element will help improve maintainability and reliability. But in the software industry, we have not observed this trend. Code reuse has been around for some time but to a minimal extent. There are no standard elements for software, except for some standardized logic structures.

January 11, 2026

Software Reliability in Software Engineering
Introduction

Software Reliability means Operational reliability. It is described as the ability of a system or component to perform its required functions under static conditions for a specific period.

Software reliability is also defined as the probability that a software system fulfills its assigned task in a given environment for a predefined number of input cases, assuming that the hardware and the input are free of error.

Software Reliability is an essential connect of software quality, composed with functionality, usability, performance, serviceability, capability, installability, maintainability, and documentation. Software Reliability is hard to achieve because the complexity of software turn to be high. While any system with a high degree of complexity, containing software, will be hard to reach a certain level of reliability, system developers tend to push complexity into the software layer, with the speedy growth of system size and ease of doing so by upgrading the software.

For example, large next-generation aircraft will have over 1 million source lines of software on-board; next-generation air traffic control systems will contain between one and two million lines; the upcoming International Space Station will have over two million lines on-board and over 10 million lines of ground support software; several significant life-critical defense systems will have over 5 million source lines of software. While the complexity of software is inversely associated with software reliability, it is directly related to other vital factors in software quality, especially functionality, capability, etc.

Techniques of Software Reliability

Two distinct Models including are used to calculate software reliability:
1. Prediction Modeling
2. Estimation Modeling
Prediction Modeling

As the name suggests, the Prediction Model is constructed using presumptions about the specifications needed to create the specified software program. Among these assumptions are the information and materials from historical occurrences or the software’s operational features. Because it is thought to be extremely unreliable to predict during or after development, it is carried out during the design phase or before the development process begins. Forecasts are not based on the current situation but rather on the idea that the application will be used at some point in the future.

Estimation Modeling

The estimation model is constructed using the current data results from the development or testing processes and is based on several software features. It is completed later in the software development life cycle, when all of the required software components have been installed. The software’s reliability is estimated using the current or immediately subsequent time periods. Several software development analysts have also developed different models such as the Basic Execution Time Model Shooman Model, the Bug Seeding Model Logarithmic Poisson Time Model, the Littlewood – Verrall Model, the Goel – Okumoto Model, the Musa – Okumoto Model, and the Jelinski – Moranda Model.

Metrics for software Reliability

The software system applications’ reliability is measured and derived using software reliability metrics, which can be expressed numerically or in any other way. The system behaviour, the software’s business objective, the anticipated recovery time, the likelihood of failure, the kinds of users who use the program, etc., can all influence the kind of metric that the application developers decide on. The following are the kinds of assessments that professionals in software application development frequently use in real time to gauge software reliability.

Based on Requirements

The client’s actual needs can be discovered in the software development specification documentation. It usually outlines the requirements and expectations for developing the software, including its functional features, non-functional look, and dependencies on other related systems. It is employed to identify the functionality of the software.

It is used to address non-functional aspects such as the software’s look compatibility performance validation integrating capabilities load passed through the program in real-time, etc. The process outcome should demonstrate that there are no differences between the client’s needs and the software development team’s comprehension.

Based on Design and Code

The action plan assesses software reliability during the design and coding phases. The software component usability features and software size are the domains where the estimation is used. Maintaining the system in smaller units is crucial in order to significantly lower the likelihood of accidents. The reliability scale will operate as needed for the analysis once the fault occurrences are contained. Multiple components with easily comprehensible software units are preferable to a single large complex system.

Testing Reliability Metrics

During the testing process, the dependability metrics are divided into two pieces. One is validation to ensure that the functional behaviour of the built application matches the requirements specified in the documentation. The other portion evaluates the program’s functions and performance. The first is referred to as a Black Box Testing Method, and the latter is known as White Box Testing, which is usually performed by the developer.

Under the guise of the client’s requirement specifications, the testing procedure is conducted against the previously placed documentation. This means that any discrepancy at this point will be reported fixed as part of the bug fix and monitored using a defect life cycle. In order to ensure that every aspect of the developed system is validated, it is used to accomplish an efficient method of validating the entire system.

The following are the approaches employed, based on the required type of metric analysis, during the above-mentioned software development phases:
- Mean Time to Failure – (Total time) / (Number of units tested)
- Mean Time to Repair – (Total time for maintenance) / (total repairs)
- Mean Time Between Failure – MTTF + MTTR
- Rate of Occurrence of Failure – 1 / (MTTF)
- Probability of Failure – (Number of Failures) / (Total cases considered)
- Availability – MTTF / MTBF
Example to Implement Software Reliability:

Let us consider the Mean Time to Failure computation, which requires both the total time and the number of units tested.

For example, if the values are as follows, the MTTF is calculated as:

MTTF = (total time) / (number of units tested)

= 100 / 40

= 2.5

Factors Affecting Software Reliability

A user’s assessment of a software program’s dependability is based on two types of data.
- The quantity of errors in the software
- The way users interact with the system. This is referred to as the operational profile.
The following factors affect the number of faults in a system
- The code’s size and complexity.
- Features of the employed development process.
- Training education and experience of development staff.
- Operating environment.
Software Reliability Applications

There are several uses for software reliability:
1. Technologies related to software engineering are compared.
  - How much does it cost to adopt a technology?
  - In terms of price and quality what is the technologies yield?
2. Monitoring the system testing process: The failure intensity metric provides information about the systems current quality a high intensity indicates that additional testing is necessary.
3. Controlling the system in use: The degree of software modification required for maintenance has an impact on the systems dependability.
4. Improved understanding of software development processes: We can gain a better understanding of software development processes by quantifying quality.
Software Reliability Benefits

Including software reliability in the software development process has the following benefits:
- Data preservation makes use of software reliability.
- Avoiding software failure is beneficial.
- The process of upgrading the system is simple.
- More productivity is the result of improved system performance and efficiency.
January 11, 2026

Category: 12. Software Reliability

Software Fault Tolerance

Software Fault Tolerance Techniques

1. Recovery Block

2. N-Version Software

3. N-Version Software and Recovery Blocks

Reliability Metrics

1. Mean Time to Failure (MTTF)

2. Mean Time to Repair (MTTR)

3. Mean Time Between Failure (MTBR)

4. Rate of occurrence of failure (ROCOF)

5. Probability of Failure on Demand (POFOD)

6. Availability (AVAIL)

Software Metrics for Reliability

Different Types of Software Metrics are:-

Requirements Reliability Metrics

Design and Code Reliability Metrics

Testing Reliability Metrics

Software Reliability Measurement Techniques

1. Product Metrics

2. Project Management Metrics

3. Process Metrics

4. Fault and Failure Metrics

Software Failure Mechanisms

Hardware vs. Software Reliability

Software Reliability in Software Engineering

Introduction

Techniques of Software Reliability

Prediction Modeling

Estimation Modeling

Metrics for software Reliability

Based on Requirements

Based on Design and Code

Testing Reliability Metrics

Example to Implement Software Reliability:

Factors Affecting Software Reliability

Software Reliability Applications

Software Reliability Benefits