Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.
The failure rate of a system usually depends on time, with the rate varying over the life cycle of the system. For example, an automobile’s failure rate in its fifth year of service may be many times greater than its failure rate during its first year of service. One does not expect to replace an exhaust pipe, overhaul the brakes, or have major transmission problems in a new vehicle.
In practice, the mean time between failures (MTBF, 1/λ) is often reported instead of the failure rate. This is valid and useful if the failure rate may be assumed constant – often used for complex units / systems, electronics – and is a general agreement in some reliability standards (Military and Aerospace). It does in this case only relate to the flat region of the bathtub curve, which is also called the “useful life period”. Because of this, it is incorrect to extrapolate MTBF to give an estimate of the service lifetime of a component, which will typically be much less than suggested by the MTBF due to the much higher failure rates in the “end-of-life wearout” part of the “bathtub curve”.
The reason for the preferred use for MTBF numbers is that the use of large positive numbers (such as 2000 hours) is more intuitive and easier to remember than very small numbers (such as 0.0005 per hour).
The MTBF is an important system parameter in systems where failure rate needs to be managed, in particular for safety systems. The MTBF appears frequently in the engineering design requirements, and governs frequency of required system maintenance and inspections. In special processes called renewal processes, where the time to recover from failure can be neglected and the likelihood of failure remains constant with respect to time, the failure rate is simply the multiplicative inverse of the MTBF (1/λ).
A similar ratio used in the transport industries, especially in railways and trucking is “mean distance between failures”, a variation which attempts to correlate actual loaded distances to similar reliability needs and practices.
Failure rates are important factors in the insurance, finance, commerce and regulatory industries and fundamental to the design of safe systems in a wide variety of applications.
Failure Rate Data
Failure rate data can be obtained in several ways. The most common means are:
From field failure rate reports, statistical analysis techniques can be used to estimate failure rates. For accurate failure rates the analyst must have a good understanding of equipment operation, procedures for data collection, the key environmental variables impacting failure rates, how the equipment is used at the system level, and how the failure data will be used by system designers.
Historical data about the device or system under consideration
Many organizations maintain internal databases of failure information on the devices or systems that they produce, which can be used to calculate failure rates for those devices or systems. For new devices or systems, the historical data for similar devices or systems can serve as a useful estimate.
Government and commercial failure rate data
Handbooks of failure rate data for various components are available from government and commercial sources. MIL-HDBK-217F, Reliability Prediction of Electronic Equipment, is a military standard that provides failure rate data for many military electronic components. Several failure rate data sources are available commercially that focus on commercial components, including some non-electronic components.
Time lag is one of the serious drawbacks of all failure rate estimations. Often by the time the failure rate data are available, the devices under study have become obsolete. Due to this drawback, failure-rate prediction methods have been developed. These methods may be used on newly-designed devices to predict the device’s failure rates and failure modes. Two approaches have become well known, Cycle Testing and FMEDA.
The most accurate source of data is to test samples of the actual devices or systems in order to generate failure data. This is often prohibitively expensive or impractical, so that the previous data sources are often used instead.
Mechanical movement is the predominant failure mechanism causing mechanical and electromechanical devices to wear out. For many devices, the wear-out failure point is measured by the number of cycles performed before the device fails, and can be discovered by cycle testing. In cycle testing, a device is cycled as rapidly as practical until it fails. When a collection of these devices are tested, the test will run until 10% of the units fail dangerously.
Failure modes, effects, and diagnostic analysis (FMEDA) is a systematic analysis technique to obtain subsystem / product level failure rates, failure modes and design strength. The FMEDA technique considers:
- All components of a design,
- The functionality of each component,
- The failure modes of each component,
- The effect of each component failure mode on the product functionality,
- The ability of any automatic diagnostics to detect the failure,
- The design strength (de-rating, safety factors) and
- The operational profile (environmental stress factors).
Given a component database calibrated with field failure data that is reasonably accurate , the method can predict product level failure rate and failure mode data for a given application. The predictions have been shown to be more accurate than field warranty return analysis or even typical field failure analysis given that these methods depend on reports that typically do not have sufficient detail information in failure records. Failure modes, effects, and diagnostic analysis
- ^Electrical & Mechanical Component Reliability Handbook. exida. 2006.
- ^Goble, William M.; Iwan van Beurden (2014). Combining field failure data with new instrument design margins to predict failure rates for SIS Verification. Proceedings of the 2014 International Symposium – BEYOND REGULATORY COMPLIANCE, MAKING SAFETY SECOND NATURE, Hilton College Station-Conference Center, College Station, Texas.
- ^ M. Goble, “Field Failure Data – the Good, the Bad and the Ugly,” exida, Sellersville, PA 
- ^Finkelstein, Maxim (2008). “Introduction”. Failure Rate Modelling for Reliability and Risk. Springer Series in Reliability Engineering. pp. 1–84. doi:10.1007/978-1-84800-986-8_1. ISBN 978-1-84800-985-1.
- ^ Jump up to:ab Brown, M. (1980). “Bounds, Inequalities, and Monotonicity Properties for Some Specialized Renewal Processes”. The Annals of Probability. 8 (2): 227–240. doi:10.1214/aop/1176994773. JSTOR 2243267.
- ^ Jump up to:ab Shanthikumar, J. G. (1988). “DFR Property of First-Passage Times and its Preservation Under Geometric Compounding”. The Annals of Probability. 16 (1): 397–406. doi:10.1214/aop/1176991910. JSTOR 2243910.
- ^Brown, M. (1981). “Further Monotonicity Properties for Specialized Renewal Processes”. The Annals of Probability. 9 (5): 891–895. doi:10.1214/aop/1176994317. JSTOR 2243747.
- ^Yu, Y. (2011). “Concave renewal functions do not imply DFR interrenewal times”. Journal of Applied Probability. 48 (2): 583–588. arXiv:1009.2463. doi:10.1239/jap/1308662647.
- ^ Jump up to:ab Proschan, F. (1963). “Theoretical Explanation of Observed Decreasing Failure Rate”. Technometrics. 5 (3): 375–383. doi:10.1080/00401706.1963.10490105. JSTOR 1266340.
- ^Baker, J. C.; Baker, G. A. S. . (1980). “Impact of the space environment on spacecraft lifetimes”. Journal of Spacecraft and Rockets. 17 (5): 479. Bibcode:1980JSpRo..17..479B. doi:10.2514/3.28040.
- ^Saleh, Joseph Homer; Castet, Jean-François (2011). “On Time, Reliability, and Spacecraft”. Spacecraft Reliability and Multi-State Failures. p. 1. doi:10.1002/9781119994077.ch1. ISBN 9781119994077.
- ^Wierman, A.; Bansal, N.; Harchol-Balter, M. (2004). “A note on comparing response times in the M/GI/1/FB and M/GI/1/PS queues” (PDF). Operations Research Letters. 32: 73–76. doi:10.1016/S0167-6377(03)00061-0.
- ^Gautam, Natarajan (2012). Analysis of Queues: Methods and Applications. CRC Press. p. 703. ISBN 978-1439806586.
- ^Xin Li; Michael C. Huang; Kai Shen; Lingkun Chu. “A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility”. 2010. p. 6.
- ^“Reliability Basics”. 2010.
- ^Vita Faraci. “Calculating Failure Rates of Series/Parallel Networks”. 2006.
- ^“Mission Reliability and Logistics Reliability: A Design Paradox”.