Knowledge

High availability

Source đź“ť

1202:
administrator will claim 100% uptime. However, given the true definition of availability, the system will be approximately 99.9% available, or three nines (8751 hours of available time out of 8760 hours per non-leap year). Also, systems experiencing performance problems are often deemed partially or entirely unavailable by users, even when the systems are continuing to function. Similarly, unavailability of select application functions might go unnoticed by administrators yet be devastating to users â€“ a true availability measure is holistic.
183: 1377:
the total number of components in the system. x is the number of components used to stress the system. N-1 means the model is stressed by evaluating performance with all possible combinations where one component is faulted. N-2 means the model is stressed by evaluating performance with all possible combinations where two component are faulted simultaneously.
1329:
Active redundancy is used in complex systems to achieve high availability with no performance decline. Multiple items of the same kind are incorporated into a design that includes a method to detect failure and automatically reconfigure the system to bypass failed items using a voting scheme. This is
1184:
documents. The use of the "nines" has been called into question, since it does not appropriately reflect that the impact of unavailability varies with its time of occurrence. For large amounts of 9s, the "unavailability" index (measure of downtime rather than uptime) is easier to handle. For example,
1015:
Similarly, percentages ending in a 5 have conventional names, traditionally the number of nines, then "five", so 99.95% is "three nines five", abbreviated 3N5. This is casually referred to as "three and a half nines", but this is incorrect: a 5 is only a factor of 2, while a 9 is a factor of 10, so a
751:
or customer. The subject of the terms is thus important here: whether the focus of a discussion is the server hardware, server OS, functional service, software service/process, or similar, it is only if there is a single, consistent subject of the discussion that the words uptime and availability can
259:
or system configuration changes that only take effect upon a reboot. In general, scheduled downtime is usually the result of some logical, management-initiated event. Unscheduled downtime events typically arise from some physical event, such as a hardware or software failure or environmental anomaly.
80:
The importance of network resilience is continuously increasing, as communication networks are becoming a fundamental component in the operation of critical infrastructures. Consequently, recent efforts focus on interpreting and improving network and computing resilience with applications to critical
1376:
is used to evaluate the theoretical reliability for large systems. The outcome of this kind of model is used to evaluate different design options. A model of the entire system is created, and the model is stressed by removing components. Redundancy simulation involves the N-x criteria. N represents
1201:
Availability measurement is subject to some degree of interpretation. A system that has been up for 365 days in a non-leap year might have been eclipsed by a network failure that lasted for 9 hours during a peak usage period; the user community will see the system as unavailable, whereas the system
1321:
Passive redundancy is used to achieve high availability by including enough excess capacity in the design to accommodate a performance decline. The simplest example is a boat with two separate engines driving two separate propellers. The boat continues toward its destination despite failure of a
1299:
inherently have more potential failure points and are more difficult to implement correctly. While some analysts would put forth the theory that the most highly available systems adhere to a simple architecture (a single, high quality, multi-purpose physical system with comprehensive internal
76:
and challenges to normal operation." Threats and challenges for services can range from simple misconfiguration over large scale natural disasters to targeted attacks. As such, network resilience touches a very wide range of topics. In order to increase the resilience of a given communication
1205:
Availability must be measured to be determined, ideally with comprehensive monitoring tools ("instrumentation") that are themselves highly available. If there is a lack of instrumentation, systems supporting high volume transaction processing throughout the day and night, such as credit card
1055:): 99.95% availability is 3.3 nines, not 3.5 nines. More simply, going from 99.9% availability to 99.95% availability is a factor of 2 (0.1% to 0.05% unavailability), but going from 99.95% to 99.99% availability is a factor of 5 (0.05% to 0.01% unavailability), over twice as much. 81:
infrastructures. As an example, one can consider as a resilience objective the provisioning of services over the network, instead of the services of the network itself. This may require coordinated response from both the network and from the services running on top of the network.
1300:
hardware redundancy), this architecture suffers from the requirement that the entire system must be brought down for patching and operating system upgrades. More advanced system designs allow for systems to be patched and upgraded without compromising service availability (see
1317:
is used to create systems with high levels of availability (e.g. aircraft flight computers). In this case it is required to have high levels of failure detectability and avoidance of common cause failures. Two kinds of redundancy are passive redundancy and active redundancy.
46:
refers to the ability of the user community to obtain a service or good, access the system, whether to submit new work, update or alter existing work, or collect the results of previous work. If a user cannot access the system, it is – from the user's point of view –
312:
often refer to monthly downtime or availability in order to calculate service credits to match monthly billing cycles. The following table shows the translation from a given availability percentage to the corresponding amount of time a system would be unavailable.
286:
Many computing sites exclude scheduled downtime from availability calculations, assuming that it has little or no impact upon the computing user community. By doing this, they can claim to have phenomenally high availability, which might give the illusion of
764:
allows approximately 5 minutes of downtime per year. Variants can be derived by multiplying or dividing by 10: 4 nines is 50 minutes and 3 nines is 500 minutes. In the opposite direction, 6 nines is 0.5 minutes (30 sec) and 7 nines is 3 seconds.
307:
Availability is usually expressed as a percentage of uptime in a given year. The following table shows the downtime that will be allowed for a particular percentage of availability, presuming that the system is required to operate continuously.
1192:
Sometimes the humorous term "nine fives" (55.5555555%) is used to contrast with "five nines" (99.999%), though this is not an actual goal, but rather a sarcastic reference to something totally failing to meet any reasonable target.
1365:
can be used in systems with limited redundancy to achieve high availability. Maintenance actions occur during brief periods of down-time only after a fault indicator activates. Failure is only significant if this occurs during a
1229:(MTTR). Recovery time could be infinite with certain system designs and failures, i.e. full recovery is impossible. One such example is a fire or flood that destroys a data center and its systems when there is no secondary 299:, and application upgrades, patches, and replacements. For certain systems, scheduled downtime does not matter, for example system downtime at an office building after everybody has gone home for the night. 1334:
is derived from early work by Birman and Joseph in this area. Active redundancy may introduce more complex failure modes into a system, such as continuous system reconfiguration due to faulty voting logic.
1760: 1172:
In general, the number of nines is not often used by a network engineer when modeling and measuring availability because it is hard to apply in formula. More often, the unavailability expressed as a
283:
If users can be warned away from scheduled downtimes, then the distinction is useful. But if the requirement is for true high availability, then downtime is downtime whether or not it is scheduled.
2126: 42:
Modernization has resulted in an increased reliance on these systems. For example, hospitals and data centers require high availability of their systems to perform routine daily activities.
1153: 1590: 1053: 268:
components (or possibly other failed hardware components), an over-temperature related shutdown, logically or physically severed network connections, security breaches, or various
830: 1225:(RTO) is closely related to availability, that is the total time required for a planned outage or the time required to fully recover from an unplanned outage. Another metric is 972: 898: 1311:
High availability requires less human intervention to restore operation in complex systems; the reason for this being that the most common cause for outages is human error.
1206:
processing systems or telephone switches, are often inherently better monitored, at least by the users themselves, than systems which experience periodic lulls in demand.
291:. Systems that exhibit truly continuous availability are comparatively rare and higher priced, and most have carefully implemented specialty designs that eliminate any 1326:. Malfunction of single components is not considered to be a failure unless the resulting performance decline exceeds the specification limits for the entire system. 936: 865: 251:
that is disruptive to system operation and usually cannot be avoided with a currently installed system design. Scheduled downtime events might include patches to
1244:
and other information storage systems faithfully record and report system transactions. Information management often focuses separately on data availability, or
1102: 1079: 791: 739:
are often used interchangeably but do not always refer to the same thing. For example, a system can be "up" with its services not "available" in the case of a
1695:
Smith, Paul; Hutchison, David; Sterbenz, James P.G.; Schöller, Marcus; Fessi, Ali; Karaliopoulos, Merkouris; Lac, Chidung; Plattner, Bernhard (July 3, 2011).
2026: 1764: 1854: 2124: 77:
network, the probable challenges and risks have to be identified and appropriate resilience metrics have to be defined for the service to be protected.
168:
Detection of failures as they occur. If the two principles above are observed, then a user may never see a failure – but the maintenance activity must.
1406: 2194: 2119:
Ulrik Franke, Pontus Johnson, Johan König, Liv Marcks von Würtemberg: Availability of enterprise IT systems – an expert-based Bayesian model,
2173: 1482: 1918: 1825:
Castet J., Saleh J. Survivability and Resiliency of Spacecraft and Space-Based Networks: a Framework for Characterization and Analysis",
1350:
events, or system lifetime. Zero downtime involves massive redundancy, which is needed for some types of aircraft and for most kinds of
1952: 1385:
A survey among academic availability experts in 2010 ranked reasons for unavailability of enterprise IT systems. All reasons refer to
1453:, unavailable systems were estimated to have cost American businesses $ 4.54 billion in 1996, due to lost productivity and revenues. 1322:
single engine or propeller. A more complex example is multiple redundant power generation facilities within a large system involving
158:. This means adding or building redundancy into the system so that failure of a component does not mean failure of the entire system. 1937: 989: 1827:
American Institute of Aeronautics and Astronautics, AIAA Technical Report 2008-7707. Conference on Network Protocols (ICNP 2006)
165:, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover. 1633: 1284: 1735: 1004:) 99.999% of the time would have 5 nines reliability, or class five. In particular, the term is used in connection with 204: 1252:
with various failure events. Some users can tolerate application service interruptions but cannot tolerate data loss.
2151: 1993: 1898: 1788:"A Benders Decomposition Approach for Resilient Placement of Virtual Process Control Functions in Mobile Edge Clouds" 1643: 1538: 1110: 230: 212: 2240: 1557: 1477: 1851: 2225: 208: 1295:
Adding more components to an overall system design can undermine efforts to achieve high availability because
1890: 1019: 2235: 1612: 2080: 1977: 35:) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually 1761:"The CERCES project - Center for Resilient Critical Infrastructures at KTH Royal Institute of Technology" 1323: 1159: 796: 2191: 941: 1339: 1301: 1276: 1210: 870: 992:
or "class of nines" in the digits. For example, electricity that is delivered without interruptions (
2068:
but it seems to me we are moving closer to 9-5s (55.5555555%) in network reliability rather than 5-9s
1472: 1355: 2230: 1351: 1314: 1245: 193: 162: 155: 1185:
this is why an "unavailability" rather than availability metric is used in hard disk or data link
1839: 1396: 1373: 1256: 1222: 1009: 309: 292: 288: 197: 147: 137: 2171: 997: 89: 2055: 1280: 1226: 906: 2245: 1956: 1497: 838: 744: 269: 113: 1852:
Introduction to the new mainframe: Large scale commercial computing Chapter 5 Availability
8: 2204: 1450: 1343: 1807: 1716: 1420: 1087: 1064: 776: 244: 103: 73: 743:. Or a system undergoing software maintenance can be "available" to be worked on by a 2147: 1894: 1659: 1639: 1237: 1230: 108: 72:, the ability to "provide and maintain an acceptable level of service in the face of 1811: 1720: 2104: 1799: 1708: 1367: 1272: 277: 66: 2210: 2198: 2177: 2022: 1858: 1467: 1362: 1180:
per year is quoted. Availability specified as a number of nines is often seen in
252: 243:
A distinction can be made between scheduled and unscheduled downtime. Typically,
1824: 988:
Percentages of a particular order of magnitude are sometimes referred to by the
2108: 2027:"After 35 years of technology crusades, Bob Metcalfe rides off into the sunset" 1787: 1696: 1487: 1296: 1268: 1259:("SLA") formalizes an organization's availability objectives and requirements. 1186: 1082: 740: 143: 2010:
leading to crashes and uptime numbers closer to nine fives than to five nines.
1870: 1803: 1712: 2219: 2130: 1166: 1001: 983: 128:
are interchangeably used according to the specific context of a given study.
125: 95: 20: 1932:
Murphy, Niall Richard; Beyer, Betsy; Petoff, Jennifer; Jones, Chris (2016).
1915:
PVD for Microelectronics: Sputter Desposition to Semiconductor Manufacturing
2121:
Proc. Fourth International Workshop on Software Quality and Maintainability
1462: 993: 736: 43: 1677: 1492: 1338:
Zero downtime system design means that modeling and simulation indicates
1173: 639: 576: 248: 773:
Another memory trick to calculate the allowed downtime duration for an "
1617: 1005: 702: 296: 273: 260:
Examples of unscheduled downtime events include power outages, failed
1249: 1181: 1528:
days). For consistency, all times are rounded to two decimal digits.
182: 1520:
days per year; respectively, a quarter is a ÂĽ of that value (i.e.,
1305: 1241: 1177: 748: 53: 1347: 1331: 256: 1934:
Site Reliability Engineering: How Google Runs Production Systems
1221:
Recovery time (or estimated time of repair (ETR), also known as
732: 36: 1694: 1330:
used with complex computing systems that are linked. Internet
19:"Always-on" redirects here. For the software restriction, see 1992:
Newman, David; Snyder, Joel; Thayer, Rodney (June 24, 2012).
1736:"operational resilience | telcos | accesstel | risk | crisis" 1631: 2146:(Second ed.). Indianapolis, IN: John Wiley & Sons. 1267:
High availability is one of the primary requirements of the
57:
is used to refer to periods when a system is unavailable.
1389:
in each of the following areas (in order of importance):
265: 261: 1441:
A book on the factors themselves was published in 2003.
1169:
is sometimes used to describe the purity of substances.
1931: 793:-nines" availability percentage is to use the formula 295:
and allow online hardware, network, operating system,
100:
maintaining service of communication services such as
1560: 1279:. If the controlling system becomes unavailable, the 1113: 1090: 1067: 1022: 944: 909: 873: 841: 799: 779: 172: 1635:
High Availability: Design, Techniques, and Processes
1550:"Twice as much" on a logarithmic scale, meaning two 1792:
IEEE Transactions on Network and Service Management
1991: 1584: 1147: 1096: 1073: 1047: 966: 930: 892: 859: 835:For example, 90% ("one nine") yields the exponent 824: 785: 1342:significantly exceeds the period of time between 1196: 768: 2217: 903:Also, 99.999% ("five nines") gives the exponent 1829:, Santa Barbara, California, USA, November 2006 1248:, in order to determine acceptable (or actual) 1148:{\displaystyle c:=\lfloor -\log _{10}x\rfloor } 2042:and five nines (not nine fives) of reliability 1786:Zhao, Peiyue; Dán, György (December 3, 2018). 2205:Lecture notes on Embedded Systems Engineering 1524:days), and a month is a twelfth of it (i.e., 1380: 747:, but its services do not appear "up" to the 16:Systems with high up-time, a.k.a. "always on" 1585:{\displaystyle \times 2\times 2<\times 5} 1483:Reliability, availability and serviceability 1142: 1120: 1008:or enterprise computing, often as part of a 2141: 1950: 1697:"Network resilience: a systematic approach" 1539:mathematical coincidences concerning base 2 1262: 1216: 211:. Unsourced material may be challenged and 2073: 1444: 1411:Avoidance of internal application failures 150:which can help achieve high availability. 1871:IBM zEnterprise EC12 Business Value Video 1733: 1358:is an example of a zero downtime system. 302: 231:Learn how and when to remove this message 120:access to applications and data as needed 2021: 1994:"Crying Wolf: False alarms hide attacks" 1414:Avoidance of external services that fail 938:, and therefore the allowed downtime is 867:, and therefore the allowed downtime is 755: 2053: 2000:. Vol. 19, no. 25. p. 60 1785: 2218: 1632:Floyd Piedad, Michael Hawkins (2001). 1610: 1048:{\displaystyle \log _{10}2\approx 0.3} 2192:Lecture Notes on Enterprise Computing 1393:Monitoring of the relevant components 1842:M. Nesterenko, Kent State University 1285:ASW Continuous Trail Unmanned Vessel 977: 209:adding citations to reliable sources 176: 1016:5 is 0.3 nines (per below formula: 825:{\displaystyle 8.64\times 10^{4-n}} 760:A simple mnemonic rule states that 65:High availability is a property of 39:, for a higher than normal period. 13: 1541:for details on this approximation. 967:{\displaystyle 8.64\times 10^{-1}} 173:Scheduled and unscheduled downtime 14: 2257: 2185: 2142:Marcus, Evan; Stern, Hal (2003). 2054:Pilgrim, Jim (October 20, 2010). 1678:"Webarchiv ETHZ / Webarchive ETH" 893:{\displaystyle 8.64\times 10^{3}} 2144:Blueprints for high availability 1889:. Pergamon Press. 1981. p.  1290: 709:99.9999999999% ("twelve nines") 181: 2160: 2135: 2113: 2098: 2047: 2015: 1985: 1971: 1944: 1925: 1907: 1879: 1864: 1845: 1833: 1544: 1531: 1510: 1478:Overall equipment effectiveness 1437:Storage architecture redundancy 686:99.999999999% ("eleven nines") 2168:Improving systems availability 1951:Josh Deprez (April 23, 2016). 1818: 1779: 1753: 1727: 1688: 1670: 1660:"Definitions - ResiliNetsWiki" 1652: 1625: 1611:Robert, Sheldon (April 2024). 1604: 1240:, that is the degree to which 1197:Measurement and interpretation 142:There are three principles of 1: 2170:, IBM Global Services, 1998, 1598: 315: 131: 60: 1701:IEEE Communications Magazine 1425:Technical solution of backup 540:99.995% ("four nines five") 500:99.95% ("three nines five") 335:Downtime per day (24 hours) 7: 2081:"What is network downtime?" 1456: 1387:not following best practice 1324:electric power transmission 1277:autonomous maritime vessels 1236:Another related concept is 1160:Floor and ceiling functions 666:99.99999999% ("ten nines") 646:99.9999999% ("nine nines") 623:99.999999% ("eight nines") 10: 2262: 2197:November 16, 2013, at the 1734:accesstel (June 9, 2022). 1428:Process solution of backup 1381:Reasons for unavailability 1340:mean time between failures 1211:mean time between failures 981: 708: 685: 665: 645: 622: 603:99.99999% ("seven nines") 602: 582: 559: 539: 519: 499: 479: 460:99.8% ("two nines eight") 459: 439: 419: 399: 379: 359: 339: 135: 18: 1887:Precious metals, Volume 4 1804:10.1109/TNSM.2018.2873178 1713:10.1109/MCOM.2011.5936160 1473:High-availability cluster 1434:Infrastructure redundancy 1356:Global Positioning System 1352:communications satellites 1209:An alternative metric is 440:99.5% ("two nines five") 1814:– via IEEE Xplore. 1723:– via IEEE Xplore. 1613:"high availability (HA)" 1503: 1263:Military control systems 1246:Recovery Point Objective 1217:Closely related concepts 310:Service level agreements 156:single points of failure 84:These services include: 2241:Reliability engineering 2211:Uptime Calculator (SLA) 1445:Costs of unavailability 1374:Modeling and simulation 1287:(ACTUV) would be lost. 1257:service level agreement 1223:recovery time objective 1010:service-level agreement 583:99.9999% ("six nines") 560:99.999% ("five nines") 400:98% ("one nine eight") 380:97% ("one nine seven") 293:single point of failure 289:continuous availability 161:Reliable crossover. In 148:reliability engineering 138:Design for availability 2201:University of TĂĽbingen 2176:April 1, 2011, at the 1857:March 4, 2016, at the 1586: 1449:In a 1998 report from 1149: 1098: 1075: 1049: 968: 932: 931:{\displaystyle 4-5=-1} 894: 861: 826: 787: 752:be used synonymously. 520:99.99% ("four nines") 480:99.9% ("three nines") 360:95% ("one nine five") 303:Percentage calculation 90:distributed processing 51:. Generally, the term 2226:System administration 2207:by Prof. Phil Koopman 2166:IBM Global Services, 2123:(WSQM 2010), Madrid, 1980:The myth of the nines 1587: 1281:Ground Combat Vehicle 1227:mean time to recovery 1176:(like 0.00001), or a 1150: 1099: 1076: 1058:A formulation of the 1050: 969: 933: 895: 862: 860:{\displaystyle 4-1=3} 827: 788: 756:Five-by-five mnemonic 326:Downtime per quarter 136:Further information: 1959:on September 4, 2016 1558: 1498:Ubiquitous computing 1417:Physical environment 1111: 1088: 1081:based on a system's 1065: 1020: 942: 907: 871: 839: 797: 777: 769:"Powers of 10" trick 745:system administrator 689:315.58 microseconds 675:262.80 microseconds 672:788.40 microseconds 658:604.80 microseconds 626:315.58 milliseconds 612:262.98 milliseconds 595:604.80 milliseconds 320:Availability % 205:improve this section 114:online collaboration 2236:Applied probability 2129:August 4, 2012, at 1767:on October 19, 2018 1451:IBM Global Services 1344:planned maintenance 1167:similar measurement 721:604.81 nanoseconds 712:31.56 microseconds 695:26.28 microseconds 692:78.84 microseconds 678:60.48 microseconds 661:86.40 microseconds 649:31.56 milliseconds 632:26.30 milliseconds 629:78.89 milliseconds 615:60.48 milliseconds 598:86.40 milliseconds 329:Downtime per month 1582: 1421:Network redundancy 1145: 1094: 1071: 1045: 964: 928: 890: 857: 822: 783: 724:86.40 nanoseconds 718:2.63 microseconds 715:7.88 microseconds 698:6.05 microseconds 681:8.64 microseconds 669:3.16 milliseconds 655:2.63 milliseconds 652:7.89 milliseconds 635:6.05 milliseconds 618:8.64 milliseconds 420:99% ("two nines") 332:Downtime per week 323:Downtime per year 245:scheduled downtime 104:video conferencing 2058:. Clearfield, Inc 2056:"Goodbye Five 9s" 2025:(April 2, 2001). 1682:webarchiv.ethz.ch 1638:. Prentice Hall. 1431:Physical location 1273:unmanned vehicles 1238:data availability 1231:disaster recovery 1097:{\displaystyle x} 1074:{\displaystyle c} 974:seconds per day. 900:seconds per day. 832:seconds per day. 786:{\displaystyle n} 728: 727: 340:90% ("one nine") 241: 240: 233: 163:redundant systems 109:instant messaging 29:High availability 2253: 2180: 2164: 2158: 2157: 2139: 2133: 2117: 2111: 2102: 2096: 2095: 2093: 2091: 2077: 2071: 2070: 2065: 2063: 2051: 2045: 2044: 2039: 2037: 2019: 2013: 2012: 2007: 2005: 1989: 1983: 1978:Evan L. Marcus, 1975: 1969: 1968: 1966: 1964: 1955:. Archived from 1953:"Nines of Nines" 1948: 1942: 1941: 1929: 1923: 1922: 1917:. 1998. p.  1911: 1905: 1904: 1883: 1877: 1868: 1862: 1849: 1843: 1837: 1831: 1822: 1816: 1815: 1798:(4): 1460–1472. 1783: 1777: 1776: 1774: 1772: 1763:. Archived from 1757: 1751: 1750: 1748: 1746: 1731: 1725: 1724: 1692: 1686: 1685: 1674: 1668: 1667: 1656: 1650: 1649: 1629: 1623: 1622: 1608: 1592: 1591: 1589: 1588: 1583: 1548: 1542: 1535: 1529: 1527: 1523: 1519: 1514: 1407:network failures 1368:mission critical 1154: 1152: 1151: 1146: 1135: 1134: 1103: 1101: 1100: 1095: 1080: 1078: 1077: 1072: 1054: 1052: 1051: 1046: 1032: 1031: 973: 971: 970: 965: 963: 962: 937: 935: 934: 929: 899: 897: 896: 891: 889: 888: 866: 864: 863: 858: 831: 829: 828: 823: 821: 820: 792: 790: 789: 784: 317: 316: 278:operating system 236: 229: 225: 222: 216: 185: 177: 2261: 2260: 2256: 2255: 2254: 2252: 2251: 2250: 2231:Quality control 2216: 2215: 2199:Wayback Machine 2188: 2183: 2178:Wayback Machine 2165: 2161: 2154: 2140: 2136: 2118: 2114: 2103: 2099: 2089: 2087: 2079: 2078: 2074: 2061: 2059: 2052: 2048: 2035: 2033: 2020: 2016: 2003: 2001: 1990: 1986: 1976: 1972: 1962: 1960: 1949: 1945: 1930: 1926: 1913: 1912: 1908: 1901: 1885: 1884: 1880: 1869: 1865: 1859:Wayback Machine 1850: 1846: 1838: 1834: 1823: 1819: 1784: 1780: 1770: 1768: 1759: 1758: 1754: 1744: 1742: 1732: 1728: 1693: 1689: 1676: 1675: 1671: 1658: 1657: 1653: 1646: 1630: 1626: 1609: 1605: 1601: 1596: 1595: 1559: 1556: 1555: 1549: 1545: 1536: 1532: 1525: 1521: 1517: 1515: 1511: 1506: 1468:Fault tolerance 1459: 1447: 1399:and procurement 1383: 1363:instrumentation 1297:complex systems 1293: 1269:control systems 1265: 1219: 1199: 1187:bit error rates 1130: 1126: 1112: 1109: 1108: 1089: 1086: 1085: 1066: 1063: 1062: 1027: 1023: 1021: 1018: 1017: 990:number of nines 986: 980: 955: 951: 943: 940: 939: 908: 905: 904: 884: 880: 872: 869: 868: 840: 837: 836: 810: 806: 798: 795: 794: 778: 775: 774: 771: 758: 729: 305: 255:that require a 253:system software 247:is a result of 237: 226: 220: 217: 202: 186: 175: 154:Elimination of 140: 134: 124:Resilience and 96:network storage 63: 24: 17: 12: 11: 5: 2259: 2249: 2248: 2243: 2238: 2233: 2228: 2214: 2213: 2208: 2202: 2187: 2186:External links 2184: 2182: 2181: 2159: 2152: 2134: 2112: 2097: 2072: 2046: 2014: 1984: 1970: 1943: 1924: 1906: 1899: 1878: 1863: 1844: 1832: 1817: 1778: 1752: 1726: 1687: 1669: 1664:resilinets.org 1651: 1644: 1624: 1602: 1600: 1597: 1594: 1593: 1581: 1578: 1575: 1572: 1569: 1566: 1563: 1543: 1530: 1508: 1507: 1505: 1502: 1501: 1500: 1495: 1490: 1488:Responsiveness 1485: 1480: 1475: 1470: 1465: 1458: 1455: 1446: 1443: 1439: 1438: 1435: 1432: 1429: 1426: 1423: 1418: 1415: 1412: 1409: 1403: 1400: 1394: 1382: 1379: 1302:load balancing 1292: 1289: 1264: 1261: 1218: 1215: 1198: 1195: 1156: 1155: 1144: 1141: 1138: 1133: 1129: 1125: 1122: 1119: 1116: 1093: 1083:unavailability 1070: 1044: 1041: 1038: 1035: 1030: 1026: 982:Main article: 979: 976: 961: 958: 954: 950: 947: 927: 924: 921: 918: 915: 912: 887: 883: 879: 876: 856: 853: 850: 847: 844: 819: 816: 813: 809: 805: 802: 782: 770: 767: 757: 754: 741:network outage 726: 725: 722: 719: 716: 713: 710: 706: 705: 699: 696: 693: 690: 687: 683: 682: 679: 676: 673: 670: 667: 663: 662: 659: 656: 653: 650: 647: 643: 642: 636: 633: 630: 627: 624: 620: 619: 616: 613: 610: 607: 604: 600: 599: 596: 593: 590: 587: 586:31.56 seconds 584: 580: 579: 573: 570: 569:26.30 seconds 567: 564: 561: 557: 556: 553: 552:30.24 seconds 550: 547: 544: 543:26.30 minutes 541: 537: 536: 533: 530: 527: 526:13.15 minutes 524: 523:52.60 minutes 521: 517: 516: 515:43.20 seconds 513: 510: 509:21.92 minutes 507: 504: 501: 497: 496: 493: 492:10.08 minutes 490: 489:43.83 minutes 487: 484: 481: 477: 476: 473: 472:20.16 minutes 470: 469:87.66 minutes 467: 464: 461: 457: 456: 453: 452:50.40 minutes 450: 447: 444: 441: 437: 436: 435:14.40 minutes 433: 430: 427: 424: 421: 417: 416: 415:28.80 minutes 413: 410: 407: 404: 401: 397: 396: 395:43.20 minutes 393: 390: 387: 384: 381: 377: 376: 373: 370: 367: 364: 361: 357: 356: 353: 350: 347: 344: 341: 337: 336: 333: 330: 327: 324: 321: 304: 301: 239: 238: 189: 187: 180: 174: 171: 170: 169: 166: 159: 144:systems design 133: 130: 122: 121: 118: 117: 116: 111: 106: 98: 92: 62: 59: 15: 9: 6: 4: 3: 2: 2258: 2247: 2244: 2242: 2239: 2237: 2234: 2232: 2229: 2227: 2224: 2223: 2221: 2212: 2209: 2206: 2203: 2200: 2196: 2193: 2190: 2189: 2179: 2175: 2172: 2169: 2163: 2155: 2153:0-471-43026-9 2149: 2145: 2138: 2132: 2131:archive.today 2128: 2125: 2122: 2116: 2110: 2106: 2101: 2086: 2082: 2076: 2069: 2057: 2050: 2043: 2032: 2028: 2024: 2023:Metcalfe, Bob 2018: 2011: 1999: 1998:Network World 1995: 1988: 1982: 1981: 1974: 1958: 1954: 1947: 1939: 1935: 1928: 1920: 1916: 1910: 1902: 1900:9780080253695 1896: 1892: 1888: 1882: 1876: 1872: 1867: 1860: 1856: 1853: 1848: 1841: 1840:Lecture Notes 1836: 1830: 1828: 1821: 1813: 1809: 1805: 1801: 1797: 1793: 1789: 1782: 1766: 1762: 1756: 1741: 1737: 1730: 1722: 1718: 1714: 1710: 1706: 1702: 1698: 1691: 1683: 1679: 1673: 1665: 1661: 1655: 1647: 1645:9780130962881 1641: 1637: 1636: 1628: 1620: 1619: 1614: 1607: 1603: 1579: 1576: 1573: 1570: 1567: 1564: 1561: 1553: 1547: 1540: 1534: 1513: 1509: 1499: 1496: 1494: 1491: 1489: 1486: 1484: 1481: 1479: 1476: 1474: 1471: 1469: 1466: 1464: 1461: 1460: 1454: 1452: 1442: 1436: 1433: 1430: 1427: 1424: 1422: 1419: 1416: 1413: 1410: 1408: 1405:Avoidance of 1404: 1401: 1398: 1395: 1392: 1391: 1390: 1388: 1378: 1375: 1371: 1369: 1364: 1359: 1357: 1353: 1349: 1345: 1341: 1336: 1333: 1327: 1325: 1319: 1316: 1312: 1309: 1307: 1303: 1298: 1291:System design 1288: 1286: 1282: 1278: 1274: 1270: 1260: 1258: 1253: 1251: 1247: 1243: 1239: 1234: 1233:data center. 1232: 1228: 1224: 1214: 1212: 1207: 1203: 1194: 1190: 1188: 1183: 1179: 1175: 1170: 1168: 1163: 1161: 1139: 1136: 1131: 1127: 1123: 1117: 1114: 1107: 1106: 1105: 1091: 1084: 1068: 1061: 1056: 1042: 1039: 1036: 1033: 1028: 1024: 1013: 1011: 1007: 1003: 999: 995: 991: 985: 984:Nine (purity) 975: 959: 956: 952: 948: 945: 925: 922: 919: 916: 913: 910: 901: 885: 881: 877: 874: 854: 851: 848: 845: 842: 833: 817: 814: 811: 807: 803: 800: 780: 766: 763: 753: 750: 746: 742: 738: 734: 723: 720: 717: 714: 711: 707: 704: 700: 697: 694: 691: 688: 684: 680: 677: 674: 671: 668: 664: 660: 657: 654: 651: 648: 644: 641: 637: 634: 631: 628: 625: 621: 617: 614: 611: 609:0.79 seconds 608: 606:3.16 seconds 605: 601: 597: 594: 592:2.63 seconds 591: 589:7.89 seconds 588: 585: 581: 578: 574: 572:6.05 seconds 571: 568: 566:1.31 minutes 565: 563:5.26 minutes 562: 558: 555:4.32 seconds 554: 551: 549:2.19 minutes 548: 546:6.57 minutes 545: 542: 538: 535:8.64 seconds 534: 532:1.01 minutes 531: 529:4.38 minutes 528: 525: 522: 518: 514: 512:5.04 minutes 511: 508: 506:65.7 minutes 505: 502: 498: 495:1.44 minutes 494: 491: 488: 485: 482: 478: 475:2.88 minutes 474: 471: 468: 465: 462: 458: 455:7.20 minutes 454: 451: 448: 445: 442: 438: 434: 431: 428: 425: 422: 418: 414: 411: 408: 405: 402: 398: 394: 391: 388: 385: 382: 378: 374: 371: 368: 365: 362: 358: 354: 351: 348: 345: 342: 338: 334: 331: 328: 325: 322: 319: 318: 314: 311: 300: 298: 294: 290: 284: 281: 279: 275: 271: 267: 263: 258: 254: 250: 246: 235: 232: 224: 214: 210: 206: 200: 199: 195: 190:This section 188: 184: 179: 178: 167: 164: 160: 157: 153: 152: 151: 149: 145: 139: 129: 127: 126:survivability 119: 115: 112: 110: 107: 105: 102: 101: 99: 97: 93: 91: 87: 86: 85: 82: 78: 75: 71: 68: 58: 56: 55: 50: 45: 40: 38: 34: 30: 26: 22: 21:Always-on DRM 2167: 2162: 2143: 2137: 2120: 2115: 2100: 2090:December 27, 2088:. Retrieved 2084: 2075: 2067: 2060:. Retrieved 2049: 2041: 2034:. Retrieved 2030: 2017: 2009: 2002:. Retrieved 1997: 1987: 1979: 1973: 1961:. Retrieved 1957:the original 1946: 1933: 1927: 1914: 1909: 1886: 1881: 1874: 1866: 1847: 1835: 1826: 1820: 1795: 1791: 1781: 1769:. Retrieved 1765:the original 1755: 1743:. Retrieved 1739: 1729: 1707:(7): 88–97. 1704: 1700: 1690: 1681: 1672: 1663: 1654: 1634: 1627: 1616: 1606: 1551: 1546: 1533: 1512: 1463:Availability 1448: 1440: 1397:Requirements 1386: 1384: 1372: 1360: 1337: 1328: 1320: 1313: 1310: 1294: 1266: 1254: 1235: 1220: 1208: 1204: 1200: 1191: 1171: 1164: 1157: 1059: 1057: 1014: 987: 902: 834: 772: 761: 759: 737:availability 730: 640:microseconds 577:milliseconds 463:17.53 hours 446:10.98 hours 409:14.61 hours 406:43.86 hours 389:21.92 hours 369:36.53 hours 352:16.80 hours 349:73.05 hours 306: 285: 282: 242: 227: 218: 203:Please help 191: 141: 123: 83: 79: 69: 64: 52: 48: 44:Availability 41: 32: 28: 27: 25: 2246:Measurement 1875:youtube.com 1493:Scalability 1174:probability 1060:class of 9s 703:nanoseconds 503:4.38 hours 486:2.19 hours 483:8.77 hours 466:4.38 hours 449:3.65 hours 432:1.68 hours 429:7.31 hours 426:21.9 hours 412:3.36 hours 392:5.04 hours 383:10.96 days 375:1.20 hours 372:8.40 hours 363:18.26 days 355:2.40 hours 343:36.53 days 270:application 249:maintenance 94:supporting 88:supporting 49:unavailable 2220:Categories 2085:Networking 1936:. p.  1861:IBM (2006) 1771:August 26, 1618:Techtarget 1599:References 1402:Operations 1315:Redundancy 1006:mainframes 731:The terms 443:1.83 days 423:3.65 days 403:7.31 days 386:2.74 days 366:4.56 days 346:9.13 days 297:middleware 280:failures. 274:middleware 132:Principles 70:resilience 61:Resilience 2062:March 15, 2036:March 15, 2004:March 15, 1740:accesstel 1577:× 1568:× 1562:× 1283:(GCV) or 1250:data loss 1242:databases 1182:marketing 1143:⌋ 1137:⁡ 1124:− 1121:⌊ 1104:would be 1040:≈ 1034:⁡ 998:brownouts 994:blackouts 957:− 949:× 923:− 914:− 878:× 846:− 815:− 804:× 221:June 2008 192:does not 2195:Archived 2174:Archived 2127:Archived 1891:page 262 1855:Archived 1812:56594760 1721:10246912 1457:See also 1370:period. 1306:failover 1213:(MTBF). 1178:downtime 749:end user 54:downtime 2031:ITworld 1963:May 31, 1552:factors 1526:30.4375 1522:91.3125 1348:upgrade 1332:routing 978:"Nines" 762:5 nines 701:864.00 638:864.00 575:864.00 213:removed 198:sources 67:network 2150:  2107:  1897:  1810:  1745:May 8, 1719:  1642:  1554:of 2: 1518:365.25 1516:Using 1361:Fault 1002:surges 733:uptime 276:, and 257:reboot 74:faults 37:uptime 1808:S2CID 1717:S2CID 1504:Notes 1158:(cf. 2148:ISBN 2092:2023 2064:2019 2038:2019 2006:2019 1965:2016 1895:ISBN 1773:2023 1747:2023 1640:ISBN 1574:< 1537:See 1304:and 1275:and 946:8.64 875:8.64 801:8.64 735:and 196:any 194:cite 2109:992 2105:RFC 1919:387 1873:at 1800:doi 1709:doi 1308:). 1271:in 1162:). 1128:log 1043:0.3 1025:log 1000:or 266:RAM 264:or 262:CPU 207:by 146:in 2222:: 2083:. 2066:. 2040:. 2029:. 2008:. 1996:. 1938:38 1893:. 1806:. 1796:15 1794:. 1790:. 1738:. 1715:. 1705:49 1703:. 1699:. 1680:. 1662:. 1615:. 1354:. 1346:, 1255:A 1189:. 1165:A 1132:10 1118::= 1029:10 1012:. 996:, 953:10 882:10 808:10 272:, 33:HA 2156:. 2094:. 1967:. 1940:. 1921:. 1903:. 1802:: 1775:. 1749:. 1711:: 1684:. 1666:. 1648:. 1621:. 1580:5 1571:2 1565:2 1140:x 1115:c 1092:x 1069:c 1037:2 960:1 926:1 920:= 917:5 911:4 886:3 855:3 852:= 849:1 843:4 818:n 812:4 781:n 234:) 228:( 223:) 219:( 215:. 201:. 31:( 23:.

Index

Always-on DRM
uptime
Availability
downtime
network
faults
distributed processing
network storage
video conferencing
instant messaging
online collaboration
survivability
Design for availability
systems design
reliability engineering
single points of failure
redundant systems

cite
sources
improve this section
adding citations to reliable sources
removed
Learn how and when to remove this message
scheduled downtime
maintenance
system software
reboot
CPU
RAM

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑