Knowledge

Software fault tolerance

Source đź“ť

22: 227:
Threading allows a separate sequence of execution for each API call that can block. This can prevent the overall application from stalling while waiting for a resource. This has the benefit that none of the information about the state of the API call is lost while other activities take place.
327:
Timers allow a blocked call to be interrupted. A periodic timer allows the programmer to emulate threading. Interrupts typically destroy any information related to the state of a blocked API call or intensive calculation, so the programmer must keep track of this information separately.
481:
Redundancy relies on replicating information on more than one computer computing device so that the recovery delay is brief. This can be achieved using continuous backup to a live system that remains inactive until needed (synchronized backup).
212:
As an example, the TCP call blocks until a response becomes available from a remote server. This occurs every time you perform an action with a web browser. Intensive calculations cause lengthy delays with the same effect as a blocked API call.
477:
Backup requires an information-restore strategy to make backup information available on a replacement system. The restore process is usually time-consuming, and information will be unavailable until the restore process is complete.
406:
Initialized handler functions are paired with each signal when the software starts. This causes the handler function to startup when the corresponding signal arrives. This technique can be used with timers to emulate threading.
159:. The need to control software fault is one of the most rising challenges facing software industries today. Fault tolerance must be a key consideration in the early stage of 143:
To make your system more fault tolerant, you should measure 99th percentile latency and keep the remaining 1% (aka tail latencies) in check through self healing mechanisms.
663:
Eckhardt, D. E., "Fundamental Differences in the Reliability of N-Modular Redundancy and N-Version Programming", The Journal of Systems and Software, 8, 1988, pp. 313–318.
151:
The only thing constant is change. This is certainly more true of software systems than almost any phenomenon, not all software change in the same way so software
672:
Ray Giguette and Johnette Hassell, “Toward A Resourceful Method of Software Fault Tolerance”, ACM Southeast regional conference, April, 1999.
410:
In-line handler functions are associated with a call using specialized syntax. The most familiar is the following used with C++ and Java.
485:
This can also be achieved by replicating information as it is created on multiple identical systems, which can eliminate recovery delay.
86: 140:
should be combined together to make the system more fault tolerant: retry, fallback, timeout, circuit breaker, and bulkhead pattern.
58: 39: 193:(API) to access shared resources, like the keyboard, mouse, screen, disk drive, network, and printer. These can fail in two ways. 379:
in POSIX compliant systems, and these signals originate from API calls, from the operating system, and from other applications.
209:
A blocked call is a request for services from the operating system that halts the computer program until results are available.
65: 392:
The termination signal is the only signal that cannot be handled. All other signals can be directed to a handler function.
72: 648: 623: 599: 189: 105: 54: 705: 517: 463:
Backup maintains information in the event that hardware must be replaced. This can be done in one of two ways.
152: 43: 385:
The handler is a function that is performed on-demand when the application receives a signal. This is called
382:
Any signal that does not have handler code becomes a fault that causes premature application termination.
641:
Understanding Distributed Systems: What every developer should know about large distributed applications
155:
methods are designed to overcome execution errors by modifying variable values to create an acceptable
79: 700: 710: 525: 457: 502: 359: 137: 32: 560: 685: 530: 512: 507: 364: 160: 8: 535: 497: 386: 376: 170: 644: 619: 595: 126: 122: 574: 686:
Software fault tolerance, by Chris Inacio at Carnegie Mellon University (1998)
694: 156: 352:
Corrupted state will occur with timers. This is avoided with the following.
166:
There exist different mechanisms for software fault tolerance, among which:
21: 125:
to continue its normal operation despite the presence of system or
540: 452: 448:
Hardware fault tolerance for software requires the following.
592:
Kubernetes Native Microservices with Quarkus and MicroProfile
133:
has the ability to satisfy requirements despite failures.
46:. Unsourced material may be challenged and removed. 692: 395:Handler functions come in two broad varieties. 216:There are two methods used to handle blocking. 575:"Portable and Fault Tolerant Software Systems" 331:Un-threaded languages include the following. 187:Computer applications make a call using the 182: 231:Threaded languages include the following. 467:Automatic scheduled backup using software 106:Learn how and when to remove this message 580:. Massachusetts Institute of Technology. 638: 693: 44:adding citations to reliable sources 15: 470:Manual backup on a regular schedule 443: 13: 679: 541:OpenSAF - Service Availability API 14: 722: 616:Acing the System Design Interview 190:application programming interface 204: 20: 146: 31:needs additional citations for 666: 657: 632: 608: 584: 567: 553: 518:Fault-tolerant computer system 1: 563:. Carnegie Mellon University. 546: 7: 491: 10: 727: 561:"Software Fault Tolerance" 522:Immunity-aware programming 55:"Software fault tolerance" 639:Vitillo, Roberto (2021). 370: 526:Logic built-in self-test 183:Operating system failure 119:Software fault tolerance 503:Built-in test equipment 131:Fault-tolerant software 178:Self-checking software 706:Software architecture 531:N-version programming 513:Fault-tolerant system 508:Fault-tolerant design 375:Fault are induced by 434:signal_handler_code; 356:Track software state 161:software development 40:improve this article 643:. Roberto Vitillo. 473:Information restore 536:Safety engineering 498:Built-in self-test 387:exception handling 175:N-version software 121:is the ability of 618:. Manning. 2024. 594:. Manning. 2022. 350: 349: 325: 324: 123:computer software 116: 115: 108: 90: 718: 701:Software quality 673: 670: 664: 661: 655: 654: 636: 630: 629: 612: 606: 605: 588: 582: 581: 579: 571: 565: 564: 557: 444:Hardware failure 334: 333: 234: 233: 111: 104: 100: 97: 91: 89: 48: 24: 16: 726: 725: 721: 720: 719: 717: 716: 715: 711:Fault tolerance 691: 690: 682: 680:Further reading 677: 676: 671: 667: 662: 658: 651: 637: 633: 626: 614: 613: 609: 602: 590: 589: 585: 577: 573: 572: 568: 559: 558: 554: 549: 494: 488: 446: 373: 207: 185: 171:Recovery blocks 153:fault tolerance 149: 138:design patterns 112: 101: 95: 92: 49: 47: 37: 25: 12: 11: 5: 724: 714: 713: 708: 703: 689: 688: 681: 678: 675: 674: 665: 656: 650:978-1838430207 649: 631: 624: 607: 600: 583: 566: 551: 550: 548: 545: 544: 543: 538: 533: 528: 523: 520: 515: 510: 505: 500: 493: 490: 475: 474: 471: 468: 461: 460: 455: 445: 442: 441: 440: 437: 436: 435: 429: 426: 423: 422: 421: 415: 404: 403: 400: 372: 369: 368: 367: 362: 357: 348: 347: 344: 341: 338: 323: 322: 320: 318: 316: 313: 310: 307: 303: 302: 299: 296: 293: 290: 287: 284: 280: 279: 276: 273: 270: 267: 264: 261: 257: 256: 253: 250: 247: 244: 241: 238: 225: 224: 221: 206: 203: 202: 201: 198: 184: 181: 180: 179: 176: 173: 148: 145: 114: 113: 28: 26: 19: 9: 6: 4: 3: 2: 723: 712: 709: 707: 704: 702: 699: 698: 696: 687: 684: 683: 669: 660: 652: 646: 642: 635: 627: 625:9781638355915 621: 617: 611: 603: 601:9781638357155 597: 593: 587: 576: 570: 562: 556: 552: 542: 539: 537: 534: 532: 529: 527: 524: 521: 519: 516: 514: 511: 509: 506: 504: 501: 499: 496: 495: 489: 486: 483: 479: 472: 469: 466: 465: 464: 459: 456: 454: 451: 450: 449: 438: 433: 432: 430: 427: 424: 419: 418: 416: 413: 412: 411: 408: 401: 398: 397: 396: 393: 390: 388: 383: 380: 378: 366: 363: 361: 358: 355: 354: 353: 346:Visual Basic 345: 342: 339: 336: 335: 332: 329: 321: 319: 317: 314: 311: 308: 305: 304: 300: 297: 294: 291: 288: 285: 282: 281: 277: 274: 271: 268: 265: 262: 259: 258: 254: 251: 248: 245: 242: 239: 236: 235: 232: 229: 222: 219: 218: 217: 214: 210: 205:Blocked calls 199: 197:Blocked Calls 196: 195: 194: 192: 191: 177: 174: 172: 169: 168: 167: 164: 162: 158: 157:program state 154: 144: 141: 139: 134: 132: 128: 124: 120: 110: 107: 99: 96:February 2011 88: 85: 81: 78: 74: 71: 67: 64: 60: 57: â€“  56: 52: 51:Find sources: 45: 41: 35: 34: 29:This article 27: 23: 18: 17: 668: 659: 640: 634: 615: 610: 591: 586: 569: 555: 487: 484: 480: 476: 462: 447: 409: 405: 394: 391: 384: 381: 374: 351: 330: 326: 230: 226: 215: 211: 208: 188: 186: 165: 150: 147:Introduction 142: 135: 130: 118: 117: 102: 93: 83: 76: 69: 62: 50: 38:Please help 33:verification 30: 420:API_call(); 399:Initialized 286:Perl 5.8.7+ 695:Categories 547:References 458:Redundancy 340:Javascript 301:Smalltalk 136:Following 66:newspapers 360:Semaphore 315:Ballerina 272:Napier 88 492:See also 365:Blocking 269:Modula 3 129:faults. 127:hardware 402:In-line 377:signals 283:pSather 278:Presto 266:Magenta 255:Erlang 220:Threads 80:scholar 647:  622:  598:  453:Backup 371:Faults 312:Unicon 306:Tcl/Tk 292:Python 252:Eiffel 223:Timers 200:Faults 82:  75:  68:  61:  53:  578:(PDF) 428:catch 240:Afnix 87:JSTOR 73:books 645:ISBN 620:ISBN 596:ISBN 337:Bash 298:Ruby 263:Lisp 260:Java 249:CILK 59:news 414:try 343:SQL 289:PHP 243:C++ 237:Ada 42:by 697:: 431:{ 417:{ 389:. 275:Oz 246:C# 163:. 653:. 628:. 604:. 439:} 425:} 309:V 295:R 109:) 103:( 98:) 94:( 84:· 77:· 70:· 63:· 36:.

Index


verification
improve this article
adding citations to reliable sources
"Software fault tolerance"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
computer software
hardware
design patterns
fault tolerance
program state
software development
Recovery blocks
application programming interface
Semaphore
Blocking
signals
exception handling
Backup
Redundancy
Built-in self-test
Built-in test equipment
Fault-tolerant design
Fault-tolerant system
Fault-tolerant computer system

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑