Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005
…
3 pages
1 file
Disaster Tolerance is the characteristic attributed to a system that can withstand a catastrophic failure and still function with some degree of normality. Disaster tolerance of computer and communication systems is described and methods for modeling this form of system robustness are described.
The purpose of this report is to outline the major concepts and developments in the area of fault tolerant computing. Both hardware and software fault tolerance issues are addressed. The topics covered include module function and system-level fault detection methods, redundancy and reconfiguration strategies, valid fault models, and coding and checking in computer systems. Software fault tolerance methods such as recovery blocks, design diversity, and checkpointing and recovery are also discussed. Major issues in modeling and evaluation of fault-tolerant systems are outlined. The design of two successful commercial systems is discussed.
Computer, 1990
Fault properties. A fault can be classi-fied by its duration, nature, and extent. The duration of a fault can be transient, inter-mittent, or permanent. A transient fault, often the result of external disturbances, exists for a finite length of time and is nonrecurring. A system with an ...
Microelectronics Reliability, 1987
In many critical applications of digital systems, fault tolerance has been an essential architectual attribute for achieving high reliability. In recent years, the concept of the performability of such systems has drawn the attention of many researchers. In this paper, we develop a general Markov model for fault tolerant computer systems. Various important performance measures, including the performability measures as well as some new performance measures, are treated in a unified manner. Futhermore general and efficient computational procedures are developed for calculating these performance measures based on the uniformization technique of Keilson(1974,1979). A numerical example is given to illustrate the computational procedures developed. XIR P.'1' VTF; 2 0,7 7P IT FTC Z3&.krCH (AFSC) :"'7tIC'E OF RNTITTTAL TO DTIC This technic, l report Mns been reviewed and is approvcd for public release IAW AFR 190-12. Distribution is unlimited.
2007 1st Annual IEEE Systems Conference, 2007
). The state of traditional disaster recovery approaches is outlined. The risks of IT application downtime attributable to the increasing dependence on critical information technology operating in interdependent, interacting complex infrastructure systems is reviewed. General disaster tolerance techniques are summarized. While content specific approaches currently undertaken to understand and avoid cascading failures in systems are extant, opportunities exist to extend this complex systems independence analysis to the private business sector in the form of disaster tolerance. The high level of complexity of relationships between IT application availability and numerous secondary and tertiary affects of a disaster on systems that are dependent on other systems for availability has not yet been fully explored.
2007 1st Annual IEEE Systems Conference, 2007
, which, if disrupted would seriously impact health, safety and security of citizens or effective functioning of governments and industries. This infrastructure system includes telecommunications, energy, banking and financial, transportation, water, healthcare, government and emergency systems. All of these systems are linked through vast physical and cyber networks which have become completely interdependent. These networks present with a multitude of distributed heterogeneous
Lecture Notes in Computer Science, 1994
An adaptive computing system is one that modi es its behavior based on changes in the environment. Since one common type of environment change in a distributed system is network or processor failure, fault-tolerant distributed systems can be viewed as an important subclass of adaptive systems. As such, use of adaptive methods for dealing with failures in this context has the same potential advantages of improved e ciency and structural simplicity as for adaptive systems in general. This paper describes a model for adaptive systems that can be applied in many failure scenarios arising in distributed systems. This model divides the adaptation process into three di erent phases|change detection, agreement, and action|that can be used as a common means for describing various fault-tolerance algorithms such as reliable transmission and membership protocols. This serves not only to clarify the logical structure and relationship of such algorithms, but also to provide a unifying implementation framework. Several adaptive fault-tolerant protocols are given as examples. A technique for implementing the model in a distributed system using an event-driven approach for composing protocols in parallel is also presented.
Proceedings of the 2009 International Conference on Computer-Aided Design - ICCAD '09, 2009
The term resilience is used differently by different communities. In general engineering systems, fast recovery from a degraded system state is often termed as resilience. Computer networking community defines it as the combination of trustworthiness (dependability, security, performability) and tolerance (survivability, disruption tolerance, and traffic tolerance). Dependable computing community defined resilience as the persistence of service delivery that can justifiably be trusted, when facing changes. In this paper, resilience definitions of systems and networks will be presented. Metrics for resilience will be compared with dependability metrics such as availability, performance, performability. Simple examples will be used to show quantification of resilience via probabilistic analytic models.
Strategic Corporate Communication in the Digital Age (Emerald), 2021
Archaeologik.blogspot.com
Early Childhood Research Quarterly, 2009
Socio-Economic Planning Sciences, 2020
Journal of Fish Biology, 2002
Journal of Pharmacy and Pharmacology, 2011
Human Molecular Genetics, 2000
BMC Infectious Diseases, 2014
Agricultural Economics Research Review, 2017
Human Genetics, 1985
Neurochemistry International, 1999
Annals of Hematology, 2020
2016 IEEE Conference on Systems, Process and Control (ICSPC)
Hypatia: A Journal of Feminist Philosophy , 2024