Verification of Cooperative Transient Fault Diagnosis and Recovery in Critical Embedded Systems

Verification of Cooperative Transient Fault Diagnosis and Recovery in Critical Embedded Systems

Zibouda Aliouat and Makhlouf Aliouat
University of Ferhat Abbes Faculty of Sciences, Computer Science Department Sétif 19000 Algeria
 
Abstract: The faults caused by ambient cosmic radiation are a growing threat to the dependability of advanced embedded computer systems. Maintaining availability and consistency in distributed applications is one of the fundamental attribute in building complex critical systems.  To achieve this, a key factor is the ability to detect the fault and handle it by means of recovery.  Such systems can use membership protocols designed to provide this function. The objective of membership protocol is to give all entities of every node in the cluster a consistent view of the system status, all within a pre-defined time. This paper describes a formal analysis of an extension of the group membership algorithm implemented in the time-triggered protocol. The proposed extension is to allow nodes reintegration after transient fault. We provide a detailed analysis of properties of formal model of the algorithm. The paper is intended to verify the safety and liveness properties that the protocol must satisfy. The correctness of the protocol is verified by the PVS theorem prover.


Keywords: Group membership protocol, formal verification, fault-tolerant distributed algorithm, and node reintegration.



Received February 25, 2010; accepted August 10, 2010

Read 3183 times Last modified on Tuesday, 22 November 2011 02:22
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…