Adaptive Fault Diagnosis Algorithm For Controller Area Network

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO.
10, OCTOBER 2014 5527

Adaptive Fault Diagnosis Algorithm for
Controller Area Network
Supriya Kelkar, Member, IEEE, and Raj Kamal, Member, IEEE
AbstractA controller area network (CAN)-based distributed
system may develop faults at run-time. These faults need to be
detected and diagnosed. This paper proposes a new algorithm
named adaptive fault diagnosis algorithm for CAN (AFDCAN). It
is designed for low-cost resource-constrained distributed embed-
ded systems. The proposed algorithm detects all faulty nodes on
the CAN. It allows new node entry and reentry of repaired faulty
nodes during a diagnostic cycle. AFDCAN is found to provide
high fault tolerance and to ensure reliable communication. It uses
single-channel communication deploying the bus-based standard
CAN protocol. A hardware implementation of the proposed algo-
rithm has been used to obtain the results. The results show that the
proposed algorithm diagnoses all faults in the system. Analysis of
the proposed algorithm proves that the algorithm uses a denite
and bounded number of testing rounds and messages to complete
one diagnostic cycle.
Index TermsAdaptive algorithms, automotive applications,
controller area network (CAN) protocol, distributed networks,
distributed systems, fault diagnosis, real-time systems.
I. INTRODUCTION
D
IFFERENT techniques are used to handle faults in a
distributed network. Faults in the network may be due
to node failures or connection failures. Faults at nodes can
arise due to failures in the processor, in the memory, or in the
inputoutput hardware interfaces. The faults in the network can
be at the physical layer, data link layer, application layer, or
network management level [1].
Controller area network (CAN) protocol is widely used in
real-time industrial and automotive applications. There are
many other proprietary as well as standard serial protocols used
for automotive multiplexing [2], [3]. CAN handles the errors
efciently at the node level. Error connement depends upon
the behavior of the node when the node is in one of the three
states, namely, active state, passive state, or bus-off state [4].
The connement of faults is a part of the CAN protocol [4].
However, along with capabilities of fault diagnosis and fault
connement at node level, CAN-based distributed embedded
Manuscript received April 6, 2013; revised July 22, 2013 and September 8,
2013; accepted November 14, 2013. Date of publication January 2, 2014; date
of current version May 2, 2014. This work was supported by the Cummins Col-
lege of Engineering for Women, Pune, India, and the Institute of Engineering
and Technology, Devi Ahilya University, Indore, India.
S. Kelkar is with the Computer Engineering Department, Cummins College
of Engineering for Women, Pune 411052, India (e-mail: supriya.kelkar@
cumminscollege.in).
R. Kamal is with the School of Computer Science and Information Tech-
nology, Devi Ahilya University, Indore 452017, India (e-mail: dr_rajkamal@
hotmail.com).
Color versions of one or more of the gures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identier 10.1109/TIE.2013.2297296
systems need a robust fault diagnosis algorithm at network
level. This is to assure a reliable communication to achieve
expected functioning of the system. Reliability can be achieved
by providing redundancy at node level or at channel level [5]
[7]. However, this additional use of hardware increases the
cost. In a distributed embedded system, every node needs to
have knowledge about other nodes in the system. In case of
master-slave conguration, the functions are distributed among
different nodes. Every node in the system in such a situation
should be able to detect the faulty nodes in the system and
should relay this information on the network. There will be a
considerable rise in the number of diagnostic messages if every
node tries to detect the health of all other nodes. This, in turn,
increases the bus-load and slows down the communication.
There is a high probability of denial of bus access to low-
priority messages due to the increase in bus-load.
Authors address these issues and propose adaptive fault diag-
nosis algorithm for CAN (AFDCAN). Every node in AFDCAN
tests other nodes until it detects a rst fault-free node. Thus, a
node in AFDCAN does not test all other nodes in the system.
All of the fault-free nodes together detect the faulty nodes in the
system. This reduces the number of diagnostic messages on the
bus. Thus, the system diagnostic function is distributed among
the nodes. Also, redundancy in terms of extra nodes or channels
is not required [5][7].
II. EARLIER WORK AND MOTIVATION FOR THE
PRESENT WORK
CAN is a single-channel protocol [4]. It does not support
redundant bus for communication among different nodes in the
network [5]. Considerable literature is available for CAN to
achieve redundancy at channel or media level [5], [6] and at
node level [6]. Redundancy at media and node levels is a desired
feature for fault handling in safety-critical applications. Time
triggered-CAN (TT-CAN) supports channel redundancy with
the use of additional hardware [7], [8]. Time triggered protocol
supports node and channel duplication for real-time control
systems such as automotive applications [9]. An active star
topology called CANcentrate has been proposed for providing
a solution to communication failures in CAN-based systems
[10], [11]. CANcentrate uses an active hub to connect the CAN-
based nodes and prevents propagation of errors from one port
to others. However, CANcentrate requires additional hardware,
which itself may become faulty.
Automotive applications such as steer-by-wire are driven by
actuators [12]. The faulty actuators need to be switched off
within a short time span to avoid unwanted system behavior.
0278-0046 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
5528 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 10, OCTOBER 2014
Time-constrained failure diagnosis uses a subset of the total
number of processors in the system to test an actuator indepen-
dently [12]. The individual test results are exchanged among
these processors, and the fault-free processors correctly identify
the actuator fault. Here, more processors are required for testing
[12]. Another approach for diagnosing faults in automotive
applications is based on condensed diagnosis instead of global
diagnosis [13]. Condensed diagnosis includes diagnostic in-
formation of components, inputs, and signals for a unit or an
agent along with a scalar variable representing removed com-
ponents. Using condensed local diagnosis of different agents,
the condensed global diagnosis of the system can be computed.
Arogeti et al. [14] have proposed a new fault detection and
isolation technique for an electrohydraulic steering system of
an electric vehicle. They have proposed a hybrid bond graph
modeling technique and have derived the global analytical
redundancy relations [14]. Vong et al. [15] have proposed a new
framework for simultaneous-fault diagnosis. They have used
pairwise probabilistic multilevel classication based on time-
dependent patterns and have applied the framework to diagnose
automotive engine-ignition systems. The effects of faults on
the estimation of schedulability and reliability are illustrated
in [16]. Authors have applied the outcome of the analysis on
a distributed antilock breaking system of an automotive [16].
Networked control systems (NCSs) are used in applica-
tions such as automotive, manufacturing processes, etc. [17].
Jiang et al. have considered the Takagi-Sugeno fuzzy model
for the NCSs and have proposed H
lter design with respect

to transfer delays and packet loss [17]. Yang et al. [18] have
modeled the NCSs as nonlinear switched systems and have
addressed the problem of system stabilization. Their results
have been applied to multiagent systems with leader-follower
structure, especially when the follower agent may not cooperate
with the leader and run away from a target or a goal [18].
Data-driven approach is used for fault diagnosis in automo-
tive domain, in manufacturing systems, and in process control
applications. Faults are detected and isolated using a multivari-
ate statistical technique, namely, principle component analysis
(PCA) and pattern classier techniques such as support vector
machines, probabilistic neural network, and K-nearest neighbor
[19]. PCA and independent component analysis (ICA) are used
together to detect the faults in operating parameter identiers
which are collected through various sensors and diagnostic
routines situated in the electronic control units (ECUs) [20].
ICA is better suited than PCA for fault detection in automo-
tive applications. ICA is more effective for automotive subsys-
tem data of non-Gaussian nature [21]. Although PCA is used
for fault isolation, it is not well suited for task discrimination
[22], [23]. PCA shows limited discrimination or classication
when data for both normal operation and fault operation are
available [23]. Among correspondence analysis (CA) and PCA,
better results are observed with the former [22], [24].
Kelkar and Kamal conclude that these data-driven methods
are useful in cases where the system is multivariate and the data
set available for fault detection is very large. Techniques for
multivariate statistical analysis such as PCA tend to reduce the
number of variables in a data set, thus limiting the dimensions
of the data. Statistical techniques such as PCA, ICA, and CA
TABLE I
COMPARISON OF DISTRIBUTED ALGORITHMS
are applied on normal or fault diagnosis data acquired through
data loggers. Analyses using these techniques are carried out
outside the actual control units or systems.
A number of algorithms have been suggested in the area
of fault diagnosis in distributed networks. Table I shows the
comparison of some of these algorithms [25][27]. Distributed
system-level diagnosis algorithm for arbitrary networks such
as point-to-point, broadcast, or combination of both has been
proposed [28]. Here, a fault-free node is made responsible to
get tested by another fault-free node. The tests are conducted
periodically, and the node transmits diagnostic information to
all of its neighbors about detection of a new node or a repaired
faulty node. This information is received simultaneously by all
of the neighbors. A hierarchical adaptive distributed system-
level diagnosis algorithm (Hi-ADSD) uses clusters containing
nodes, where the number of nodes in a cluster is always a
power of two [29]. Here, a node can get tested by more than
one node, and tests are conducted asynchronously. The number
of test rounds required for Hi-ADSD [29] is less than that for
adaptive DSD [27]. There is also an improvement in diagnosis
latency [29].
Comparison-based model is another approach for fault di-
agnosis in multiprocessor systems. In this model, redundant
tasks are executed at two processors, and the outcomes of the
tasks are sent to a central observer which, in turn, compares the
outcomes to nd the faulty processor [30], [31]. The processors
themselves compare the outcomes of the tasks executed by
two other processors instead of the central observer [32]. In
generalized comparison model (GCM), the comparison is done
by one of the two processors being tested [33].
The outcomes are also sent to the central observer, even
though the processors themselves perform the comparison [32],
[33]. The outcomes of the tasks executed by two processors are
also sent to all other processors in the broadcast comparison
KELKAR AND KAMAL: ADAPTIVE FAULT DIAGNOSIS ALGORITHM FOR CONTROLLER AREA NETWORK 5529
Fig. 1. System under consideration for fault diagnosis.
model [34]. Articial neural network (ANN) based GCM
model employs backpropagation neural networks to diagnose
the faults in multiprocessor or multicomputer systems [35].
These ANN-based techniques require considerable memory
and processing power.
SAE J1939 protocol uses CAN bus at physical layer. This
protocol is widely used in automotive applications and also
supports extensive diagnostic features. SAE J1979 is the SAE
guideline for On-Board-Diagnostics-II (OBD-II) [36]. Specic
online and ofine diagnostic features for different categories of
automotive are available in SAE documents.
While working with distributed embedded systems, authors
felt a need to incorporate an effective fault detection technique
in these systems. For example, in the process control applica-
tions, some nodes or microcontroller-based systems may fail
during operation. Such information of failed nodes needs to be
known to all other nodes in the network. For this purpose, the
methodologies used in distributed computer systems [25][29]
have been found suitable. These methods have been used for
hybrid or star networks where fault detection algorithms are
mostly adaptive and parallel.
The adaptive algorithm does not use a xed testing scheme
but adapts to the current fault situation in the system, and the
nodes are tested accordingly. In distributed computer systems,
diagnostic tests are carried out by different nodes simulta-
neously and hence are termed as parallel. A fault detection
method using parallel diagnostic tests would have been stressful
to the distributed embedded systems which use bus topology.
Hence, in this paper, the authors have proposed an adaptive
sequential algorithm for fault detection in bus-based distributed
embedded systems, named as AFDCAN. It is an adaptive fault
detection algorithm, which resides inside the nodes along with
the applications and detects faults in the system as and when
they occur.
AFDCAN uses distributed network approach for detecting
faults. AFDCAN is designed for CAN-based embedded dis-
tributed networks which specically use bus topology. This
algorithm needs comparatively less memory, and the number
of messages required for a diagnostic cycle is optimized.
AFDCAN can be adapted for other automotive protocols such
as local interconnect network and SAE J1939 with a few
modications. In the present work, focus is on nding the faulty
nodes in the network and, in turn, informing the statuses of
faulty nodes to all of the healthy nodes. As mentioned earlier,
different approaches have been used for fault diagnosis in
CAN-based systems. These approaches include additional use
of hardware in case of CANcentrate [10], specic domains like
automotive using SAE J1979 [36], time-constrained fault diag-
nosis [12], condensed diagnosis [13], generic distributed mul-
ticomputer systems [25][29], etc. To the best of the authors
knowledge, specically in the CAN protocol, fault detection
techniques for a bus topology have not been proposed earlier.
The remainder of this paper is organized as follows.
Section III proposes the new AFDCAN algorithm. Section IV
provides its modeling and analysis. Section V proposes fur-
ther modications to AFDCAN. Comparison of AFDCAN
with other distributed diagnosis algorithms is presented in
Section VI. Section VII presents important experimental results
for different fault cases when implemented using the hard-
ware. This is followed by Section VIII giving the conclusion.
The mathematical representation of the AFDCAN algorithm is
presented in the Appendix.
III. AFDCAN
A. System Details, Assumptions, and Fault Model
for AFDCAN
AFDCAN algorithm uses single bidirectional channel and
bus-based CAN topology. Let V be the set containing n number
of nodes in the system, where N
1
is the rst node, N
2
is the
second node, etc. Then
V = {N
1
, N
2
, N
3
, N
4
, . . . , N
n
} . (1)
The following assumptions are considered for fault condi-
tions in AFDCAN.
1) The node is assumed faulty when it stops functioning.
2) There can be one or more faulty nodes in the system,
where faults are bounded.
3) The number of faulty nodes in the system is bounded
by (n 2), where n is the total number of nodes in the
system.
4) A node can become faulty any time during the diagnostic
cycle. This fault condition of the node may be temporary
or permanent.
5) Fluctuations or changes in the status of a node are not
detected during the same diagnostic cycle if that node has
already been tested. However, the change in status of that
node will be detected in the next diagnostic cycle.
6) Every node is tested by only one other node in a diagnos-
tic cycle.
Fig. 1 shows a generic system utilizing the AFDCAN algo-
rithm for fault diagnosis. AFDCAN uses the following fault
model for the CAN network. The fault model denes the
outcome of the test, i.e., response from the fault-free node or
from the faulty node when the test message is sent by another
fault-free node.
Fig. 2. Format of the buffer at every node.
Fig. 3. Test/result/second result/broadcast frame.
Let t(n
i
, n
j
) be the test performed by node n
i
on node n
j
.
Then, result R
e
of t(n
i
, n
j
) will be
R
e
= 0, if n
i
is fault free and n
j
is faulty;
R
e
= 1, if both n
i
and n
j
are fault free;
n
i
cannot test n
j
and evaluate n
j
as fault free if n
i
itself is faulty.
In the present algorithm, the faulty node stops functioning,
and therefore, it is unable to communicate with the other nodes.
B. Details of the AFDCAN Algorithm
A node continues to diagnose until it nds a fault-free
node. This approach is similar to the adaptive DSD algorithm
proposed for distributed systems [27].
Each node is assigned a unique node identication (NID).
Each node maintains a buffer (NBUFF) containing n number of
elds, where n is the total number of nodes in the system. The
NBUFF of each node consists of individual elds containing
node ID and state for all of the nodes in the system. The state
indicates whether the node is faulty, is fault free, or is in an
undened state. Fig. 2 shows NBUFF of a node in a system
containing ve nodes. A fault-free node in the network initiates
the AFDCAN algorithm by sending the test frame.
The format for different types of frames used in AFDCAN is
shown in Fig. 3. There are 8 B in the CAN data frame allocated
to data eld. Fig. 3 shows these 8 B and their use in AFDCAN.
A node can be a tester which tests the other nodes or it can
be a testee which gets tested by any one fault-free node. While
sending the test frame, the NID of the tester becomes the source
node identication (SID), and the NIDof the testee becomes the
destination node identication (DID).
Any node that does not receive a test frame as a testee within
a specic time becomes the tester and initiates the diagnosis
process. The testee receives the test frame if it is not faulty.
It marks the tester as fault free in its buffer. Also, it updates
its buffer with the diagnostic information of all of the nodes
preceding the tester. This status information is available in
the test frame sent by the tester. Then, the testee will send
the result frame to the tester. After receiving the result frame
from the testee within timeout, the tester reads the diagnostic
information fromthe result frame. As read fromthe result frame
sent by the testee, the tester updates its buffer, marking its own
status as fault free. Also, the tester marks the testee as fault free
in its buffer. This communication is called test round between
two nodes. This is shown in Fig. 4.
The aforementioned test rounds continue until the last node
in the system is tested. The last fault-free node sends the second
result frame to the earlier fault-free node after all of the test
rounds are completed. After the reception of second result
frame, the (m 1)th fault-free node transmits the broadcast
frame to all of the nodes, which contains the complete diagnosis
report of the network. Here, m n, with n being the total
number of nodes in the system and mth node being the last
fault-free node in the system. This is the end of one diagnostic
cycle as shown in Fig. 4. These diagnostic cycles are periodic
in nature.
In Fig. 5, as discussed earlier, the test round will be
completed between N
1
and N
2
, N
2
and N
3
, and N
3
and N
4
.
However, when N
4
sends the test frame to N
5
, node N
5
, being
faulty, does not respond and thus does not send the result frame
to N
4
. Therefore, N
4
marks N
5
as faulty in its buffer and then
sends the second result frame to N
3
. Now, N
3
will update its
buffer for the nal result and will send the broadcast frame to all
of the nodes so that all of the fault-free nodes will update their
buffers with the complete diagnostic information of the system.
This is the reason why the second result frame is transmitted in
the AFDCAN algorithm.
C. Entry of New Nodes and Reentry of Faulty Nodes
After Repairs
AFDCAN supports new node entry and its participation in
the fault diagnostic cycle.
When a new node is powered up, it needs to listen to the
bus. The new node sends the new node entry frame after the
detection of a result frame on the bus. All of the nodes which are
part of the diagnostic cycle receive the new node entry frame.
These nodes recognize the arrival of the new node and allocate
a eld in their buffer for this node. The allocated eld in the
buffer is then updated with the node ID and state of the new
node. Thus, the new node starts participating from the next
diagnostic cycle.
Fig. 6 shows the format of the entry frame for a new node.
Faulty nodes can also reenter the network and participate in
the diagnostic cycle after they are repaired. A faulty node can
join the current or ongoing diagnostic cycle if its tester has not
yet sent the test frame to it. Otherwise, it starts participating
from the next diagnostic cycle. The number of faulty nodes is
bounded by (n 2).
In the Appendix, Section A presents the algorithm for a test
round between the tester and the testee, Section B presents the
algorithms for different timeout conditions, and the algorithm
for the new node entry is presented in Section C.
IV. MODELING AND ANALYSIS OF THE AFDCAN
ALGORITHM
The system considered by AFDCAN is represented by a
directed multigraph G, where G = (V, E) consists of a set V
Fig. 4. Complete diagnostic cycle when all of the nodes are fault free.
Fig. 5. Complete diagnostic cycle when node N
5
is faulty.
Fig. 6. Entry frame for a new node.
of vertices, a set E of edges, and a function f from E to
{(u, v)|u, vCV }. The edges e
1
and e
2
are multiple edges if
f(e
1
) = f(e
2
)[37].
In the graph, the vertices represent the nodes, and the
edges represent the communication paths. These communica-
tion paths are unidirectional. Fig. 7 is the graphical representa-
tion of the systemunder consideration for a case where there are
no faulty nodes. T represents the test frame sent by the node,
R, the result frame sent by the node, 2R, the second result
frame sent by the node, and B, the broadcast frame sent by
the node. It may be noted that these edges are directed outward
from the nodes.
Consider nodes N
3
and N
4
shown in Fig. 7. As there are
two edges, namely, result and broadcast, directed from N
4
to N
3
, one can say that (N
4
, N
3
) is an edge of G(V, E),
with f(e
1
) = f(e
2
) = (N
4
, N
3
). Similarly, for nodes N
4
and
N
5
, it can be proved that (N
5
, N
4
) is an edge of G(V, E),
with f(e
1
) = f(e
2
) = (N
5
, N
4
). Thus, it is proved that Fig. 7
represents a multigraph. Fig. 8 represents the multigraph of the
system with fault condition.
The system under consideration has been proved to be a
planar graph for all fault conditions with (n 2) as bounded
Fig. 7. System with no faulty node.
Fig. 8. System with node N
2
faulty.
condition for faulty nodes. That means that the system with ve
nodes is a planar graph with the maximum allowed number of
faulty nodes being three.
Fig. 9. Adjacency matrix for out-degree of nodes, when node N
2
is faulty.
The following discussion provides the proofs for a lemma
and two theorems.
Lemma: There is a directed path from any fault-free node to
any other fault-free node. This leads to following theorems.
Theorem 1: The number of test rounds for one complete
diagnostic cycle is denite and bounded.
Theorem 2: The number of diagnostic messages required for
a complete diagnostic cycle is denite and bounded.
The proofs for the aforementioned lemma and theorems are
given in Section IV-(A)(C), respectively.
A. Proof for the Lemma
Fig. 9 shows the adjacency matrix N for out-degree of nodes
N
1
, N
2
, N
3
, N
4
, and N
5
. N represents the case where node N
2
is faulty as shown in Fig. 8.
In an adjacency matrix N
N =[a
ij
] then
a
ij
=1, if there is an edge from N
i
to N
j
, i, j =1, 2 . . . 5
a
ij
=0 otherwise [37].
For Fig. 8, when a
ij
= 0, no directed path exists from i to j,
and when a
ij
= 1, a directed path exists.
The rows of the adjacency matrix represent nodes N
1
, N
2
,
N
3
, N
4
, and N
5
, respectively. In the adjacency matrix shown
in Fig. 9, where the elements having values of 1 indicate that a
directed path exists and 2 indicate that more than one path exist
between the nodes. When all of the elements in ith row are zero,
N
i
th node is faulty. The second row (row N
2
) of Fig. 9 contains
all zeros indicating N
2
being a faulty node. The following are
some of the observations regarding the fault conditions in the
system after analyzing the adjacency matrix (Fig. 9).
1) If a particular row contains all of the elements as zero, the
corresponding node is a faulty node.
2) The elements above the main diagonal of the adjacency
matrix provide the information about the directed path
between all of the fault-free nodes of the system.
B. Proof for Theorem 1
One test round in AFDCAN consists of two messages be-
tween any two fault-free nodes. They are the test message and
the result message.
With n being the total number of nodes in the system, the
number of test rounds t
R
will be (n 1) for the fault-free
system.
If there are f number of faulty nodes in the system, then
t
R
= n 1 f. (2)
There is an additional result message sent from mth fault-
free node to (m 1)th fault-free node, where m n and m
is the last fault-free node in the system with f faulty nodes.
This constitutes half of t
R
. Also, (m 1)th fault-free node
broadcasts the result which contains the complete diagnosis
report of the network.
For the fault-free system, the number of broadcasting mes-
sages will constitute {(n 1)/2} test rounds, where (n 1)
is the total number of broadcasted messages from (m 1)th
fault-free node. Also, there are test messages sent to f faulty
nodes, where these nodes do not respond and thus do not com-
plete the test round. This constitutes {(n 2)/2} test rounds,
where (n 2) is the maximum number of faulty nodes. Thus,
there is a need to consider all of the aforementioned additional
communication edges as part of the diagnostic cycle. These
communication edges will also contribute to (2).
Let c
e
be the total of aforementioned additional communi-
cation edges. Also, t
R
is called as t
RPair
, which represents
the total number of pairs of messages inclusive of all of the
communication edges for a single diagnostic cycle in a system.
The maximum number of pairs of messages in one complete
diagnostic cycle of the system is represented by t
RPair(max)
.
Therefore
t
RPair(max)
= n 1 f + c
e
(3)
where
c
e
=
n 1
2
+
1
2
+
n 2
2
and c
e
= Total of all of the additional communication edges as
described previously.
Therefore
t
RPair(max)
= 2n f 2. (4)
Consider the system consisting of ve nodes, with three
nodes detected as faulty. In this case, as per (4), there will be
maximum ve pairs of messages (t
RPair(max)
) in one complete
diagnostic cycle. Equation (4) satises all of the 26 fault
conditions for the system with ve nodes.
C. Proof for Theorem 2
Let t
m
be the total number of test messages sent, r, the total
number of result messages sent, and (n 1), the total number
of broadcasted messages.
Then, the total number of messages (m
s
) in one complete
diagnostic cycle is
m
s
= t
m
+ r + (n 1). (5)
If there are f number of faulty nodes in the system, then
r = n f. (6)
Therefore
m
s
= t
m
+ 2n f 1. (7)
Equation (7) provides the exact number of messages required
for one complete diagnostic cycle for the fault-free or fault
Fig. 10. Arrow diagram for the system with nodes N
1
and N
3
faulty.
condition of the system. The value of m
s
can also be obtained
from the arrow diagram. The arrow diagram shown in Fig. 10
represents the condition where nodes N
1
and N
3
are faulty.
The in-degree d
in
(G) gives the exact number of messages for a
complete diagnostic cycle. For the system with ve nodes, the
values of m
s
can be obtained for all of the 26 fault conditions
using arrow diagrams similar to Fig. 10.
Let m
smax
be the upper bound for the number of messages
in one complete diagnostic cycle. It is known from earlier
discussion that
t
m
(n 1). (8)
Thus, by adding (2n f 1) on both sides of (8), we get
m
smax
= 3n f 2. (9)
Thus, the total number of messages in one complete diagnos-
tic cycle is bounded and also denite.
V. PROPOSED MODIFICATIONS IN AFDCAN
The diagnostic cycle in AFDCAN is periodic, and the algo-
rithm checks each and every node during the diagnostic cycle.
A node is tested again even if it is detected as faulty in the
earlier cycle. This technique increases the time required for
the completion of a diagnostic cycle, as the earlier node waits
for the response from the next faulty node until timeout. A
solution to this additional latency would be in avoiding the
testing of the faulty nodes in the next diagnostic cycles. A
fault-free node can check its buffer to identify faulty nodes, if
any, for the currently completed diagnostic cycle. During the
next diagnostic cycle, a fault-free node can avoid testing these
faulty nodes. Thus, the diagnostic time can be reduced. This
will improve the performance of AFDCAN. In such a case, the
faulty node after repair can enter into the diagnostic cycle by
sending the entry frame for the repaired faulty node.
The entry frame for the repaired faulty node will contain SID
specifying the node ID, DID as zero, and frame type equal to
0101. The entry frame for the repaired faulty node will be
similar to the entry frame for the new node (Fig. 6) but with
the frame type bits as 0101. Also, the buffer writing at every
node can be avoided if the current fault status of a tested node
is the same as that found in the buffer.
Presently, the number of faulty nodes in the system is
bounded by (n 2). This bound can be made equal to (n 1)
with minor changes in the algorithm. Thus, AFDCAN can
operate even if there is only one fault-free node in the system.
TABLE II
COMPARISON OF DISTRIBUTED FAULT DIAGNOSIS ALGORITHMS
VI. COMPARISON OF AFDCAN WITH OTHER
FAULT DIAGNOSIS ALGORITHMS FOR
DISTRIBUTED NETWORKS
The AFDCAN algorithm is based on the CAN protocol.
In this section, the authors compare AFDCAN with other
algorithms [25][29] for fault diagnosis in distributed networks.
These algorithms [25][29] are discussed in Section II.
AFDCAN is a fault diagnosis algorithm for distributed em-
bedded systems, whereas the algorithms in [25][29] have been
implemented on Ethernet-based distributed computer systems.
AFDCAN is an adaptive fault diagnosis algorithm, and the al-
gorithms in [25][29] also are of adaptive type. The algorithms
in [25][29] are executed in parallel by the computer systems,
whereas the tests in AFDCAN are executed sequentially. How-
ever, in AFDCAN, the transmission of the nal result to all of
the nodes is done at the same time, which means in parallel.
The fault model used in AFDCAN is asymmetric as dis-
cussed in Section III-A. Presently, in AFDCAN, the faulty node
does not respond to the test message sent by the other node and
does not understand the nal result. This is because AFDCAN
considers the faulty node as ceased to function, as in the fail-
stop model discussed in [28]. The fault model is symmetric [27]
for algorithms described in [25][27], and [29]. A comparison
of AFDCAN and other algorithms is presented in Table II.
VII. RESULTS AND DISCUSSIONS
A. System With Five Nodes
The AFDCAN algorithm has been implemented using ve
embedded hardware units based on Renesas M16C/6N group
microcontroller [38]. The baud rate used for CAN data transfer
is 125 kb/s. All 26 conditions for fault diagnosis of the system
containing ve nodes are considered and veried. Figs. 11
and 12 show two fault conditions detected by AFDCAN. The
marked area in each of these gures indicates one complete
Fig. 11. All of the nodes are fault-free.
Fig. 12. Nodes N
2
, N
3
, and N
4
are faulty.
diagnostic cycle. Lines 1 and 2 of the marked areas show one
test round between two nodes. The different cases which make
up 26 conditions for fault diagnosis in the system with ve
nodes are as follows: 1) no faulty node in the system; 2) one
faulty node in the system; 3) two faulty nodes in the system;
and 4) three faulty nodes in the system.
The bus activity is captured using Vector CANalyzer, and the
results have been veried.
B. Entry of New Nodes and Reentry of Faulty Nodes
Entry of new node and reentry of repaired faulty nodes
are also veried experimentally for AFDCAN as shown in
Figs. 1316. The system with four nodes N
1
, N
2
, N
3
, and N
4
is considered. Node N
5
is used to test the entry of a new node.
Fig. 13. New node N
5
entry.
Fig. 14. New node N
5
is part of the diagnostic cycle.
Fig. 13 shows the entry of N
5
, and Fig. 14 shows N
5
becoming
part of the diagnostic cycle. Node N
5
is considered as a faulty
node for testing the reentry of the repaired faulty node. N
5
is
powered off to indicate the fault condition (Fig. 15), and later,
N
5
is powered on again to indicate the reentry in the system
(Fig. 16).
C. Diagnostic Cycle Time of AFDCAN
AFDCAN is executed on the hardware as explained earlier in
Section VII-A at 125-kb/s baud rate, with all of the ve nodes
being fault free. The bus activity of the complete diagnostic
cycle captured on CAN bus is shown in Fig. 11 as marked area.
The time taken for completion of one diagnostic cycle of Fig. 11
is 560 ms.
Fig. 15. Faulty node N
5
is powered off.
Fig. 16. Faulty node N
5
reenters the diagnostic cycle after repair.
Authors have further improved this diagnostic cycle time
without affecting the AFDCAN algorithm with the same baud
rate. The improved diagnostic cycle time for completion of one
diagnostic cycle is 526 ms, as shown in the marked area of
Fig. 17.
When the AFDCAN algorithm is executed at 500-kb/s baud
rate, the diagnostic cycle time is found to be 526 ms. This
diagnostic cycle time is the same as when AFDCAN is exe-
cuted at 125 kb/s. As mentioned earlier, while implementing
AFDCAN, authors have made use of the local timer module
of the microcontroller at every node. In order to achieve syn-
chronization between all of the nodes in the 26 fault conditions,
it was required to provide a time window for checking the re-
ception of different AFDCAN frames at every node. Therefore,
the diagnostic cycle time depends on the timer at every node
and is not affected by the baud rate of the CAN bus.
Fig. 17. Improved diagnostic cycle time when all of the ve nodes are fault
free.
D. Support for Large Number of Nodes in AFDCAN
The CAN data frame has 64 b (8 B) of data eld along
with other elds. Data eld is used for sending information
on the CAN bus. Out of 8 B of data eld, 2 B are required
for specifying SID and DID, respectively (Fig. 3). Also, 4 b
are required for specifying frame type (Fig. 3). Therefore, the
remaining 44 b can be used for holding diagnostic data of the
nodes present in the system. Two bits per node are required to
represent the state of the node (Fig. 3) in AFDCAN. Hence,
AFDCAN supports 22 nodes in the system, including entries
of new nodes. Thus, a large number of nodes can be part
of the fault diagnosis system. This feature is useful for both
automotive and industrial applications.
VIII. CONCLUSION
The AFDCAN algorithm uses a denite number of test
rounds and sends a denite number of messages to nd the fault
conditions in the CAN-based distributed embedded system.
Therefore, AFDCAN uses a denite bandwidth, based on total
number of nodes in the system. The number of test rounds
and the number of messages decrease with the increase in the
number of faulty nodes.
AFDCAN also supports the entry of new nodes and reentry
of repaired faulty nodes, as demonstrated in Section VII-B. The
failure of the response is detected by the testee or tester with
the help of timeout.
Thus, AFDCAN uses denite time for fault diagnosis of the
system. The improved diagnostic cycle time of AFDCAN is
526 ms, when all of the ve nodes are fault free.
Looking further, synchronization of timings at different
nodes in the system is required for better performance of the
AFDCAN. Also, the time taken by AFDCAN diagnostic cycle
can be reduced by implementing the primary message generator
[6]. The improvements proposed in Section V may also be
considered.
TT-CAN is a higher layer protocol above the standard CAN.
Standard CAN messages are sent in TT-CAN [7]. These mes-
sages are transmitted at a specic time slot, and thus, they
do not compete with other CAN messages for bus access.
Communication schedules of all CAN nodes are synchronized
by TT-CAN for a network. By designing the schedules of dif-
ferent CAN nodes for different AFDCAN diagnostic messages,
AFDCAN may be adopted for TT-CAN.
AFDCAN may become part of CAN-based automotive net-
works. In automotive, both periodic and aperiodic messages
may appear on the CAN bus. It is essential to meet the trans-
mission deadline for each message. The AFDCAN messages
need to be scheduled carefully in such a way that they do not
interfere in the existing schedules of message transfers at every
node. This will ensure the real-time transfer and safety aspects
of CAN messages in automotive.
In automotive, the CAN messages are sent or received by
nodes or ECUs of CAN network. These messages are given
priority with the help of the identier present in the CAN data
frame. For a CAN message, the lower the identier, the higher
is the priority assigned. As far as AFDCAN is concerned, the
priority of the diagnostic messages is decided based on the
priority of the existing periodic and aperiodic messages. While
allocating identiers to the diagnostic messages for AFDCAN,
the transmission deadlines of the periodic and aperiodic mes-
sages need to be considered.
The collective diagnostic information of all of the nodes in
AFDCAN is available with any fault-free node in the system.
The corrective action can be taken, or the faults can be reported
depending on the severity of the problem. Thus, the use of
AFDCAN gives a new proposition for network diagnosis in
applications such as automotive and industrial automation.
APPENDIX
A. Algorithm to Find First the Fault-Free Node by the Tester
1) Initialization;
2) Make test frame, TFRM.
TFRM = {NID} {DID} {Tbit}
n
j=1
{TBUFF
1,j
}
where,
n
j=1
{TBUFF
1,j
} =
n
j=2k
{NBUFF
1,j
} k = 1, 2 . . . n.
3) Send TFRM to testee.
4) Wait until timeout.
5) If result frame is received within timeout, then, read the
received frame in RFRM.
a. If frame type bits found in RFRM indicate R bits
(0001) then, modify node buffer as follows:
n
j=2k
{NBUFF
1,j
} =
j=l+m
{RFRM
1,j
} where,
k=1, 2 . . . n, m=1, 2 . . .(n1), l ={|SID|+|DID|+2}.
b. Mark testee as fault-free, NBUFF
1,2DID
= 1.
c. Exit.
6) If result frame is not received by tester from testee within
timeout, increment faulty node counter by one.
7) Check for destination ID eld (DID) for following condi-
tions:
a. DID less than n;
b. DID equal to n.
8) If condition (7.a) is true, then,
a. NBUFF
1,2DID
= 0 and make DID = DID + 1.
b. TFRM={NID}{DID}{Tbit}
n
j=1
{TBUFF
1,j
}
where,
n
j=1
{TBUFF
1,j
} =
n
j=2k
{NBUFF
1,j
}
where, k = 1, 2 . . . n.
c. Send TFRM to testee and exit.
9) If condition (7.b) is true, then,
RFRM={NID}{DID}{2Rbit}
n
j=1
{TBUFF
1,j
},
where DID is the NID of the earlier fault-free node and
n
j=1
{TBUFF
1,j
} =
n
j=2k
{NBUFF
1,j
}
where, k = 1, 2 . . . n.
10) RFRM is 2nd result frame. Send it to the earlier fault-free
node.
B. Algorithm for the Following Timeout Conditions in
AFDCAN
1) Algorithm for steps to be taken when test frame sent from
the tester is not received by the testee, within timeout.
a. Update node buffer as:
2(NID1)
j=2k
{NBUFF
1,j
}=0, where, k=1, 2 . . .(NID1).
b. Increment faulty node counter by one.
c. Send test frame to next node.
2) Second result frame not received by tester or broadcast
frame not received by the node.
a. j = 2k, {NBUFF
1,j
} = u, where, u = 2
d
indicating
undened state of the node and k = 1, 2 . . . n.
b. Start new diagnostic cycle.
C. Algorithm for Sending the New Node Entry Frame to All of
the Nodes in the System
(Refer to Fig. 6)
1) TFRM = {NID} {0} {NERbit}.
2) Send TFRM to all of the nodes in the system after sensing
a result frame on the bus.
REFERENCES
[1] L. Cauffriez, J. Ciccotelli, B. Conrardc, and M. Bayartc, Design of
intelligent distributed control systems: A dependability point of view,
Reliab. Eng. Syst. Safety, vol. 84, pp. 1932, 2004.
[2] S. Kelkar and R. Kamal, Control area network based quotient remainder
compression-algorithm for automotive applications, in Proc. 38th Annu.
IEEE IECON, Montreal, QC, Canada, Oct. 2012, pp. 30303036.
[3] S. Kelkar and R. Kamal, Comparison and analysis of quotient remain-
der compression algorithms for automotives, in Proc. IEEE INDICON,
Kochi, India, Dec. 2012, pp. 802807.
[4] Robert Bosch GmbH, Ver. 2.0 Controller Area Network (CAN)Protocol
Specication 1991, Robert Bosch GmbH, Ver. 2.0.
[5] M. J. Short and M. J. Pont, Fault-tolerant time-triggered communication
using CAN, IEEE Trans. Ind. Informat., vol. 3, no. 2, pp. 131142,
May 2007.
[6] J. R. Pimentel and J. A. Fonseca, FlexCAN: A exible architecture for
highly dependable embedded applications, in Proc. 3rd Int. Workshop
Real-Time Netw., Italy, 2004. [Online]. Available: http://paws.kettering.
edu//~jpimente/excan/FlexCAN-architecture.pdf
[7] T. Fuhrer, B. Muller, W. Dieterie, F. Hartwich, R. Hugel, and H. Weiler,
Time triggered communication on CAN, in Proc. 7th Int. CAN
Conf., Amsterdam, Netherlands, 2000. [Online]. Available: http://www.
bosch-semiconductors.de/media/pdf_1/canliteratur/cia2000paper_1.pdf
[8] Muller, T. Fuhrer, F. Hartwich, R. Hugel, and H. Weiler, Fault tol-
erant TTCAN networks, in Proc. 8th Int. CAN Conf., Las Vegas,
NV, USA, 2002. [Online]. Available: http://www.bosch-semiconductors.
de/media/pdf_1/canliteratur/fault_tolerant_ttcan.pdf
[9] H. Kopetz and G. Grnsteidl, TTP-A protocol for fault-tolerant real-time
systems, Computer, vol. 27, no. 1, pp. 1423, Jan. 1994.
[10] B. Manuel, P. Julin, N. Guillermo, and A. Lus, An active star topology
for improving fault connement in CAN networks, IEEE Trans. Ind.
Informat., vol. 2, no. 2, pp. 7885, May 2006.
[11] M. Barranco, J. Proenza, and L. Almeida, Quantitative comparison of
the error-containment capabilities of a bus and a star topology in CAN
networks, IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 802813,
Mar. 2011.
[12] N. Kandasamy, J. P. Hayes, and B. T. Murray, Time-constrained failure
diagnosis in distributed embedded systems: application to actuator diag-
nosis, IEEE Trans. Parallel Distrib. Syst., vol. 16, no. 3, pp. 258270,
Mar. 2005.
[13] J. Biteus, E. Frisk, and M. Nyberg, Distributed diagnosis using a con-
densed representation of diagnoses with application to an automotive
vehicle, IEEE Trans. Syst., Man, Cybern. A, Syst. Humans, vol. 41, no. 6,
pp. 12621267, Nov. 2011.
[14] S. A. Arogeti, D. Wang, C. B. Low, and M. Yu, Fault detection isolation
and estimation in a vehicle steering systems, IEEE Trans. Ind. Electron.,
vol. 59, no. 12, pp. 48104820, Dec. 2012.
[15] C. M. Vong, P. K. Wong, and W. F. Ip, Anewframework of simultaneous-
fault diagnosis using pairwise probabilistic multi-label classication for
time-dependent patterns, IEEE Trans. Ind. Electron., vol. 60, no. 8,
pp. 33723385, Aug. 2013.
[16] H. A. Hansson, T. Nolte, C. Norstrom, and S. Punnekkat, Integrating
reliability and timing analysis of CAN-based systems, IEEE Trans. Ind.
Electron., vol. 49, no. 6, pp. 12401250, Dec. 2002.
[17] B. Jiang, Z. Mao, and P. Shi, H
-lter design for a class of networked

control systems via T-S fuzzy-model approach, IEEE Trans. Fuzzy Syst.,
vol. 18, no. 1, pp. 201208, Feb. 2010.
[18] H. Yang, B. Jiang, V. Cocquempot, and H. Zhang, Stabilization of
switched nonlinear systems with all unstable modes: Applications to
multi-agent systems, IEEE Trans. Autom. Control, vol. 56, no. 9,
pp. 22302235, Sep. 2011.
[19] K. Choi, J. Luo, K. Pattipati, S. M. Namburu, L. Qiao, and S. Chigusa,
Data reduction techniques for intelligent fault diagnosis in automotive
systems, in Proc. IEEE Int. Conf. Autotestcon, Sep. 2006, pp. 6672.
[20] A. Routray, A. Rajguru, and S. Singh, Data reduction and clustering
techniques for fault detection diagnosis in automotives, in Proc. 6th
Annu. Conf. Autom. Sci. Eng., Toronto, ON, Canada, 2010, pp. 326331.
[21] N. Das, A. Routray, and P. K. Dash, ICA methods for blind source
separation of instantaneous mixtures: A case study, Neural Inf. Process
Lett. Rev., vol. 11, no. 11, pp. 225246, Nov. 2007.
[22] K. P. Detroja, R. D. Gudi, S. C. Patwardhan, and K. Roy, Fault detec-
tion and isolation using correspondence analysis, Ind. Eng. Chem. Res.,
vol. 45, no. 1, pp. 223235, 2006.
[23] K. P. Detroja, R. D. Gudi, and S. C. Patwardhan, Data reduction algo-
rithm based on principle of distributional equivalence for fault diagnosis,
Control Eng. Practice, vol. 20, no. 10, pp. 10331041, Oct. 2012.
[24] S. Pusha, R. D. Gudi, and S. Noronha, Polar classication with corre-
spondence analysis for fault isolation, J. Process Control, vol. 19, no. 4,
pp. 656663, Apr. 2009.
[25] S. H. Hosseini, J. G. Kuhl, and S. M. Reddy, A diagnosis algorithm for
distributed computing systems with dynamic failure and repair, IEEE
Trans. Comput., vol. 33, no. 3, pp. 223233, Mar. 1984.
[26] A. Bagchi and S. L. Hakimi, An optimal algorithmfor distributed system-
level diagnosis, in Proc. 21st IEEE Int. Symp. Fault-Tolerant Comput.,
Montreal, QC, Canada, 1991, pp. 214221.
[27] R. P. Bianchini and R. W. Buskens, Implementation of on-line distributed
system-level diagnosis theory, IEEE Trans. Comput., vol. 41, no. 5,
pp. 616626, May 1992.
[28] S. Rangarajan, A. T. Dahbura, and E. A. Ziegler, A distributed system-
level diagnosis algorithm for arbitrary network topologies, IEEE Trans.
Comput., vol. 44, no. 2, pp. 312334, Feb. 1995.
[29] E. P. Duarte and T. Nanya, A hierarchical adaptive distributed
system-level diagnosis algorithm, IEEE Trans. Comput., vol. 47, no. 1,
pp. 3445, Jan. 1998.
[30] M. Malek, A comparison connection assignment for diagnosis of mul-
tiprocessor systems, in Proc. 7th Int. Symp. Comput. Architect., 1980,
pp. 3135.
[31] K. Chwa and S. Hakimi, Schemes for fault tolerant computing: A com-
parison of modularly redundant and t-diagnosable systems, Inf. Control,
vol. 49, no. 3, pp. 212238, Jun. 1981.
[32] J. Maeng and M. Malek, A comparison connection assignment for
self-diagnosis of multiprocessor systems, in Proc. 11th Int. Symp.
Fault-Tolerant Comput., 1981, pp. 173175.
[33] A. Sengupta and A. Dahbura, On self-diagnosable multiprocessor sys-
tems: Diagnosis by the comparison approach, IEEE Trans. Comput.,
vol. 41, no. 11, pp. 13861395, Nov. 1992.
[34] M. Blough and H. Brown, The broadcast comparison model for on-line
fault-diagnosis in multicomputer systems: Theory and implementation,
IEEE Trans. Comput., vol. 48, no. 5, pp. 470493, May 1999.
[35] E. Mourad and A. Nayak, Comparison-based system-level fault diag-
nosis: A neural network approach, IEEE Trans. Parallel Distrib. Syst.,
vol. 23, no. 6, pp. 10471059, Jun. 2012.
[36] C. A. Lupini, Vehicle Multiplex Communication-Serial Data Networking
Applied to Vehicular Engineering. Warrendale, PA, USA: SAE, 2004.
[37] K. H. Rosen, Discrete Mathematics and its Application, 5th ed.
New York, NY, USA: McGraw-Hill, 1988.
[38] Renesas M16C/6N Group Hardware manual, Oct. 2005.
Supriya Kelkar (M02) received the Bachelors
degree in electronics and communication engineer-
ing from Karnataka University, Karnataka, India, in
1989, and the Masters degree in electronics and
telecommunication engineering from Pune Univer-
sity, Pune, India, in 1999. She received the Ph.D.
degree in the area of automotive multiplex systems
from the Institute of Engineering and Technology,
Devi Ahilya University, Indore, India, in 2014.
She was a Research and Development Engineer
for electronics systems with Chromatography and
Instruments Company, Vadodara, India, for three years. She is currently an
Associate Professor with the Computer Engineering Department, Cummins
College of Engineering for Women, Pune. Her research interests include
fault diagnosis in distributed real-time systems, data compression algorithms
for automotive distributed systems, and real-time networks for industrial and
automotive applications, such as controller area networks.
Dr. Kelkar is a member of the IEEE Vehicular Technology Society.
Raj Kamal (M07) received the Doctoral degree
in the area of physics from the Indian Institute of
Technology, New Delhi, India, in 1972.
He has over 40 years of experience in research,
has published over 125 research papers, and has
taught physics, electronics, computer science, and
information technology. He carried out postdoctoral
research at Uppsala University, Uppsala, Sweden, in
19781979 and 1984. He is currently a Professor
with the School of Computer Science and Infor-
mation Technology, Devi Ahilya University, Indore,
India. He is widely recognized for his research and engineering books:
Embedded Systems (McGraw-Hill), Computer Architecture (a Schaum Se-
ries Adaptation by McGraw-Hill), Microcontrollers (Pearson Education), and
Mobile Computing (Oxford University Press).
Dr. Kamal is a member of the IEEE Computer Society.

Adaptive Fault Diagnosis Algorithm For Controller Area Network

Uploaded by

Copyright:

Available Formats

Adaptive Fault Diagnosis Algorithm For Controller Area Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaptive Fault Diagnosis Algorithm For Controller Area Network

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO.

10, OCTOBER 2014 5527

lter design with respect

-lter design for a class of networked

You might also like