Adaptive Fault Diagnosis Algorithm For Controller Area Network
Adaptive Fault Diagnosis Algorithm For Controller Area Network
Adaptive Fault Diagnosis Algorithm For Controller Area Network
n 1
2
+
1
2
+
n 2
2
and c
e
= Total of all of the additional communication edges as
described previously.
Therefore
t
RPair(max)
= 2n f 2. (4)
Consider the system consisting of ve nodes, with three
nodes detected as faulty. In this case, as per (4), there will be
maximum ve pairs of messages (t
RPair(max)
) in one complete
diagnostic cycle. Equation (4) satises all of the 26 fault
conditions for the system with ve nodes.
C. Proof for Theorem 2
Let t
m
be the total number of test messages sent, r, the total
number of result messages sent, and (n 1), the total number
of broadcasted messages.
Then, the total number of messages (m
s
) in one complete
diagnostic cycle is
m
s
= t
m
+ r + (n 1). (5)
If there are f number of faulty nodes in the system, then
r = n f. (6)
Therefore
m
s
= t
m
+ 2n f 1. (7)
Equation (7) provides the exact number of messages required
for one complete diagnostic cycle for the fault-free or fault
KELKAR AND KAMAL: ADAPTIVE FAULT DIAGNOSIS ALGORITHM FOR CONTROLLER AREA NETWORK 5533
Fig. 10. Arrow diagram for the system with nodes N
1
and N
3
faulty.
condition of the system. The value of m
s
can also be obtained
from the arrow diagram. The arrow diagram shown in Fig. 10
represents the condition where nodes N
1
and N
3
are faulty.
The in-degree d
in
(G) gives the exact number of messages for a
complete diagnostic cycle. For the system with ve nodes, the
values of m
s
can be obtained for all of the 26 fault conditions
using arrow diagrams similar to Fig. 10.
Let m
smax
be the upper bound for the number of messages
in one complete diagnostic cycle. It is known from earlier
discussion that
t
m
(n 1). (8)
Thus, by adding (2n f 1) on both sides of (8), we get
m
smax
= 3n f 2. (9)
Thus, the total number of messages in one complete diagnos-
tic cycle is bounded and also denite.
V. PROPOSED MODIFICATIONS IN AFDCAN
The diagnostic cycle in AFDCAN is periodic, and the algo-
rithm checks each and every node during the diagnostic cycle.
A node is tested again even if it is detected as faulty in the
earlier cycle. This technique increases the time required for
the completion of a diagnostic cycle, as the earlier node waits
for the response from the next faulty node until timeout. A
solution to this additional latency would be in avoiding the
testing of the faulty nodes in the next diagnostic cycles. A
fault-free node can check its buffer to identify faulty nodes, if
any, for the currently completed diagnostic cycle. During the
next diagnostic cycle, a fault-free node can avoid testing these
faulty nodes. Thus, the diagnostic time can be reduced. This
will improve the performance of AFDCAN. In such a case, the
faulty node after repair can enter into the diagnostic cycle by
sending the entry frame for the repaired faulty node.
The entry frame for the repaired faulty node will contain SID
specifying the node ID, DID as zero, and frame type equal to
0101. The entry frame for the repaired faulty node will be
similar to the entry frame for the new node (Fig. 6) but with
the frame type bits as 0101. Also, the buffer writing at every
node can be avoided if the current fault status of a tested node
is the same as that found in the buffer.
Presently, the number of faulty nodes in the system is
bounded by (n 2). This bound can be made equal to (n 1)
with minor changes in the algorithm. Thus, AFDCAN can
operate even if there is only one fault-free node in the system.
TABLE II
COMPARISON OF DISTRIBUTED FAULT DIAGNOSIS ALGORITHMS
VI. COMPARISON OF AFDCAN WITH OTHER
FAULT DIAGNOSIS ALGORITHMS FOR
DISTRIBUTED NETWORKS
The AFDCAN algorithm is based on the CAN protocol.
In this section, the authors compare AFDCAN with other
algorithms [25][29] for fault diagnosis in distributed networks.
These algorithms [25][29] are discussed in Section II.
AFDCAN is a fault diagnosis algorithm for distributed em-
bedded systems, whereas the algorithms in [25][29] have been
implemented on Ethernet-based distributed computer systems.
AFDCAN is an adaptive fault diagnosis algorithm, and the al-
gorithms in [25][29] also are of adaptive type. The algorithms
in [25][29] are executed in parallel by the computer systems,
whereas the tests in AFDCAN are executed sequentially. How-
ever, in AFDCAN, the transmission of the nal result to all of
the nodes is done at the same time, which means in parallel.
The fault model used in AFDCAN is asymmetric as dis-
cussed in Section III-A. Presently, in AFDCAN, the faulty node
does not respond to the test message sent by the other node and
does not understand the nal result. This is because AFDCAN
considers the faulty node as ceased to function, as in the fail-
stop model discussed in [28]. The fault model is symmetric [27]
for algorithms described in [25][27], and [29]. A comparison
of AFDCAN and other algorithms is presented in Table II.
VII. RESULTS AND DISCUSSIONS
A. System With Five Nodes
The AFDCAN algorithm has been implemented using ve
embedded hardware units based on Renesas M16C/6N group
microcontroller [38]. The baud rate used for CAN data transfer
is 125 kb/s. All 26 conditions for fault diagnosis of the system
containing ve nodes are considered and veried. Figs. 11
and 12 show two fault conditions detected by AFDCAN. The
marked area in each of these gures indicates one complete
5534 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 10, OCTOBER 2014
Fig. 11. All of the nodes are fault-free.
Fig. 12. Nodes N
2
, N
3
, and N
4
are faulty.
diagnostic cycle. Lines 1 and 2 of the marked areas show one
test round between two nodes. The different cases which make
up 26 conditions for fault diagnosis in the system with ve
nodes are as follows: 1) no faulty node in the system; 2) one
faulty node in the system; 3) two faulty nodes in the system;
and 4) three faulty nodes in the system.
The bus activity is captured using Vector CANalyzer, and the
results have been veried.
B. Entry of New Nodes and Reentry of Faulty Nodes
Entry of new node and reentry of repaired faulty nodes
are also veried experimentally for AFDCAN as shown in
Figs. 1316. The system with four nodes N
1
, N
2
, N
3
, and N
4
is considered. Node N
5
is used to test the entry of a new node.
Fig. 13. New node N
5
entry.
Fig. 14. New node N
5
is part of the diagnostic cycle.
Fig. 13 shows the entry of N
5
, and Fig. 14 shows N
5
becoming
part of the diagnostic cycle. Node N
5
is considered as a faulty
node for testing the reentry of the repaired faulty node. N
5
is
powered off to indicate the fault condition (Fig. 15), and later,
N
5
is powered on again to indicate the reentry in the system
(Fig. 16).
C. Diagnostic Cycle Time of AFDCAN
AFDCAN is executed on the hardware as explained earlier in
Section VII-A at 125-kb/s baud rate, with all of the ve nodes
being fault free. The bus activity of the complete diagnostic
cycle captured on CAN bus is shown in Fig. 11 as marked area.
The time taken for completion of one diagnostic cycle of Fig. 11
is 560 ms.
KELKAR AND KAMAL: ADAPTIVE FAULT DIAGNOSIS ALGORITHM FOR CONTROLLER AREA NETWORK 5535
Fig. 15. Faulty node N
5
is powered off.
Fig. 16. Faulty node N
5
reenters the diagnostic cycle after repair.
Authors have further improved this diagnostic cycle time
without affecting the AFDCAN algorithm with the same baud
rate. The improved diagnostic cycle time for completion of one
diagnostic cycle is 526 ms, as shown in the marked area of
Fig. 17.
When the AFDCAN algorithm is executed at 500-kb/s baud
rate, the diagnostic cycle time is found to be 526 ms. This
diagnostic cycle time is the same as when AFDCAN is exe-
cuted at 125 kb/s. As mentioned earlier, while implementing
AFDCAN, authors have made use of the local timer module
of the microcontroller at every node. In order to achieve syn-
chronization between all of the nodes in the 26 fault conditions,
it was required to provide a time window for checking the re-
ception of different AFDCAN frames at every node. Therefore,
the diagnostic cycle time depends on the timer at every node
and is not affected by the baud rate of the CAN bus.
Fig. 17. Improved diagnostic cycle time when all of the ve nodes are fault
free.
D. Support for Large Number of Nodes in AFDCAN
The CAN data frame has 64 b (8 B) of data eld along
with other elds. Data eld is used for sending information
on the CAN bus. Out of 8 B of data eld, 2 B are required
for specifying SID and DID, respectively (Fig. 3). Also, 4 b
are required for specifying frame type (Fig. 3). Therefore, the
remaining 44 b can be used for holding diagnostic data of the
nodes present in the system. Two bits per node are required to
represent the state of the node (Fig. 3) in AFDCAN. Hence,
AFDCAN supports 22 nodes in the system, including entries
of new nodes. Thus, a large number of nodes can be part
of the fault diagnosis system. This feature is useful for both
automotive and industrial applications.
VIII. CONCLUSION
The AFDCAN algorithm uses a denite number of test
rounds and sends a denite number of messages to nd the fault
conditions in the CAN-based distributed embedded system.
Therefore, AFDCAN uses a denite bandwidth, based on total
number of nodes in the system. The number of test rounds
and the number of messages decrease with the increase in the
number of faulty nodes.
AFDCAN also supports the entry of new nodes and reentry
of repaired faulty nodes, as demonstrated in Section VII-B. The
failure of the response is detected by the testee or tester with
the help of timeout.
Thus, AFDCAN uses denite time for fault diagnosis of the
system. The improved diagnostic cycle time of AFDCAN is
526 ms, when all of the ve nodes are fault free.
Looking further, synchronization of timings at different
nodes in the system is required for better performance of the
AFDCAN. Also, the time taken by AFDCAN diagnostic cycle
can be reduced by implementing the primary message generator
[6]. The improvements proposed in Section V may also be
considered.
5536 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 10, OCTOBER 2014
TT-CAN is a higher layer protocol above the standard CAN.
Standard CAN messages are sent in TT-CAN [7]. These mes-
sages are transmitted at a specic time slot, and thus, they
do not compete with other CAN messages for bus access.
Communication schedules of all CAN nodes are synchronized
by TT-CAN for a network. By designing the schedules of dif-
ferent CAN nodes for different AFDCAN diagnostic messages,
AFDCAN may be adopted for TT-CAN.
AFDCAN may become part of CAN-based automotive net-
works. In automotive, both periodic and aperiodic messages
may appear on the CAN bus. It is essential to meet the trans-
mission deadline for each message. The AFDCAN messages
need to be scheduled carefully in such a way that they do not
interfere in the existing schedules of message transfers at every
node. This will ensure the real-time transfer and safety aspects
of CAN messages in automotive.
In automotive, the CAN messages are sent or received by
nodes or ECUs of CAN network. These messages are given
priority with the help of the identier present in the CAN data
frame. For a CAN message, the lower the identier, the higher
is the priority assigned. As far as AFDCAN is concerned, the
priority of the diagnostic messages is decided based on the
priority of the existing periodic and aperiodic messages. While
allocating identiers to the diagnostic messages for AFDCAN,
the transmission deadlines of the periodic and aperiodic mes-
sages need to be considered.
The collective diagnostic information of all of the nodes in
AFDCAN is available with any fault-free node in the system.
The corrective action can be taken, or the faults can be reported
depending on the severity of the problem. Thus, the use of
AFDCAN gives a new proposition for network diagnosis in
applications such as automotive and industrial automation.
APPENDIX
A. Algorithm to Find First the Fault-Free Node by the Tester
1) Initialization;
2) Make test frame, TFRM.
TFRM = {NID} {DID} {Tbit}
n
j=1
{TBUFF
1,j
}
where,
n
j=1
{TBUFF
1,j
} =
n
j=2k
{NBUFF
1,j
} k = 1, 2 . . . n.
3) Send TFRM to testee.
4) Wait until timeout.
5) If result frame is received within timeout, then, read the
received frame in RFRM.
a. If frame type bits found in RFRM indicate R bits
(0001) then, modify node buffer as follows:
n
j=2k
{NBUFF
1,j
} =
j=l+m
{RFRM
1,j
} where,
k=1, 2 . . . n, m=1, 2 . . .(n1), l ={|SID|+|DID|+2}.
b. Mark testee as fault-free, NBUFF
1,2DID
= 1.
c. Exit.
6) If result frame is not received by tester from testee within
timeout, increment faulty node counter by one.
7) Check for destination ID eld (DID) for following condi-
tions:
a. DID less than n;
b. DID equal to n.
8) If condition (7.a) is true, then,
a. NBUFF
1,2DID
= 0 and make DID = DID + 1.
b. TFRM={NID}{DID}{Tbit}
n
j=1
{TBUFF
1,j
}
where,
n
j=1
{TBUFF
1,j
} =
n
j=2k
{NBUFF
1,j
}
where, k = 1, 2 . . . n.
c. Send TFRM to testee and exit.
9) If condition (7.b) is true, then,
RFRM={NID}{DID}{2Rbit}
n
j=1
{TBUFF
1,j
},
where DID is the NID of the earlier fault-free node and
n
j=1
{TBUFF
1,j
} =
n
j=2k
{NBUFF
1,j
}
where, k = 1, 2 . . . n.
10) RFRM is 2nd result frame. Send it to the earlier fault-free
node.
B. Algorithm for the Following Timeout Conditions in
AFDCAN
1) Algorithm for steps to be taken when test frame sent from
the tester is not received by the testee, within timeout.
a. Update node buffer as:
2(NID1)
j=2k
{NBUFF
1,j
}=0, where, k=1, 2 . . .(NID1).
b. Increment faulty node counter by one.
c. Send test frame to next node.
2) Second result frame not received by tester or broadcast
frame not received by the node.
a. j = 2k, {NBUFF
1,j
} = u, where, u = 2
d
indicating
undened state of the node and k = 1, 2 . . . n.
b. Start new diagnostic cycle.
C. Algorithm for Sending the New Node Entry Frame to All of
the Nodes in the System
(Refer to Fig. 6)
1) TFRM = {NID} {0} {NERbit}.
2) Send TFRM to all of the nodes in the system after sensing
a result frame on the bus.
REFERENCES
[1] L. Cauffriez, J. Ciccotelli, B. Conrardc, and M. Bayartc, Design of
intelligent distributed control systems: A dependability point of view,
Reliab. Eng. Syst. Safety, vol. 84, pp. 1932, 2004.
[2] S. Kelkar and R. Kamal, Control area network based quotient remainder
compression-algorithm for automotive applications, in Proc. 38th Annu.
IEEE IECON, Montreal, QC, Canada, Oct. 2012, pp. 30303036.
[3] S. Kelkar and R. Kamal, Comparison and analysis of quotient remain-
der compression algorithms for automotives, in Proc. IEEE INDICON,
Kochi, India, Dec. 2012, pp. 802807.
[4] Robert Bosch GmbH, Ver. 2.0 Controller Area Network (CAN)Protocol
Specication 1991, Robert Bosch GmbH, Ver. 2.0.
[5] M. J. Short and M. J. Pont, Fault-tolerant time-triggered communication
using CAN, IEEE Trans. Ind. Informat., vol. 3, no. 2, pp. 131142,
May 2007.
[6] J. R. Pimentel and J. A. Fonseca, FlexCAN: A exible architecture for
highly dependable embedded applications, in Proc. 3rd Int. Workshop
Real-Time Netw., Italy, 2004. [Online]. Available: http://paws.kettering.
edu//~jpimente/excan/FlexCAN-architecture.pdf
[7] T. Fuhrer, B. Muller, W. Dieterie, F. Hartwich, R. Hugel, and H. Weiler,
Time triggered communication on CAN, in Proc. 7th Int. CAN
Conf., Amsterdam, Netherlands, 2000. [Online]. Available: http://www.
bosch-semiconductors.de/media/pdf_1/canliteratur/cia2000paper_1.pdf
[8] Muller, T. Fuhrer, F. Hartwich, R. Hugel, and H. Weiler, Fault tol-
erant TTCAN networks, in Proc. 8th Int. CAN Conf., Las Vegas,
NV, USA, 2002. [Online]. Available: http://www.bosch-semiconductors.
de/media/pdf_1/canliteratur/fault_tolerant_ttcan.pdf
KELKAR AND KAMAL: ADAPTIVE FAULT DIAGNOSIS ALGORITHM FOR CONTROLLER AREA NETWORK 5537
[9] H. Kopetz and G. Grnsteidl, TTP-A protocol for fault-tolerant real-time
systems, Computer, vol. 27, no. 1, pp. 1423, Jan. 1994.
[10] B. Manuel, P. Julin, N. Guillermo, and A. Lus, An active star topology
for improving fault connement in CAN networks, IEEE Trans. Ind.
Informat., vol. 2, no. 2, pp. 7885, May 2006.
[11] M. Barranco, J. Proenza, and L. Almeida, Quantitative comparison of
the error-containment capabilities of a bus and a star topology in CAN
networks, IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 802813,
Mar. 2011.
[12] N. Kandasamy, J. P. Hayes, and B. T. Murray, Time-constrained failure
diagnosis in distributed embedded systems: application to actuator diag-
nosis, IEEE Trans. Parallel Distrib. Syst., vol. 16, no. 3, pp. 258270,
Mar. 2005.
[13] J. Biteus, E. Frisk, and M. Nyberg, Distributed diagnosis using a con-
densed representation of diagnoses with application to an automotive
vehicle, IEEE Trans. Syst., Man, Cybern. A, Syst. Humans, vol. 41, no. 6,
pp. 12621267, Nov. 2011.
[14] S. A. Arogeti, D. Wang, C. B. Low, and M. Yu, Fault detection isolation
and estimation in a vehicle steering systems, IEEE Trans. Ind. Electron.,
vol. 59, no. 12, pp. 48104820, Dec. 2012.
[15] C. M. Vong, P. K. Wong, and W. F. Ip, Anewframework of simultaneous-
fault diagnosis using pairwise probabilistic multi-label classication for
time-dependent patterns, IEEE Trans. Ind. Electron., vol. 60, no. 8,
pp. 33723385, Aug. 2013.
[16] H. A. Hansson, T. Nolte, C. Norstrom, and S. Punnekkat, Integrating
reliability and timing analysis of CAN-based systems, IEEE Trans. Ind.
Electron., vol. 49, no. 6, pp. 12401250, Dec. 2002.
[17] B. Jiang, Z. Mao, and P. Shi, H