An Intrusion Detection System For Heavy-Duty Truck Networks: Matthew Butler University of Tulsa, USA
An Intrusion Detection System For Heavy-Duty Truck Networks: Matthew Butler University of Tulsa, USA
An Intrusion Detection System For Heavy-Duty Truck Networks: Matthew Butler University of Tulsa, USA
Matthew Butler
University of Tulsa, USA
matthew-butler@utulsa.edu
Abstract: Control Area Networks (CAN) and its related protocol J1939 have gained widespread acceptance in the automobile
and heavy truck industry as low-cost, high reliability embedded network technologies. Unfortunately, these protocols were
not engineered to support traditional security goals such as confidentiality, integrity, and availability, leaving vehicle
networks open to numerous exploits if a networked component or the network bus itself is compromised. This paper
presents some of the features of these technologies that distinguish them from more conventional computer networking
technologies and proposes an extendable anomaly-based intrusion detection system for J1939 networks to address the lack
of built-in security in these network technologies. This intrusion detection system is designed to be attached to an existing
J1939 network without requiring any additional modifications to the network, and incorporates machine learning
techniques, including support vector machines and random forests, using real and simulated J1939 data.
Keywords: control area networks, J1939, anomaly-based intrusion detection system, machine learning, support vector
machine, random forest
1. Introduction
The inclusion of information systems in automobiles is among the most important innovations of the automotive
industry in recent decades. These information systems include control networks that replace traditional
mechanical control systems. Such networks support a wide range of safety and convenience applications, such
as electronic stability control and remote keyless entry. Technologies such as full drive-by-wire and vehicle-to-
vehicle and vehicle-to-infrastructure communication appear to be feasible in the not-so-distant future.
Meanwhile, automotive manufacturers are increasingly making use of both short- and long-range wireless
communication technologies such as Bluetooth and GSM cellular modems for driver convenience. Each of these
advances further integrates automobiles with the broader world of communication networks and in the process
further exposes them to attack. The motivations to attack automotive networks range from theft to espionage
to terrorism. An attacker with access to the network has virtually unlimited control. Furthermore, exposure to
the major multi-purpose communication networks opens automotive networks up to becoming collateral
damage in attacks designed for other systems, such as personal computers or cell phones.
An Intrusion Detection System (IDS) is a security tool which attempts to detect unauthorized actions within a
system. There are two general strategies in doing so: searching for patterns indicative of previously discovered
attacks, and using various heuristics to identify anomalies—behavior that departs from the norm. Since attacks
on heavy duty vehicle networks "in the wild" are still thankfully rare, this work presents a design for anomaly-
based, network intrusion detection system.
The paper is organized as follows: Section 2 provides a brief overview of the existing work in automotive
intrusion detection systems. Section 3 provides background information on CAN and J1939, both widely used
automotive network protocols, as well as information on IDSs and the classifiers used by this project for anomaly
detection. Section 4 presents the proposed solution, including the threat model used to design it, the method
of feature selection, and the system design. Section 5 discusses the metrics and method to be used in evaluating
the proposed IDS. Section 6 contains closing thoughts and a roadmap of the continuing work on this project.
2. Related work
Hoppe, Kiltz, and Dittmann (2008) discuss the use of Intrusion Detection Systems and proactive forensics support
as a step towards developing long-term solutions to IT security threats to automobiles. Their work presents four
attacks that violate security principles such as Confidentiality, Integrity, Availability, and Non-Repudiation and
shows how it would be possible to detect such attacks with metrics such as increased message frequency,
obvious misuse of message identifiers, and physical layer implementation fingerprinting. Some of these metrics
could be observed by an independent ECU on the CAN bus, while others would require modification of individual
ECU firmware to support detection. In a similar paper, Müter, Groll, and Freiling (2010) outline a set of detection
metrics for use in automotive network intrusion detection systems.
399
Matthew Butler
Müter and Asaj (2011) proposed and implemented an intrusion detection system for CAN networks that
monitors the entropy of network traffic to detect intrusions. Entropy for this work is defined in the information
theory sense, as the expected value of the information contained within each message. Entropy and related
metrics such as self-information and relative entropy were measured across three different levels of abstraction
of network traffic: binary, signal, and protocol. The IDS was successful in detecting attacks that involved
introducing identical spoofed messages onto the bus, as this lowered the overall entropy. Measuring the relative
distance between normal and attack data-sets on the protocol level allowed the system to identify the message
type involved in the attack. The system had mixed results on attacks that altered the behavior of the vehicle, in
this case speed, since this did not always create a significant deviation from normal behavior.
Two solutions based specifically on detecting and preventing spoofing attacks have been proposed. Matsumoto
et al. (2012) describe a system that uses error frames to prevent unauthorized data transmission. In this system,
ECUs are assigned ownership over a set of CAN identifiers that only they are allowed to transmit. The ECUs are
then programmed to listen for CAN messages of those types. If the ECU starts to receive a CAN frame with an
identifier it owns, but is not sending, it sends an error frame. Otsuka et al present a delayed-decision approach
to detecting spoofing attacks (Otsuka 2014). Their system is located in gateway nodes connecting CAN-buses
and focuses on the expected interval of time between frames, or cycle. If a frame arrives significantly before the
end of the cycle, then the frame is held at the gateway for a specified period of time. If another frame is received
during this time, the held frame is assumed to be fraudulent and is dropped. Otherwise, it is forwarded on to
the rest of the network. This delayed decision strategy provides both a means of mitigating an attack and
lessening the potential for false positives. Since the system is located in the gateway, legacy ECUs do not need
to be modified.
Wasicek and Weimerskirch (2015) use an artificial neural network to model the behavior of an engine ECU in
order to detect chip tuning. Chip tuning involves reflashing an ECU with customized firmware and provides a
good example of an existing attack on the integrity of the system's design. Wasicek and Weimerskirch's system
captures telemetry data and creates feature vectors that contain the mean, standard variation, variance,
skewness, and kurtosis for sequences of the speed, RPM, and torque parameters. These feature vectors are used
as inputs to a "bottleneck" neural network trained with normal data, and the resulting outputs combined by a
Root Mean Square function to create an anomaly score that represents how much the feature vector deviates
from the norm. This anomaly score is compared to a threshold to determine when an anomaly has occurred.
Although not applied to intrusion detection, Theissler (2014) uses a single-class support vector machine to detect
anomalies in vehicle test drive data. A similar approach might be feasible for intrusion detection.
3. Background
While the version of the CAN protocol defined in ISO 11898 is rather agnostic in regards to the physical medium,
CAN is typically deployed over a twisted pair wires, taking advantage of differential signals to overcome
electromagnetic interference. Signals are represented on the twisted pair as a difference between the two wires,
typically labeled as CAN_H and CAN_L. When both CAN_H and CAN_L have the same, baseline voltage, the
voltage difference between the wires is zero, and the bus is said to be in the "recessive" state, which represents
a binary 1. Binary 0, represented by the "dominant" state, is transmitted by lowering the voltage of CAN_L and
raising the voltage of CAN_H, creating a voltage difference. With multiple nodes potentially writing to the bus
at the same time, the dominant state overrides the recessive state (Pfeiffer 2008).
CAN's real-time property is a result of how it resolves collisions, a process called "arbitration." A collision occurs
when two nodes attempt to use the fieldbus simultaneously. To resolve the collision, the message identifiers of
the colliding frames are compared bit-by-bit by each of the sending nodes as they write their respective frames
400
Matthew Butler
to the bus. After writing each bit, the nodes read the value on the bus and compare it to the value they just
wrote to the bus. If the values are different, the node stops sending and waits for the other node(s) to finish
sending before trying again. This allows network designers to order message types by priority and gives some
assurance as to how long it will take for a frame to be delivered. This process is implemented in the CAN
controller hardware (Pfeiffer, Ayre, and Keydel 2008).
CAN has four message types, Data, Remote, Error, and Overload, though Data and Error are the most common
messages. Figure 1 shows the basic frame format for CAN Data and Remote messages. Error and Overload
frames are much simpler, essentially consisting of a series of six or more dominant bits followed by eight
recessive bits. CAN ensures this pattern is unique by using bit stuffing, a sentinel bit to break up series of
dominant bits, in normal network traffic.
401
Matthew Butler
Audit information is the input an IDS uses to make its determination on whether an intrusion has occurred. The
most common sources of audit information are host logs and network traffic. Intrusion detection systems
typically focus on one or the other and are often qualified as "host-based" (HIDS) or "network-based" (NIDS),
respectively. HIDSs look at a wide variety of information from the host system, including program usage (Forrest
1996, Schonlau 2001), system calls (Ko 1994, Yeung 2003), and server logs (Abad 2003, Vigna 2003). NIDSs
perform their analysis by inspecting packet headers and payload data of network messages (Roesch 1999,
Kruegel 2002).
Detection methods can be broadly described as either signature-based detection or anomaly-based detection.
Signature-based intrusion detection systems recognize malicious intent by searching for data patterns used by
known exploits and attacks. Anomaly-based intrusion detection systems use heuristics to identify behavior that
deviates from a pre-established norm. Signature-based detection is typically very accurate in detecting known
attacks, but is unable to detect new or obfuscated attacks. Anomaly detection is capable of identifying new
attacks, but often suffers from false alarms.
3.3 Classifiers
Anomaly-based intrusion detection rely on some sort of heuristic to classify traffic as anomalous or normal. Two
types of classifier are being considered in this work: SVMs and Random Forests.
The utility of SVMs can be further extended through the use of kernel functions. A kernel function is function
that produces the equivalent of a dot product on given data after some transformation into a different space.
Since SVMs operate exclusively on dot products of data rather than the data themselves, this allows
transformations of the data to be used without actually having to apply the transformation to each datum,
making these transformations relatively cheap computationally. There are a number of kernel functions for
SVMs which allow them to be applied to situations in which purely linear classification would be inadequate
(Rogers and Girolami 2012).
402
Matthew Butler
bagging", and is used to reduce correlation between trees in the ensemble created by a few very strong predictor
features (Breiman 2001).
4. Proposed solution
The proposed solution is an anomaly-based, network intrusion detection system. Anomaly detection was chosen
because, as of this writing, attacks on vehicles running CAN and J1939 are mostly the purview of security
researchers rather than criminals, and thus there are few attacks "in the wild" against which to build signatures.
This section presents a threat model composed of attacks found in the literature, which can then be used for
the testing and evaluation of the prototype system. It also describes how features are to be selected for building
the classification models and the design of the prototype system.
More general denial of service attacks can be accomplished through a variety of other means. Fuzzing, for
example, is likely to cause a denial of service (Koscher 2010). Fuzzing is a technique commonly used by hackers
and reverse engineers to discover undocumented behavior by sending a selected subset of all possible
combinations of data to a component. Since the range of valid CAN and J1939 frames is relatively small, using
this technique can easily lead to a denial of service. It is also possible to introduce a denial of service by exploiting
CAN's arbitration scheme, flooding the network with meaningless high-priority messages. Both of these attacks
can be detected through either statistical analysis or signatures.
Finally, one of the most important attacks on automotive networks is the clandestine replacement of ECU
software. Replacing ECU software allows further attacks to be committed and potentially gives the attacker a
persistent presence on the network. While a legitimate action, reflashing ECUs should only take place under
certain controlled situations.
Many of these parameters are going to be irrelevant in detecting attacks. To produce a more accurate model for
classifying traffic as either indicative of an attack or not, only parameters that have some statistically significant
value in predicting an attack will be used. To determine these parameters, logistic regression will be applied to
each parameter compared to the presence of an attack. Those parameters that are shown to have some
predictive ability will then be used to create the multivariate classifiers discussed in Section 3.3.
In order to perform this logistic regression and to then train the selected models, the attacks presented in the
threat model will be run against a simulated automotive network environment, the result of which will be a
matrix of feature values during consecutive, short periods of time, or time quanta, and a vector of indicator
variables representing whether or not an attack was being run during a given time quantum.
403
Matthew Butler
The Traffic Capture component is responsible for capturing traffic from a specified source and placing records
of each frame in a queue for each registered analysis component. This record includes source, destination,
priority, PGN information from the frame identifier, and any parameters identified as relevant from the payload,
as well as an associated timestamp. Two traffic capture components were implemented for this prototype. One
captures data from the system’s CAN interface, the other reads a file containing previously captured traffic data.
Analysis components perform the analysis of network traffic. They take frame-records from their respective
queues and use them to identify security events. In this prototype, an analysis component was implemented for
each of the classifiers selected—one for the SVM model, and another for the Random Forests model. When a
security event is detected by an Analysis component, it generates a report and sends it to the Response
component.
The Response component is responsible for responding to anomalous traffic data identified by the analysis
components. It takes security event reports, decides if they merit action and then performs that action based
on them. These actions could be reporting- and recording-type actions, such as logging events to a file or sending
the command to light a component of the instrument cluster. Eventually, the Response component might weigh
different types of analysis to create a more nuanced report. The response component as currently implemented
for the prototype simply aggregates concurrent alerts and logs them to a local file.
5. Evaluation
Testing IDS effectiveness is usually done by measuring the system's accuracy, completeness, and performance
(Porras 1998). Accuracy is often measured by the false positive rate, that is, how often the system misidentifies
legitimate traffic as illegitimate. Completeness is a measure of how well the IDS covers the range of possible
attacks, and corresponds to the false negative rate. Performance is measured in terms of system resources, such
as CPU load, used to perform intrusion detection.
For our purposes, accuracy and performance are the most important aspects to evaluate. The prototype system
is designed to run on a small, Linux-capable hobby board such as the BeagleBone, together with an extension
board that provides the CAN hardware controller. This combination of hardware and software must be able to
keep up with high bus utilization, which will occur during many of the attacks contained in the threat model. The
primary metric for accuracy will be the false positive rate, that is fp/(fp+tn), where fp is the number of quanta in
which the IDS issues a report when there is no actual illegitimate traffic and tn is the number of quanta in which
the IDS correctly abstains from issuing a report.
404
Matthew Butler
Evaluation of the system will be performed similarly to the process used to gather ground truth data for feature
selection and model training. The attacks outlined in the threat model will again be run against a simulated
automotive network while the IDS observes, analyzes, and responds to the resulting network traffic. Since these
simulations are not deterministic, the evaluation simulation will be measurably different from the simulation
used to train the detection models.
6. Conclusion
The increasing integration of automotive information systems with other networks, such as cellular networks
and the internet, exposes them to threats they were not designed to be resilient against. This creates the
potential for serious consequences in the event of compromise. Redesigning automotive information systems
to be more resilient to attack is unfortunately prohibitively expensive.
This work proposes a relatively low-cost security solution to detect attacks on the internal network of vehicles
using the J1939 protocol. This project is still a work in progress, however. At present, the prototype IDS is
functionally complete and works as expected on hardware similar to the target hardware. The current work is
focused on gathering network data on the simulated automotive network environment both under normal
conditions and under attack to gather ground truth data and perform feature selection and model training as
discussed in Section 4.2. After this is complete, the system can undergo its final evaluation as outlined in Section
5.
References
Abad, C., Taylor, J., Sengul, C., Yurcik, W., Zhou, Y., and Rowe, K. (2003). Log correlation for intrusion detection: A proof of
concept. In Computer Security Applications Conference, 2003. Proceedings. 19th Annual, pages 255–264. IEEE.
Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32.
Cristianini, Nello (2000). An introduction to support vector machines: and other kernel-based learning methods. Cambridge
University Press.
Debar, H., Dacier, M., and Wespi, A. (1999). Towards a taxonomy of intrusion-detection systems. Computer Networks,
31(8):805–822.
Forrest, S., Hofmeyr, S. A., Somayaji, A., and Longstaff, T. A. (1996). A sense of self for unix processes. In Proceedings of
1996 IEEE Symposium on Security and Privacy, pages 120–128. IEEE.
Hoppe, T., Kiltz, S., and Dittmann, J. (2008). Security threats to automotive can networks — practical examples and selected
shortterm countermeasures. In Proceedings of the 27th international conference on Computer Safety, Reliability, and
Security, SAFECOMP ’08, pages 235–248, Berlin, Heidelberg. Springer-Verlag.
Ko, C., Fink, G., and Levitt, K. (1994). Automated detection of vulnerabilities in privileged programs by execution
monitoring. In Computer Security Applications Conference, 1994. Proceedings., 10th Annual, pages 134–144. IEEE.
Koscher, K., Czeskis, A., Roesner, F., Patel, S., Kohno, T., Checkoway, S., McCoy, D., Kantor, B., Anderson, D., Shacham, H.,
and Savage, S. (2010). Experimental security analysis of a modern automobile. Security and Privacy, IEEE Symposium
on, 0:447–462.
Kruegel, C., Valeur, F., Vigna, G., and Kemmerer, R. (2002). Stateful intrusion detection for high-speed networks. In Security
and Privacy, 2002. Proceedings. 2002 IEEE Symposium on, pages 285–293. IEEE.
Lunt, T. F. (1993). A survey of intrusion detection techniques. Computers & Security, 12(4):405–418.
Matsumoto, T., Hata, M., Tanabe, M., Yoshioka, K., and Oishi, K. (2012). A method of preventing unauthorized data
transmission in controller area network. In Vehicular Technology Conference (VTC Spring), 2012 IEEE 75th, pages 1–5.
IEEE.
Müter, M. and Asaj, N. (2011). Entropy-based anomaly detection for in-vehicle networks. In Intelligent Vehicles Symposium
(IV), 2011 IEEE, pages 1110–1115. IEEE.
Müter, M., Groll, A., and Freiling, F. C. (2010). A structured approach to anomaly detection for in-vehicle networks. In
Information Assurance and Security (IAS), 2010 Sixth International Conference on, pages 92–98. IEEE.
Otsuka, S., Ishigooka, T., Oishi, Y., and Sasazawa, K. (2014). Can security: Cost-effective intrusion detection for real-time
control systems. Technical Paper 2014-01-0340, SAE International.
Pfeiffer, O., Ayre, A., and Keydel, C. (2008). Embeddied Networking with CAN and CANopen. Copperhill Technologies
Corporation.
Porras, P. A. and Valdes, A. (1998). Live traffic analysis of tcp/ip gateways. In NDSS.
Python Software Foundation (2012). Python 3.3 changelog. https://docs.python.org/3.3/whatsnew/changelog.html.
Accessed: 2014-05-21.
Roesch, M. et al. (1999). Snort: Lightweight intrusion detection for networks. In Proceedings of LISA ’99: 13th Systems
Administration Conference, pages 229–238.
Rogers, Simon and Girolami, Mark (2012). A first course in machine learning. CRC Press.
SAE (2009). Recommended Practice for a Serial Control and Communications Vehicle Network. Standard J1939, SAE
International.
405
Matthew Butler
Schonlau, M., DuMouchel, W., Ju, W.-H., Karr, A. F., Theus, M., and Vardi, Y. (2001). Computer intrusion: Detecting
masquerades. Statistical science, pages 58–74.
Theissler, A. (2014). Anomaly detection in recordings from in-vehicle networks. In Big Data Applications and Principles.
U.S. Government (2012). On-board Diagnostics. Code of Federal Regulations, title 40, sec. 86.005-17.
Vigna, G., Robertson, W., Kher, V., and Kemmerer, R. A. (2003). A stateful intrusion detection system for world-wide web
servers. In Computer Security Applications Conference, 2003. Proceedings. 19th Annual, pages 34–43. IEEE.
Wasicek, A. and Weimerskirch, A. (2015). Recognizing manipulated electronic control units. Technical Paper 2015-010202,
SAE International.
Yeung, D.-Y. and Ding, Y. (2003). Host-based intrusion detection using dynamic and static behavioral models. Pattern
recognition, 36(1):229–243.
406
Biographies of Contributing Authors
Shada Alsalamah is an Assistant Professor at the College of Computer and Information Science (CCIS) at King
Saud University (KSU). She has a PhD in healthcare information systems’ security and privacy, and an MSc in
Strategic Information Systems with Information Assurance from Cardiff University, UK. Her research interests
focus on interdisciplinary approaches to information security and privacy in multi-agent collaborative
environments.
Matthew Aust is a 2nd Lieutenant in the United States Air Force. He is married to Alexis Aust. He graduated
from the United States Air Force Academy in 2015 and is currently attending the Air Force Institute of
Technology for his Masters in Computer Science.
Jeffrey Avery is a PhD Candidate in Computer Science at Purdue University studying under Prof. Eugene
Spafford. His research interests include information assurance and computer security. Jeffrey has completed
a number of internships and currently is a cyber security intern at Sypris Electronics, Inc. His research interests
include network security, anti-phishing and deception.
Zakariya Belkhamza is a senior lecturer in management information systems at the Faculty of Business,
Economics and Accountancy, Universiti Malaysia Sabah, where he teaches subjects, including management
information systems and human resource information systems. He holds a PhD in management information
systems and a master’s degree in business. Besides his teaching experience, he is also involved in research and
consultancy activities. His research interests include IT management, strategic information systems, IS
Security, IS implementation, assessment and evaluation.
Vijay Bhuse is an Assistant Professor of Computer Science at the Grand Valley State University. He is received
his Ph.D. from Western Michigan University in 2007 and completed his postdoctoral fellowship from the
Dartmouth College. We worked in industry before returning to academia. His research interests are Network
Security, Wireless Sensor Networks and Secure Coding.
Jason M. Bindewald (Major) is an Assistant Professor of Computer Science in the Department of Electrical and
Computer Engineering at the Air Force Institute of Technology, Wright-Patterson AFB, OH. He previously
earned a Doctor of Philosophy degree in Computer Science (2015) and a Master’s degree in Cyber Operations
(2012), as a distinguished graduate of the same institution.”
Johnny Botha Project Manager, software developer & researcher at the Council for Scientific and Industrial
Research (CSIR). He is studying a masters (MTech) degree in Information Technology, at University of South
Africa(UNISA). Topic: "Personal Identifiable Information Disclosure Since the Protection of Personal Information
Act Adoption in South Africa". He obtained NDip and BTech degree in Computer Systems Engineering at the
Tswane University of Technology(TUT).
Adam J. Brown is a Ph.D. student at the University of South Alabama’s School of Computing. He received his
J.D. and M.B.A. from Florida State University in 2012. He received his B.S. in Mathematics and Statistics from
the University of South Alabama in 2007. His research interests include cyber law, cyber security, formal
modelling, and embedded systems.
Matthew Butler is currently a Ph.D. candidate at the University of Tulsa. He graduated with his Bachelor of
Science in 2008 and Masters of Science in 2011, both in Computer Science, from the University of Tulsa. His
research interests are security engineering, spatial access control, and embedded automotive communication
networks.
Eddie K. Caberto is a Captain in the United States Air Force woring on y Masters degree in Computer
Engineering from the Air Force Institute of Technology. His background experience includes a Bachelors
Degree in Computer Engineering and Cyber Defence/Attack/Operations in the Air Force. His interests are in
the exploitation and security of cyber-physical embedded systems.
Emin Caliskan is a Cyber Security Researcher and a PhD candidate currently working at the Tallinn University of
Technology, Computer Science Department. Prior to his position at TTU, he worked as a scientist at NATO CCD
xi
Reproduced with permission of copyright owner.
Further reproduction prohibited without permission.