This thesis focuses on the development, implementation and optimization of pattern-matching algorithms in two different, yet closely-related research fields: malicious code detection in intrusion detection systems and digital forensics...
moreThis thesis focuses on the development, implementation and optimization of pattern-matching algorithms in two different, yet closely-related research fields: malicious code detection in intrusion detection systems and digital forensics (with a special focus on the data recovery process and the metadata collection stages it involves). The thesis introduces the motivational backgrounds for the development of the work, later on presents the related work and then continues with the main achievements obtained, while in the end a few conclusions and future research directions are discussed.
The main four chapters in this thesis show the main contributions of our work and address the following topics: we present an efficient storage mechanism for hybrid CPU/GPU-based systems, and compare it with other known approaches to date. We then propose an innovative, highly parallel approach to the fast construction of very large Aho-Corasick and Commentz-Walter pattern matching automata on hybrid CPU/GPU-based systems, and compare it to existing sequential approaches. Later on we propose a new heuristics for profiling malicious behavior based on system-call analysis, using the Aho-Corasick algorithm, and also discuss a new hybrid compression mechanism for this automata based on dynamic programming, that reduces the storage space required for it. Finally, we propose an efficient new method for collecting metadata and helping the human operator or automated tools used in the data recovery process as part of the computer forensic investigations.
The research and models obtained in this thesis extend the existing literature in the field of intrusion detection systems (malicious code detection in particular), by presenting: an innovative heuristics for behavioral analysis of code in executable files through system-call interception, a novel and highly efficient approach to efficiently storing pattern-matching automata in hybrid CPU/GPU-based systems, which serves as the base for an innovative model for the fast, GPU-accelerated construction of such very large automata (for both the Aho-Corasick and Commentz-Walter algorithms) and a new hybrid compression technique applied to the Aho-Corasick automata using a dynamic programming approach, that reduces storage space significantly.