Academia.eduAcademia.edu

Digital Forensic Reconstruction of A Program Actions

2015

Abstract—Forensic analysis of a suspect program is a daily challenge encounters forensic analysts and law-enforcement. It requires determining the behavior of a suspect program found in a computer system subject to investigation and attempting to reconstruct actions that have been invoked in the system. In this research paper, a forensic analysis approach for suspect programs in an executable binary form is introduced. The proposed approach aims to reconstruct high level forensic actions and approximate action arguments from low level machine instructions; That is, reconstructed actions will assist in forensic inferences of evidence and traces caused by an action invocation in a system subject to forensics investigation.

2013 IEEE Security and Privacy Workshops Digital Forensic Reconstruction of A Program Actions Ahmed F.Shosha, Lee Tobin and Pavel Gladyshev School of Computer Science and Informatics University College Dublin Dublin, Ireland. {ahmed.shosha, lee.tobin}@ucdconnect.ie, pavel.gladyshev@ucd.ie executable binary. Although, these approaches allow analysis of tremendous security problems, they are however, limited if used in forensic investigation of progr m’s binary. These approaches address the problems related to semantic analysis of low-level machine instructions and its side effects [9] and do not approach the analysis of a program actions that may change the final state of a system. Forensic analysis of suspect programs, naturally, concerned with reconstruction of high level program actions (e.g. file modifications or registry manipulation) that change the final state of a system and cause traces that assists the process of evidence inferences. In previously mentioned static analysis frameworks, instructions that handle action invocation and termination, such as, call and ret are treated as basic assignment and jump operations, and arguments of an action in the procedure stack are not forensically considered in the analysis. Thus, forensic analysis based on these approaches may conclude the possibility of certain action invocation, however, the detailed specifications of the action arguments remain unspecified; i.e. a human investigator may infer, in static code analysis, the possibility of file creation based on existence of a file create action call instruction, however, file specification cannot be determined due to the lack of action arguments analysis in the procedure block stack. In this research, an action reconstruction approach is proposed to determine invoked actions and compute an approximation of action arguments in a procedure block. In the proposed approach, an enhancement to interprocedural analysis of a procedure block is proposed through modeling the local stack frame. Modeled stack frame is, then, augmented with a data flow analysis of action arguments to allow approximation of argument passed to an action. Determined actions and approximated action arguments values will, subsequently, allow in inferences of program traces and evidence and will assist in forensic reconstruction of a program behavior in the system subject to investigation. The remainder of the paper is organized as follows: In section Two, an intermediate language describing the semantic of a program machine code is proposed; then, an interprocedural analysis of program blocks and data flow analysis of action and arguments are presented. In section Three, the system implementation and preliminary experimental results are described. Finally, section Four, concludes the presented approach and proposes the future research work. Abstract—Forensic analysis of a suspect program is a daily challenge encounters forensic analysts and law-enforcement. It requires determining the behavior of a suspect program found in a computer system subject to investigation and attempting to reconstruct actions that have been invoked in the system. In this research paper, a forensic analysis approach for suspect programs in an executable binary form is introduced. The proposed approach aims to reconstruct high level forensic actions and approximate action arguments from low level machine instructions; That is, reconstructed actions will assist in forensic inferences of evidence and traces caused by an action invocation in a system subject to forensics investigation. Keywords:Program Analysis, Data Flow Analysis, Digital Forensic Investigation, Action Reconstruction, Static Code Analysis. I. INTRODUCTION In digital investigation, investigators are required to analyze suspect executable binaries of programs found in a system subject to investigation. Program analysis, generally, can be accomplished in two-folds: ( ) dynamic program analysis [1-2] ( ) static program analysis [3-4]. In dynamic program analysis, a suspect binary is executed in a virtual or emulated system and actions invoked in the concrete execution (e.g. file created, registry modified, process accessed) are monitored to determine the program behavior. Concrete program execution of a program, however, has a set of limitations [5-6]. Programs are comprised of several execution paths and sets of configurations, and each path can invoke several subsequent actions. An analysis system, which a program is executed, is typically a standard and preconfigured environment that barely similar to investigated system. A program actions invoked in a concrete execution on analysis system, as a result, may by different than actions executed in a system subject to investigation. That is, forensic analysis based on concrete execution may conclude to invalid results. To supplement forensic investigation based dynamic analysis approaches, static program analysis is, then, introduced to the forensics investigation process. Fundamentally, static program analysis approaches aim to approximate a behavior of a program if executed on a computer system. Prevalent binary analysis frameworks, (e.g. BAP [2], BitBlaze [7], or Jackstab [8]) proposed different approaches that allow automating static analysis of a vector of dependent machine instructions decoded from an © 2013, Ahmed F. Shosha. Under license to IEEE. DOI 10.1109/SPW.2013.17 119 <Statement <Expression > > ::= < expression > |<variable> ::= < expression > |if < expression >: <statement> else <statement> |<jump>: < statement > <Assignment ̂ ̂ ⟦̂ ( ) <Conditional Expression>: ::= <variable> |<assignment > |<test> <expression> |<valuation> <expression> |<unary-expression> |<binary-expression> | <value> ̂ <Jump>: ̂ ( ) ⟦̂ ( ( ) ̂ Operator: Temp ⟦ Compute ̂ Operator: Temp Figure 1: A syntax of IL to Abstract a Program Sematic II. >: FORENSIC ANALYSIS OF A PROGRAM A. A Program Formalization To allow analysis of a progr m’s in ry execut le , a simplified Intermediate Language (IL) [10-11] is proposed to express the concrete semantic of low level instructions belonging to a program subject to investigation. In proposed IL, a program is a set of statements that represent different operations over expression or a variable , i.e. variable or memory assignment, conditional assessment of an expression or jump to a specific program point . The syntax of proposed IL is presented in Figure 1. A semantic of statement in a program is modeled as a program state at program point . A transition function model the 〈 〉 , and updates changes in a program state 〈 〉 the program counter . The operational semantic of presented IL is shown in Figure 2. It defines unambiguously the concrete execution of an investigated program abstracted in IL. In IL operational semantic, each statement is substituted with one or more production rules that are depicted in Figure 2. All production rules are in the following form: ⟦̂ ( ) ( )⟧ ̂ ( )⟧ ( ) ( ) ( ) ) ( ( ) ( ) ) ( )⟧ ( )⟧ Figure 2: Concrete Operational Semantic of IL comprised of a set of statements . A control flow graph (CFG) of a given can be constructed using a standard CFG construction technique such as presented in [3]. Note that, a program based on previously illustrated notation can be viewed as CFG of procedure blocks; where each block is a node in the graph, and have entry point and exit point denoting the graph vertices. The semantic of a state transition to a procedure , additionally to updating the program counter, allocates a memory region to the local stack frame for local variables in and other conventional operations such as, caller and callee saving registers [12]. Thus, in order to reason a behavior of actions invoked in , a simplified modeling of a local stack frame is proposed. A local stack frame of a procedure at determined 〉. program point is formalized as a flat lattice 〈 ⟦ ⟧ ⟦ ⟧ denotes an empty stack frame and ⟦ ⟧ denotes the top of a stack. The size of a ⟦ ⟧ is, basically, computed based on statements ⟦ ⟧ that semantically affect Consequently, a function ƒ is said to increment or decrement ⟦ ⟧ if at any program point in , there is a statement that semantically ⟦ ⟧ affects ⟦ ⟧ and ƒ is define as, ƒ 〈 ⟦ ⟧ 〉 where, ⟦ ⟧ is a set of data flow analysis computations over , and it will be explained later. Each production rule performs analysis to a statement state before and after the concrete execution of the statement sem ntics. The “ efore” n lysis is denoted s “St te Entry An lysis”, where n ev lu tion of v ri le’s or expression’s state that may be effected by a statement computation in the context ( ) is performed. While “After” n lysis, denoted s “St te Post Comput tion”, which valuate a variable or an expression based on a statement semantic in the context of statement state ( ). Evaluation and valuation of a variable or expression are accomplished through proposed ̂ and ̂ operators, respectively. A Program, as well, is comprised of a set of procedure }, such that, a procedure { is blocks B. Procedure Block Analysis As explained, a procedure block of statements in which, is comprised of a set allows in deducing and deducing in ⟦ ⟧ . Since a single may have several execution paths based on its CFG, analysis of actions invoked in required, primarily, identifying 120 possible paths that hold an action. As a preliminary analysis, we formulate an execution path in CFG⟦ ⟧ as a trace that is a set of transitive statements starting at and ending at . Every trace is corresponding to a concrete execution of subject of analysis. Since several traces ⟦ ⟧ can be computed, ⟦ ⟧ may have a different layouts, each corresponds to a particular executed trace in . Thus, a stack frame of a ⟦ ⟧ . ⟦ ⟧ given trace can be given as ⟦ ⟧ To decrease the complexity of action reconstruction from several traces found in several procedures, we restrict traces subject to analysis to those, only, hold actions (e.g. invokes OS system calls or services) and may change a system final state, if executed. As a result, a forensic analysis of a procedure is only ⟦ ⟧. accomplished to and its 48: 4C: 51: 52: 54: lea eax, [esp+1Ch] push 3E8h push eax #lpFileName push 0 call ds:GetModuleFileNameA ….. 62: 66: 68: 69: 6B: 6D: 6F: 71: 7B: 7C: lea ecx, [esp+414h+BinaryPathName] push 0 #lpLoadOrderGroup push ecx #lpBinaryPathName push 0 #dwErrorControl push 2 #dwStartType push 10h #dwServiceType push 2 #dwDesiredAccess push eax #ServiceName push esi #hSCManager call ds:CreateServiceA Figure 3: A Portion of a Trace Code from a Malicious Program C. Forensic Reconstruction of Actions In digital forensics, an action is an external event to a system and action invocation may cause a creation or modification of the system objects [13-15]. Inferences and/or deduction of an action effect on a system objects in forensic investigation, required reconstruction of the action and associated specifications. To reconstruct an action from low-level instructions, a set of statements ⟦ ⟧ in that invoke an action have to be determined and action arguments have to be computed or approximated. To determine a trace hold a certain action, we defined a set of possible actions that may be invoked by a program binary and its default arguments as stated in the operating system specification [16]. The possible actions set is 〉, defined as 2-tuple of action and arguments 〈 {( 〈 〉)} 〉) ( 〈 where, ⟦ ⟧ invokes an For every trace , if action ; is, then, labeled as and at is labeled for further analysis. As shown in figure 3, a code portion of a trace from a procedure block of a malicious program invoke actions at program points 54 and 7C. Action at 54 accesses a file in the system, while action in 7C creates a persistent service in the system which leaves a trace that may assist a forensic investigation. Determining an action may invoke at a certain procedure block in a concrete execution, however, may not completely assist an inference of forensic evidence unless the arguments to the action are specified, as well. For example, action at program point 7C can assist in inference of system service creation, however, service specification (e.g. name, desired access, path to a service binary image) still unspecified. The service name of action at 7C is added , however, the value of a register variable to ⟦ ⟧ at eax (an action argument register) has been defined through several computations prior to stack decrementing at 71. To compute or approximate action arguments, a data flow analysis of arguments in ⟦ ⟧ is proposed. A standard data flow analysis technique denoted as variable assignment definition [3] is employed to determine a set of statements ⟦ ⟧ that previously, assigned a value to an argument register used in a subsequent action invocation. } denote a set of register variables { Let {〈 〉 〈 〉} a which ⟦ ⟧ operates on. Let Poset of 2-tuple, of in where ⟦ ⟧ has found to valuate . An assignment analysis of variables used in subsequent invocation of an action can be defined as a backward trace function ƒ- over ⟦ ⟧ to trace values assigned to a ⟦ ⟧ , such that: to variable of interest from ƒ- ⟦ ⟧ ƒ 〈 ⟦ ⟧ 〉 ⟦ ⟧ The set resulted is, then, mapped to an argument of action at ⟦ ⟧, as follow: ⟦ ⟧, ⟦ ⟧, a concrete evaluation using Finally, for every ̂ and ̂ is accomplished to compute a concrete value for action arguments subsequently computed though several statements determined in ⟦ ⟧, as follow: ƒ 〈̂ ⟦ ⟧ ̂ 〉 As shown in Figure 3, an argument of action specified at 7C . A backward trace function over is added ⟦ ⟧ at ⟦ ⟧ is ⟦ ⟧ back to action argument at ⟦ ⟧ that assigned a recursively invoked to determine value to variables used as arguments to action , (i.e. {〈 〉〈 〉 〈 〉}). ⟦ ⟧ 121 Sample Name #IL #Actions Accuracy % Trojan.Zbot-1225 1031 34 89 Trojan.Zbot-385 1319 53 87 Trojan.Zbot-1023 1220 39 82 Trojan.Zbot-1652 2240 103 71 Trojan.Ransomwre-1 1632 89 79 IV. In this research work, an automated approach to extract forensic actions from low-level machine code and approximating action arguments values based on backward data flow analysis algorithm is proposed. The proposed approach allows in inference/deduction of evidence and extraction of traces related to a suspect program in a system subject to forensics investigation. A prototype forensic analysis framework is, then, developed and evaluated using different malicious programs that regularly used to commit cybercrime activities. Table 1: A Sample Forensic Analysis of Malicious Programs ̂ and ̂ operators are, then, recursively operate over to compute and valuate an argument variable, i.e. eax defined in set. III. CONCLUSION REFERENCES [1] IMPLEMENTATION AND PRELIMINARY RESULTS [2] A prototype program code disassembler and analyzer is implemented to presented program forensics approach. Developed prototype automates x86 executable binary forensic investigation through decoding a suspect binary machine-code and lift it to our proposed IL. Lifted instructions are, then, automatically examined with the proposed action reconstruction and arguments computing algorithms, as described. To evaluate the proposed approach, forensics analysis is performed for different samples of malicious programs used to commit cybercrimes, such as variants of Zeus (a malware family for banking cybercrimes) and other ransom malware program [17]. Every sample subject to investigation has been forensically examined in our developed prototype and concretely executed in managed system to evaluate the preciseness of reconstructed actions relative to concrete execution, and to determine whether all action arguments have been successfully computed. A sample preliminary result of presented approach is depicted in Table 1. As shown in Table 1, a several forensic actions and associated arguments have been, successfully, reconstructed from machine code, and action traces have been located in the system subject to investigation. The accuracy percentage describes the percentage of successfully reconstructed actions arguments values in compare to concrete execution. In illustrated experimental results, the proposed approach has successfully reconstructs a considerable set of actions that have invoked in a concrete execution and computed substantial percentage of action arguments values; however, a set of action arguments have not been computed, since the behavior of samples subject to investigation are developed to execute based on runtime dynamic computation and configuration parameters in the compromised systems. In other words, several reconstructed actions in the samples are operated on arguments that dynamically populated in runtime from compromised systems, and hence, to approximate such values, a detailed modeling of compromised system is required to be included in the forensic analysis process. [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] 122 M. Egele, T. Scholte, E. Kird , nd C. Kruegel, “A survey on automated dynamic malware-analysis techniques and tools”, ACM Computing Surveys, vol. 44, no. 2, pp. 1–42, Feb. 2012. T. A. Edw rd J. Schw rtz, “All you Ever w nted to know out dynamic taint analysis and forward symbolic execution”, IEEE Symposium on Security and Privacy, pp. 317–331, 2010. C. Nielson, F., Nielson, R., & Hankin, “Principles of program analysis. Springer”, p. 450, 1999. M. Christodorescu nd S. Jh , “St tic n lysis of execut les to detect malicious patterns”, 12th conf. on USENIX Security Symposium - Vol. 12, P. 12, 2003. C. Malin, E. Casey, J. Aquilina,“Malware forensics: investigating and analyzing malicious code”, Syngress, 2008. M. Br nd, C. V lli, nd A. Woodw rd, “M lw re forensics: discovery of the intent of deception”, in Australian Digital Forensics Conference, 2010. D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosank m, nd P. S xen , “BitBl ze: a new approach to computer security via binary an lysis”, in Intl, Conf. on Information Systems Security, vol. 5352, pp. 1–25, 2008. J. Kinder, F. Zuleger, nd H. Veith, “An abstract interpretationbased framework for control flow reconstruction from bin ries” in 10th Intl. Conf. on Verification, Model Checking, and Abstract Interpretation, vol. 5403, pp. 214–228, 2009. A. Flexeder, M. Petter, nd H. Seidl, “Side-effect analysis of assembly code”, in 18th Intl Conf. on Static Analysis, pp. 77–94, 2011. J. E. Hopcroft, R. Motwani, and J. D. Ullman, “Introduction to automata theory, languages, and computation”, Prentice Hall, p. 750, 2006. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, “Compilers: principles, techniques, and tools”, (2nd Edition). Addison Wesley, p. 1000. 2006. Intel, “Intel IA-32 Architectures Softw re Developer M nu ls.” [Online].www.intel.com/content/www/us/en/processors/architect ures-software-developer-manuals.html. [Accessed: 14-Feb-2013]. J. J mes, P. Gl dyshev, nd Y. Zhu, “An lysis of evidence using formal event reconstruction”, Digital Forensics and Cyber Crimes, vol. 31, no. 1, pp. 85–98, 2010. P. Gl dyshev nd A. P tel, “Finite st te m chine ppro ch to digital event reconstruction”, Digital Investigation, vol. 1, no. 2, 2004. F. A. Shosh , J. J mes, nd P. Gl dyshev, “Tow rds automated forensic event reconstruction of malicious Code”, 15th Intl. Symposium on Research in Attacks and Intrusion RAID, p. 388 2012. Microsoft, “MSDN: The Microsoft Developer Network.” [Online]. Available: http://msdn.microsoft.com/en-US/. “ pen M lw re: community malicious code research and analysis”. Av il le: http://www.offensivecomputing.net/.