Malware Detection in Android Applications
Malware Detection in Android Applications
Volume 3 Issue 5, August 2019 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
@ IJTSRD | Unique Paper ID – IJTSRD26449 | Volume – 3 | Issue – 5 | July - August 2019 Page 2401
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
detect Android Malware. Arshad Saba et al.[2] gives details platform which is based on virtual machine introspection.
of different Android malware types and its penetration Droidscope is built upon QEMU emulator. It is monitoring
techniques. They categorized different antimalware whole operating system to get more information regarding
techniques like static and dynamic malware detection. At the malware and also detect kernel level attack.
end, they proposed the hybrid antimalware concept to
overcome limitations of the static and dynamic approach. PROPOSED METHODOLOGY
Preprocessing: The first step of proposed system is to collect
Feizollah et al.[3] provide details about feature selection real-world samples of benign and malware applications.
from Android applications for malware detection. Based on After collection of application sample, system next go to the
deep research, they categorized four different feature second step of recording system calls. Figure 1 shows the
selection group like application meta-data, hybrid, dynamic flow of system call recording. Initially, we installed every
and static features. It gives a novel introduction about application on Android emulator and run for a 2-3 minute.
Android malware detection types and related features. They After that, we recorded system calls of each application and
proposed permission, signature, Java’s code, etc. features copied into an external file(.csv). To trace system calls we
used for static malware detection. While the system calls, used.We know that each line in training set represents single
network traffic, user interactions are the feature set for application features with multiple feature integer and
dynamic malware detection. feature values. Now we labeled each line that means each
application with 1 or 0. Where 1 means benign application,
In paper [4], [5], [6], [7] authors suggested static malware and 0 means a malicious application in training set. We have
analysis techniques with a different approach. Geoffroy used 70% of application from system data set for training
Gueguen et al.[4] propose static malware analysis tool data and remaining for testing. After all this data
named as Androguard. Androguard is the Python-based preprocessing, we applied Naive Bayes classifier in next step
static malware analysis tool used to disassemble and
decompile Android apps by using reverse engineering. Algorithm 1: Naive Bayes Algorithm for Malware
Androguard calculates application similarities and Detection
differences by using NCD(Normalized Compression The duplicated files are mapped with a single copy of the
Distance), fuzzy risk score and signatures of the malicious file data by mapping with the existing file data in the
application. Faruki Parvez et al.[5] describe the tool cloud
Androsimilar. Androsimilar is a signature based static The comprehensive requirements in multi-user cloud
malware analysis tool. Androsimilar automatically generates storage systems and introduced the model of
the signature of the test application. Generated signature is deduplicatable dynamic PoS.
compared against malware signature database. Then identify
it as the normal or malicious application. Daniel Arp et al.[6] Input: Android Application System calls stored in .csv file
propose the static malware analysis tool called as Drebin. Output: Class from which given system calls belong.
Drebin is a static malware analysis tool which detects 1. Foreach line in file .csv do
malicious application directly on Android phone. Drebin 2. Remove all parameters except system call name;
collects various features from application code and manifest 3. Store all system call names in another file called system
file. Then machine learning approach is used to distinguish call name;
normal and malicious application. Sanz Borja et al.[7] 4. End
propose permission based static malware detection tool 5. Foreach system call name in file system call name do
called PUMA. PUMAs extract application permission from the 6. Assign unique integer number;
manifest file. Then use the machine learning algorithm to 7. Store all integers in file integer system call file;
identify normal and malicious permissions 8. End
9. Foreach integer system call do
In paper [8],[9],[10],[11] authors suggest dynamic malware 10. Calculate 3-gram and 5-gram length;
analysis techniques with the different approach. Suarez 11. End
Tangil et al.[8] proposes the dynamic analysis tool named as 12. Foreach length of system call
AlterDroid. AlterDroid a tool for dynamic analysis of hidden 13. Compute frequency of each integer then;
malware distributed over application components. 14. Foreach system call if frequency is less than 100;
Alterdroid analyses the behavioral difference between 15. Remove from file ;
original application and fault injected application. It creates 16. Compute density of each integer then;
behavior signatures for both applications. It then analyze 17. Store data into value pair format in data file;
differential signature with the help of pattern matching. Tam 18. End
Kimberly et al.[9], describe tool CopperDroid. CopperDroid 19. After all this data processing apply Naive Bayes
is virtual machine based automatic dynamic analysis system. classifier.
It reconstructs the behaviour of Android malware by 20. Foreach class instance
monitoring system calls. Shabtai Asaf et al.[10] suggest tool 21. Calculate prior probability;
Andromaly. Andromaly is the host-based malware detection 22. P(C) = Nc N
tool. Andromaly continuously monitors various metrics of 23. End
the device like battery usage, CPU usage, the number of 24. Foreach known value pair
active processes and amount of data transferred through a 25. Calculate conditional probability;
network. Then it applies the machine learning algorithm for 26. P(w|c) = countw, c() + 1/count(c) + |V
classifying data as normal and malicious. Lok Kwong Yan et 27. | End
al.[15] proposes the dynamic malware tool called as 28. Foreach unknown value pair
Droidscope. Droidscope is a dynamic malware analysis 29. Calculate posterior probability;
@ IJTSRD | Unique Paper ID – IJTSRD26449 | Volume – 3 | Issue – 5 | July - August 2019 Page 2402
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
30. Cmap = argmaxP(x1, x2, x3, , xn)P P(C) End benign and malware. For all system implementations, we
31. Compare posterior probability for each class then return used real-world malware and benign application samples.
class with highest probability as result. Proposed method gives more accurate results and performs
better than previous work. For 3-gram Naive Bayes
Algorithms classifier, the system gives 85% accuracy while in 5-gram
Let D be the Whole system which consists, Naive Bayes classifier; the system gives 89% accuracy. This
D= {I, P, O} indicates the performance of the system is proportional to
the length of system calls.
Where,
Q- Users Query {q1, q2…, qN} REFERENCES
P- Procedure, [1] Faruki Parvez, Ammar Bharmal, Vijay Laxmi, Vijay
F-Files set of {f1, f2,…,fn} Ganmoor, Manoj Singh Gaur, Mauro Conti, and
I-Input, Muttukrishnan Rajarajan. ”Android security: a survey
I-{F, Q}, of issues, malware penetration, and defenses.” IEEE
O- Output. communications surveys & tutorials 17, no. 2(2015):
998-1022.
Where: [2] Arshad Saba, Munam Ali Shah, Abid Khan, and Mansoor
F = Represents the file, Ahmed. ”Android malware detection & protection: a
m1, m2, m3, m4= representing the ith block of the file, survey.” Int. J. Adv. Comput. Sci. Appl 7, no. 2 (2016):
e = encryption key 463-475.
Phase 1: Pre-process Phase
In the pre-processing phase, [3] Feizollah Ali, Nor Badrul Anuar, Rosli Salleh, and
e← H(F), id ← H(e). Ainuddin Wahid Abdul Wahab. ”A review on feature
selection in mobile malware detection.” Digital
Then, the user announces that it has a certain file via id. If Investigation 13 (2015): 22-37.
the file does not exist, the user goes into the upload phase. [4] Desnos Anthony. ”Androguard: Reverse engineering,
Otherwise, the user goes into the De-Duplication phase. malware and goodware analysis of android
applications.” URL code. google. com/p/androguard
Phase 2 The Upload File (2013).
(C, T )← Encoding(e, F) [5] Faruki Parvez, Vijay Ganmoor, Vijay Laxmi, Manoj
Let the file F = (m1, . . . ,mn). Singh Gaur, and Ammar Bharmal. ”AndroSimilar:
The user first invokes the encoding according robust statistical feature signature for Android
malware detection.” In Proceedings of the 6th
Phase 3. The De-Duplication Data(file) International Conference on Security of Information
res∈ {0, 1} ← De-Duplication {U(e, F), S(T)} and Networks, pp. 152-159. ACM, 2013.
If a file announced by a user in the pre-process phase exists
in the cloud server, the user goes into the De-Duplication [6] Arp Daniel, Michael Spreitzenbarth, Malte Hubner,
phase and runs the De-Duplication protocol Hugo Gascon, Konrad Rieck, and C. E. R. T. Siemens.
”DREBIN: Effective and Explainable Detection of
Phase 4: The Update File Android Malware in Your Pocket.” In NDSS. 2014.
res∈ {he∗, (C∗, T ∗)i,⊥} ← Updating{U(e, i, m, OP), S(C, T )} [7] Sanz Borja, Igor Santos, Carlos Laorden, Xabier Ugarte-
In this phase, a user can arbitrarily update the file by Pedrero, Pablo Garcia Bringas, and Gonzalo lvarez.
invoking the update protocol ”Puma: Permission usage to detect malware in
android.” In International Joint Conference CISIS12-
Phase 5: The Proof of Storage to Owner ICEUTE 12-SOCO 12 Special Sessions, pp. 289-298.
res∈ {0, 1} ←Checking{S(C, T ), U(e)} Springer Berlin Heidelberg, 2013.
At any time, users can go into the proof of storage phase if [8] Suarez-Tangil, Guillermo, Juan E. Tapiador, Flavio
they have the ownerships of the files. The users and the Lombardi, and Roberto Di Pietro. ”ALTERDROID:
cloud server run the checking protocol. differential fault analysis of obfuscated smartphone
malware.” IEEE Transactions on Mobile Computing 15,
RESULT AND DISCUSSIONS no. 4 (2016): 789-802.
User can upload, download update on cloud server and
provide data De-Duplication. [9] Tam Kimberly, Salahuddin J. Khan, Aristide Fattori, and
Lorenzo Cavallaro. ”CopperDroid: Automatic
CONCLUSIONS Reconstruction of Android Malware Behaviors.” In
In this work, we developed dynamic malware detection NDSS. 2015.
system to detect malware in Android applications. For [10] Shabtai Asaf, Uri Kanonov, Yuval Elovici, Chanan Glezer,
dynamic detection, we used system calls invoked by the and Yael Weiss. ”Andromaly: a behavioral malware
application during execution. After that, Naive Bayes detection framework for android devices.” Journal of
classifier is used to classify runtime behavior of applications. Intelligent Information Systems 38, no. 1 (2012): 161-
In addition, we used 3-gram and 5-gram length of system 190.
calls. Instead of using every system calls; we filter system [11] Yan, Lok-Kwong, and Heng Yin. ”DroidScope:
calls based on frequency. Filtered system calls are used to Seamlessly Reconstructing the OS and Dalvik Semantic
calculate density. Then, by applying Naive Bayes classifier, Views for Dynamic Android Malware Analysis.” In
we classified application in two different classes that are
USENIX security symposium, pp. 569-584. 2012.
@ IJTSRD | Unique Paper ID – IJTSRD26449 | Volume – 3 | Issue – 5 | July - August 2019 Page 2403