Abstract
Rapid growth in the variety and quantity of apps makes it difficult for users to protect their privacy, although existing regulations have been introduced and the Android ecosystem is constantly being improved, there are still violations as privacy policies may not fully comply with regulations, and app behavior may not be fully consistent with privacy policies. To solve such issues, this paper proposes an automated method called VioDroid-Finder aiming at the evaluation of compliance and consistency for Android apps. We first study existing common regulations and conclude the privacy policy content into 7 aspects (i.e., privacy categories), for privacy policies, different compliance rules are required to be complied with in each privacy category. Secondly, we present a policy structure parser model based on the structure extraction/rebuilding method (which can convert the unstructured text to an XML tree) and subtitle similarity calculation algorithm. Thirdly, we propose a violation analyzer using the BERT model to classify each sentence in the privacy policy, we collect existing issues and combine them with manual observations to define 6 types of violations and detect them based on classification results. Then, we propose an inconsistency analyzer that converts permissions, APIs, and GUI into a set of personal information based on static analysis, inconsistencies are detected by comparing that set with personal information declared in the privacy policy. Finally, we evaluate 600 Chinese apps using the proposed method, from which we detect many violations and inconsistencies reflecting the current widespread privacy violation issues.

















Similar content being viewed by others
Notes
When we visit them in December 2022, including Huawei AppGallery, Wandoujia, and Xiaomi AppGallery
Although this privacy policy was soon updated and the disclaimer was deleted, we took a screenshot before the update
https://appgallery.huawei.com/app/C101690401, accessed October 2023
https://appgallery.huawei.com/app/C10572603, accessed October 2023
References
Ahmad W, Chi J, Tian Y, Chang K-W (2020) PolicyQA: a reading comprehension dataset for privacy policies. arXiv:2010.02557
AIR (2023) China’s new AI regulations. https://www.lw.com/admin/upload/SiteAttachments/Chinas-New-AI-Regulations.pdf. Accessed 20 Oct 2023
AndroidDeveloper (2022) Developer guides. https://developer.android.com/guide. Accessed 18 Sept 2022
Statista (2022) Mobile operating systems’ market share worldwide from 1st quarter 2009 to 4th quarter 2022. https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/. Accessed 18 Sept 2022
Apktool (2022) Apktool: a tool for reverse engineering android APK files. https://ibotpeaches.github.io/Apktool/. Accessed 18 Sept 2022
Breiman L (2001) Random forests. Mach Learn 45:5–32
Bui D, Shin KG, Choi J-M, Shin J (2021) Automated extraction and presentation of data practices in privacy policies. Proc Priv Enhancing Technol 2021(2):88–110
CEJAS OA, Abualhaija S, Torre D, Sabetzadeh M, Briand L (2021) AI-enabled automation for completeness checking of privacy policies. IEEE Trans Softw Eng
CGTN (2018) CCA report: 91 out of 100 apps suspected of excessive collection of personal data. https://news.cgtn.com/news/3d3d514e7959544f30457a6333566d54/share_p.html. Accessed 18 Sept 2022
Chen J, Huang C, Han J (2024) VioDroid-Finder: automated evaluation of compliance and consistency for Android apps. https://doi.org/10.5281/zenodo.10690737
Coppola R, Morisio M, Torchiano M, Ardito L (2019) Scripted GUI testing of Android open-source apps: evolution of test code and fragility causes. Empir Softw Eng 24:3205–3248
Cui H, Trimananda R, Markopoulou A, Jordan S (2022) PoliGraph: automated privacy policy analysis using knowledge graphs. arXiv:2210.06746
Custers B, Sears AM, Dechesne F, Georgieva I, Tani T, Van der Hof S (2019) EU personal data protection in policy and practice. Springer
Daoudi N, Allix K, Bissyandé TF, Klein J (2023) Assessing the opportunity of combining state-of-the-art android malware detectors. Empir Softw Eng 28(2):22
DasLab (2024) viodroid-finder. https://github.com/das-lab/VioDroid-Finder. Accessed 22 Feb 2024
Demissie BF, Ceccato M, Shar LK (2020) Security analysis of permission re-delegation vulnerabilities in android apps. Empir Softw Eng 25:5084–5136
Desnos A, Gueguen G (2013) Androguard-reverse engineering, malware and goodware analysis of android applications. URL code. google. com/p/androguard, 153
Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Elluri L, Joshi KP, Kotal A (2020) Measuring semantic similarity across EU GDPR regulation and cloud privacy policies. In: 2020 IEEE international conference on big data (Big Data). IEEE, pp 3963–3978
Fan O, Jian X (2022) S3Feature: a static sensitive subgraph-based feature for android malware detection. Comput Secur 112:102513
Fan M, Yu L, Chen S, Zhou H, Luo X, Li S, Liu Y, Liu J, Liu T (2020) An empirical evaluation of GDPR compliance violations in Android mHealth apps. In: 2020 IEEE 31st international symposium on software reliability engineering (ISSRE). IEEE, pp 253–264
GooglePlay (2022) Google play store, data security. https://support.google.com/googleplay/answer/11416267?hl=en. Accessed 18 Sept 2022
He H, Choi JD (2021) The stem cell hypothesis: dilemma behind multi-task learning with transformer encoders. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 5555–5577, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://aclanthology.org/2021.emnlp-main.451
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Huang J, Li Z, Xiao X, Wu Z, Lu K, Zhang X, Jiang G (2015) \(\{\)SUPOR\(\}\): precise and scalable sensitive user input detection for android apps. In: 24th USENIX security symposium (USENIX Security 15). pp 977–992
Huang J, Zhang X, Tan L, Wang P, Liang B (2014) Asdroid: detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In: Proceedings of the 36th international conference on software engineering. pp 1036–1046
Kaur J, Dara RA, Obimbo C, Song F, Menard K (2018) A comprehensive keyword analysis of online privacy policies. Inf Secur J: Glob Perspect 27(5–6):260–275
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338
Liu X, Liu J, Zhu S, Wang W, Zhang X (2019) Privacy risk analysis and mitigation of analytics libraries in the android ecosystem. IEEE Trans Mobile Comput 19(5):1184–1199
Liu K, Xu G, Zhang X, Xu G, Zhao Z (2022) Evaluating the privacy policy of android apps: a privacy policy compliance study for popular apps in China and Europe. Sci Program 2022
Liu S, Zhao B, Guo R, Meng G, Zhang F, Zhang M (2021) Have you been properly notified? automatic compliance analysis of privacy policy text with GDPR Article 13. In: Proceedings of the web conference, vol 2021, pp 2154–2164
McDonald AM, Cranor LF (2008) The cost of reading privacy policies. Isjlp 4:543
Nakayama H, Kubo T, Kamura J, Taniguchi Y, Liang X (2018) Doccano: text annotation tool for human. Software available from https://github.com/doccano/doccano
Nan Y, Yang Z, Yang M, Zhou S, Zhang Y, Guofei G, Wang X, Sun L (2016) Identifying user-input privacy in mobile applications at a large scale. IEEE Trans Inf Forensics Secur 12(3):647–661
Nan Y, Yang M, Yang Z, Zhou S, Gu G, Wang X (2015) \(\{\)UIPicker\(\}\):\(\{\)User-Input\(\}\) privacy identification in mobile applications. In: 24th USENIX security symposium (USENIX Security 15). pp 993–1008
Nejad NM, Jabat P, Nedelchev R, Scerri S, Graux D (2020) Establishing a strong baseline for privacy policy classification. In: IFIP international conference on ICT systems security and privacy protection. Springer, pp 370–383
NetworkData (2021) Guidelines for classification and hierarchy of network data. https://www.tc260.org.cn/upload/2021-12-31/1640948142376022576.pdf. Accessed 22 Feb 2024
Ni Z, Wang Y, Qian Y et al (2021) Privacy policy compliance of chronic disease management apps in China: scale development and content evaluation. JMIR mHealth and uHealth 9(1):e23409
NPI (2020) Scope of necessary personal information for common types of mobile internet applications (apps) (draft for solicitation of comments). https://www.chinalawtranslate.com/en/app-necessary-data. Accessed 18 Sept 2022
Okoyomon E, Samarin N, Wijesekera P, On AEB, Vallina-Rodriguez N, Reyes I, Feal Á, Egelman S et al (2019) On the ridiculousness of notice and consent: contradictions in app privacy policies. In: Workshop on technology and consumer protection (ConPro 2019), in conjunction with the 39th IEEE symposium on security and privacy
PIC (2022) Mobile intelligent terminal and application software user personal information protection implementation guide, part 2: Personal information classification. https://www.taf.org.cn/upload/notice/2022-0919-150143-7293742.pdf. Accessed 22 Feb 2024
PIPL (2021) Personal information protection law of the People’s Republic of China. http://en.npc.gov.cn.cdurl.cn/2021-12/29/c_694559.htm. Accessed 18 Sept 2022
PIS (2020) Information security technology- personal information (pi) security specification. https://www.tc260.org.cn/upload/2020-09-18/1600432872689070371.pdf. Accessed 18 Sept 2022
PIRule (2020) Mobile intelligent terminal and application software user personalinformation protection implementation guide, part 8: Personal information collection and use rules. https://www.taf.org.cn/Association_standard_detail.aspx?Id=7a9c1009-07f5-42a7-830a-a36c65e647e4. Accessed 22 Feb 2024
Ramos J et al (2003) Using TF-IDF to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242. Citeseer, pp 29–48
Rehurek R, Sojka P (2011) Gensim–Python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 3(2)
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108
Sathyendra KM, Wilson S, Schaub F, Zimmeck S, Sadeh N (2017) Identifying the provision of choices in privacy policy text. In: Proceedings of the 2017 conference on empirical methods in natural language processing. pp 2774–2779
Sel (2020) Self-assessment guide for the collection and use of personal information by mobile internet applications (apps). https://www.tc260.org.cn/upload/2020-07-22/1595396892533085831.pdf. Accessed 22 Feb 2024
Shanghai Consumer Council (2020) The evaluation report of 600 apps. https://315.sh.cn/html/wqdt/2020/12/17/40657563-14e0-4fd9-bb8c-8d42609a9c03.shtml. Accessed 18 Sept 2022
Shar LK, Demissie BF, Ceccato M, Tun YN, Lo D, Jiang L, Bienert C (2023) Experimental comparison of features, analyses, and classifiers for android malware detection. Empir Softw Eng 28(6):130
Slavin R, Wang X, Hosseini MB, Hester J, Krishnan R, Bhatia J, Breaux TD, Niu J (2016a) PVDetector: a detector of privacy-policy violations for Android apps. In: Proceedings of the international conference on mobile software engineering and systems. pp 299–300
Slavin R, Wang X, Hosseini MB, Hester J, Krishnan R, Bhatia J, Breaux TD, Niu J (2016b) Toward a framework for detecting privacy policy violations in android application code. In: Proceedings of the 38th international conference on software engineering. pp 25–36
PapersWithCode (2023) Text classification task best model. https://paperswithcode.com/task/text-classification. Accessed 15 Oct 2023
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30
Verderame L, Caputo D, Romdhana A, Merlo A (2020) On the (un) reliability of privacy policies in android apps. In: 2020 International joint conference on neural networks (IJCNN). IEEE, pp 1–9
Wang Y, Chen Y, Ye F, Liu H, Yang J (2019) Implications of smartphone user privacy leakage from the advertiser’s perspective. Pervasive Mob Comput 53:13–32
Wilson S, Schaub F, Dara AA, Liu F, Cherivirala S, Leon PG, Andersen MS, Zimmeck S, Sathyendra KM, Russell NC et al (2016) The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 1330–1340
Xposed (2022) xposed. https://github.com/rovo89/Xposed. Accessed 18 Sept 2022
Jiemian (2022) Learning app suspected of mass data leak. https://en.jiemian.com/article/7638486.html. Accessed 22 Feb 2024
Yu L, Luo X, Chen J, Zhou H, Zhang T, Chang H, Leung HKN (2018) PPChecker: towards accessing the trustworthiness of android apps’ privacy policies. IEEE Trans Softw Eng 47(2):221–242
Zaeem RN, Barber KS (2021) A large publicly available corpus of website privacy policies based on DMOZ. In: Proceedings of the eleventh ACM conference on data and application security and privacy. pp 143–148
Zimmeck S, Story P, Smullen D, Ravichander A, Wang Z, Reidenberg JR, Russell NC, Sadeh N (2019) MAPS: scaling privacy compliance analysis to a million apps. Proc Priv Enhancing Tech 2019:66
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (61902265), National Key Research and Development Program of China (No.2021YFB3100500).
Funding
This research is funded by the National Key Research and Development Program of China (No.2021YFB3100500), National Natural Science Foundation of China (No.61902265).
Author information
Authors and Affiliations
Contributions
Conceptualization: Junren Chen; Methodology: Junren Chen, Cheng Huang; Software: Junren Chen, Jiaxuan Han; Validation: Jiaxuan Han; Data curation: Junren Chen; Writing - Original Draft: Junren Chen; Writing - Review & Editing: Cheng Huang, Jiaxuan Han; Supervision: Cheng Huang
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no relevant or non-financial financial interests to disclose.
Additional information
Communicated by: Meiyappan Nagappan.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Policy Structure Parser: Example
We present an example of what the Policy Structure Parser module does in Fig. 17.
Appendix B: Classification Model Evaluation
In Table 10, we present the evaluation of four classification models, note that the precision and recall are all close to 1.000 for CR1, this is because the sentences that comply with CR1 almost have the same format, like “Policy updated on xxxx-xx-xx”, which makes it easy for models to classify these sentences.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, J., Huang, C. & Han, J. VioDroid-Finder: automated evaluation of compliance and consistency for Android apps. Empir Software Eng 29, 64 (2024). https://doi.org/10.1007/s10664-024-10470-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-024-10470-8