SimAndro-Plus: On Computing Similarity of Android Applications

Masoud Reyhani Hamedani1 and Sang-Wook Kim1

  1. Department of Computer and Software, Hanyang University
    Seoul, Korea, 04763
    {masoud,wook}@hanyang.ac.kr

Abstract

In this paper, we propose SimAndro-Plus as an improved variant of the state-of-the-art method, SimAndro, to compute the similarity of Android applications (apps) regarding their functionalities. SimAndro-Plus has two major differences with SimAndro: 1) it exploits two beneficial features to similarity computation, which are totally disregarded by SimAndro; 2) to compute the similarity score of an app-pair based on strings and package name features, SimAndro-Plus considers not only those terms co-appearing in both apps but also considers those terms appearing in one app while missing in the other one. The results of our extensive experiments with three real-world datasets and a dataset constructed by human experts demonstrate that 1) each of the two aforementioned differences is really effective to achieve better accuracy and 2) SimAndro-Plus outperforms SimAndro in similarity computation by 14% in average.

Key words

android applications, apps data mining, feature extraction, API calls, manifest information, similarity computation

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS210208036H

Publication information

Volume 18, Issue 4 (September 2021)
Year of Publication: 2021
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Hamedani, M. R., Kim, S.: SimAndro-Plus: On Computing Similarity of Android Applications. Computer Science and Information Systems, Vol. 18, No. 4, 1219–1238. (2021), https://doi.org/10.2298/CSIS210208036H