Android Malfare Detection: Building Useful Representations

Luiza Sayfullina, Emil Eirola, Dmitri Komashinskiy, Paolo Palumbo, Juha Karhunen

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

5 Sitaatiot (Scopus)
122 Lataukset (Pure)


The problem of proactively detecting Android Malware has proven to be a challenging one. The challenges stem from a variety of issues, but recent literature has shown that this task is hard to solve with high accuracy when only a restricted set of features, like permissions or similar fixed sets of features, are used. The opposite approach of including all available features is also problematic, as it causes the features space to grow beyond reasonable size. In this paper we focus on finding an efficient way to select a representative feature space, preserving its discriminative power on unseen data. We go beyond traditional approaches like Principal Component Analysis, which is too heavy for large-scale problems with millions of features. In particular we show that many feature groups that can be extracted from Android application packages, like features extracted from the manifest file or strings extracted from the Dalvik Executable (DEX), should be filtered and used in classification separately. Our proposed dimensionality reduction scheme is applied to each group separately and consists of raw string preprocessing, feature selection via log-odds and finally applying random projections. With the size of the feature space growing exponentially as a function of the training set's size, our approach drastically decreases the size of the feature space of several orders of magnitude, this in turn allows accurate classification to become possible in a real world scenario. After reducing the dimensionality we use the feature groups in a light-weight ensemble of logistic classifiers. We evaluated the proposed classification scheme on real malware data provided by the antivirus vendor and achieved state-of-the-art 88.24% true positive and reasonably low 0.04% false positive rates with a significantly compressed feature space on a balanced test set of 10,000 samples.
Otsikko2016 15th IEEE International Conference on Machine Learning and Applications, ICMLA 2016, Proceedings
AlaotsikkoAnaheim, California, USA, December 18-20, 2016.
ISBN (painettu)978-1-5090-6166-2
DOI - pysyväislinkit
TilaJulkaistu - 2017
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Machine Learning and Applications - Anaheim, Yhdysvallat
Kesto: 18 joulukuuta 201620 joulukuuta 2016
Konferenssinumero: 15


ConferenceIEEE International Conference on Machine Learning and Applications

Sormenjälki Sukella tutkimusaiheisiin 'Android Malfare Detection: Building Useful Representations'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

  • Lehtileikkeet

    Machine Learning Methods for Classification of Unstructured Data

    Eric Malmi & Juha Karhunen


    1 kohde/ Medianäkyvyys

    Lehdistö/media: Esiintyminen mediassa

    Siteeraa tätä

    Sayfullina, L., Eirola, E., Komashinskiy, D., Palumbo, P., & Karhunen, J. (2017). Android Malfare Detection: Building Useful Representations. teoksessa 2016 15th IEEE International Conference on Machine Learning and Applications, ICMLA 2016, Proceedings: Anaheim, California, USA, December 18-20, 2016. (Sivut 201-206). IEEE.