TextDroid: Semantics-based detection of mobile malware using network flows

Wang, S.; Yan, Q.; Chen, Z.; Yang, B.; Zhao, C.; Conti, M.

doi:10.1109/INFCOMW.2017.8116346

The wide-spreading mobile malware has become a dreadful issue in the increasingly popular mobile networks. Most of the mobile malware relies on network interface to coordinate operations, steal users' private information, and launch attack activities. In this paper, we propose TextDroid, an effective and automated malware detection method combining natural language processing and machine learning. TextDroid can extract distinguishable features (n-gram sequences) to characterize malware samples. A malware detection model is then developed to detect mobile malware using a Support Vector Machine (SVM) classifier. The trained SVM model presents a superior performance on two different data sets, with the malware detection rate reaching 96.36% in the test set and 76.99% in an app set captured in the wild, respectively. In addition, we also design a flow header visualization method to visualize the highlighted texts generated during the apps' network interactions, which assists security researchers in understanding the apps' complex network activities.