TY - JOUR
T1 - Clustering and predicting the data usage patterns of geographically diverse mobile users
AU - Walelgne, Ermias
AU - Asrese, Alemnew
AU - Manner, Jukka
AU - Bajpai, Vaibhav
AU - Ott, Jörg
PY - 2021/3/14
Y1 - 2021/3/14
N2 - Mobile users demand more and more data traffic, yet network resources are limited. This creates a challenge for network resource management. One way of addressing this challenge is by understanding the data usage patterns of mobile users so that resources can be optimally allocated based on user traffic demand and data usage behavior. However, understanding and characterizing the data usage patterns of mobile users is a complex task. In this work, we investigate and characterize users’ data usage patterns and behavior in mobile networks. We leverage a dataset (∼113 million records) collected through a crowd-based mobile network measurement platform – Netradar – across five countries. Data usage behavior of users over a cellular network is primarily driven by user mobility, the type of subscription plan marketed by Mobile Network Operators (MNOs), network congestion, and network coverage. We apply an unsupervised machine learning approach to cluster mobile user types by considering different factors such as data consumption, network access type, the number of sessions created per user, throughput, and mobility. By defining data usage pattern of mobile users, we develop a user clustering model and identify three different mobile user groups (clusters). Our clustering model shows that the data usage patterns are unevenly distributed across the five countries studied, characterized by a small number of heavy users consuming the highest volume of data. We show how the types of applications installed by users correlate with data consumption patterns in some countries. Heavy users tend to install more traffic-demanding apps than users from the other two groups – regular and light users. Finally, we trained a classification model using the labeled dataset produced by our aforementioned user clustering method. The model helps classifying mobile users according to their usage patterns (i.e., heavy, regular, and light) with an accuracy of ∼80% in the test dataset.
AB - Mobile users demand more and more data traffic, yet network resources are limited. This creates a challenge for network resource management. One way of addressing this challenge is by understanding the data usage patterns of mobile users so that resources can be optimally allocated based on user traffic demand and data usage behavior. However, understanding and characterizing the data usage patterns of mobile users is a complex task. In this work, we investigate and characterize users’ data usage patterns and behavior in mobile networks. We leverage a dataset (∼113 million records) collected through a crowd-based mobile network measurement platform – Netradar – across five countries. Data usage behavior of users over a cellular network is primarily driven by user mobility, the type of subscription plan marketed by Mobile Network Operators (MNOs), network congestion, and network coverage. We apply an unsupervised machine learning approach to cluster mobile user types by considering different factors such as data consumption, network access type, the number of sessions created per user, throughput, and mobility. By defining data usage pattern of mobile users, we develop a user clustering model and identify three different mobile user groups (clusters). Our clustering model shows that the data usage patterns are unevenly distributed across the five countries studied, characterized by a small number of heavy users consuming the highest volume of data. We show how the types of applications installed by users correlate with data consumption patterns in some countries. Heavy users tend to install more traffic-demanding apps than users from the other two groups – regular and light users. Finally, we trained a classification model using the labeled dataset produced by our aforementioned user clustering method. The model helps classifying mobile users according to their usage patterns (i.e., heavy, regular, and light) with an accuracy of ∼80% in the test dataset.
KW - Mobile networks
KW - Data usage patterns
KW - User behavior modeling
KW - Clustering data usage
UR - http://www.scopus.com/inward/record.url?scp=85099523404&partnerID=8YFLogxK
U2 - 10.1016/j.comnet.2020.107737
DO - 10.1016/j.comnet.2020.107737
M3 - Article
VL - 187
JO - Computer Networks
JF - Computer Networks
SN - 1389-1286
M1 - 107737
ER -