TY - JOUR
T1 - A bag-of-paths framework for network data analysis
AU - Françoisse, Kevin
AU - Kivimäki, Ilkka
AU - Mantrach, Amin
AU - Rossi, Fabrice
AU - Saerens, Marco
PY - 2017/6/1
Y1 - 2017/6/1
N2 - This work develops a generic framework, called the bag-of-paths (BoP), for link and network data analysis. The central idea is to assign a probability distribution on the set of all paths in a network. More precisely, a Gibbs–Boltzmann distribution is defined over a bag of paths in a network, that is, on a representation that considers all paths independently. We show that, under this distribution, the probability of drawing a path connecting two nodes can easily be computed in closed form by simple matrix inversion. This probability captures a notion of relatedness, or more precisely accessibility, between nodes of the graph: two nodes are considered as highly related when they are connected by many, preferably low-cost, paths. As an application, two families of distances between nodes are derived from the BoP probabilities. Interestingly, the second distance family interpolates between the shortest-path distance and the commute-cost distance. In addition, it extends the Bellman–Ford formula for computing the shortest-path distance in order to integrate sub-optimal paths (exploration) by simply replacing the minimum operator by the soft minimum operator. Experimental results on semi-supervised classification tasks show that both of the new distance families are competitive with other state-of-the-art approaches. In addition to the distance measures studied in this paper, the bag-of-paths framework enables straightforward computation of many other relevant network measures.
AB - This work develops a generic framework, called the bag-of-paths (BoP), for link and network data analysis. The central idea is to assign a probability distribution on the set of all paths in a network. More precisely, a Gibbs–Boltzmann distribution is defined over a bag of paths in a network, that is, on a representation that considers all paths independently. We show that, under this distribution, the probability of drawing a path connecting two nodes can easily be computed in closed form by simple matrix inversion. This probability captures a notion of relatedness, or more precisely accessibility, between nodes of the graph: two nodes are considered as highly related when they are connected by many, preferably low-cost, paths. As an application, two families of distances between nodes are derived from the BoP probabilities. Interestingly, the second distance family interpolates between the shortest-path distance and the commute-cost distance. In addition, it extends the Bellman–Ford formula for computing the shortest-path distance in order to integrate sub-optimal paths (exploration) by simply replacing the minimum operator by the soft minimum operator. Experimental results on semi-supervised classification tasks show that both of the new distance families are competitive with other state-of-the-art approaches. In addition to the distance measures studied in this paper, the bag-of-paths framework enables straightforward computation of many other relevant network measures.
KW - Commute-time distance
KW - Distance and similarity on a graph
KW - Link analysis
KW - Network science
KW - Resistance distance
KW - Semi-supervised classification
UR - https://www.scopus.com/pages/publications/85018251790
U2 - 10.1016/j.neunet.2017.03.010
DO - 10.1016/j.neunet.2017.03.010
M3 - Article
AN - SCOPUS:85018251790
SN - 0893-6080
VL - 90
SP - 90
EP - 111
JO - Neural Networks
JF - Neural Networks
ER -