High-dimensional structure learning of binary pairwise Markov networks: A comparative numerical study

Research output: Contribution to journalArticleScientificpeer-review

Standard

High-dimensional structure learning of binary pairwise Markov networks : A comparative numerical study. / Pensar, Johan; Xu, Yingying; Puranen, Santeri; Pesonen, Maiju; Kabashima, Yoshiyuki; Corander, Jukka.

In: Computational Statistics and Data Analysis, Vol. 141, 01.01.2020, p. 62-76.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

APA

Vancouver

Author

Bibtex - Download

@article{0ed49042a031420aae3fd617e08d8115,
title = "High-dimensional structure learning of binary pairwise Markov networks: A comparative numerical study",
abstract = "Learning the undirected graph structure of a Markov network from data is a problem that has received a lot of attention during the last few decades. As a result of the general applicability of the model class, a myriad of methods have been developed in parallel in several research fields. Recently, as the size of the considered systems has increased, the focus of new methods has been shifted towards the high-dimensional domain. In particular, introduction of the pseudo-likelihood function has pushed the limits of score-based methods which were originally based on the likelihood function. At the same time, methods based on simple pairwise tests have been developed to meet the challenges arising from increasingly large data sets in computational biology. Apart from being applicable to high-dimensional problems, methods based on the pseudo-likelihood and pairwise tests are fundamentally very different. To compare the accuracy of the different types of methods, an extensive numerical study is performed on data generated by binary pairwise Markov networks. A parallelizable Gibbs sampler, based on restricted Boltzmann machines, is proposed as a tool to efficiently sample from sparse high-dimensional networks. The results of the study show that pairwise methods can be more accurate than pseudo-likelihood methods in settings often encountered in high-dimensional structure learning applications.",
keywords = "Gibbs sampler, Ising model, Markov network, Mutual information, Pseudo-likelihood, Structure learning",
author = "Johan Pensar and Yingying Xu and Santeri Puranen and Maiju Pesonen and Yoshiyuki Kabashima and Jukka Corander",
year = "2020",
month = "1",
day = "1",
doi = "10.1016/j.csda.2019.06.012",
language = "English",
volume = "141",
pages = "62--76",
journal = "Computational Statistics & Data Analysis",
issn = "0167-9473",
publisher = "Elsevier Science B.V.",

}

RIS - Download

TY - JOUR

T1 - High-dimensional structure learning of binary pairwise Markov networks

T2 - A comparative numerical study

AU - Pensar, Johan

AU - Xu, Yingying

AU - Puranen, Santeri

AU - Pesonen, Maiju

AU - Kabashima, Yoshiyuki

AU - Corander, Jukka

PY - 2020/1/1

Y1 - 2020/1/1

N2 - Learning the undirected graph structure of a Markov network from data is a problem that has received a lot of attention during the last few decades. As a result of the general applicability of the model class, a myriad of methods have been developed in parallel in several research fields. Recently, as the size of the considered systems has increased, the focus of new methods has been shifted towards the high-dimensional domain. In particular, introduction of the pseudo-likelihood function has pushed the limits of score-based methods which were originally based on the likelihood function. At the same time, methods based on simple pairwise tests have been developed to meet the challenges arising from increasingly large data sets in computational biology. Apart from being applicable to high-dimensional problems, methods based on the pseudo-likelihood and pairwise tests are fundamentally very different. To compare the accuracy of the different types of methods, an extensive numerical study is performed on data generated by binary pairwise Markov networks. A parallelizable Gibbs sampler, based on restricted Boltzmann machines, is proposed as a tool to efficiently sample from sparse high-dimensional networks. The results of the study show that pairwise methods can be more accurate than pseudo-likelihood methods in settings often encountered in high-dimensional structure learning applications.

AB - Learning the undirected graph structure of a Markov network from data is a problem that has received a lot of attention during the last few decades. As a result of the general applicability of the model class, a myriad of methods have been developed in parallel in several research fields. Recently, as the size of the considered systems has increased, the focus of new methods has been shifted towards the high-dimensional domain. In particular, introduction of the pseudo-likelihood function has pushed the limits of score-based methods which were originally based on the likelihood function. At the same time, methods based on simple pairwise tests have been developed to meet the challenges arising from increasingly large data sets in computational biology. Apart from being applicable to high-dimensional problems, methods based on the pseudo-likelihood and pairwise tests are fundamentally very different. To compare the accuracy of the different types of methods, an extensive numerical study is performed on data generated by binary pairwise Markov networks. A parallelizable Gibbs sampler, based on restricted Boltzmann machines, is proposed as a tool to efficiently sample from sparse high-dimensional networks. The results of the study show that pairwise methods can be more accurate than pseudo-likelihood methods in settings often encountered in high-dimensional structure learning applications.

KW - Gibbs sampler

KW - Ising model

KW - Markov network

KW - Mutual information

KW - Pseudo-likelihood

KW - Structure learning

UR - http://www.scopus.com/inward/record.url?scp=85068545371&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2019.06.012

DO - 10.1016/j.csda.2019.06.012

M3 - Article

VL - 141

SP - 62

EP - 76

JO - Computational Statistics & Data Analysis

JF - Computational Statistics & Data Analysis

SN - 0167-9473

ER -

ID: 35440622