DCA for genome-wide epistasis analysis: The statistical genetics perspective

Research output: Contribution to journalArticleScientificpeer-review


  • Chen Yi Gao
  • Fabio Cecconi
  • Angelo Vulpiani
  • Hai Jun Zhou
  • Erik Aurell

Research units

  • CAS - Institute of Theoretical Physics
  • University of Chinese Academy of Sciences
  • University of Rome La Sapienza
  • KTH Royal Institute of Technology
  • Université PSL
  • Accademia Nazionale dei Lincei


Direct coupling analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. DCA has been applied with great success to sequences of homologous proteins, and also more recently to whole-genome population-wide sequencing data. We here argue that the use of DCA on the genome scale is contingent on fundamental issues of population genetics. DCA can be expected to yield meaningful results when a population is in the quasi-linkage equilibrium (QLE) phase studied by Kimura and others, but not, for instance, in a phase of clonal competition. We discuss how the exponential (Potts model) distributions emerge in QLE, and compare couplings to correlations obtained in a study of about 3000 genomes of the human pathogen Streptococcus pneumoniae.


Original languageEnglish
Article number026002
JournalPhysical Biology
Issue number2
Publication statusPublished - 29 Jan 2019
MoE publication typeA1 Journal article-refereed

    Research areas

  • direct coupling analysis, genome-scale, quasi-linkage equilibrium

Download statistics

No data available

ID: 39041153