TY - JOUR
T1 - Identifying 3D Genome Organization in Diploid Organisms via Euclidean Distance Geometry
AU - Belyaeva, Anastasiya
AU - Kubjas, Kaie
AU - Sun, Lawrence J.
AU - Uhler, Caroline
N1 - | openaire: EC/H2020/748354/EU//NonnegativeRank
PY - 2022/2/28
Y1 - 2022/2/28
N2 - The spatial organization of the genome in the cell nucleus plays an important role for gene regulation, replication of the deoxyribonucleic acid (DNA), and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, and Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the three-dimensional (3D) organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from the distance measurements derived from contact frequencies in diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higher-order interactions) and prove identifiability under these conditions. Furthermore, we provide semidefinite programming formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higher-order contact frequency data and show that we can recover known genome organization patterns.
AB - The spatial organization of the genome in the cell nucleus plays an important role for gene regulation, replication of the deoxyribonucleic acid (DNA), and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, and Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the three-dimensional (3D) organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from the distance measurements derived from contact frequencies in diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higher-order interactions) and prove identifiability under these conditions. Furthermore, we provide semidefinite programming formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higher-order contact frequency data and show that we can recover known genome organization patterns.
KW - 3D genome organization
KW - diploid organisms
KW - semidefinite programming
KW - Hi-C
KW - Euclidean distance geometry
KW - systems of polynomial equations
U2 - 10.1137/21M1390372
DO - 10.1137/21M1390372
M3 - Article
SN - 2577-0187
VL - 4
SP - 204
EP - 228
JO - SIAM Journal on Mathematics of Data Science
JF - SIAM Journal on Mathematics of Data Science
IS - 1
ER -