Abstract
Identifying and illustrating patterns in DNA sequences are crucial tasks in various biological data analyses. In this task, patterns are often represented by sets of k-mers, the fundamental building blocks of DNA sequences. To visually unveil these patterns, one could project each k-mer onto a point in two-dimensional (2D) space. However, this projection poses challenges owing to the high-dimensional nature of k-mers and their unique mathematical properties. Here, we establish a mathematical system to address the peculiarities of the k-mer manifold. Leveraging this k-mer manifold theory, we develop a statistical method named KMAP for detecting k-mer patterns and visualizing them in 2D space. We applied KMAP to three distinct data sets to showcase its utility. KMAP achieves a comparable performance to the classical method MEME, with ∼90% similarity in motif discovery from HT-SELEX data. In the analysis of H3K27ac ChIP-seq data from Ewing sarcoma (EWS), we find that BACH1, OTX2, and KNCH2 might affect EWS prognosis by binding to promoter and enhancer regions across the genome. We also observe potential colocalization of BACH1, OTX2, and the motif CCCAGGCTGGAGTGC in ∼70 bp windows in the enhancer regions. Furthermore, we find that FLI1 binds to the enhancer regions after ETV6 degradation, indicating competitive binding between ETV6 and FLI1. Moreover, KMAP identifies four prevalent patterns in gene editing data of the AAVS1 locus, aligning with findings reported in the literature. These applications underscore that KMAP can be a valuable tool across various biological contexts.
| Original language | English |
|---|---|
| Pages (from-to) | 1234-1246 |
| Number of pages | 13 |
| Journal | Genome Research |
| Volume | 35 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - May 2025 |
| MoE publication type | A1 Journal article-refereed |
Fingerprint
Dive into the research topics of 'k-mer manifold approximation and projection for visualizing DNA sequences'. Together they form a unique fingerprint.Projects
- 2 Finished
-
Cheng Lu AoF costs part2: Systematic approach to study alternative splicing from scRNA-seq in cancer
Cheng, L. (Principal investigator), Cheng, G. (Project Member), Fu, C. (Project Member) & Lampinen, J. (Principal investigator)
01/09/2023 → 31/08/2025
Project: RCF Academy Research Fellow: Research costs
-
-: Cheng Lu/AoF costs
Lampinen, J. (Principal investigator), Fu, C. (Project Member), Säilynoja, T. (Project Member), Huang, D. (Project Member) & Cheng, G. (Project Member)
01/09/2020 → 22/05/2024
Project: RCF Academy Research Fellow: Research costs
Equipment
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver