Skip to main navigation Skip to search Skip to main content

k-mer manifold approximation and projection for visualizing DNA sequences

  • Chengbo Fu
  • , Einari A. Niskanen
  • , Gong Hong Wei
  • , Zhirong Yang
  • , Marta Sanvicente-García
  • , Marc Güell
  • , Lu Cheng*
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

2 Citations (Scopus)
4 Downloads (Pure)

Abstract

Identifying and illustrating patterns in DNA sequences are crucial tasks in various biological data analyses. In this task, patterns are often represented by sets of k-mers, the fundamental building blocks of DNA sequences. To visually unveil these patterns, one could project each k-mer onto a point in two-dimensional (2D) space. However, this projection poses challenges owing to the high-dimensional nature of k-mers and their unique mathematical properties. Here, we establish a mathematical system to address the peculiarities of the k-mer manifold. Leveraging this k-mer manifold theory, we develop a statistical method named KMAP for detecting k-mer patterns and visualizing them in 2D space. We applied KMAP to three distinct data sets to showcase its utility. KMAP achieves a comparable performance to the classical method MEME, with ∼90% similarity in motif discovery from HT-SELEX data. In the analysis of H3K27ac ChIP-seq data from Ewing sarcoma (EWS), we find that BACH1, OTX2, and KNCH2 might affect EWS prognosis by binding to promoter and enhancer regions across the genome. We also observe potential colocalization of BACH1, OTX2, and the motif CCCAGGCTGGAGTGC in ∼70 bp windows in the enhancer regions. Furthermore, we find that FLI1 binds to the enhancer regions after ETV6 degradation, indicating competitive binding between ETV6 and FLI1. Moreover, KMAP identifies four prevalent patterns in gene editing data of the AAVS1 locus, aligning with findings reported in the literature. These applications underscore that KMAP can be a valuable tool across various biological contexts.

Original languageEnglish
Pages (from-to)1234-1246
Number of pages13
JournalGenome Research
Volume35
Issue number5
DOIs
Publication statusPublished - May 2025
MoE publication typeA1 Journal article-refereed

Fingerprint

Dive into the research topics of 'k-mer manifold approximation and projection for visualizing DNA sequences'. Together they form a unique fingerprint.
  • Science-IT

    Hakala, M. (Manager)

    School of Science

    Facility/equipment: Facility

Cite this