Random projection based clustering for population genomics

Sotiris Tasoulis, Lu Cheng, Niko Valimaki, Nicholas J. Croucher, Simon R. Harris, William P. Hanage, Teemu Roos, Jukka Corander

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

9 Sitaatiot (Scopus)

Abstrakti

Recent data revolution in population genomics for bacteria has increased the size of aligned sequence data sets by two-to-three orders of magnitude. This trend is expected to continue in the near future, putting an emphasis on applicability of big data techniques to leverage biologically important insights. Moreover, with the increasing density of sampling, it may also be necessary to consider alignment-free sequence analysis techniques combined with clustering to yield a sufficient insight to data. This leads to ultra high-dimensional data with tens of millions of variables, which can no longer be handled by the existing population genomic methods. Using the largest bacterial sequence data sets published to date, we demonstrate that random projection based clustering provides a highly accurate and several orders of magnitude faster approach to the analysis of both alignment-based and alignment-free genome data sets, compared with the Bayesian model-based analysis that is currently considered as the state-of-the-art. Hence, clustering methods for big data harbor considerable potential for important applications in genomics and could pave way for novel analysis pipelines even in the online setting when executed in a massively parallel computing environment.

AlkuperäiskieliEnglanti
OtsikkoProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
KustantajaIEEE
Sivut675-682
Sivumäärä8
ISBN (elektroninen)9781479956654
DOI - pysyväislinkit
TilaJulkaistu - 7 tammikuuta 2015
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Big Data - Washington, Yhdysvallat
Kesto: 27 lokakuuta 201430 lokakuuta 2014
Konferenssinumero: 2

Conference

ConferenceIEEE International Conference on Big Data
LyhennettäBig Data
MaaYhdysvallat
KaupunkiWashington
Ajanjakso27/10/201430/10/2014

Sormenjälki

Sukella tutkimusaiheisiin 'Random projection based clustering for population genomics'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä