FAST AND ROBUST BOOTSTRAP IN ANALYSING LARGE MULTIVARIATE DATASETS

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussavertaisarvioitu

Tutkijat

Organisaatiot

Kuvaus

In this paper we address the problem of performing statistical inference for large scale data sets. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single node. We propose a scalable, statistically robust and computationally efficient bootstrap method compatible with distributed processing and storage systems. Bootstrapping is performed on multiple smaller distinct subsets of data similarly to the bag of little bootstrap method (BLB) [1]. For each bootstrap replica drawn from distinct data subsets, a computationally efficient fixed-point estimation equation is solved. The proposed bootstrap method facilitates using highly robust statistical methods in analyzing large scale data sets. Significant savings in computation is achieved since the method does not require recomputing the estimator for each bootstrap sample but it is done analytically using a smart approximation. Simulation examples demonstrate the usefulness and validity of the method for bootstrap analysis of large data sets.

Yksityiskohdat

AlkuperäiskieliEnglanti
OtsikkoCONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS
ToimittajatMichael B. Matthews
TilaJulkaistu - 2014
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaAsilomar Conference on Signals, Systems & Computers - Pacific Grove, Yhdysvallat
Kesto: 2 marraskuuta 20145 marraskuuta 2014
Konferenssinumero: 48

Julkaisusarja

NimiConference Record of the Asilomar Conference on Signals Systems and Computers
KustantajaIEEE COMPUTER SOC
ISSN (painettu)1058-6393

Conference

ConferenceAsilomar Conference on Signals, Systems & Computers
LyhennettäASILOMAR
MaaYhdysvallat
KaupunkiPacific Grove
Ajanjakso02/11/201405/11/2014

ID: 3306475