Scheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation

Zhenhua Han, Haisheng Tan, Shaofeng H.C. Jiang, Xiaoming Fu, Wanli Cao, Francis C.M. Lau

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

Abstrakti

The Bulk Synchronous Parallel (BSP) paradigm is gaining tremendous importance recently because of the pop-ularity of computations such as distributed machine learning and graph computation. In a typical BSP job, multiple workers concurrently conduct iterative computations, where frequent synchronization is required. Therefore, the workers should be scheduled simultaneously and their placement on different computing devices could significantly affect the performance. Simply retrofitting a traditional scheduling discipline will likely not yield the desired performance due to the unique characteristics of BSP jobs. In this work, we derive SPIN, a novel scheduling designed for BSP jobs with placement-sensitive execution to minimize the makespan of all jobs. We first prove the problem approximation hardness and then present how SPIN solves it with a rounding-based randomized approximation approach. Our analysis indicates SPIN achieves a good performance guarantee efficiently. Moreover, SPIN is robust against misestimation of job execution time by theoretically bounding its negative impact. We implement SPIN on a production-trace driven testbed with 40 GPUs. Our extensive experiments show that SPIN can reduce the job makespan and the average job completion time by up to 3× and 4.68×, respectively. Our approach also demonstrates better robustness to execution time misestimation compared with heuristic baselines.

AlkuperäiskieliEnglanti
OtsikkoINFOCOM 2020 - IEEE Conference on Computer Communications
KustantajaIEEE
Sivut1053-1062
Sivumäärä10
ISBN (elektroninen)9781728164120
DOI - pysyväislinkit
TilaJulkaistu - heinäkuuta 2020
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE Conference on Computer Communications - Toronto, Kanada
Kesto: 6 heinäkuuta 20209 heinäkuuta 2020
Konferenssinumero: 38

Julkaisusarja

NimiProceedings - IEEE INFOCOM
Vuosikerta2020-July
ISSN (painettu)0743-166X

Conference

ConferenceIEEE Conference on Computer Communications
LyhennettäINFOCOM
MaaKanada
KaupunkiToronto
Ajanjakso06/07/202009/07/2020

Sormenjälki Sukella tutkimusaiheisiin 'Scheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä