Abstrakti
For scientific simulations reaching exascale via GPU acceleration brings along with it two new problems: that of resource utilization of the heterogeneous hardware and the increasing data volumes of the simulations.
In this work, we tackle these problems using two approaches. Firstly, we present an approach to enable a GPU accelerated with existing in-situ data analysis to perform the data analysis asynchronously on all of the CPU cores alongside the time integration. This requires moving from a purely message-passing programming model to a hybrid model that includes shared memory. We show how complications arising from it can be resolved using static code analysis tools.
Secondly, we show how task-based execution platforms can be utilized in order to perform independent data analysis and even run independent simulations opportunistically with the resources left over from the main simulation, enabling better resource utilization of the hardware and a larger range of data analyses.
We present benchmarks which demonstrate how performing the in-situ data analysis asynchronously on all of the cores helps to speed up the time to solution up to a factor of three. Furthermore, our benchmarks demonstrate how the opportunistic execution enables better utilization of the CPU resources.
In this work, we tackle these problems using two approaches. Firstly, we present an approach to enable a GPU accelerated with existing in-situ data analysis to perform the data analysis asynchronously on all of the CPU cores alongside the time integration. This requires moving from a purely message-passing programming model to a hybrid model that includes shared memory. We show how complications arising from it can be resolved using static code analysis tools.
Secondly, we show how task-based execution platforms can be utilized in order to perform independent data analysis and even run independent simulations opportunistically with the resources left over from the main simulation, enabling better resource utilization of the hardware and a larger range of data analyses.
We present benchmarks which demonstrate how performing the in-situ data analysis asynchronously on all of the cores helps to speed up the time to solution up to a factor of three. Furthermore, our benchmarks demonstrate how the opportunistic execution enables better utilization of the CPU resources.
Alkuperäiskieli | Englanti |
---|---|
Sivumäärä | 6 |
Julkaisu | Computer Physics Communications |
Tila | Valmisteltavana - 7 toukok. 2025 |
OKM-julkaisutyyppi | B1 Kirjoitus tieteellisessä aikakauslehdessä |