One of the most efficient non-perturbative methods for the calculation of thermal properties of quantum systems is the Hybrid Monte Carlo algorithm, as evidenced by its use in large-scale lattice quantum chromodynamics calculations. The performance of this algorithm is determined by the speed at which the fermion operator is applied to a given vector, as it is the central operation in the preconditioned conjugate gradient iteration. We study a simple implementation of these operations for the fermion matrix of the Hubbard model in d+1 spacetime dimensions, and report a performance comparison between a 2.66 GHz Intel Xeon E5430 CPU and an NVIDIA Tesla C1060 GPU using double-precision arithmetic. We find speedup factors ranging between 30 and 350 for d=1, and in excess of 40 for d=3. We argue that such speedups are of considerable impact for large-scale simulational studies of quantum many-body systems.