We develop optimal sleeping and harvesting policies for radio frequency (RF) energy harvesting devices, formalizing the following intuition: when the ambient RF energy is low, devices consume more energy being awake than what can be harvested and should enter sleep mode; when the ambient RF energy is high, on the other hand, it is essential to wake up and harvest. Toward this end, we consider a scenario with intermittent energy arrivals described by a two-state Gilbert-Elliott Markov chain model. The challenge is that the state of the Markov chain can only be observed during the harvesting action, and not while in sleep mode. Two scenarios are studied under this model. In the first scenario, we assume that the transition probabilities of the Markov chain are known and formulate the problem as a partially observable Markov decision process (POMDP). We prove that the optimal policy has a threshold structure and derive the optimal decision parameters. In the practical scenario where the ratio between the reward and the penalty is neither too large nor too small, the POMDP framework and the threshold-based optimal policies are very useful for finding non-Trivial optimal sleeping times. In the second scenario, we assume that the Markov chain parameters are unknown and formulate the problem as a Bayesian adaptive POMDP and propose a heuristic posterior sampling algorithm to reduce the computational complexity. The performance of our approaches is demonstrated via numerical examples.
- ambient radio frequency energy
- Bayesian inference
- Energy harvesting
- partially observable Markov decision process