The current rigid proprietary licensing of the radio spectrum has created a shortage of wireless bandwidth. Cognitive radios have the potential for more flexible spectrum sharing by means of opportunistic spectrum access (OSA). In OSA secondary spectrum users would be allowed to access licensed frequency bands unused either temporally or spatially by the incumbent users. In order to identify such spectrum opportunities, the secondary users need to employ spectrum sensing together with policies that guide the sensing and access over a possibly wide range of frequencies. These policies need to be adaptive to changing radio environments and be capable of learning from their past observations and actions. In this thesis theories and methods for reinforcement learning based sensing and access policies are developed. The policies stem from the reinforcement learning and multi-armed bandit literature and employ collaborative sensing for mitigating the effects of fading and interference. The thesis consists of 9 original publications and an introductory part providing an extensive overview of the existing body of work in the area of cognitive radios. A practical measure for spatial diversity in collaborative sensing is proposed. The diversity measure captures how gains from collaborative sensing tend to behave in practice: The gains come with diminishing returns as the number of collaborating sensors increases, and the gains are reduced when the sensors are experiencing correlated observations.Deterministic frequency hopping code design for collaborative spectrum exploration is developed. The codes are designed to guarantee a desired diversity order, i.e., a desired number of sensors per frequency band in order to gain from spatial diversity. The codes consider every possible collaborating sensor combinations with the desired diversity in minimum time. The codes can be used for managing spectrum sensing and access during the exploration phases of reinforcement learning based policies. A novel recency-based sensing policy deriving from the restless multi-armed bandit problem formulation of OSA is proposed. The policy is shown to attain asymptotically order optimal regret in unknown radio environments with time independent state-evolution and Markovian state-evolution. Computer simulations illustrate that the proposed policy offers an excellent trade-off between computational complexity and performance.Several collaborative sensing and access policies based on reinforcement learning are proposed. The policies allow trading off between sensing performance and other utilities, such as achieved data rate and energy efficiency. Fast heuristic approximation algorithms are proposed for computing near-optimal sensor assignments during the policy's exploitation periods.
|Translated title of the contribution||Koneoppimismenetelmiä spektrin kartoittamiseen ja hyödyntämiseen|
|Publication status||Published - 2016|
|MoE publication type||G5 Doctoral dissertation (article)|
- cognitive radio
- multi-armed bandit
- reinforcement learning