Multi-armed bandits with dependent arms for Cooperative Spectrum Sharing

Lopez Martinez, Mario; Alcaraz, Juan J.; Badia, Leonardo; Zorzi, Michele

doi:10.1109/ICC.2015.7249554

Cooperative Spectrum Sharing (CSS) is an appealing approach for primary users (PUs) to share spectrum with secondary users (SUs) because it increases the transmission range or rate of the PUs. Most previous works are focused on developing complex algorithms which may not be fast enough for real-time variations such as in channel availability. Instead, we develop a learning mechanism for a PU to enable CSS in a strongly incomplete information scenario with low computational overhead. We model the learning mechanism of the PU to discover which SU to interact with and what offer to make to it with a combination of a Multi-Armed Bandit (MAB) and a Markov Decision Process (MDP). By means of Monte-Carlo simulations we show that, despite its low computational overhead, our proposed mechanism converges to the optimal solution and significantly outperforms the ε-greedy heuristic. This algorithm can be extended to include more sophisticated features while maintaining its desirable properties such as the fast speed of convergence.