In this work, we propose an online adaptation logic for Dynamic Adaptive Streaming over HTTP (DASH) clients, where each client selects the representation that maximize the long term expected reward. The latter is defined as a combination of the decoded quality, the quality fluctuations and the rebuffering events experienced by the user during the playback. To solve this problem, we cast a Markov Decision Process (MDP) optimization for the selection of the optimal representations. System dynamics required in the MDP model are a priori unknown and are therefore learned through a Reinforcement Learning (RL) technique. The developed learning process exploits a parallel learning technique that improves the learning rate and limits sub-optimal choices, leading to a fast and yet accurate learning process that quickly converges to high and stable rewards. Therefore, the efficiency of our controller is not sacrificed for fast convergence. Simulation results show that our algorithm achieves a higher QoE than existing RL algorithms in the literature as well as heuristic solutions, as it is able to increase average QoE and reduce quality fluctuations.

Online learning adaptation strategy for DASH clients

Chiariotti, Federico
Formal Analysis
;
2016

Abstract

In this work, we propose an online adaptation logic for Dynamic Adaptive Streaming over HTTP (DASH) clients, where each client selects the representation that maximize the long term expected reward. The latter is defined as a combination of the decoded quality, the quality fluctuations and the rebuffering events experienced by the user during the playback. To solve this problem, we cast a Markov Decision Process (MDP) optimization for the selection of the optimal representations. System dynamics required in the MDP model are a priori unknown and are therefore learned through a Reinforcement Learning (RL) technique. The developed learning process exploits a parallel learning technique that improves the learning rate and limits sub-optimal choices, leading to a fast and yet accurate learning process that quickly converges to high and stable rewards. Therefore, the efficiency of our controller is not sacrificed for fast convergence. Simulation results show that our algorithm achieves a higher QoE than existing RL algorithms in the literature as well as heuristic solutions, as it is able to increase average QoE and reduce quality fluctuations.
2016
Proceedings of the 7th International Conference on Multimedia Systems, MMSys 2016
9781450342971
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3255748
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 62
  • ???jsp.display-item.citation.isi??? 44
social impact