Airplane cockpit screens consist of virtual instruments where characters, numbers, and graphics are overlaid on a black or natural background. Recording the cockpit screen allows one to log vital plane data, as aircraft manufacturers do not offer direct access to raw data. However, traditional video codecs struggle at preserving character readability at the required low bit-rates. We showed in a previous work that large rate-distortion gains can be achieved if the characters are encoded as text rather than as pixels. We now leverage temporal redundancy to both achieve robust character recognition and improve encoding efficiency. A convolutional neural network is trained for character classification over synthetic samples augmented with occlusions to gain robustness against overlapping graphics. Further robustness to background occlusions is brought by a probabilistic framework that error-corrects the output of the convolutional neural network. Next, we propose a predictive text coding technique specifically tailored for text in cockpit videos that achieves competitive performance over commodity lossless methods. Experiments with real cockpit video footage show large rate-distortion gains for the proposed method with respect to three different video compression standards. Notably, the H.264/AVC codec retrofitted with our method outperforms H.265/HEVC-SCC and is competitive with the much more complex H.266/VVC while preserving text and graphics. The entire pipeline described in this work has been implemented at Safran Electronics as an embedded avionics system drawing just 2W of power thanks to a combination of software and FPGA implementation.

Robust and efficient airplane cockpit video coding leveraging temporal redundancy

Cagnazzo, Marco
2024

Abstract

Airplane cockpit screens consist of virtual instruments where characters, numbers, and graphics are overlaid on a black or natural background. Recording the cockpit screen allows one to log vital plane data, as aircraft manufacturers do not offer direct access to raw data. However, traditional video codecs struggle at preserving character readability at the required low bit-rates. We showed in a previous work that large rate-distortion gains can be achieved if the characters are encoded as text rather than as pixels. We now leverage temporal redundancy to both achieve robust character recognition and improve encoding efficiency. A convolutional neural network is trained for character classification over synthetic samples augmented with occlusions to gain robustness against overlapping graphics. Further robustness to background occlusions is brought by a probabilistic framework that error-corrects the output of the convolutional neural network. Next, we propose a predictive text coding technique specifically tailored for text in cockpit videos that achieves competitive performance over commodity lossless methods. Experiments with real cockpit video footage show large rate-distortion gains for the proposed method with respect to three different video compression standards. Notably, the H.264/AVC codec retrofitted with our method outperforms H.265/HEVC-SCC and is competitive with the much more complex H.266/VVC while preserving text and graphics. The entire pipeline described in this work has been implemented at Safran Electronics as an embedded avionics system drawing just 2W of power thanks to a combination of software and FPGA implementation.
File in questo prodotto:
File Dimensione Formato  
s11042-024-18755-2.pdf

accesso aperto

Tipologia: Published (publisher's version)
Licenza: Creative commons
Dimensione 3.05 MB
Formato Adobe PDF
3.05 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3512793
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact