HEMP: High-order entropy minimization for neural network compression

We formulate the entropy of a quantized artificial neural network as a differentiable function that can be plugged as a regularization term into the cost function minimized by gradient descent. Our formulation scales efficiently beyond the first order and is agnostic of the quantization scheme. The network can then be trained to minimize the entropy of the quantized parameters, so that they can be optimally compressed via entropy coding. We experiment with our entropy formulation at quantizing and compressing well-known network architectures over multiple datasets. Our approach compares favorably over similar methods, enjoying the benefits of higher order entropy estimate, showing flexibility towards non-uniform quantization (we use Lloyd-max quantization), scalability towards any entropy order to be minimized and efficiency in terms of compression. We show that HEMP is able to work in synergy with other approaches aiming at pruning or quantizing the model itself, delivering significant benefits in terms of storage size compressibility without harming the model’s performance.

HEMP: High-order entropy minimization for neural network compression

enzo tartaglione;stéphane lathuilière;attilio fiandrotti;marco cagnazzo;marco grangetto

2021

Abstract

We formulate the entropy of a quantized artificial neural network as a differentiable function that can be plugged as a regularization term into the cost function minimized by gradient descent. Our formulation scales efficiently beyond the first order and is agnostic of the quantization scheme. The network can then be trained to minimize the entropy of the quantized parameters, so that they can be optimally compressed via entropy coding. We experiment with our entropy formulation at quantizing and compressing well-known network architectures over multiple datasets. Our approach compares favorably over similar methods, enjoying the benefits of higher order entropy estimate, showing flexibility towards non-uniform quantization (we use Lloyd-max quantization), scalability towards any entropy order to be minimized and efficiency in terms of compression. We show that HEMP is able to work in synergy with other approaches aiming at pruning or quantizing the model itself, delivering significant benefits in terms of storage size compressibility without harming the model’s performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Rivista su cui è pubblicata l'opera
	
				NEUROCOMPUTING
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.neucom.2021.07.022
			
	Codice WOS
	
				WOS:000697941300001
			
	Codice Scopus
	
				2-s2.0-85111540438
			
	Appare nelle tipologie:
	
				01.01 - Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Neurocomputing_HEMP.pdf accesso aperto Tipologia: Preprint (AM - Author's Manuscript - submitted) Licenza: Accesso libero Dimensione 662.6 kB Formato Adobe PDF Visualizza/Apri	662.6 kB	Adobe PDF	Visualizza/Apri
hemp.pdf Accesso riservato Tipologia: Published (Publisher's Version of Record) Licenza: Accesso privato - non pubblico Dimensione 1.03 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.03 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3469257

Citazioni

ND

8

6

ND

social impact