Methods to speed up learning in back propagation and to optimize the network architecture have been recently studied. This paper shows how adaptation of the steepness of the sigmoids during learning treats these two topics in a common framework. The adaptation of the steepness of the sigmoids is obtained by gradient descent. The resulting learning dynamics can be simulated by a standard network with fixed sigmoids and a learning rule whose main component is a gradient descent with adaptive learning parameters. A law linking variation on the weights to variation on the steepness of the sigmoids is discovered. Optimization of units is obtained by introducing a tendency to decay to zero in the steepness values. This decay corresponds to a decay of the sensitivity of the units. Units with low final sensitivity can be removed after a given transformation of the biases of the network. A decreasing initial distribution of the steepness values is suggested to obtain a good compromise between speed of learning and network optimization. Simulation of the proposed procedure has shown an improvement of the mean convergence rate with respect to the standard back propagation and good optimization performance. Several 4-3-1 networks for the four bits parity problem were discovered.

Speed Up Learning and Network Optimization with Extended Back Propagation

SPERDUTI, ALESSANDRO;
1993

Abstract

Methods to speed up learning in back propagation and to optimize the network architecture have been recently studied. This paper shows how adaptation of the steepness of the sigmoids during learning treats these two topics in a common framework. The adaptation of the steepness of the sigmoids is obtained by gradient descent. The resulting learning dynamics can be simulated by a standard network with fixed sigmoids and a learning rule whose main component is a gradient descent with adaptive learning parameters. A law linking variation on the weights to variation on the steepness of the sigmoids is discovered. Optimization of units is obtained by introducing a tendency to decay to zero in the steepness values. This decay corresponds to a decay of the sensitivity of the units. Units with low final sensitivity can be removed after a given transformation of the biases of the network. A decreasing initial distribution of the steepness values is suggested to obtain a good compromise between speed of learning and network optimization. Simulation of the proposed procedure has shown an improvement of the mean convergence rate with respect to the standard back propagation and good optimization performance. Several 4-3-1 networks for the four bits parity problem were discovered.
1993
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/119257
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 86
  • ???jsp.display-item.citation.isi??? 59
social impact