The presence of gaps in meteorological time series is a very common problem for long term studies, for example when computer activity is needed to carry out general climatological analysis. This problem can be solved through a method to reconstruct missing data; the method must be adapted to the density of the suitable stations and the climate zone which they belong to. Regression-based ones are among the most important methods used to carry out such reconstructions. A suitable search strategy for identifying the best reconstructing stations is a basic requisite for the proper implementation of this class of methods. In this article a detailed analysis of the effects of the number of predictors for a regression-based approach and their search strategy is presented. The multiple correlation between stations, related to the distance from the target station, was studied checking performances with a recently published regression model. This study was carried out for daily data of minimum, mean and maximum temperature of a dense network (111 stations within an area of a ∼76.5 km radius, on average). For the density of this network and comparing the system through different values of distance from target station, a better performance was achieved when the maximum radius within which to start searching for predictors was equal to or greater than 40 km. As a consequence it can be deduced that stations used to reconstruct gaps do not strictly need to be close to the target station. Setting the maximum number of predictors at four, and the maximum radius at exactly 40 km significantly reduces the number of the cases in which the reconstructed values present a reversing of the natural order: minimum<mean<maximum temperature.

The selection of predictors in a regression-based method for gap filling in daily temperature datasets

TARDIVO, GIANMARCO;BERTI, ANTONIO
2013

Abstract

The presence of gaps in meteorological time series is a very common problem for long term studies, for example when computer activity is needed to carry out general climatological analysis. This problem can be solved through a method to reconstruct missing data; the method must be adapted to the density of the suitable stations and the climate zone which they belong to. Regression-based ones are among the most important methods used to carry out such reconstructions. A suitable search strategy for identifying the best reconstructing stations is a basic requisite for the proper implementation of this class of methods. In this article a detailed analysis of the effects of the number of predictors for a regression-based approach and their search strategy is presented. The multiple correlation between stations, related to the distance from the target station, was studied checking performances with a recently published regression model. This study was carried out for daily data of minimum, mean and maximum temperature of a dense network (111 stations within an area of a ∼76.5 km radius, on average). For the density of this network and comparing the system through different values of distance from target station, a better performance was achieved when the maximum radius within which to start searching for predictors was equal to or greater than 40 km. As a consequence it can be deduced that stations used to reconstruct gaps do not strictly need to be close to the target station. Setting the maximum number of predictors at four, and the maximum radius at exactly 40 km significantly reduces the number of the cases in which the reconstructed values present a reversing of the natural order: minimum
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2782881
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 12
social impact