Efficient classification techniques are nowadays in huge demand in observational astronomy, which is probably the oldest area of application of statistical science to the study of nature. The rapid evolution through technology has determined a paradigm shift where rich and/or massive data sets are becoming the dominant source of information. A typical sky survey collects terabytes of data each night on millions of possible objects (stars, galaxies, etc.) which themselves may encompass many observed properties. Some success stories are the Sloan Digital Sky Survey whose whelm of very high quality photometric data allowed astronomers to pinpoint the position of over one million of galaxies, and NASA's Fermi space telescope, thanks to which we are still deepening our knowledge of high-energy phenomena. Discoveries in these fields are of utmost relevance as they contain a wealth of information about the history of our Galaxy, and impact on the understanding of our own Solar System. However, the simple collection of these data sets is just the beginning of the process. A key step in astronomical breakthrough research is the meaningful analysis of the collected information. The complex structure of the data, combined with the availability of a tremendous number of observations, represents a non-negligible statistical challenge from both the theoretical and computational viewpoint. Statisticians must work at the forefront of this area. This thesis aims to enlarge the set of clustering and classification techniques available for astronomy, providing new flexible solutions to cope with the requirements imposed by the vast diversity of celestial objects. In the first part, we propose an innovative approach based on Bayesian nonparametric methods to the signal extraction of high-energy astronomical sources immersed in a strong background contamination. Our model simultaneously clusters the photons and gives an estimate of the number of sources using a Dirichlet Process (DP) mixture, while separating them from the irregularly shaped background, that is reconstructed using a novel Bayesian nonparametric technique based on B-spline functions. The resultant is then a hierarchical model in the class of mixtures of DP mixtures. We provide a suitable Markov Chain Monte Carlo algorithm to conduct the inference, and a post-processing procedure to quantify the information coming from the discovered clusters. We finally test the capacity of the model in locating and extracting the signal of the sources using several artificial datasets, and a further application on the Fermi LAT map is proposed. In the second part, we propose a novel statistical approach to separate the different levels of brightness in the emission activity of high-energy sources. The method analyses the variation of flux in time and clusters the observations to infer on the latent states of variability that correspond to distinct physical mechanisms of the source. We model the transition among the latent states with a continuous-time Markov chain, and the flux measurements in each state with an Ornstein-Uhlenbeck (OU) process. The resultant technique belongs to the class of continuous-time hidden Markov models (HMMs) and can be fitted via maximum likelihood estimation using an efficient EM algorithm. In addition, we assess the properties of the model with a proper bootstrap algorithm. We finally illustrate the efficiency of the method on a light curve from a blazar discovered by the Fermi LAT.

Advances in Mixture Modelling for Model-Based Clustering: Two Case Studies in Astronomy / Sottosanti, Andrea. - (2019 Dec 02).

Advances in Mixture Modelling for Model-Based Clustering: Two Case Studies in Astronomy

Sottosanti, Andrea
2019

Abstract

Efficient classification techniques are nowadays in huge demand in observational astronomy, which is probably the oldest area of application of statistical science to the study of nature. The rapid evolution through technology has determined a paradigm shift where rich and/or massive data sets are becoming the dominant source of information. A typical sky survey collects terabytes of data each night on millions of possible objects (stars, galaxies, etc.) which themselves may encompass many observed properties. Some success stories are the Sloan Digital Sky Survey whose whelm of very high quality photometric data allowed astronomers to pinpoint the position of over one million of galaxies, and NASA's Fermi space telescope, thanks to which we are still deepening our knowledge of high-energy phenomena. Discoveries in these fields are of utmost relevance as they contain a wealth of information about the history of our Galaxy, and impact on the understanding of our own Solar System. However, the simple collection of these data sets is just the beginning of the process. A key step in astronomical breakthrough research is the meaningful analysis of the collected information. The complex structure of the data, combined with the availability of a tremendous number of observations, represents a non-negligible statistical challenge from both the theoretical and computational viewpoint. Statisticians must work at the forefront of this area. This thesis aims to enlarge the set of clustering and classification techniques available for astronomy, providing new flexible solutions to cope with the requirements imposed by the vast diversity of celestial objects. In the first part, we propose an innovative approach based on Bayesian nonparametric methods to the signal extraction of high-energy astronomical sources immersed in a strong background contamination. Our model simultaneously clusters the photons and gives an estimate of the number of sources using a Dirichlet Process (DP) mixture, while separating them from the irregularly shaped background, that is reconstructed using a novel Bayesian nonparametric technique based on B-spline functions. The resultant is then a hierarchical model in the class of mixtures of DP mixtures. We provide a suitable Markov Chain Monte Carlo algorithm to conduct the inference, and a post-processing procedure to quantify the information coming from the discovered clusters. We finally test the capacity of the model in locating and extracting the signal of the sources using several artificial datasets, and a further application on the Fermi LAT map is proposed. In the second part, we propose a novel statistical approach to separate the different levels of brightness in the emission activity of high-energy sources. The method analyses the variation of flux in time and clusters the observations to infer on the latent states of variability that correspond to distinct physical mechanisms of the source. We model the transition among the latent states with a continuous-time Markov chain, and the flux measurements in each state with an Ornstein-Uhlenbeck (OU) process. The resultant technique belongs to the class of continuous-time hidden Markov models (HMMs) and can be fitted via maximum likelihood estimation using an efficient EM algorithm. In addition, we assess the properties of the model with a proper bootstrap algorithm. We finally illustrate the efficiency of the method on a light curve from a blazar discovered by the Fermi LAT.
2-dic-2019
Astrostatistics, Bayesian Nonparametrics, Mixture Modelling, Model-Based Clustering, Signal Extraction
Advances in Mixture Modelling for Model-Based Clustering: Two Case Studies in Astronomy / Sottosanti, Andrea. - (2019 Dec 02).
File in questo prodotto:
File Dimensione Formato  
Sottosanti_Andrea_tesi.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 19.5 MB
Formato Adobe PDF
19.5 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3423316
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact