Likelihood-based inference for moderate to high dimensional models

Huang, Caizhu

Likelihood based statistics and its standard asymptotic distribution results offer a general solution for hypothesis testing in parametric models. However, such approximate solutions are not reliable when the dimension $p$ of the parameter may increase with the sample size $n$. However, in such divergent dimensional regimes, higher-order likelihood approximations, even though developed in the fixed $p$ scenario, such as the directional test \citep{sartori:2014} and modifications of log-likelihood ratio test \cite{skovgaard:2001}, may still give substantial improvements over standard first-order solutions. Taking inspiration from a classification of asymptotic regimes recently introduced by \cite{battey2022some}, we focus on a moderate dimensional asymptotic setting, in which $p/n \to 0$, for instance with $p=O(n^\tau)$, with $\tau \in (0,1)$, and a high dimensional asymptotic setting, in which $p/n \to \kappa \in (0,1)$. On the other hand, we will not consider ultra-high dimensional settings, in which $p/n$ converges to a constant greater than 1, or even diverges. Within several prominent frameworks, we propose then to provide reliable solutions via higher-order approximations. In particular, the first part of the thesis examines higher-order likelihood solutions for moderate and high dimensional multivariate normal models. In the high dimensional regimes, we prove that the directional $p$-value is exactly uniformly distributed under the null hypothesis for seven prominent hypotheses concerning means and/or covariance matrices of multivariate normal distributions. We also consider a multivariate Behrens-Fisher problem, that is testing a hypothesis of equality of mean vectors in $k$ independent multivariate normal distribution with different covariance matrices. In this case, the parameter being tested is not a canonical parameter of an exponential family and therefore we cannot expect the accuracy of the methods to hold in high dimensional regimes. For this reason, we restrict ourselves to moderate dimensional regimes. Simulation results show that the higher-order approximations outperform the standard first-order solutions. Finally, we also study moderate dimensional logistic regression models. We consider three types of hypotheses: where the whole parameter is of interest, (i.e. no nuisance parameters problem), when a scalar component of the parameter is of interest, and when a vector component of the parameter is of interest. We give a tentative proof that the directional test gives reliable results provided that $p=o(n^{3/4})$ under a particular Gaussian assumption on the design matrix. Extended simulation results showed that the higher-order approximations perform good when the dimension of the parameter of interest is small or the dimension of the nuisance parameter is large. In this model setting, also Skovgaard's modified likelihood ratio statistic is empirically found to provide very accurate results. A more thorough theoretical study of these statistics in this setting is certainly an interesting future development of this thesis.

Likelihood-based inference for moderate to high dimensional models / Huang, Caizhu. - (2023 Apr 27).