Modern likelihood inference for the maximum/minimum of a bivariate normal vector

ABSTRACT We consider the use of modern likelihood asymptotics in the construction of confidence intervals for the parameter which determines the skewness of the distribution of the maximum/minimum of an exchangeable bivariate normal random vector. Simulation studies were conducted to investigate the accuracy of the proposed methods and to compare them to available alternatives. Accuracy is evaluated in terms of both coverage probability and expected length of the interval. We furthermore illustrate the suitability of our proposals by means of two data sets, consisting of, respectively, measurements taken on the brains of 10 mono-zygotic twins and measurements of mineral content of bones in the dominant and non-dominant arms for 25 elderly women.


Introduction
Small sample sizes are rather common to researchers in fields such as biology, genetics, medical sciences and psychology. Inference based on the classical first-order normal and χ 2 approximations may then be unreliable. The last four decades have seen the development of so-called higher order likelihood approximations, which require little more effort than is needed for their first-order counterparts while providing highly accurate inferences in small samples. We refer the reader to [1] for a rich collection of realistic examples and case studies, which show how to use the new theory. The aim of this paper is to investigate the feasibility of these modern likelihood-based solutions, whose thirdorder accuracy makes them well suited for small-sample sizes, for the analysis of bivariate normally distributed continuous data when interest focuses on the maximum (or minimum) value of correlated observations. Our study furthermore deepens the theoretical understanding of the heretofore available solutions, which were developed for large sample sizes.
Indeed, a great variety of studies, especially in the medical area, are based on the analysis of extreme observations. For instance, the extremes coming from the left and right sides of the body are commonplace in comparative studies; see [2] for an illustration about the visual acuity of fellow eyes, or [3,4] and reference therein, for other typical cases. A further instance are twin studies, which since Sir Francis Galton's [5] seminal paper, have extensively been used for the quantitative ascertainment of genetic and environmental influences. Classical twin designs are a valuable tool in fields such as biomedicine, psychiatry and behavioural sciences, where the number of available observations is far smaller than those typical in modern twin studies. There are several views of how the degree of concordance between twins should be assessed. [6,7] Here, we promote the use of Azzalini's [8] skew-normal distribution, which revisits, [9] the first contribution to this regard we are aware of, from a modern perspective.
We set off from a theoretical result, due to Loperfido, [10] which states that the maximum, or minimum, of two random variables, whose joint distribution is bivariate exchangeable normal with correlation coefficient ρ, is skew-normally distributed with skewness parameter γ , or −γ , where γ = √ (1 − ρ)/(1 + ρ). Recently, Mameli et al. [11] borrowing from Loperfido [10] and Fisher's z transform for ρ, obtained an asymptotic confidence interval for the skewness parameter of the distribution of the maximum/minimum under this framework. Their simulations revealed that the proposed asymptotic confidence interval has similar actual and nominal coverages, though its expected length increases for decreasing sample size and correlation coefficient close to −1. In this paper we explore the performance of confidence intervals for γ obtained from the small-sample solutions recently proposed in [12], and this in terms of both actual coverage and expected length.
The paper organizes as follows. Section 2 reviews modern likelihood-based inference. The skewnormal distribution and Loperfido's results will be introduced in Section 3. Inference on γ will be discussed in Section 4. Section 5 analyses the twin data collected by Tramo et al. [13] and the data set presented by Johnson and Wichern [14] using the large-and small-sample solutions of Section 4. The finite-sample properties of confidence intervals will be investigated in Section 6 through simulation. Some concluding remarks are given in Section 7. Table A1 in the appendix provides a list of the most common notations used in the paper together with their definition.

First-order theory
Let y = (y 1 , . . . , y n ) be a sample of size n with joint log-likelihood function l(θ ) = l(θ ; y), where θ = (ψ, λ) is a k-dimensional parameter, ψ is the scalar parameter of interest, and λ a vector of nuisance parameters of dimension k − 1. Under broad regularity conditions, the maximum likelihood estimate of θ, denoted byθ, may be obtained by solving the score equation l θ (θ; y) = 0, with l θ (θ ; y) = ∂l(θ ; y)/∂θ . Let j(θ ) = ∂ 2 l(θ ; y)/∂θ∂θ represent the observed information function for θ and j(θ) the observed Fisher information. Here denotes matrix transposition. The decomposition of the parameter θ into ψ and λ leads to an analogous decomposition of the score vector l θ (θ; y) and of the observed information function j(θ ).
The recommended likelihood pivot for making inference on ψ is the signed likelihood root Here l p (ψ) = l(θ ψ ), withθ ψ = (ψ,λ ψ ), is the profile log-likelihood, whileλ ψ represents the constrained maximum likelihood estimate obtained by maximizing the log-likelihood l(ψ, λ) with respect to λ holding ψ fixed. The signed likelihood root (1) is asymptotically standard normal up to the order n −1/2 , which leads to the first-order (1 − α)100% confidence interval for ψ where z β , with β ∈ (0, 1), is the βth quantile of the standard normal distribution. The standard normal approximation provides a satisfactory approximation for large sample sizes, but can be highly unreliable for small values of n. The value of ψ which satisfies Equation (2) can be found numerically by calculating the function r(ψ) on a grid of points ψ, which are then interpolated using a suitable smoothing function. The numerical issues, which may arise in the interpolation step, can be avoided by excluding the values of ψ close to the maximum likelihood estimateψ. The details are given in [1, Section 9.3].

Higher order theory
A nowadays broadly known improvement to the signed likelihood root (1), which was originally introduced by Barndorff-Nielsen, [15] is the modified likelihood root whose finite-sample distribution may be approximated by the standard normal up to the order n −3/2 . The higher order (1 − α)100% confidence interval for ψ is hence given by Again, pivot profiling [1, Section 9.3] can be used to identify the upper and lower bounds of the confidence interval. Furthermore, the r * (ψ) pivot -like its first-order counterpart r(ψ) -is invariant under interest-respecting re-parametrizations, that is re-parametrizations of the form τ (θ) = τ (ψ, λ) = (ζ , η) with ζ = ζ(ψ) and η = η(ψ, λ). Several expressions for the correction term q have been proposed, both from the frequentist and the Bayesian perspective. Here, we will focus on the developments by Fraser and Reid. [16] Their formula for q is based upon the notion of 'tangent exponential model' which, at a fixed value of y, denoted y 0 , approximates the true model by a local exponential model with canonical parameter ϕ = ϕ(θ), defined as Here, l ;V indicates differentiation of the log-likelihood function in the directions given by the n columns V 1 , . . . , V n of the n × k matrix V . The matrix V can be constructed using a vector of pivotal quantities z = {z 1 (y 1 , θ), . . . , z n (y n , θ)}, where each component z i (y i , θ) has a fixed distribution under the model. The matrix V is defined through z by , whereθ 0 is the maximum likelihood estimate at y 0 . The expression of the correction term q is then where ϕ θ (θ ) = ∂ϕ(θ )/∂θ represents the matrix of partial derivatives of ϕ(θ) with respect to θ, while ϕ λ (θ ) = ∂ϕ(θ )/∂λ identifies the k − 1 columns of this matrix which correspond to the nuisance parameter λ. Analogously, the matrix j λλ (θ ) is the (k − 1) × (k − 1) sub-matrix of the observed information function j(θ ) with respect to the nuisance parameter λ. The expression of q(ψ) for the case in which no explicit nuisance parametrization is given can be found in [12].

Approximations for Bayesian inference
In the Bayesian setting with a prior density π(θ) for θ , the analogue of the first-order results of Section 2.1 is the asymptotic normality of the posterior density π(θ|y) for θ . The Bayesian counterpart of the correction term q in Equation (3), which we will denote by q B , was obtained by DiCiccio and Martin [17] under the assumption that the nuisance parametrization is given explicitly, and results to where l p (ψ) = dl p (ψ)/dψ is the profile score function and j p (ψ) = d 2 l p (ψ)/dψ 2 the profile observed information function. Posterior quantiles for the parameter ψ can then be found exploiting the fact that the posterior distribution function may be approximated to the order n −3/2 by the standard normal distribution function (r * B (ψ 0 )), evaluated at Again, pivot profiling provides the upper and lower bounds of the (1 − α)100% credible interval for ψ given by Like for q(ψ), Fraser et al. [12] provide the expression of the correction term q B (ψ) for the case in which the nuisance parametrization is not given explicitly.

Matching priors.
Given the prior π(θ) for θ, let θ π 1−α denote the (1 − α)th approximate posterior quantile of θ of order n −s , that is, the value of θ for which with s > 0 and 0 < α < 1. If we also have that with θ π 1−α the upper bound of a frequentist one-sided (1 − α)100% confidence interval, the prior π is called a probability matching prior to the sth order of approximation. For such priors, Bayesian and frequentist inference for the parameter θ are in perfect agreement up to the order s.
If s = 1, π(θ) is called a first-order probability matching prior, while for s = 3/2 we have a secondorder probability matching prior. Welch and Peers [18] showed that the unique first-order probability matching prior, when no nuisance parameter is present, is Jeffrey's prior.
The same result does not necessarily hold when θ includes a nuisance component λ. For an orthogonal parametrization, [19] proposed to use the following prior for θ in Equation (7), where i ψψ (ψ, λ) represents the value of the expected Fisher information function corresponding to ψ. The authors call this prior the 'unique prior', as it leads to an approximation of the marginal posterior distribution of ψ accurate to the order n −3/2 . When the parametrization θ = (ψ, λ) is not orthogonal, their suggestion is to find an orthogonal parametrization (ψ, η) of the original model for which the prior can be expressed as Equation (12), and then to re-express the prior in the original parametrization (ψ, λ), leading to Here, the indices ψ and λ indicate which sub-blocks of the expected Fisher information function to take. Furthermore, ∂η/∂λ represents the Jacobian of the transformation from (ψ, η) to (ψ, λ).

The skew-normal model
The skew-normal distribution was introduced by Azzalini [8] to define a class of asymmetric parametric models which includes the standard normal as a special case. We say that a continuous random variable Z ∼ SN(γ ), distributes as a skew-normal indexed by the real parameter γ , if it has density function Here φ(·) and (·) denote, respectively, the density and the distribution functions of the standard normal distribution. The class of skew-normal distributions can be widened by including a location parameter μ ∈ R and a scale parameter σ > 0. Thus, if X ∼ SN(γ ), then Y = μ + σ X is a skewnormal random variable with parameters μ, σ , γ , or, Y ∼ SN(μ, σ , γ ) for short. Making inference on the skewness parameter is quite challenging, as the expected Fisher information becomes singular as γ → 0; see the solutions proposed by Azzalini and Capitanio [20] and Sartori. [21] Functions for manipulating the skew-normal probability distribution and for fitting it to data are given in the R package sn. [22] We refer the reader to [23,24] for a general treatment of the skew-normal distribution and its extensions.
The focus of this paper is on the distribution of the maximum (or minimum) of an exchangeable bivariate normal random vector. The first contribution to this regard is, to our knowledge [9] who discusses inference for the minimum value with special emphasis on twin studies. We derive our reference model from Corollary 1 of Loperfido, [10] which, adapted to our notation, reads as follows.
Let X 1 and X 2 be two random variables whose joint distribution is bivariate normal with common mean μ ∈ R, common variance σ 2 > 0 and correlation coefficient ρ ∈ (−1, 1). Then for any two real constants h and k = −h, the distribution of h min(X 1 , It follows that the distribution of max(X 1 , The special case of ρ = 0 translates into γ = 1 and γ = −1, respectively.
Note that a first version of Corollary 1 of Loperfido [10] was already derived by Nagaraja [25] in the special case of the bivariate normal model with zero means, unit variances and correlation coefficient ρ, and later re-discovered by Loperfido. [26]

Inference on the skewness parameter γ
This section outlines known exact and large-sample results for making inference on the skewness parameter γ = √ (1 − ρ)/(1 + ρ) which characterizes the distribution of the maximum between two correlated normal variables, as well as our newly derived small-sample solutions. In particular, we will consider the following three instances: (1) the no-nuisance parameter case, (2) the complete bivariate normal model, and (3) the equi-correlated bivariate normal model. All methods straightforwardly extend to the case −γ = − √ (1 − ρ)/(1 + ρ) with the suitable changes.

The bivariate normal model without nuisance parameter
Let Y = (X 1 , X 2 ) be a bivariate normal random vector with common mean 0, common variance 1 and correlation coefficient ρ Haddad and Provost [27] proposed a range-based exact confidence interval for ρ. The construction of the confidence interval makes use of the two random variables Taking advantage of the independence of X 1i + X 2i and X 1i − X 2i along with the fact that X 1i + X 2i ∼ N(0, 2(1 + ρ)) and X 1i − X 2i ∼ N(0, 2(1 − ρ)), the authors derive the following pivotal quantity where F n,n is Fisher's F distribution with (n, n) degrees of freedom. This gives an exact (1 − α)100% confidence interval for the parameter γ of the form where F n,n (β), with β ∈ (0, 1), represents the βth quantile of the F n,n distribution.

Small-sample confidence intervals.
Reid [28] provides the expression of the canonical parameter ϕ when interest relies on θ = ρ, the correlation coefficient of the bivariate normal vector Y = (X 1 , X 2 ). The reference model in this case is a (2, 1) curved exponential family. Confidence intervals for the parameter γ are derived from the r(ρ), and r * (ρ) pivots thanks to their invariance under interest-respecting re-parametrizations. Credible intervals for γ are obtained from r * B (ρ) by exploiting that γ = is a one-to-one transformation of the random variable ρ.
A key quantity for the determination of the canonical parameter (5) of the approximating tangent full exponential model is the vector Later, Reid and Fraser [29] proposed an alternative formulation,φ(θ) = nθ/(1 − θ 2 ), of the canonical parameter. As shown there, both formulations lead to almost the same numerical results as far as the approximation of tail areas is concerned. Turning to the Bayesian world, we may adopt Jeffreys' prior for ρ, given by As stated in Section 2.3.1, this prior provides a first-order probability matching prior for a scalar parameter in the absence of nuisance parameters.

The complete bivariate normal model
Let Y = (X 1 , X 2 ) be again a bivariate normal random vector, this time with parameters (μ 1 , μ 2 , σ 1 , σ 2 , ρ) where (μ 1 , σ 1 ) ∈ R × R + and (μ 2 , σ 2 ) ∈ R × R + are, respectively, the means and variances of X 1 and X 2 , and ρ ∈ (−1, 1) their correlation. Instructions of how to construct a, yet approximate, confidence interval for ρ given an i from Y , can once more be found in [27]. The first step is to standardize the two components X 1i and X 2i into X * 1i and X * 2i . An approximate confidence interval for the parameter ρ is obtained by using the fact that X * 1i − X * 2i and X * 1i + X * 2i are nearly independent, X * 1i + X * 2i ∼ N(0, 2(1 + ρ)) and where 2 and R is the sample correlation coefficient defined as A second approximate solution to the inferential problem we are interested in can be found in [11]. Because of the difficulties of obtaining the finite sample distribution of R, inference for ρ is commonly based on the monotonic transformation 1 2 for n > 50 is approximately standard normal. This turns into an (1 − α)100% confidence interval for γ , which we will call ACI 5 , of the form Note that the upper and lower bounds of both, the confidence interval (20) proposed by Mameli et al. [11] and solution (18) derived from the confidence interval for ρ of Haddad and Provost, [27] include the multiplying factor

Small-sample confidence intervals.
The computation of small-sample confidence intervals for the parameter ρ from the frequentist point of view is addressed in [30].
As in the previous section, confidence intervals for the parameter γ are readily derived from the r(ρ) and r * (ρ) pivots thanks to their invariance under interest-respecting re-parametrizations, while credible intervals for γ are obtained from r * B (ρ) by using the biunivocal transformation, √ (1 − ρ)/(1 + ρ), of the random variable ρ.

Small-sample confidence intervals.
A first confidence interval for γ , which is accurate up to the second order, can be derived by using an argument equivalent to the one adopted by Mameli et al. [11] for the complete bivariate normal model, and exploits the fact that Fisher's transformation ofρ is again a normalizing and variance stabilizing transformation as it is the case for the well-known transformation of the sample correlation. That is, the pivot is asymptotically standard normal up to the order n −1 ; see [34]. Expression (22) leads straightforwardly to the following asymptotic confidence interval for γ : which we will indicate by ACI 3 . Further small-sample confidence intervals may be derived by setting, as in Section 4.2, ψ = ρ and λ = (μ, σ ) and using the r * (ρ) pivot from Equations (6) and (3), respectively. The log-likelihood function now characterizes an exponential family with canonical parameter In order to obtain credible confidence intervals from Equation (8) with frequentist coverage, we assume as matching prior π(θ), the 'unique prior' defined in [19]. This requires an orthogonal re-parametrization of the model of the form with η 1 = μ and η 2 = σ 2 (1 − ρ 2 ) 1/2 . The matching prior for θ = (ρ, λ), with λ = (μ, σ ), will then be As before, the quantity i(θ ) indicates the expected Fisher information function, while ∂η/∂λ denotes the Jacobian of the transformation from (ρ, η) to (ρ, λ).

'Corpus callosum' surface area in mono-zygotic twins
This first example considers the twin study conducted by Tramo et al. [13] which focuses on similarities of the brains of mono-zygotic twin pairs such as in forebrain volume, cortical surface area, and callosal area. The data, as available on StatLib, include the measurements on the brains of 10 pairs of mono-zygotic twins, five male and five female. Our variable of interest is the corpus callosum surface area. The purpose of the analysis is to model the distribution of the maximum between the area in the first twin and in the second twin. The order of twins is determined by birth, though this does not affect the method since the two variables are supposed to be exchangeable. The paired measurements exhibit a strong positive linear relationship with correlation coefficient amounting to R = 0.90. To assure that all conditions of Corollary 1 of Loperfido [10] hold, we first standardize the pairs of observations {(x 11 , x 21 ), . . . , (x 1n , x 2n )} as on page 7. The bivariate Shapiro-Wilk normality test (W = 0.97, p-value = 0.86) supports the hypothesis of bivariate normality of the standardized data. Maximum likelihood yieldsγ = 0.229.
The five 95% confidence intervals for γ , computed using the methods outlined in Section 4.2, are given in Table 1. The interval based on the third-order Bayesian solution r * B is wider than the confidence intervals obtained from the first-order pivot r, the higher order frequentist pivot r * , the large sample (HP) confidence interval by Haddad and Provost [27] and the ACI 5 confidence interval by Mameli et al. [11] Figure 1 shows how to compute the lower and upper bounds numerically. The intervals based on r (1st), r * (3rd), r * B (Bayes) and ACI 5 can be read off from the intersections of the corresponding pivots with the horizontal black lines, which represent the 2.5% and the 97.5% quantiles of the standard normal distribution. The lower and upper bounds of the HP confidence interval are computed similarly, but this time by referring to the horizontal grey lines, which represent the 2.5%, and the 97.5% quantiles of the F distribution with (9,9) degrees of freedom.  [14] mineral content measurements of the ulna data. Lower (LB) and upper (UB) bounds of 95% confidence intervals for the parameter γ . Pivots used, with corresponding confidence intervals: 1st -likelihood root r (2); 3rdmodified likelihood root r * (4); Bayes -Bayesian modified likelihood root r * B (9); ACI 3 -approximate pivot (22

Mineral content measurements in the ulna of elderly women
This second example considers the data available in [14, p.24], which consist of mineral content measurements of three bones, among these the ulna, in the dominant and non-dominant arm for 25 elderly women. The purpose of the analysis now is to model the distribution of the minimum between the contra-lateral measurements of the ulna. To take account of the bilateral symmetries in the human body, we constrain the model parameters to μ 1 = μ 2 = μ and σ 1 = σ 2 = σ . The bivariate Shapiro-Wilk normality test confirms the hypothesis of bivariate normality (W = 0.9708, p-value = 0.6645). Furthermore, as expected, the two contra-lateral measurements present a strong positive linear relationship with intraclass correlationρ = 0.724. The maximum likelihood estimate of −γ is −γ = −0.4004. Table 2 shows the four 95% confidence intervals for the parameter −γ obtained by using the methods outlined in Section 4.3. The ACI 3 confidence interval of equation (23) is smaller than the confidence intervals obtained from the first-order pivot r and the higher order frequentist and Bayesian pivots r * and r * B , respectively. Figure 2 plots the pivot functions r (1st), r * (3rd), r * B (Bayes) and ACI 3 for the ulna data. As before, intervals can be read off from the intersections of the corresponding pivots with the horizontal lines which represent the 2.5% and the 97.5% quantiles of the standard normal distribution.

Numerical assessment
We designed three simulations studies to assess and compare the finite-sample properties of the methods discussed in this paper. The summary statistics used are: empirical coverage (CP), upper error probability (UE), that is, the percentage of the true parameter values falling above the upper bound, lower error probability (LE), that is, the percentage of the true parameter values falling below the lower bound, and average length (AL) of the confidence intervals considered in Section 4. All simulations were run using the numerical computing environment R. [35] For every occurrence, 10,000 replicates were generated for the four sample sizes n = 5, 10, 15, 20. The simulation error amounts to ±0.004. We only report the summary statistics for the smallest sample size n = 5; the omitted results are available as supplementary material.

Simulation 1
We consider a bivariate normal model with zero means, unit variances and unknown correlation ρ, which takes values from −0.9 to +0.9, with step size 0.1. The purpose is to compare the behaviour of the higher order pivot r * with its first-order counterpart r, the Bayesian small-sample solution r * B and the exact method (14), which apply when no nuisance parameter is present, and this for small-sample sizes.

Simulation 3
We explore the finite-sample performance of the different confidence intervals which are available for the equi-correlated bivariate normal model, again while emphasizing on small-sample sizes. The pivots used are the higher order solution r * , its first-order counterpart r, the Bayesian competitor r * B , and the approximate solution (22). We set μ = 0.7, σ = 0.1, while again ρ ∈ {−0.9, −0.8, . . . , 0.8, 0.9}. Note that the simulation set-up borrows from the real data example of Section 5.2, for which the maximum likelihood estimate isθ = (0.724, 0.699, 0.103). Figure 3 shows the actual coverage of the nominal 95% confidence intervals for γ derived from Equation (15) (exact), and by using the pivots r (1st), r * (3rd), and r * B (Bayes) for the no nuisance parameter case. The higher order likelihood pivots r * and, to a somewhat lesser extent, r * B outperform their first-order counterpart r, even for the very limited sample sizes considered in the four scenarios of Simulation 1. The differences among the four pivots fade out as the sample size increases. Table 3 summarizes the performance of the nominal 95% confidence intervals for γ derived from the four pivots considered in Simulation 1. Being there no nuisance parameter present, the corresponding true coverage probabilities are very close to the nominal level, as we may have expected. Inspection of the upper and lower error probabilities reveals that the r * and r * B pivots also improve in terms of symmetry over r. The exact method produces confidence intervals for γ which are, on average, wider than the confidence intervals obtained from the first-order solution r and the higher order pivots r * and r * B . The average length of all four confidence intervals is larger for negative values of ρ, and increases when the correlation tends to −1.  Figure 4 reports the actual coverage of the nominal 95% confidence intervals for γ obtained from expressions (17) (HP) and (19) (ACI 5 ), and by using the pivots r (1st), r * (3rd) and r * B (Bayes). In terms of real coverage, r * again outperforms its first-order counterpart r. It also outperforms the large-sample (HP) proposal by Haddad and Provost [27] and, surprisingly, the Bayesian solution r * B . The most accurate method is the large-sample confidence interval developed by Mameli et al., [11] although the differences fade out for increasing sample size. Table 4 summarizes the performance of the nominal 95% confidence intervals for γ derived from the five methods considered in Simulation 2. The results reveal that r * is more accurate than r, especially when the sample size is small, because of both an, on average, larger width and its capability of correctly centering the confidence intervals. The r * B pivot consistently overestimates the real coverage, while guaranteeing symmetry on the tails, because of the, on average, longer confidence intervals it produces. The ACI 5 and HP methods lead to confidence intervals for γ which are remarkably asymmetric. Their better performance with respect to, respectively, r * and r may be explained by the, on average, larger widths of the corresponding confidence intervals. For all five methods considered, the expected length becomes larger for negative values of ρ, especially when ρ is close to −1. This is in agreement with, [11] who noted the same behaviour for their ACI 5 confidence interval. Table 3. Summary statistics for Simulation 1: bivariate normal with means 0 and variances 1. Empirical coverage (CP), upper (UE) and lower (LE) error probability and average length (AL) of nominal two-sided 95% confidence intervals for γ , for varying values of ρ and sample size n = 5. Pivots used: likelihood root r (1st), modified likelihood root r * (3rd); Bayesian modified likelihood root r * B (Bayes); expression (14) by Haddad and Provost [27] (exact). Based on 10,000 replicates; simulation error: ±0.004.      Figure 5 shows the actual coverage of the nominal 95% confidence intervals for γ by using the pivots r (1st), r * (3rd), r * B (Bayes) and (22) (ACI 3 ) in the equi-correlated bivariate normal model. In terms of real coverage, r * again outperforms its first order counterpart r, the Bayesian solution r * B and the approximate solution (23). Table 5 reveals that the small-sample pivots r * and r * B exhibit more reliable coverage than the confidence intervals obtained from their large-sample counterpart r and the pivot (22), although the Bayesian solution r * B somewhat overestimates the nominal level. Moreover, r and, to a somewhat lesser extent (22), exhibit, for small-sample sizes, an unsatisfactory behaviour as far as the symmetry of errors is concerned, although the ACI 3 confidence intervals seem to improve as soon as n ≥ 10 (see the Supplementary Material). In addition, r * B and Equation (22) produce confidence intervals with, respectively, larger and shorter expected widths as compared with the intervals obtained from r * and r. The finite-sample differences among the four pivots vanish as the sample size increases. For all four methods, the expected length becomes wider for negative values of ρ, especially when ρ approaches −1.

Concluding remarks
In this paper we investigate the behaviour of modern likelihood-based small-sample procedures to compute confidence intervals for the parameter of skewness which characterizes the distribution of the maximum/minimum of a bivariate normal and exchangeable random vector. This distribution may, for example, be used for assessing the degree of concordance of a continuous mono-zygotic twin trait when interest focuses on the pairwise maximum or minimum, as in Section 5.1. Or, it may represent the reference model for making inference on contra-lateral measurements, as in Section 5.2.
Extensive numerical investigation revealed that the higher order frequentist pivot r * is highly accurate, especially for the rather small-sample sizes which may be encountered, and for the challenging situation where ρ is close to −1. This is in agreement with the findings by Sun and Wong [30], though their contribution focuses on ρ and does not consider the custom-tailored statistics of Section 4.2. When no nuisance parameter is present, r * yields confidence intervals which, for practical purposes, may be considered exact. Among the four alternatives available in the presence of nuisance parameters, the only real competitor to r * , in terms of both real coverage and required computational efforts, is the ACI 5 confidence interval, though it leads to, on average, longer confidence intervals which counterbalance the lack of symmetry on the tails. The potential applicability of the ACI 5 method to studies on twins was already put forward in [11]. The construction of confidence sets for γ in the three-parameter bivariate normal model shows that the small-sample frequentist confidence interval based on r * outperforms its three alternatives with respect to both average length and symmetry of errors. The ACI 3 confidence interval requires a minor computational effort than the likelihood-based.  However, despite the fact that its construction borrows from [11], it performs poorly in terms of both coverage probability and symmetry of error rates. This is especially true for small-sample sizes. The straightforward extension of our proposal is to the skew-normal model. For instance, Loperfido's theorem provides the reference models for mono-zygotic twin studies for which information on the pair (X 1 , X 2 ) is missing, and only their maximum (or minimum) value is recorded. This may happen because of practical reasons; see [9] for a rather early treatment. As pointed out there, because healthy mono-zygotic twins share an identical genetic mark-up, time of onset for a particular event in the first twin -such as getting a cold or developing leukaemia -is likely to closely follow in the second twin, so that only the smaller or larger record may be kept. The same holds true for clinical comparative studies. Furthermore, working with the maximum (or minimum) of two correlated measurements can be, at times, more reliable than the study of the original values, especially if the measurements of the smaller (or larger) values are more accurate.
A further possible line of research is the investigation of small-sample asymptotic theory in the non-standard setting of finite skew-normal mixtures (see, for instance, [36,37]).

Supplementary Material
The Supplementary Material contains additional results for the simulation studies of Section 6, and the R code used in Examples 5.1 (twins.R) and 5.2 (longitudinal.R). As permission to use the data of Example 5.1 must be required from the Authors of [13], the code in twins.r reproduces the analysis using simulated data from a bivariate normal model with vector of parametersθ = (0.900, 7.061, 6.924, 0.905, 0.872).