Integrated Likelihood Inference in Small Sample Meta-analysis for Continuous Outcomes

. This paper proposes the use of the integrated likelihood for inference on the mean effect in small-sample meta-analysis for continuous outcomes. The method eliminates the nuisance parameters given by variance components through integration with respect to a suitable weight function, with no need to estimate them. The integrated likelihood approach takes into proper account the estimation uncertainty of within-study variances, thus providing conﬁdence intervals with empirical coverage closer to nominal levels than standard likelihood methods. The improve- ment is remarkable when either (i) the number of studies is small to moderate or (ii) the small sample size of the studies does not allow to consider the within-study variances as known, as common in applications. Moreover, the use of the integrated likelihood avoids numerical pitfalls related to the estimation of variance components which can affect alternative likelihood approaches. The proposed methodology is illustrated via simulation and applied to a meta-analysis study in nutritional science.


Introduction
Meta-analysis is a diffuse approach to combine evidence from different studies about the same issue of interest. The usage of meta-analysis pervades almost any area of research, such as, biological sciences, medicine, epidemiology and, more recently, economics and behavioural investigations (Roberts, 2005;Sutton, 2008).
Meta-analysis is typically performed by specifying an appropriate random-effects model, with the random component associated to the different studies providing summary information about the common issue of interest. Inference is then carried out by relying on the procedure by DerSimonian & Laird (1986), traditionally, or on more recent likelihood approaches developed either from a frequentist or a Bayesian perspective (van Houwelingen et al., 2002). The reliability of the inferential conclusions is strictly related to the amount of information available from the meta-analysis studies. This paper investigates the problem of small sample inference as a consequence of small sample size for the studies included in the meta-analysis or as a consequence of a small number of studies recruited in the meta-analysis.
A common strategy in meta-analysis assumes that the within-study variances provided by each study are known and equal to the variances associated to the estimates of the mean effect (van Houwelingen et al., 2002, Section 3). The justification is that the sample size of each study is large enough to guarantee a good estimate of the true within-study variance, with little or no impact on the results. Actually, such an assumption is justifiable in case of large studies, as, for example, many medical or epidemiological investigations. Conversely, standard inference performed on studies of small sample size can provide misleading results, if the uncertainty related to variance estimation is not properly taken into account. Several authors pointed out the relevance of the problem, for example, Hardy & Thompson (1996), Brockwell & Gordon (2001), Sidik & Jonkman (2007), and Sánchez-Meca & Marín-Martínez (2008), with the suspicion that consequences could affect the variance estimator of the mean effect and related inferential procedures. Simulations by Böhning et al. (2002) illustrate that the DerSimonian and Laird estimator of the between-study variance can be prone to considerable bias when estimates of the within-study variances are employed, and Jackson & Bowden (2009) show that changes in the distribution of the within-study variances can notably affect the performance of the quantile approximation method by Brockwell & Gordon (2007). Several solutions have been proposed in the literature to face the problem. Böhning et al. (2002) rely on population-averaged study specific variances, although this is not generally applicable. For meta-analysis of standardized differences, Malzahn et al. (2000) take into account within-study variance estimates when proposing a nonparametric estimation of the betweenstudy variance, while Di Gessa (2008) investigates the use of shrinkage approaches for variance estimation. Johnson & Huedo-Medina (2013) show the advantages of using the standardized mean difference in place of the unstandardized version as a tool to incorporate within-study variances directly in the effect measure. The problem of the estimation of the within-study variances has been recently faced by Sharma & Mathew (2011) with reference to the consensus mean in inter-laboratory studies. Although Sharma & Mathew (2011) never directly refer to meta-analysis and related terminology, the framework they focus on is analogous. For the purpose of investigation on the consensus mean in inter-laboratory studies, Sharma & Mathew (2011) propose to improve on likelihood results by applying higher-order asymptotics via second-order likelihood ratio statistic (Skovgaard, 1996). Nevertheless, the approach can suffer from some computational problems, as illustrated in this paper, requiring much care for its application.
Small-sample inference in meta-analysis can also arise as a consequence of the limited number of studies recruited, a concern which has been raised by several authors in the literature (e.g. Hardy & Thompson, 1996;Normand, 1999;van Houwelingen et al., 2002). Within a likelihood-based approach, for example, the small number of studies can give rise to inaccurate inferential conclusions when relying on first-order approximations, such as the 2 distribution for the likelihood ratio statistic. Guolo (2012) exploits the theory of higherorder asymptotics (Severini, 2000) to refine first-order likelihood solutions in meta-analysis, when the within-study variances are assumed to be known. The attention is paid to the Skovgaard's second-order statistic, which is implemented within the R (R Core Team, 2015) package metaLik (Guolo & Varin, 2012). In this paper, we consider the problem of small study inference in meta-analysis for continuous outcomes when information is available as summary data. We suggest to perform the meta-analysis by using the integrated likelihood (Severini, 2000, Section 8.4). The approach replaces the elimination of the nuisance parameters given by variance components through maximization with their elimination by integration. We show that this method provides a good accuracy of inferential results, and it is free of numerical pitfalls. The proposed approach is evaluated through a simulation study covering scenarios of practical interest, and it is applied to a meta-analysis study in nutritional science.

Likelihood inference
Consider a meta-analysis of n independent studies about a common effectˇ. Let Y i be the summary measure ofˇobtained from study i , i D 1; : : : ; n, such as, the mean difference. The classical model for meta-analysis is the random effects model (DerSimonian & Laird, 1986) whereˇi is the random effects component associated to each study Variance components are the within-study variances 2 i , i D 1; : : : ; n, and the betweenstudy variance 2 . Thus, marginally, Y i Normal.ˇ; 2 i C 2 /: The traditional approach to meta-analysis is based on the assumption that each within-study variance 2 i is known and equal to the variance estimate reported in the i-th study. This assumption is justifiable when the sample size of each study included in the meta-analysis is large. Otherwise, inference can provide misleading results, if the uncertainty related to the variance estimation is not properly taken into account. Let S 2 i denote the measure of the within-study variance 2 i obtained from study i having f i degrees of freedom, with S 2 i following a scaled chisquare distribution, S 2 i f i = 2 i 2 f i . For example, f i is equal to n i 1 where n i is the sample size of each study, in case of a single group or a paired t test, or f i is equal to n i1 C n i2 2 in case of a two-group comparison, with n i1 and n i2 denoting the sample size of each group in study i . When the outcome is derived from the analysis of covariance, then f i D n i p i 1, if the number of observations n i and the number of covariates p i for study i are available.
According to the specifications earlier, the log likelihood function for the .nC2/-dimensional parameter vector Â D .ˇ; ; 1 ; : : : ; n / T is Inferential interest is usually focused on the mean effectˇ, while variance components are considered as nuisance parameters. Accordingly, we can partition Â into Â D .ˇ; / T , where D . ; 1 ; : : : ; n / T . Let O Â D . Ǒ ; O / T denote the maximum likelihood estimate (MLE) of Â and let O ˇd enote the constrained MLE of for a given value ofˇ. Let`P.ˇ/ indicate the corresponding profile log likelihood forˇ,`P.ˇ/ D`.ˇ; O ˇ/ . Inference onˇcan be based on the signed profile log likelihood ratio statistic which is asymptotically distributed as a standard normal up to first-order error, under mild regularity conditions (Severini, 2000, Section 4.4).
Despite the feasibility, a serious drawback of first-order asymptotic results is that they can be inaccurate in case of a large dimension of the nuisance parameter compared with the available information. In meta-analysis, this corresponds to either small study sizes, which leads to imprecise estimation of the within-study variances, or to a small number of studies, which lead to imprecise estimation of the between-study variance. To face the problem, it is preferable to resort to the theory of likelihood asymptotics (Severini, 2000), which makes several solutions available for the task. Skovgaard (1996) proposes to base inference on a scalar component of interest on statistic which is asymptotically standard normally distributed up to second-order error. The component u.ˇ/ included in (2) is a function of the observed and the expected information matrices and of covariances of likelihood quantities, evaluated at the MLE and the constrained MLE. Skovgaard's statistic is well defined for a wide class of sufficiently regular parametric models, and it is invariant with respect to interest-respecting re-parameterizations. Guolo (2012) investigates the applicability of Skovgaard's statistic in meta-analysis and meta-regression problems, following the convention of assuming known within-study variances. The approach is satisfactory in improving on the accuracy of standard first-order likelihood analysis when the sample size n is small to moderate. Sharma & Mathew (2011) examine the performance of Skovgaard's statistic in inter-laboratory studies where interest relies on the consensus mean, assuming unknown different within-laboratory variances. The simulation studies performed highlight a better accuracy of results based on r P with respect to its first-order counterpart. The computational difficulties and numerical instabilities encountered by Skovgaard's approach in case of small sample sizes (Section 5) are mainly due to the fact that the nuisance parameters are eliminated via maximization. The same difficulty is also shared by alternative solutions like the modified profile likelihood (Severini, 2000, Chapter 9). For an illustration, see Vangel & Rukhin (1999), where an example of the profile likelihood for .ˇ; / T exhibiting some local maxima is provided. A different route is provided by the integrated likelihood (Severini, 2007), which eliminates the nuisance parameters by integration of the likelihood with respect to a weight function. For the model of interest here, the integrated log likelihood function forˇis where L.Â/ D exp¹`.Â/º, . jˇ/ denotes a weight function for for fixedˇand 2 . Once the integrated log likelihood is obtained, it can be used as a standard log likelihood function for inference. For example, let Ň be the estimate ofˇobtained from the maximization of`I nt .ˇ/.
Then, inference onˇcan be performed via the signed integrated log likelihood ratio statistic (Severini, 2010) Advantages of the integrated likelihood approach include better accuracy of the inferential results if compared with those from r P and reduced numerical instabilities in case of large dimension of (Severini, 2010). The main drawback is the specification of the weight function . jˇ/. Severini (2007) provides several suggestions about how to choose the weight function in order to make the integrated likelihood share the frequentist properties of a genuine likelihood function and be suitable for non-Bayesian inference. Possible choices are discussed in Section 3.

Integrated likelihood in meta-analysis
In our context, the parameter space for D . ; 1 ; : : : ; n / T is D OE0; 1/ .0; 1/ n , and the integrated log likelihood has the following form Int .ˇ/ D log The usage of (4) requires to overcome two main obstacles. The first one is the choice of the weight function for the nuisance parameter vector , for fixedˇ. The second obstacle pertains to the computation of`I nt .ˇ/.
For the choice of the weight function for , we can follow the recommendations by Severini (2007;2010). He advocates the use of an orthogonal parameterization of the nuisance parameters and the consequent choice of the weight function for free ofˇ. From a frequentist perspective, he shows that the best inferential results are achieved when the model parameterization is expressed so that the nuisance parameter is strongly unrelated toˇ. A nuisance parameter is strongly unrelated toˇif E¹` .ˇ; /Iˇ0; 0 ºˇ.ˇ0 ; 0 /D. Ǒ ; / D 0; where` is the score vector for and the expected value is computed before the evaluation at .ˇ0; 0 / T D . Ǒ ; / T . The function D .ˇ; I Ǒ / defines a data-dependent parameterization, and is called the zero-score-expectation parameter. When such a parameterization is employed, the resulting integrated likelihood is a high-order approximation to the modified profile likelihood (Severini, 2000, Section 9.3), which achieves optimal elimination of the nuisance parameters (Severini, 2007). With the zero-score-expectation parameterization, the choice of the weight function for the nuisance parameter becomes largely inconsequential. With reference to model (1), the nuisance parameter vector is orthogonal toˇ, that is, the corre-spondingˇ -block of the expected Fisher information is nil. Moreover, parametersˇand i are also strongly unrelated; indeed, O 2 i : D s 2 i . Let D . ; ı 1 ; : : : ; ı n / T , then 2 D 2 C . Ǒ ˇ/ 2 ; ı i D i ; iD 1; : : : ; n: Details about the derivation of the zero-score-expectation parameterization are reported in the Supporting Information. Once a strongly unrelated parameterization for the nuisance parameters is obtained, the weight function for and 1 ; : : : ; n can be chosen with some liberty. A simple choice is given by separate weights for all the components of , with . / / 1 and . i / / 1= k i , for fixed k. In the following, we set k D 1, after checking that different choices of k would lead to similar results. The choice of the weight function, coupled with the algebraic form of the score function forˇ,`ˇ.ˇ; /, implies that the signed integrated log likelihood ratio statistic r Int .ˇ/ in (3) is asymptotically standard normally distributed with high accuracy (Severini, 2010, Section 5). The latter property is shared also by the integrated likelihood computed using the original parameterization, provided that similar weights, free ofˇ, are adopted.
Computation of`I nt .ˇ/ is less demanding than it might seem at first sight. Indeed, under the assumption of independent meta-analysis information, L.ˇ; ; 1 ; : : : ; n / in (4) is the product of n similar terms, which can be readily recovered from formula (1). The aforementioned choice of the weight function for with separate components implies that`I nt .ˇ/ can be written as where g i .ˇ; / D R 1 0 L.ˇ; ; i / . i /d i and L.ˇ; ; i / is the likelihood term for study i. In other words, each of the n integrals g i .ˇ; / and the main integral in (6) amount to onedimensional integrals that can be approximated via standard numerical methods. In our study, the inner integrals for g i .ˇ; / in (6) is computed by adaptive Gauss-Kronrod quadrature, using the C function Rdqags, which is the port to the R library of C functions of the QUADPACK routine dqags (Piessens et al., 1983). The outer integral is computed by a standard Gaussian

R. Bellio and A. Guolo
Scand J Statist 43 quadrature. The resulting integrated log likelihood is quite a smooth function ofˇin all the experiments performed, and its maximization by means of a derivative-free optimizer is usually not an issue.

Cocoa intake and blood pressure reduction
Increasing consumption of sources of polyphenols is recommended by physicians as coadjutant therapy to face hypertension and prevent cardiovascular risks. Taubert et al. (2007) perform a meta-analysis of randomized controlled studies to evaluate blood pressure-lowering effects of cocoa and tea intake, which represent a high proportion of total polyphenol intake in Western countries. We focus on a portion of the data about the effectiveness on lowering diastolic blood pressure after 2 weeks of cocoa consumption. Data refer to five studies, with sample size ranging from 21 to 41. The estimate of the effect provided by each study is the mean difference in diastolic blood pressure before and after the cocoa consumption. Figure 1, left panel, reports the forest plot of the data, that is, a graphical display of the information provided by each study in the meta-analysis. Information includes the estimated mean difference from each study together with the associated 95% confidence interval. The summary estimate obtained from the likelihood analysis based on r P is added.
Likelihood approach provides an estimate of the treatment effect equal to 2:799 (s.e. 1.009), which is found to be significant, given the P-value equal to 0.030 associated to r P . The associated 95% confidence interval for the parameter is . 5:262; 0:397/. A comparable result is obtained by the standard likelihood approach assuming known within-study variances. The integrated likelihood approach based on the zero-score-expectation parameterization suggests a non-significant effect of cocoa consumption on lowering diastolic blood pressure, with the estimate of the treatment effect equal to 2:805 (s.e. 1.270) and the P-value for the effectiveness of the treatment equal to 0.071. The integrated likelihood accounts for the variability of the estimated within-study variances, and the associated 95% confidence interval for the parameter is wider, equal to . 6:027; 0:349/. The profile log likelihood function and the integrated log likelihood function are compared in Fig. 1, right panel.

Simulation studies
The performance of the integrated likelihood has been investigated via simulation.

Experiment based on the design of cocoa data
As a first experiment, we consider the same setting of the cocoa data and generate 10 000 data sets from the model of Section 2, with parameters equal to the maximum likelihood estimates, namely,ˇD 2:8; 2 D 4:27; . 2 1 ; : : : ; 2 5 / T D .0:34; 0:86; 1:37; 1:38; 0:29/ T . Inference oň is based on the signed profile log likelihood ratio statistic r P , Skovgaard's statistic r P and their counterparts assuming known within-study variances, r P,k and r P,k , respectively. The solutions are compared with the following specifications of the signed integrated log likelihood ratio statistic: r Int , based on (4) expressed in the original parameterization, with . / / 1 and . i / / 1= i ; Q r Int , based on the re-parameterized model using the zero-score-expectation parameter , with . / / 1 and . i / / 1= i ; N r Int , based on the re-parameterized model using the zero-score-expectation parameter , with . / / 1 and . i / / 1= i . Here, Ǒ in is replaced by the maximizer of (4) expressed in the original parameterization.
The latter choice has the virtue of not requiring the MLE ofˇ, thus bypassing all the numerical problems related to likelihood maximization.
The simulation studies evaluate the empirical one-sided rejection rates for the competing approaches according to different nominal levels. Results are reported in Table 1.
The standard first-order statistic r P provides empirical one-sided rejection rates which are far from the target levels, with coverages of confidence intervals substantially below the nominal level. An improvement over first-order results is provided by Skovgaard's statistic, although such an amelioration is not free of pitfalls. From a practical point of view, the evaluation of r P suffers from numerical instabilities when estimating the between-study variance in about 10% of the simulation trials. In most of these cases, either the MLE or the constrained MLE of is close to zero and the -part of the observed Fisher information matrices entering the definition of the adjustment u.ˇ/ in (2) fails to be definite positive. In such cases, r P is not Table 1. Empirical one-sided rejection rates for the signed profile log likelihood ratio statistic r P , Skovgaard's statistic r P , their counterparts r P,k and r P,k assuming known within-study variances and different specifications of the signed integrated log likelihood ratio statistic, r Int , Q r Int and r Int , based on 10 000 replicates Rejection rates r P r P r P,k r P,k r Int computable. The same problems are also experienced by Sharma & Mathew (2011) when applying Skovgaard's statistic in inter-laboratory studies with unknown within-laboratory variances. They suggest some practical measures to face the computational difficulties related to the evaluation of r P in case of limited data. For example, the observed information matrix, when not positive definite, can be substituted with a positive quantity, for example, the expected information matrix. After such a careful computation, the resulting statistic is always well defined, although the theoretical consequences of the various modifications are not clear. As a final note, the r P statistic can be unstable when the value under testing is close to Ǒ , thus requiring some further adjustments. See, for example, the discussion in Fraser et al. (2003). The results provided by r P,k are much more satisfactory than those from r P and r P,k , showing that for study size between 21 and 41, the effect of taking the within-study variances as fixed is minor.
The use of the integrated likelihood approach provides a substantial improvement of the results accuracy with respect to r P , and overall, it is the most accurate solution. Moreover, the application of the approach is not affected by any computational inconvenience, especially when N r Int is employed. Empirical rejection rates are close to the target levels, with no appreciable difference among the integrated likelihood specifications (Table 1).

Experiments based on a planned design
For a more systematic investigation, we design a study with three experimental factors, namely, (i) the number of studies n 2 ¹5; 20º; (ii) the study degrees of freedom f i 2 ¹9; 24º; and (iii) the between-study variance 2 2 ¹0:1; 0:5; 2º. For each combination of the experimental factors, 10 000 data sets are generated following the model specification given in Section 2 with param-etersˇD 1 and 2 i generated from a uniform variable on OE0:1; 2:0. Inference onˇis performed using statistics r P , r P,k and the three statistics based on the integrated likelihood introduced in §5.1. Skovgaard's statistic r P is not used for comparison because of the overwhelming percentage of data sets with numerical problems encountered in the simulation (up to over 70% for some of the experiments), which makes the results unreliable. Statistic r P,k is not considered as well, as in the previous simulation study, the adjusted version r P,k turned out to be uniformly preferable. The simulation study evaluates the empirical coverages of the confidence intervals at nominal level 0.95 for the competing approaches (Fig. 2). A notable result from the simulation study is the crucial role of the number of studies and the true value of the between-study variance 2 in determining the performance of the various methods. The liberal behaviour of r P is apparent across different settings. Skovgaard's statistic r P,k assuming known within-study variances provides a remarkable improvement, although it underestimates the nominal level for small number of studies and large values of 2 . The integrated likelihood is the most satisfactory solution overall. A conservative performance is experienced for small values of 2 and a small number of studies. Results from Q r Int and N r Int essentially overlap, so that only the latter are displayed. The two solutions are both preferable to r Int , with a coverage of confidence intervals which tends to be close to 0.95 or slightly higher.

Concluding remarks
This paper considers small-sample inference in meta-analysis of normally distributed measures of the effect of interest. Instead of assuming the within-study variances as known, we considered an integrated likelihood approach to account for the additional uncertainty related to the estimation of the within-study variances. The methodology is shown to provide accurate inferential results when the sample size of the studies or the number of studies included in the meta-analysis is limited. Meanwhile, the method avoids the computational difficulties related to the large number of nuisance parameters. Typically, the usage of the integrated likelihood proposed here would lead to more cautious inferences, a commendable result in several settings with limited sample information.
The focus of the paper is on meta-analysis of summary data. When individual patient data are available, the small sample size of meta-analysis studies is typically not a drawback, and appropriate hierarchical models are commonly adopted for inference. The small number of studies, instead, can still represent a source of inaccurate inference to deal with.
The estimation of the within-study variances for binary data differs from the case of normally distributed outcomes examined in this paper, because the available information is analogous to that from individual patient data. For this situation, several approaches in the literature have been proposed, which mainly focus on hierarchical models, for example, Smith et al. (1995), Turner et al. (2000), and Hamza et al. (2008). The integrated likelihood for binary data maintains an interesting analogy with that examined in this paper under the normal case, despite obtaining a weight function for the nuisance components that is less immediate. Details about the binary data case are reported in the Supporting Information.
From the methodological side, the problem studied in this paper is an instance of twoindex asymptotics (Sartori, 2003), meaning that the available sample information grows with both the observations within each study and the number of studies. Although we did not formally cast the study of the available methodology within the two-index setting, it seems worth mentioning that recent results presented in De Bin et al. (2015) substantiate the good properties of the integrated likelihood using the zero-score-expectation parameterization for general statistical models within two-index asymptotics.
Albeit the methodology discussed here is embedded in a frequentist approach, an extension to a full Bayesian formulation is possible. In fact, the integrated likelihood (4) can be used along with an a priori distribution forˇto obtain a marginal posterior distribution.
Although the paper considers the integrated likelihood approach within the meta-analysis context, the extension to the meta-regression case is straightforward. Deriving the explicit form of the zero-score-expectation parameterization, however, requires to assume equal within-study variances. Details are reported in the Supporting information.