Abstract

In mixed treatment comparison (MTC) meta-analysis, modeling the heterogeneity in between-trial variances across studies is a difficult problem because of the constraints on the variances inherited from the MTC structure. Starting from a consistent Bayesian hierarchical model for the mean treatment effects, we represent the variance configuration by a set of triangle inequalities on the standard deviations. We take the separation strategy ( Barnard and others, 2000) to specify prior distributions for standard deviations and correlations separately. The covariance matrix of the latent treatment arm effects can be employed as a vehicle to load the triangular constraints, which in addition allows incorporation of prior beliefs about the correlations between treatment effects. The spherical parameterization based on Cholesky decomposition (Pinheiro and Bates, 1996) is used to generate a positive-definite matrix for the prior correlations in Markov chain Monte Carlo (MCMC). Elicited prior information on correlations between treatment arms is introduced in the form of its equivalent data likelihood. The procedure is implemented in a MCMC framework and illustrated with example data sets from medical research practice.

1. INTRODUCTION

Meta-analysis of mixed treatment comparisons (MTC) (Lu and Ades, 2004), or network meta-analysis (Lumley, 2002), synthesizes comparative evidence on multiple treatments from randomized controlled trials (RCTs) available in a therapeutic area and has been increasingly used in medical decision making and health technology assessments. A brief review can be found in Sutton and Higgins (2008).

MTC models often assume, for conceptual and technical simplicity, that the between-trial variances are all equal (Higgins and Whitehead, 1996; Lumley, 2002), but in many situations, this is far from realistic. Frequently, particular pairs of treatments are pharmacologically similar to each other, but not to treatments in another class, and one might expect less between-trial variation where the treatments are more similar. For example, in trials of thrombolytic treatments following acute myocardial infarction, we might expect a low between-trial variation, as all these treatments act physiologically in a similar way. Variations in patient group or protocol would, therefore, tend to affect all the treatments together, generating a high correlation in treatment effects and low between-trial variances. By contrast, in comparisons between these treatments and percutaneous transluminal coronary angioplasty, a surgical procedure, it would be reasonable to expect greater between-trial variation.

Spiegelhalter and others (2004) have argued that, where random effect models are used to summarize treatment effects, the predictive distribution of the effect in a new trial, for example, δ X Y new N ( d X Y , σ X Y 2 ) , may be a more appropriate representation of our belief about future efficacy than the mean effect d X Y (see also Ades and others, 2005). Since the variance of the predictive distribution depends critically on the population variance in effect size, it becomes more important that our estimate of the latter is unbiased. The assumption of homogeneous variance in these circumstances would often be difficult to defend.

However, modeling heterogeneous variances is a difficult task because the consistency of the MTC structure imposes some implicit constraints on the variances and covariances of random effects. Models in which the variances are assumed to be independent or treated as exchangeable random samples from a common distribution ( Lu and others, 2007) are incompatible with the consistency relationships on which MTC models rely (see section 3.1).

In this paper, we propose an approach to modeling between-trial variance structures that is compatible with consistency assumptions and allows one to incorporate prior information on correlations between treatment arms, which may be an important consideration if data are sparse. Section 2 briefly describes exchangeability and consistency assumptions in Bayesian hierarchical modeling for MTC. Section 3 explains how the triangle inequalities are inherited from the requirement of evidence consistency and adopts a separation strategy to overcome the technical difficulty in embedding weakly informative prior distributions for the heterogeneity parameters. Section 4 discusses how to incorporate prior correlation information into a MTC heterogeneity variance model. We also apply our methods to 2 published data sets.

2. BAYESIAN HIERARCHICAL MODELS FOR MTC

2.1. MTC as incomplete data

Bayesian hierarchical models for MTC have been proposed by Lu and Ades (2004, 2006) and Lu and others (2007) under consistency conditions, extending ideas first proposed by Higgins and Whitehead (1996). Suppose that N RCTs have been conducted making mixed comparisons between K treatments. The comparisons are "mixed" in the sense that there may be only a subset of the treatments that are compared in each trial. Let

graphic

be the available data from the jth trial, where r j k and n j k are the numbers of events and cases, respectively, on treatment k, and 𝒯 j is the set of treatments that are compared in trial j. We shall denote the baseline effect in trial j by μ j and denote for h , k 𝒯 j the relative effect of treatment k related to treatment h by δ j h k , on the log odds ratio scale. For narrative convenience, we also denote the set of treatments involved in all trials by 𝒯 = 𝒯 j and the set of trials that compare treatment pair (X, Y) by S X Y . Usually, the number of treatments in trial j is 2 or 3, leading to a very sparse data matrix. We thus need to develop a way of "borrowing strength" across both trials and MTC structure (Higgins and Whitehead, 1996). This requires a fully specified model that can match the MTC structure.

2.2. Borrowing strength from evidence consistency

Analogous to the well-known idea of missing at random in dealing with missing data, we may assume that the evidence about treatment contrasts from various sources is consistent in the sense that the "true" relative effect of each treatment contrast remains the same over all trials under investigation whether or not it has been directly exposed to comparison in a particular trial. This leads to the basic assumption for a Bayesian hierarchical model as given below.

Assumption of exchangeability: For any treatment pair (X, Y), the random relative effects { δ j X Y , j = 1 , , N } are conditionally independent, given the true hyperparameters ( d X Y , σ X Y 2 ).

Note that although there are no direct comparisons between treatments (X, Y) in trials j S X Y , these trials do contain indirect information about d X Y . For any treatment triple ( X , Y , Z ) , the definition of δ on the log odds ratio (LOR) scale implies the following fundamental relationships in MTC.

graphic

(2.1)

By taking expectations on both sides, Lu and Ades (2006) characterize evidence consistency with a system of conditions on the first-order moments called

graphic

(2.2)

These equations are essential for MTC modeling because they reflect the logical relationships between the population treatment contrasts and allow the model to borrow strength across the comparison networks. The unstructured hyperparameter set { d b k , k b , b , k 𝒯 } for population relative effects is now endowed with a coherent structure that effectively reduces the dimension of the parameter space.

2.3. Bayesian hierarchical models

Let b ( j ) be the specified baseline treatment in trial j. When there is no risk of confusion, we simply denote b ( j ) by b. Let p j k be the probability of an event on treatment k. Under evidence consistency, a Bayesian hierarchical model in general settings can be written as follows:

graphic

(2.3)

The parameters are interpreted as follows. At the study level, μ j represents the specified baseline effect that is regarded as a nuisance parameter. As before, δ j b k is the treatment effect of treatment k related to the baseline b ( j ) on the logit scale. Here, using parameters ( μ j , δ j b k ) rather than arm-effect parameters retain randomization in Bayesian meta-analysis ( Smith and others, 1995; Thompson and others, 1997). The dummy variable X j k is the indicator for baseline taking value 1 when kb and value 0 when k = b. At the second level, while the hyperparameters d b k and σ b k 2 are interpreted as before, the hyperparameters γ h k ( b ) describe the correlation between 2 treatment contrasts within a trial. These are only needed in a MTC model when there are multiarm trials on treatment triples ( b , h , k ). For a full Bayesian analysis, all these hyperparameters require suitable prior distributions. While priors for d b k , informative or noninformative, can be specified in a standard way ( Smith and others, 1995; Thompson and others, 1997), the specification of variance priors is a difficult task which we address below.

3. PRIORS FOR HETEROGENEOUS VARIANCES

3.1. Variance configuration in MTC

Denote the set of distinct hyperparameters for variances by 𝒱 { σ b k 2 , b  < k 𝒯 } . By taking variances on both sides of the transitivity relations (2.1), for any treatment triple ( b , h , k ) 𝒯 , we have

graphic

which in turn implies the second-order consistency conditions named as

graphic

(3.1)

Thus, the values of σs form a configuration with any 3 standard deviations being able to configure a triangle (Figure 1). The mass of the joint distribution for σs is concentrated on a cone contained in + K ( K 1 ) / 2 . When all σs are equal, we obtain a homogeneity model with all γs being 1 / 2 (Higgins and Whitehead, 1996). In this case, the triangle inequalities hold trivially. A MTC model satisfying both the first- and the second-order moment conditions would be more strictly a consistency model than one built on the first-order moment condition only. Also, under assumption of normality for contrast effects, the first- and second-order conditions would be sufficient to represent evidence consistency. However, the second-order moment constraint can cause serious difficulties in the specification of prior distributions for heterogeneous variances.

Fig. 1.

Variance configuration for 4 treatment MTC: any 3 standard deviations configure a triangle with sds as sides.

Variance configuration for 4 treatment MTC: any 3 standard deviations configure a triangle with sds as sides.

Note that, once the values of σs are given, the correlation coefficients γ h k ( b ) , when needed, can be determined by the cosine theory for triangles

graphic

(3.2)

that is, each correlation coefficient is geometrically the cosine of the angle formed by sides bh and bk, which can be represented by lengths of the 3 sides in a triangle.

3.2. Ancillary variable method

We introduce some ancillary parameters as an instrument to guarantee triangle inequalities for σs. As the random effect δ j b k is the LOR related to treatment pair ( b , k ) on the logit scale, σ b k 2 can be represented as

graphic

(3.3)

where τ b 2 and τ k 2 are, respectively, variances of 2 random quantities θ j b and θ j k that can be interpreted as the random effects of corresponding treatment arms up to a common unknown constant, and ρ b k is their correlation coefficient. Under this representation, it is easy to check that triangle constraints will be retained for the σs. Let Ω be the covariance matrix formed by τs and ρs. Then, the prior distributions for the σs under triangle constraints can be specified via a suitable distribution for Ω and (3.3).

Special choices of Ω can lead to particularly simplified models. For example, when all τs and all ρs are equal, we obtain a homogeneity model with all σs equal and all γs in (3.2) equal 1/2. A slight extension allows τ k to vary freely but keeps ρs being a constant, which may reflect our prior belief in the equality of interactions between treatment effects. Thus, when there is some prior knowledge or belief about the Ω (see also example 2 in section 4), the ancillary parameter method may provide a useful way to incorporate such prior information into the model.

Note that, in general cases, the mapping from the ancillary parameter set { τ k 2 , ρ b k , b < k } to the variance parameter set { σ b k 2 , b  < k } is not a one-to-one transformation. The role of ancillary parameters is to guarantee triangle constraints for the variances σs.

3.3. Weakly informative prior distribution for Ω

The basic requirement for specifying a prior distribution for Ω is that it should be nonnegative definite. This problem, frequently encountered in Bayesian multivariate analyses for econometrics, psychology, education studies, and so forth, has been investigated by many statisticians, including Dempster (1972), Anderson and others (1987), Eaton and Olkin (1987), Leonard and Hsu (1992), Liu (1993), Yang and Berger (1994), Pinheiro and Bates (1996), Daniels and Kass (1999), Barnard and others (2000), and Sun and Sun (2005).

To achieve the flexibility we require in the prior for Ω, we take the separation strategy ( Barnard and others, 2000) to write

graphic

(3.4)

and specify independent priors for V 1 / 2 and R, where V 1 / 2 is diagonal with elements τ k , and R is the correlation matrix with nondiagonal elements ρ b k . Possible choices of weakly informative prior for τ k include inverse gamma (IG) ( Spiegelhalter and others, 2004), uniform ( Warn and others, 2002), or half-t (Gelman, 2006).

To specify prior distributions for the correlation matrix R, we use the Cholesky decomposition to write R= L T L, where L is an K×K upper-triangular matrix, and then obtain the weakly informative distribution for L through the spherical parameterization (Pinheiro and Bates, 1996) as follows.

Let L k denote the kth column of the matrix L, and k = ( k 1 , , k k ) T the spherical coordinates of the first k elements of L k , given, for k = 2 , , K , by

graphic

(3.5)

with 11 = 1 . To ensure uniqueness of the spherical parameterization, we must have φ k m ( 0 , π ) , for m = 2 , , k (Pinheiro and Bates, 1996). Then, the (h, k) entry of R can be represented as the inner product

graphic

(3.6)

It is easy to check that the diagonal entries satisfy the condition ρ k k = L k T L k = k T k = 1 . For illustration, the 3 × 3 upper-triangular matrix L for Cholesky decomposition is given by

graphic

(3.7)

with the 3 coordinate parameters ( φ 21 , φ 31 , φ 32 ) , and the corresponding correlation matrix R is

graphic

(3.8)

In the general case, all the angles in (3.5) may be given a uniform distribution over interval (0, π), resulting in a general correlation matrix R with ρ h k ( 1 , 1 ) .

3.4. Positive correlation model

Within trials, each treatment is tested on the same population. It is therefore very likely that the random effects of treatment arms tend to be positively correlated so that ρ h k ( 0 , 1 ) for all h k . In this case, all the angles in (3.5) should be given a uniform prior distribution on (0, π/2) instead of (0, π).

As φ U ( 0 , π / 2 ) , y = cos ( φ ) has the density p ( y ) = 2 π 1 ( 1 y 2 ) 1 / 2 , y ( 0 , 1 ) , which gives more mass to the value of y close to 1. However, we may specify prior distributions for y = cos ( φ ) directly by using the relation sin ( φ ) = ( 1 y 2 ) 1 / 2 for acute angle φ. Thus, the spherical coordinates in (3.5) can be rewritten in terms of ys, and each y i can be specified as beta ( α , β ) distributions, where α , β are some known constants or hyperparameters depending on our prior knowledge about the correlations between arm effects.

3.5. Example 1: Smoking cessation data

This data set (section 1 of the supplementary material available at Biostatistics online [http://biostatistics. oxfordjournals.org/]) comprises 24 trials on smoking cessation ( Fiore and others, 1996), 22 two-arm and 2 three-arm trials, with 18 822 participants, making comparisons between 4 treatments: (A) no contact, (B) self-help, (C) individual counseling, and (D) group counseling. Meta-analysis for this data set was studied by Hasselblad (1998), Caldwell and others (2005), and Lu and Ades (2006). Although the heterogeneity is not too serious in this data set, we use it to illustrate our method. Considering that there are only 2 medium size 3-arm trials, in the following analysis, we ignore the multiarm effects and treat each 3-arm trial as 2 independent 2-arm trials.

We shall fit and compare 4 variance models, each constituted by combining model (1) with a particular between-trial variance structure:

  • (i) HOM model: The usual homogeneity case in which all the between-trial variances are equal to σ 2 , which is given a weakly informative prior, for example, IG distribution IG(10−3, 10−3).

  • (ii) ID model: The unstructured heterogeneity model, assuming that { σ X Y 2 , X Y } , where "≺" means alphabetic priority, is independent and is given a weakly informative prior IG(10−3, 10−3). Because the triangle inequalities are not guaranteed, it is only a first-order consistency model.

  • (iii) GC model: The general heterogeneity case under the consistency condition on the second-order moments where there is no prior information available about the correlations between treatment effects. In the Cholesky parameterization (3.5), we take φ k m U ( 0 , π ) .

  • (iv) PC model: The specific case of the GC model with the further assumption that correlations between effects of treatment arms are all positive. In the Cholesky parameterization (3.5), we take φ k m U ( 0 , π / 2 ) and name the model as positive correlation (PC)-I. When taking y k m cos ( φ k m ) beta ( α , β ), we name the model as PC-II when α = β (note: α = β = 1 corresponds to y k m U ( 0 , 1 ) ) , and as PC-III when α ≠ β.

We employ the Markov chain Monte Carlo (MCMC) method with the software WinBUGS ( Spiegelhalter and others, 2001) to calculate posterior estimates of the parameters of interest. The code is given in section 2 of the supplementary material available at Biostatistics online (http://biostatistics.oxfordjournals.org/). Table 1 shows the posterior means for the basic treatment contrast parameters d A B , d A C , d A D , and the standard deviation parameters σ and σ X Y under the above models. Figure 2 plots the posterior medians with 95% Bayesian credible intervals (CIs) for σ X Y 's under these models. The GC model and the 3 versions of PC model lead to very similar results, reflecting the insensitivity of the parameter estimates to the choice of weakly informative prior distribution for the Cholesky parameters for the correlation matrix R. However, the ID model results in unstable estimates where data are sparse, without constraints of the triangle inequalities. The posterior mean of σ A D under the ID model is as unrealistically high at 5.80 with a large standard deviation 12.86 and a very wide 95% CI (0.14, 27.92) compared with the posterior mean 0.95 with standard deviation 0.55 and 95% CI (0.15, 2.11) under the PC-I model, which highlights the role of triangle inequalities in "borrowing strength" for informing between-trial variances across treatment contrasts. Under the triangle inequalities, the values of σ A D is bounded above by min { σ A X + σ X D , X = B , C } . Table 1 also lists the means and standard deviations of the predictive distributions for relative treatment effects δ X Y new for X , Y 𝒯 , where for the same reason as above the ID model leads to a very flat predictive distribution for δ A D new . The PC-I model seems particularly successful in limiting posterior uncertainty in prediction.

Fig. 2.

Posterior medians with 95% Bayesian CIs for standard deviations under HOM-, ID-, GC- and PC models for smoking cessation data. The upper 2.5% quantile for σAD under ID model is as unrealistically high as 27.92, about 12 times of that under other models. The figures below case indices are the numbers of trials that make corresponding comparisons. Note that there are two 3-arm trials.

Posterior medians with 95% Bayesian CIs for standard deviations under HOM-, ID-, GC- and PC models for smoking cessation data. The upper 2.5% quantile for σ A D under ID model is as unrealistically high as 27.92, about 12 times of that under other models. The figures below case indices are the numbers of trials that make corresponding comparisons. Note that there are two 3-arm trials.

Table 1.

(Hassulblad data) Comparison of posterior mean under 4 variance models. The multi-arm trial effects on 2 three-arm trials have been ignored. Treatment effects are reported on the log odds scale. Results are based on 30 000 MCMC samples after 20 000 iterations burn-in. Some unrealistically high values under the ID model are underlined

Parameter of interest Variance model (posterior mean/sd)
HOM ID GC: φ ∼ U(0, π) PC-I: φ ∼ U(0, π/2) PC-II: cos(φ) ∼ U(0, 1) PC-III: cos(φ) ∼ beta(5, 3)
dAB 0.523 (0.38) 0.38 (0.26) 0.540 (0.35) 0.53 (0.34) 0.54 (0.35) 0.53 (0.34)
dAC 0.814 (0.23) 0.70 (0.23) 0.790 (0.24) 0.79 (0.24) 0.79 (0.24) 0.80 (0.24)
dAD 1.167 (0.45) 1.05 (0.36) 1.182 (0.46) 1.19 (0.46) 1.18 (0.43) 1.18 (0.44)
σ 0.821 (0.18)
σAB 0.36 (0.51) 0.71 (0.36) 0.71 (0.36) 0.74 (0.34) 0.71 (0.34)
σAC 0.90 (0.24) 0.88 (0.22) 0.88 (0.21) 0.88 (0.22) 0.88 (0.21)
σAD 5.80 (12.86) 1.00 (0.54) 0.95 (0.55) 0.93 (0.45) 0.92 (0.48)
σBC 0.41 (1.04) 0.44 (0.34) 0.45 (0.35) 0.45 (0.33) 0.47 (0.34)
σBD 0.45 (1.42) 0.57 (0.57) 0.54 (0.64) 0.53 (0.53) 0.54 (0.56)
σCD 0.52 (0.52) 0.61 (0.59) 0.55 (0.62) 0.55 (0.51) 0.53 (0.51)
δAB new 0.518 (0.88) 0.37 (0.62) 0.54 (0.86) 0.53 (0.87) 0.54 (0.89) 0.53 (0.88)
δAC new 0.805 (0.83) 0.70 (0.97) 0.79 (0.94) 0.81 (0.95) 0.80 (0.95) 0.80 (0.94)
δAD new 1.163 (0.91) 1.02 (19.14) 1.19 (1.24) 1.18 (1.11) 1.21 (1.47) 1.17 (1.12)
δBC new 0.295 (0.88) 0.35 (1.08) 0.25 (0.66) 0.27 (0.66) 0.26 (0.66) 0.27 (0.67)
δBD new 0.658 (0.92) 0.69 (1.12) 0.64 (0.96) 0.64 (0.82) 0.66 (1.25) 0.64 (0.89)
δCD new 0.368 (0.91) 0.35 (1.21) 0.40 (0.97) 0.39 (0.85) 0.41 (1.21) 0.38 (0.86)
forumla 54.65 52.88 54.09 54.10 54.07 54.23
pD 43.64 42.44 43.26 43.10 43.13 43.12
DIC 98.29 95.32 97.35 97.20 97.20 97.35
Parameter of interest Variance model (posterior mean/sd)
HOM ID GC: φ ∼ U(0, π) PC-I: φ ∼ U(0, π/2) PC-II: cos(φ) ∼ U(0, 1) PC-III: cos(φ) ∼ beta(5, 3)
dAB 0.523 (0.38) 0.38 (0.26) 0.540 (0.35) 0.53 (0.34) 0.54 (0.35) 0.53 (0.34)
dAC 0.814 (0.23) 0.70 (0.23) 0.790 (0.24) 0.79 (0.24) 0.79 (0.24) 0.80 (0.24)
dAD 1.167 (0.45) 1.05 (0.36) 1.182 (0.46) 1.19 (0.46) 1.18 (0.43) 1.18 (0.44)
σ 0.821 (0.18)
σAB 0.36 (0.51) 0.71 (0.36) 0.71 (0.36) 0.74 (0.34) 0.71 (0.34)
σAC 0.90 (0.24) 0.88 (0.22) 0.88 (0.21) 0.88 (0.22) 0.88 (0.21)
σAD 5.80 (12.86) 1.00 (0.54) 0.95 (0.55) 0.93 (0.45) 0.92 (0.48)
σBC 0.41 (1.04) 0.44 (0.34) 0.45 (0.35) 0.45 (0.33) 0.47 (0.34)
σBD 0.45 (1.42) 0.57 (0.57) 0.54 (0.64) 0.53 (0.53) 0.54 (0.56)
σCD 0.52 (0.52) 0.61 (0.59) 0.55 (0.62) 0.55 (0.51) 0.53 (0.51)
δAB new 0.518 (0.88) 0.37 (0.62) 0.54 (0.86) 0.53 (0.87) 0.54 (0.89) 0.53 (0.88)
δAC new 0.805 (0.83) 0.70 (0.97) 0.79 (0.94) 0.81 (0.95) 0.80 (0.95) 0.80 (0.94)
δAD new 1.163 (0.91) 1.02 (19.14) 1.19 (1.24) 1.18 (1.11) 1.21 (1.47) 1.17 (1.12)
δBC new 0.295 (0.88) 0.35 (1.08) 0.25 (0.66) 0.27 (0.66) 0.26 (0.66) 0.27 (0.67)
δBD new 0.658 (0.92) 0.69 (1.12) 0.64 (0.96) 0.64 (0.82) 0.66 (1.25) 0.64 (0.89)
δCD new 0.368 (0.91) 0.35 (1.21) 0.40 (0.97) 0.39 (0.85) 0.41 (1.21) 0.38 (0.86)
forumla 54.65 52.88 54.09 54.10 54.07 54.23
pD 43.64 42.44 43.26 43.10 43.13 43.12
DIC 98.29 95.32 97.35 97.20 97.20 97.35

Table 1.

(Hassulblad data) Comparison of posterior mean under 4 variance models. The multi-arm trial effects on 2 three-arm trials have been ignored. Treatment effects are reported on the log odds scale. Results are based on 30 000 MCMC samples after 20 000 iterations burn-in. Some unrealistically high values under the ID model are underlined

Parameter of interest Variance model (posterior mean/sd)
HOM ID GC: φ ∼ U(0, π) PC-I: φ ∼ U(0, π/2) PC-II: cos(φ) ∼ U(0, 1) PC-III: cos(φ) ∼ beta(5, 3)
dAB 0.523 (0.38) 0.38 (0.26) 0.540 (0.35) 0.53 (0.34) 0.54 (0.35) 0.53 (0.34)
dAC 0.814 (0.23) 0.70 (0.23) 0.790 (0.24) 0.79 (0.24) 0.79 (0.24) 0.80 (0.24)
dAD 1.167 (0.45) 1.05 (0.36) 1.182 (0.46) 1.19 (0.46) 1.18 (0.43) 1.18 (0.44)
σ 0.821 (0.18)
σAB 0.36 (0.51) 0.71 (0.36) 0.71 (0.36) 0.74 (0.34) 0.71 (0.34)
σAC 0.90 (0.24) 0.88 (0.22) 0.88 (0.21) 0.88 (0.22) 0.88 (0.21)
σAD 5.80 (12.86) 1.00 (0.54) 0.95 (0.55) 0.93 (0.45) 0.92 (0.48)
σBC 0.41 (1.04) 0.44 (0.34) 0.45 (0.35) 0.45 (0.33) 0.47 (0.34)
σBD 0.45 (1.42) 0.57 (0.57) 0.54 (0.64) 0.53 (0.53) 0.54 (0.56)
σCD 0.52 (0.52) 0.61 (0.59) 0.55 (0.62) 0.55 (0.51) 0.53 (0.51)
δAB new 0.518 (0.88) 0.37 (0.62) 0.54 (0.86) 0.53 (0.87) 0.54 (0.89) 0.53 (0.88)
δAC new 0.805 (0.83) 0.70 (0.97) 0.79 (0.94) 0.81 (0.95) 0.80 (0.95) 0.80 (0.94)
δAD new 1.163 (0.91) 1.02 (19.14) 1.19 (1.24) 1.18 (1.11) 1.21 (1.47) 1.17 (1.12)
δBC new 0.295 (0.88) 0.35 (1.08) 0.25 (0.66) 0.27 (0.66) 0.26 (0.66) 0.27 (0.67)
δBD new 0.658 (0.92) 0.69 (1.12) 0.64 (0.96) 0.64 (0.82) 0.66 (1.25) 0.64 (0.89)
δCD new 0.368 (0.91) 0.35 (1.21) 0.40 (0.97) 0.39 (0.85) 0.41 (1.21) 0.38 (0.86)
forumla 54.65 52.88 54.09 54.10 54.07 54.23
pD 43.64 42.44 43.26 43.10 43.13 43.12
DIC 98.29 95.32 97.35 97.20 97.20 97.35
Parameter of interest Variance model (posterior mean/sd)
HOM ID GC: φ ∼ U(0, π) PC-I: φ ∼ U(0, π/2) PC-II: cos(φ) ∼ U(0, 1) PC-III: cos(φ) ∼ beta(5, 3)
dAB 0.523 (0.38) 0.38 (0.26) 0.540 (0.35) 0.53 (0.34) 0.54 (0.35) 0.53 (0.34)
dAC 0.814 (0.23) 0.70 (0.23) 0.790 (0.24) 0.79 (0.24) 0.79 (0.24) 0.80 (0.24)
dAD 1.167 (0.45) 1.05 (0.36) 1.182 (0.46) 1.19 (0.46) 1.18 (0.43) 1.18 (0.44)
σ 0.821 (0.18)
σAB 0.36 (0.51) 0.71 (0.36) 0.71 (0.36) 0.74 (0.34) 0.71 (0.34)
σAC 0.90 (0.24) 0.88 (0.22) 0.88 (0.21) 0.88 (0.22) 0.88 (0.21)
σAD 5.80 (12.86) 1.00 (0.54) 0.95 (0.55) 0.93 (0.45) 0.92 (0.48)
σBC 0.41 (1.04) 0.44 (0.34) 0.45 (0.35) 0.45 (0.33) 0.47 (0.34)
σBD 0.45 (1.42) 0.57 (0.57) 0.54 (0.64) 0.53 (0.53) 0.54 (0.56)
σCD 0.52 (0.52) 0.61 (0.59) 0.55 (0.62) 0.55 (0.51) 0.53 (0.51)
δAB new 0.518 (0.88) 0.37 (0.62) 0.54 (0.86) 0.53 (0.87) 0.54 (0.89) 0.53 (0.88)
δAC new 0.805 (0.83) 0.70 (0.97) 0.79 (0.94) 0.81 (0.95) 0.80 (0.95) 0.80 (0.94)
δAD new 1.163 (0.91) 1.02 (19.14) 1.19 (1.24) 1.18 (1.11) 1.21 (1.47) 1.17 (1.12)
δBC new 0.295 (0.88) 0.35 (1.08) 0.25 (0.66) 0.27 (0.66) 0.26 (0.66) 0.27 (0.67)
δBD new 0.658 (0.92) 0.69 (1.12) 0.64 (0.96) 0.64 (0.82) 0.66 (1.25) 0.64 (0.89)
δCD new 0.368 (0.91) 0.35 (1.21) 0.40 (0.97) 0.39 (0.85) 0.41 (1.21) 0.38 (0.86)
forumla 54.65 52.88 54.09 54.10 54.07 54.23
pD 43.64 42.44 43.26 43.10 43.13 43.12
DIC 98.29 95.32 97.35 97.20 97.20 97.35

A useful reference measure for assessing adequacy and guiding selection of Bayesian hierarchical models is the deviance information criterion (DIC) proposed by Spiegelhalter and others (2002): DIC = forumla + pD, where forumla is the posterior mean of the Bayesian deviance measuring model fitness and pD is the "effective dimension" of the model serving as a penalty for the increased model complexity. The smallest forumla value occurs in the ID model simply because it is the least restricted among the 4 models and has the lowest DIC. However, in a strict sense of evidence consistency, it does not qualify as a candidate model because it does not align with the theoretical assumption about evidence consistency. The PC-I and PC-II models have similar second smallest DIC values, but the former gives better predictive distributions with smaller standard deviations. Thus, we would suggest the use of the PC-I model. We also performed a simulation study to examine performance of the PC-I model by running WinBUGS in R with the package R2WinBUGS (Gelman, 2007). See section 3 of the supplementary material available at Biostatistics online (http://biostatistics.oxfordjournals.org).

Note that, in the smoking cessation data, the heterogeneity between treatment contrasts is not very serious; hence, the HOM model might be acceptable because of its simplicity and the similarity in results to the PC models. However, although the data does not counter-indicate the homogeneity model, it is not plausible a priori, as one might expect the response to individual counseling (C) and group counseling (D) more "similar" to each other than to self-help (B).

4. INCORPORATING PRIOR CORRELATION INTO A PC MODEL

4.1. Elicitation of prior information about correlation

Researchers may have some prior information about between-treatment interaction or hold a priori certain beliefs about relations between treatment effects. Any sort of similarity in treatment protocols, biochemical or procedural, may generate higher or lower correlations between treatment effects. However, incorporating prior correlation information is difficult because the correlation coefficients between different treatment pairs are implicitly correlated according to a very complex joint distribution ( Barnard and others, 2000).

The ancillary variable method in section 3.2 with the Cholesky parameterization (3.5) can provide a way of introducing prior correlation information into a MTC model. In principle, elicitation variables are physically meaningful quantities and can be measured by procedures in which experts quantify their uncertainty on the basis of their expertise (Bedford and Cooke, 2001; O'Hagan and others, 2006). In MTC models, the parameters on which experts might be expected to have beliefs are ρ b k , the correlation coefficients between treatment arms, rather than the angle parameters φ k j . Now suppose that, under the fundamental belief that all ρs should be positive, the prior information about ρs, including expert knowledge, can be quantified in the form of their marginal distributions, say means and standard deviations or medians and 95% CIs. An appealing choice of prior for each ρ b k is the beta distribution with unknown parameters α b k , β b k because "it can portray a good variety of opinions reasonably accurately" ( O'Hagan and others, 2006, p. 125). We then apply the equivalent prior sample (EPS) information procedure (Winkler, 1967) in an inverse form to elicit experts' opinions and use the Cholesky parameterization to generate the joint prior distribution for ( ρ b k , b < k), which has the sample space S + of positive-definite matrices.

This procedure can be briefly described as follows:

  • (i) Based on the experts' knowledge, choose the suitable data set:

    graphic

    (4.1)

    where α b k are hypothetical samples from Binomial-type distribution with "sample size" α b k + β b k and success probabilities ρ b k , with the density proportional to

    graphic

    Note that ( α b k , β b k ) are allowed to be any positive real numbers.

  • (ii) Use the functional relation given by (3.5) and (3.6), that is,

    graphic

    (4.2)

    where ρ = ( ρ b k , b  < k ) denotes the K ( K 1 ) / 2 dimensional vector of correlation coefficients and φ = ( φ k j , k = 2 , , K , 2 j k ) the corresponding vector of angles in the Cholesky decomposition.

  • (iii) Specify a noninformative prior π ( φ ) .

The above procedure allows us to incorporate prior information D EPS on ρ b k , even through ρ b k is a functional parameter. This leads to the "posterior" joint distribution for the random vector ρ on S +, given the hypothetical data D EPS , with density

graphic

(4.3)

where g ( φ ) π ( φ D EPS ) b  < k f b k ( φ ) α b k ( 1 f b k ( φ ) ) β b k π ( φ ) .

An example for illustration of EPS method is given in section 4 of the supplementary material available at Biostatistics online (http://biostatistics.oxfordjournals.org). O'Hagan and others (2006) noted that methods for eliciting parameters of multivariate distributions are complex and not wholly satisfying. As our scheme guarantees a positive-definite correlation matrix, it is possible to use simpler methods for eliciting priors for scalar quantities. Respondents could be asked to give a "best" estimate of the correlation (between 0 and 1) and credible limits. O'Hagan and others (2006, chapter 6) describe a number of methods. The results can then be readily approximated as a beta distribution.

4.2. Example 2: Gastroesophageal reflux disease data

This data set consists of 41 RCTs on gastroesophageal reflux disease (GERD) with a total of 4637 participants, making mixed comparisons between 6 common therapies: (1) placebo, (2) prokinetic agents, (3) H 2 receptor antagonists ( H 2 RA), (4) H 2 RA double dose, (5) proton pump inhibitors (PPI), and (6) PPI double dose. Outcomes of these trials were observed at one or more follow-up times, varying with trials, of 4, 6, 8, or 12 weeks after commencement of treatment. For a detailed description of the data and graphical representation of the comparison networks refer to Goeree and others (1999) and Lu and others (2007).

To borrow strength both across MTC networks and across time points, Lu and others (2007) applied a set of piecewise exponential models on the log hazards with random effects of both treatments and time periods and briefly allowed an exchangeable half-t model to deal with the between-trial heterogeneity. Their model 8 was

graphic

(4.4)

where p j k u u is the healing probability for a patent in trial j, under treatment k, during the successive time segments in [ u , u ]; θ j k u the corresponding log hazard in time period u; Trt j is the set of treatments involved in trial j; and Tim j the set of observed time periods for trial j. Comments on the weakly informative specification for the hyperparameters can be referred to Warn and others (2002), Spiegalhalter and others (2004), and Gelman (2006).

An apparent feature of the treatment regime in GERD data is that both therapies H2RA and PPI were used in single and double doses, which are regarded as different treatments. It seems likely that the effects of 2 treatments based on the same therapy but with different doses are more likely to have higher correlation than that based on different therapies. Furthermore, the placebo effects may have low PC with other treatments. For other correlations, we assume symmetry about 0.5. Based on these brief considerations, one possible choice for the prior correlations may be summarized in a matrix form, denoted by R 0:

graphic

The equivalent data used in the EPS method are just the parameters in these beta distributions. We then embed the "likelihood" ρ b k α b k ( 1 ρ b k ) β b k into the model and call on the Cholesky parameterization (3.5) and the ancillary representation (3.3). The prior distributions for τ k 2 may be specified as the IG distribution IG(α, β), where α, β are given a weakly informative prior distribution Exp(0.01).

For the GERD data, incorporating prior information R 0 into the PC model leads to a close model fit (with DIC = 257.4 compared with DIC = 256.3 for PC model without R 0) and similar posterior estimates for the population treatment contrasts but slightly narrower CIs. The impact of incorporating the prior correlation matrix R 0 is seen in the Figure 3 of posterior estimates of variances and their 95% CIs under 3 respective variance assumptions: (i) with the exchangeable haft-t prior distribution for σ b k (named EX model); (ii) with a PC structure, that is, the PC model with weakly informative priors U(0, π/2) for the angle parameters; (iii) the PC model incorporating the prior correlation information summarized in R 0. Without the constraints of triangle inequalities, the EX model leads to much wider CIs for most variances than the other 2 models and also an outlier at the (1, 3) comparison. The narrowest CIs for all σ b k occur in the third model, reflecting the effects of incorporating the prior information. Thus, incorporating variance structure stabilizes the posterior estimates of variance, particularly where there is little direct information in the likelihood, while adding expert knowledge about treatment similarity can further reduce prediction uncertainty in a coherent way.

Fig. 3.

Posterior medians of {σbk} with 95% CIs: (•) under half-t model, (□) under PC model prior φ∼ U(0, π/2), and (♦) incorporating prior information R0. The figures below case indices are the numbers of studies that make corresponding comparisons.

Posterior medians of { σ b k } with 95% CIs: (•) under half-t model, (□) under PC model prior φ U(0, π/2), and (♦) incorporating prior information R 0. The figures below case indices are the numbers of studies that make corresponding comparisons.

5. SUMMARY

The variance structure inherent in MTC rests on the fundamental requirement that evidence from a variety of sources should be consistent, which logically leads to the consistency equations (2.2) and the triangle inequalities (3.1). We call the requirement of evidence consistency "fundamental" because, without it, we cannot perform a coherent synthesis of information on all the treatment contrasts. The crucial technical difficulty arises when we attempt to embed the triangle inequalities into MTC modeling. We circumvented this by adopting the ancillary representation (3.3), taking the separation strategy (3.4), and employing the spherical coordinates for Cholesky's decomposition (3.5). We thus have the GC- and PC models that provide a way to specify weakly informative prior distributions for multiple heterogeneity parameters without violation of the triangle inequalities. When there is some prior information about the correlations between treatment arms, the EPS method allow the use of equivalent data for shipping the prior information into the analytic procedure. Finally, all the computations can be quickly done by MCMC simulations using WinBUGS or R.

FUNDING

Medical Research Council funding to the MRC Health Services Research Collaboration.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

This research was supported by Medical Research Council funding to the MRC Health Services Research Collaboration. The authors are thankful to John Copas, Simon Thomson, and other colleagues for their constructive comments when the early version of the paper was represented in RSS Methods in Meta-analysis Meeting held on 4 June 2007, in London, and to the editor Peter Diggle and anonymous referees for their insightful comments and suggestions. Conflict of Interest: None declared.

References

,  ,  .

The interpretation of random effects meta-analysis in decision models

,

Medical Decision Making

,

2005

, vol.

25

 (pg.

646

-

654

)

,  ,  .

Generation of random orthogonal matrices

,

SIAM Journal on Scientific and Statistical Computing

,

1987

, vol.

8

 (pg.

625

-

629

)

,  ,  .

Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage

,

Statistica Sinica

,

2000

, vol.

10

 (pg.

1281

-

1311

)

,  .,

Probability Risk Analysis: Foundations and Methods

,

2001

Cambridge

Cambridge University Press

,  ,  .

Simultaneous comparison of multiple treatments: combining direct and indirect evidence

,

British Medical Journal

,

2005

, vol.

331

 (pg.

897

-

900

)

,  .

Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models

,

Journal of the American Statistical Association

,

1999

, vol.

94

 (pg.

1254

-

1263

)

.

Covariance selection

,

Biometrics

,

1972

, vol.

28

 (pg.

157

-

175

)

,  .

Best equivariant estimators of a Cholesky decomposition

,

Annals of Statistics

,

1987

, vol.

15

 (pg.

1639

-

1665

)

,  ,  ,  ,  ,  ,  ,  ,  ,  .

and others

,

Smoking Cessation. Clinical Practice Guideline No. 18

,

1996

Rockville, MD

Agency for Health Care Policy and Research, U.S. Department of Health and Human Services

AHCPR Publication No. 96–0692

.

Prior distributions for variance parameters in hierarchical Models

,

Bayesian Analysis

,

2006

, vol.

1

 (pg.

515

-

533

)

.,

Running WinBugs and Open Bugs from R

,

2007

 

,  ,  ,  ,  ,  .

Economic evaluation of long term management strategies for erosive oesophagitis

,

Pharmacoeconomics

,

1999

, vol.

16

 (pg.

679

-

697

)

.

Meta-analysis of multi-treatment studies

,

Medical Decision Making

,

1998

, vol.

18

 (pg.

37

-

43

)

,  .

Borrowing strength from external trials in a meta-analysis

,

Statistics in Medicine

,

1996

, vol.

15

 (pg.

2733

-

2749

)

,  .

Bayesian inference for a covariance matrix

,

Annals of Statistics

,

1992

, vol.

20

 (pg.

1669

-

1696

)

.

Bartlett's decomposition of the posterior distribution of the covariance for normal monotone ignorable missing data

,

Journal of Multivariate Analysis

,

1993

, vol.

46

 (pg.

497

-

512

)

,  .

Combination of direct and indirect evidence in mixed treatment comparisons

,

Statistics in Medicine

,

2004

, vol.

23

 (pg.

3105

-

3124

)

,  .

Assessing evidence inconsistency in mixed treatment comparisons

,

Journal of the American Statistical Association

,

2006

, vol.

101

 (pg.

447

-

459

)

,  ,  ,  ,  ,  .

Meta-analysis of mixed treatment comparisons at multiple follow-up times

,

Statistics in Medicine

,

2007

, vol.

26

 (pg.

3681

-

3699

)

.

Network meta-analysis for indirect treatment comparisons

,

Statistics in Medicine

,

2002

, vol.

21

 (pg.

2313

-

2324

)

,  ,  ,  ,  ,  ,  ,  .,

Uncertain Judgments: Eliciting Experts' Probabilities

,

2006

Chichester, UK

Wiley

,  .

Unconstrained parameterisations for variance-covariance matrices

,

Statistics and Computing

,

1996

, vol.

6

 (pg.

289

-

296

)

,  ,  .

Bayesian approaches to random-effects meta-analysis: a comparative study

,

Statistics in Medicine

,

1995

, vol.

14

 (pg.

2685

-

2699

)

,  ,  .,

Bayesian Approaches to Clinical Trials and Health-Care Evaluation

,

2004

Chichester, Chichester, UK

Wiley

,  ,  ,  .

Bayesian measures of model complexity and fit

,

Journal of the Royal Statistical Society B

,

2002

, vol.

64

 (pg.

1

-

34

)

,  ,  ,  .

WinBUGS User Manual

,

Version 1.4

,

2001

Cambridge

MRC Biostatistics Unit

,  .

Estimation of the multivariate normal precision and covariance matrices in a star-shape model

,

Annals of the Institute of Statistical Mathematics

,

2005

, vol.

57

 (pg.

455

-

484

)

,  .

Recent developments in meta-analysis

,

Statistics in Medicine

,

2008

, vol.

27

 (pg.

625

-

650

)

,  ,  .

Investigating underlying risk as a source of heterogeneity in meta-analysis

,

Statistics in Medicine

,

1997

, vol.

16

 (pg.

2741

-

2758

)

,  ,  .

Bayesian random effects meta-analysis of trials with binary outcomes: method for absolute risk difference and relative risk scales

,

Statistics in Medicine

,

2002

, vol.

21

 (pg.

1601

-

1623

)

.

The assessment of prior distributions in Bayesian analysis

,

Journal of the American Statistical Association

,

1967

, vol.

67

 (pg.

776

-

800

)

,  .

Estimation of a covariance matrix using the reference prior

,

Annals of Statistics

,

1994

, vol.

22

 (pg.

1195

-

1211

)