NONMEM Users Guide Part VII - Conditional Estimation Methods - Chapter II
II. Methods
II.A. Estimation Methods
II.A.1. The Laplacian Method
II.A.2. The FOCE Method
II.A.3. The FO Method
II.A.4. The Hybrid Method
II..mc. A.5. The Centering Methods
II..mc. A.6. The Centering FOCE Method with the First-Order Model
II..mc. B. Mixture Models

NONMEM Users Guide Part VII - Conditional Estimation Methods - Chapter II

II. Methods

II.A. Estimation Methods

II.A.1. The Laplacian Method

Let Image grohtml-30778-5.png be Image grohtml-30778-6.png , and let Image grohtml-30778-7.png and Image grohtml-30778-8.png be the gradient (column) vector and hessian matrix, respectively, of Image grohtml-30778-9.png evaluated at Image grohtml-30778-10.png . An approximation to Image grohtml-30778-11.png is given by

Image grohtml-307785.png

where Image grohtml-30778-13.png is some estimate of Image grohtml-30778-14.png , and Image grohtml-30778-15.png , Image grohtml-30778-16.png , and Image grohtml-30778-17.png are Image grohtml-30778-18.png , Image grohtml-30778-19.png , and Image grohtml-30778-20.png all evaluated at Image grohtml-30778-21.png . This results from applying a general approximation approach to integrals, attributable to the French mathematician Laplace, and described by De Bruijn (1961). With Image grohtml-30778-22.png equal to the conditional estimate obtained by maximizing the posterior density of Image grohtml-30778-23.png (in an unconstrained manner) - call this the unconstrained conditional estimate this particular approximation has been used by others (Lindley, (1980); Mosteller and Wallace (1964)), although not with a function Image grohtml-30778-24.png that is as complicated as that which often arises in population pharmacokinetic and pharmacodynamic analyses. See also: Tierny and Kadane (1986). In this particular case, the last term of the approximation is 0. In general, the approximation can produce reasonable results as long the posterior distribution of Image grohtml-30778-25.png is dominated by a single mode. On occasion, a randomly dispersed parameter seems to have a multimodal distribution. See the discussion in section B concerning mixture models for a way to address this issue.

Each of the estimation methods uses a different variant of this approximation. However, with whatever variant is used, when in particular, the Image grohtml-30778-26.png are taken to be conditional estimates of the Image grohtml-30778-27.png at Image grohtml-30778-28.png and Image grohtml-30778-29.png , the general method described in chapter I becomes what we call a conditional estimation method. When the approximation is used just as it is stated above, and when the Image grohtml-30778-30.png are taken to be the unconstrained conditional estimates, the method is called the Laplacian estimation method to honor the individual whose approximation plays such an essential role. However, the method itself involves an idea which is peculiar to NONMEM implementation. Namely, the approximation to L (the likelihood function of Image grohtml-30778-31.png and Image grohtml-30778-32.png ), resulting from using the Laplacian approximation, is maximized.

When mean-variance models are used, the assumption can be made that each intraindividual variance-covariance matrix Image grohtml-30778-33.png is actually given by Image grohtml-30778-34.png , the matrix for the mean individual. With this particular assumption, there is said to be no -interaction see chapter I. The Image grohtml-30778-36.png are computed differently, depending on whether an Image grohtml-30778-37.png -interaction is assumed, as are the posterior modes. With mean-variance models, by default, NONMEM implements the Laplacian method assuming that there is no Image grohtml-30778-38.png -interaction. With the currently distributed NONMEM code it is possible to apply the Laplacian method when there is an Image grohtml-30778-39.png -interaction, but this code and its usage are not supported by the NONMEM Project Group.

II.A.2. The FOCE Method

The matrix Image grohtml-30778-40.png can be approximated by another matrix. Suppose given Image grohtml-30778-41.png , Image grohtml-30778-42.png is comprised of statistically independent subvectors Image grohtml-30778-43.png , Image grohtml-30778-44.png , etc., so that Image grohtml-30778-45.png can be written as a sum over terms Image grohtml-30778-46.png , Image grohtml-30778-47.png , etc. Then each of Image grohtml-30778-48.png and Image grohtml-30778-49.png can be written as a sum over terms Image grohtml-30778-50.png , Image grohtml-30778-51.png , etc. and Image grohtml-30778-52.png , Image grohtml-30778-53.png , etc., respectively. An approximation Image grohtml-30778-54.png to Image grohtml-30778-55.png is obtained by replacing each Image grohtml-30778-56.png in the sum for Image grohtml-30778-57.png by Image grohtml-30778-58.png . This is a type of first-order approximation; terms involving second derivatives have been dropped. It is called the first-order approximation

With this approximation, and when all the Image grohtml-30778-59.png are taken to be equal to the unconstrained conditional estimates of the Image grohtml-30778-60.png , the method is called the first-order conditional estimation (FOCE) method

Actually, NONMEM allows the implementation of several versions of this method.

•		When a mean-variance intraindividual model is used, by default, is replaced by , where E represents the expectation over under the intraindividual model. With the currently distributed NONMEM code it is possible to use the FOCE method without doing this, but this code and its usage are not supported by the NONMEM Project Group.
•		The first-order conditional estimation method without interaction is the FOCE method applied with intraindividual mean-variance models and assuming no -interaction. When the intraindividual variance is assumed to be homoscedastic, and moreover, to be the same across individuals, then there is no -interaction, and in this case it may be shown that the FOCE method (without interaction) often produces results similar to those obtained with a method described by Lindstrom and Bates (1990). The first-order conditional estimation method with interaction is the FOCE method applied with intraindividual mean-variance models, but without the no interaction assumption. FOCE with and without interaction are both supported. With the currently distributed NONMEM code it is possible to apply the FOCE method with intraindividual models that are not mean-variance models, but this code and its usage are not supported by the NONMEM Project Group.

II.A.3. The FO Method

When the first-order approximation is used (with Image grohtml-30778-66.png replaced by Image grohtml-30778-67.png ), but when all Image grohtml-30778-68.png are taken to be 0 (the population mean value of Image grohtml-30778-69.png ), the method is called the first-order (FO) estimation method

With the first-order method, the terms Image grohtml-30778-70.png and Image grohtml-30778-71.png in the Laplacian approximation are 0. Note that since conditional estimates are not used, the first-order method is not a conditional estimation method.

It can be shown that when intraindividual mean-variance models are used, the method is equivalent to the first-order method as described, for example, in NONMEM Users Guide - Part I (also see e.g., Beal and Sheiner (1985)). Such an earlier description is also given below in section A.6. These earlier descriptions of the method apply only to mean-variance models. With the currently distributed NONMEM code it is possible to apply the FO method as defined above with intraindividual models that are not mean- variance models, but this usage is not recommended, and the code is not supported by the NONMEM Project Group.

II.A.4. The Hybrid Method

Suppose certain (but not all) elements of Image grohtml-30778-72.png are chosen to be in a set Image grohtml-30778-73.png , that the elements of Image grohtml-30778-74.png corresponding to the elements of Image grohtml-30778-75.png are taken to be 0, and that the remaining elements of Image grohtml-30778-76.png are taken to be those given by the Bayes posterior mode of Image grohtml-30778-77.png under the restriction that all elements of in are 0. The conditional estimate thus defined is an example of a constrained conditional estimate. Suppose also that the first-order approximation is made. Then the method is a hybrid between the first-order method and the FOCE method. Accordingly, this conditional estimation method is called the hybrid method Note that with the definition of the Image grohtml-30778-80.png used with this method, in contrast with the definition used with the FOCE and Laplacian methods, the last term in the Laplacian approximation is not 0.

A hybrid method can be considered that uses a weaker version of the first-order approximation. Consider using the first-order approximation, but only for the submatrix of Image grohtml-30778-81.png consisting of just those partial second derivatives such that the two variables with respect to which the differentiation occurs are in Image grohtml-30778-82.png . This method is not supported with the currently distributed NONMEM code.

When the intraindividual models are statistical linear models (linear in the parameters Image grohtml-30778-83.png ), the first-order, first-order conditional, hybrid, and Laplacian methods are all the same method, the classical maximum likelihood method.

II..mc. A.5. The Centering Methods

The Image grohtml-30778-84.png are assumed to be distributed in the population with mean 0. When the population model fits the data well, this will be reflected by the average, Image grohtml-30778-85.png , of the conditional estimates of the Image grohtml-30778-86.png across the sampled individuals (at the values of the population parameters given by the model) being close to 0. (The converse does not necessarily hold.) When Image grohtml-30778-87.png is close to 0, the fit will be called centered There is nothing about the methods defined above that insures that the fit will be centered. There are infrequently arising situations where the average is "far" from 0, where the model does not fit well (as judged e.g. by the differences Image grohtml-30778-88.png with mean-variance intraindividual models) and where a method that is designed to better center the fit might be tried (do see chapter III for some guidance). With a centering estimation method the Image grohtml-30778-89.png are taken to be the unconstrained conditional estimates, and the approximation to Image grohtml-30778-90.png is given by

Image grohtml-307786.png

With NONMEM, there are centering FOCE and Laplacian estimation methods (with no Image grohtml-30778-92.png -interaction). A centering hybrid method is not implemented in NONMEM.

II..mc. A.6. The Centering FOCE Method with the First-Order Model

The first-order model is the population model which results when for all i, the ith given intraindividual model is a mean-variance model with mean Image grohtml-30778-93.png and variance-covariance matrix Image grohtml-30778-94.png , and this model is replaced by another such model with mean

Image grohtml-307787.png

and variance-covariance matrix Image grohtml-30778-96.png .

The linearity of the Image grohtml-30778-97.png under this model implies that the population expectation of Image grohtml-30778-98.png is Image grohtml-30778-99.png , the prediction obtained by taking Image grohtml-30778-100.png to be 0, its population mean. With mean-variance models, the FO estimation method is sometimes described as the application of the maximum likelihood method to the first-order model that results from the given model, and when using this method, it is usual to judge goodness of fit by the differences Image grohtml-30778-101.png . When a conditional estimation method is used instead of the FO method, a centered fit may result, confirming that the population mean of the Image grohtml-30778-102.png is 0. However, the given intraindividual models are used, and they may be nonlinear in the Image grohtml-30778-103.png . Therefore, conceivably, Image grohtml-30778-104.png may be a poor approximation to the population expectation of Image grohtml-30778-105.png , and for this reason alone, an apparent bias in the fit may result. Experience suggests, though, that this should not be a major concern (perhaps because the nonlinear effect is small relative to the size of intraindividual variability in the residuals). If one is concerned, there are a couple of strategies one might use.

First, the NONMEM program allows the expectation of the Image grohtml-30778-106.png to be estimated by means of a couple different types of actual integration (and not just when the intraindividual models are of mean-variance kind); see NONMEM Users Guide - Part VIII. Second, when the intraindividual models are mean-variance models, NONMEM allows the first-order model to be obtained automatically from the given model and used with the centering FOCE method. (If the first-order model is used with the noncentering FOCE method, the result is the same as that obtained with the FO method.) When a conditional estimation method is needed (see chapter III), application of the centering FOCE method to the first-order model that results from the given model may yield adequate results, and of course, the expectation of Image grohtml-30778-107.png under the first-order model is simply given by Image grohtml-30778-108.png . Moreover, due to the linearity of the intraindividual models (of the first-order model) in the Image grohtml-30778-109.png , the computational requirement is substantially less than that incurred with application of the (centering or noncentering) FOCE method to the given model. The savings in CPU time is achieved at the expense of possibly using too simple a model (and, of course is still not as great a savings as is achieved with the FO method).

The first-order model may be used with the centering FOCE method, but not with the centering Laplacian method (because due to the linearity, the result would be the same as that obtained with the centering FOCE method). Be aware that when this model is used with the centering FOCE method, the conditional estimates produced by the method are based on the first-order intraindividual models (unlike whenever the noncentering FOCE method is used, where the conditional estimates are based on the given intraindividual models). It is possible nonetheless to obtain posthoc estimates based on the given intraindividual models, at the population estimates obtained from using the centering FOCE method with the first-order model. A centering hybrid method is not implemented in NONMEM.

II..mc. B. Mixture Models

On occasion, a model may need to incorporate a randomly dispersed parameter that has a possibly multimodal distribution. In this case a mixture model may be useful. This is a model where for each i, there are several possible intraindividual models, Image grohtml-30778-110.png , Image grohtml-30778-111.png , ..., Image grohtml-30778-112.png for Image grohtml-30778-113.png , and it is assumed that the particular model that actually describes Image grohtml-30778-114.png is one of these, but it is not known which one. It is assumed that the probability that it is Image grohtml-30778-115.png is Image grohtml-30778-116.png , where Image grohtml-30778-117.png . Loosely put, the ith individual is chosen randomly from a population divided into Image grohtml-30778-118.png subpopulations, their relative sizes either being known or unknown. The subpopulation of which the individual is a given member is not observable, but for each subpopulation, a model for data from an individual from the subpopulation is available. The mixing probabilities Image grohtml-30778-119.png correspond to the sizes of the subpopulations and are usually treated as parameters whose values are unknown and are estimated. With NONMEM, these probabilities can be modeled, i.e. related to covariables, and therefore, can vary between individuals. The parameters of these relationships can be estimated; they are included in Image grohtml-30778-120.png . To indicate this generality, the Image grohtml-30778-121.png may be written Image grohtml-30778-122.png (the kth mixing probability for the ith individual).

Suppose, for example, that a clearance parameter of a pharmacokinetic model may be bimodally distributed in the population. Here is how this may be expressed with a population model. One may consider a mixture model with two intraindividual models for each individual: for the ith individual, one where the individual’s clearance is given by

Image grohtml-307788.png

and another where it is given by

Image grohtml-307789.png

(The parameters Image grohtml-30778-125.png and Image grohtml-30778-126.png are the first two elements of Image grohtml-30778-127.png .) For each i, the value Image grohtml-30778-128.png arises randomly (see chapter I). For each i, a choice between the two intraindividual models is also viewed as one being made in a random fashion, according to probabilities Image grohtml-30778-129.png and Image grohtml-30778-130.png ( Image grohtml-30778-131.png ). As a result of this choice, a value Image grohtml-30778-132.png , which is either Image grohtml-30778-133.png or Image grohtml-30778-134.png , is also "chosen". (Consequently, if after , say, is chosen, the value of Image grohtml-30778-136.png does not influence the data.) From the point of view of not knowing what choices between intraindividual models were actually made, the distribution of the Image grohtml-30778-137.png across individuals is a mixture of two normal distributions, and the distribution of the Image grohtml-30778-138.png is a mixture of two lognormal distributions.

The first two elements of the random variable Image grohtml-30778-139.png may have the same or different variances, i.e. Image grohtml-30778-140.png may or may not equal Image grohtml-30778-141.png . If these variances are sufficiently small, while the parameters Image grohtml-30778-142.png and Image grohtml-30778-143.png are sufficiently far apart, and if both probabilities Image grohtml-30778-144.png and Image grohtml-30778-145.png are sufficiently large (however in this regard, the variances, the Image grohtml-30778-146.png ’s, and the probabilities must actually be considered altogether), the distribution of Image grohtml-30778-147.png is bimodal. Often, the data may not allow all of the different variances between mixture components, such as Image grohtml-30778-148.png and Image grohtml-30778-149.png , to be well estimated, in which case the assumption might be made that these variances are the same (a homoscedastic assumption). With NONMEM, this can be done explicitly, or alternatively, the "same Image grohtml-30778-150.png " can be used with both mixture components, e.g. Image grohtml-30778-151.png can be used in (3) and also in (4), instead of Image grohtml-30778-152.png . NONMEM will understand that Image grohtml-30778-153.png is symbolizing two "different Image grohtml-30778-154.png ’s", each having the same variance.†
----------

† With NONMEM Version IV, the same Image grohtml-30778-155.png can also be used, and NONMEM will understand that it is symbolizing two different Image grohtml-30778-156.png ’s with the same variance, provided the first-order estimation method is used.
----------

Other examples of mixture models may be given. See NONMEM Users Guide - Part VI, section III.L.2 for an example where the mixture model describes a mixture of two joint lognormal distributions for clearance and volume, but which is not a bimodal distribution. The differences between the models Image grohtml-30778-157.png need not be differences concerning parameters; they could be differences in model form. They can be any set of differences whatsoever.

The likelihood for Image grohtml-30778-158.png under a mixture model is

Image grohtml-3077810.png

where Image grohtml-30778-160.png is the likelihood function for Image grohtml-30778-161.png under the the kth possible intraindividual model for individual i. With a mixture model, any of the estimation methods described in section A uses the defining approximation for the method with each of the Image grohtml-30778-162.png , Image grohtml-30778-163.png , ..., Image grohtml-30778-164.png .

With a set of values for the population parameters Image grohtml-30778-165.png and Image grohtml-30778-166.png , NONMEM classifies each individual into one of the Image grohtml-30778-167.png subpopulations. The classification gives the most probable subpopulation of which the individual is a member. For each k, the empirical Bayes (marginal) posterior probability that Image grohtml-30778-168.png is described by Image grohtml-30778-169.png , given Image grohtml-30778-170.png , is computed by Image grohtml-30778-171.png . The individual is classified into the kth subpopulation if the kth probability is the largest among these r values.