NONMEM Users Guide Part VII - Conditional Estimation Methods - Chapter I
I. Introduction

NONMEM Users Guide Part VII - Conditional Estimation Methods - Chapter I

I. Introduction

This document gives a brief description of the estimation methods for population type data that can be used with NONMEM Version V. These include, in particular, a few methods that are new with this version, the centered and hybrid methods. The more important changes from the earlier edition published in 1992, but not all changes, are highlighted with the use of vertical bars in the right margin. This document contains no information about how to communicate with the NONMEM program.

To read this document it may be helpful to have some familiarity with the notation used with the representation of statistical models for the NONMEM program. See discussions of models in NONMEM Users Guide - Part I, but if one’s interest is only in using NONMEM with PREDPP, see discussions of models in NONMEM Users Guides - Parts V and VI. Particular notation used in this Guide VII is given next.

The jth observation from the ith individual is denoted Image grohtml-30209-4.png . Each individual may have a different number of observations. Each observation may be measured on a different scale: continuous, categorical, ordered categorical, discrete-ordinal.†
----------

† This document provides a description of estimation methods that can be used with observations of the same or different type. However, essentially, it neither contains any specific information about how to analyze observations of particular types, nor any information about how to communicate with NONMEM in order to do this.
----------

An individual can have multivariate observations, each of different lengths. However, the multivariate nature of an observation is suppressed, as this is not relevant to the descriptions given in this document, and so the separate (scalar-valued) observations comprising the multivariate observations are all separately indexed by j. Each multivariate observation may have a different length. The vector of all the observations from the ith individual is denoted Image grohtml-30209-5.png .

It is assumed that there exists a separate statistical model for each Image grohtml-30209-6.png . This model is called the intraindividual model or the individual model for the ith individual. It is parameterized by Image grohtml-30209-7.png , a (vector-valued) parameter common to all the separate intraindividual models, and Image grohtml-30209-8.png , a (vector-valued) parameter specific to the intraindividual model for Image grohtml-30209-9.png . Under this model, the likelihood of Image grohtml-30209-10.png for the data Image grohtml-30209-11.png (conditional on Image grohtml-30209-12.png ) is denoted by Image grohtml-30209-13.png , the dependence on Image grohtml-30209-14.png being supressed in the notation. This likelihood is called here the conditional likelihood of

When all the elements of Image grohtml-30209-16.png are measured on a continuous scale, an often-used intraindividual model is given by the multivariate normal model with mean Image grohtml-30209-17.png and variance-covariance matrix Image grohtml-30209-18.png (usually, Image grohtml-30209-19.png is comprised of parameters Image grohtml-30209-20.png which are the only ones affecting Image grohtml-30209-21.png , and other parameters which, along with Image grohtml-30209-22.png , affect Image grohtml-30209-23.png ).††
----------

†† Here and elsewhere in this section an explicit assumption concerning the normal probablility distribution is made. This is done primarily to keep the discussion simple. To various degrees in different situations the normality assumption does not play as important a role as our formally making the assumption might indicate.
----------

This type of model shall be referred to as the mean-variance model It is usually expressed in terms of a multivariate normal vector Image grohtml-30209-24.png with mean 0 and variance-covariance matrix Image grohtml-30209-25.png . In the notation used here, the parameter Image grohtml-30209-26.png includes Image grohtml-30209-27.png (ignoring the matrix structure of Image grohtml-30209-28.png ). For example,

Image grohtml-302094.png

where Image grohtml-30209-30.png is an instance of a univariate normal variable Image grohtml-30209-31.png with variance Image grohtml-30209-32.png . (When Image grohtml-30209-33.png is multivariate, the observation Image grohtml-30209-34.png is modeled in terms of a single instance of this multivariate random vector. A few other observations as well may be modeled in terms of this same instance, and thus under the model, all such observations are correlated and comprise a multivariate observation.) In this example, Image grohtml-30209-35.png is Image grohtml-30209-36.png (the mean of Image grohtml-30209-37.png ), and Image grohtml-30209-38.png is Image grohtml-30209-39.png (the variance of Image grohtml-30209-40.png ). Since the ratio of the standard deviation of Image grohtml-30209-41.png to the mean of Image grohtml-30209-42.png is the constant Image grohtml-30209-43.png , this particular model is called the constant coefficient of variation model.

The dependence of Image grohtml-30209-44.png on Image grohtml-30209-45.png is often a consequence of the intraindividual variance depending on the mean function, as with the above example, which in turn depends on Image grohtml-30209-46.png . This dependence represents an interaction between Image grohtml-30209-47.png and Image grohtml-30209-48.png . With the (homoscedastic) model expressed by

Image grohtml-302095.png

there is no such interaction; Image grohtml-30209-50.png is just Image grohtml-30209-51.png . There are two variants of the first-order conditional estimation method described in chapter II, one that takes this interaction into account and another that ignores it.

When an intraindividual model involving Image grohtml-30209-52.png is presented to NM-TRAN (the "front-end" of the NONMEM system), the model is automatically transformed. A linearization of the right side of the equation is used: a first-order approximation in Image grohtml-30209-53.png about 0, the mean value of Image grohtml-30209-54.png . Since the approximate model is linear in Image grohtml-30209-55.png , it is a mean-variance model. Clearly, if the given model is itself a mean-variance model, the transformed model is identical to the given model. Consider, for example, an intraindividual model where the elements of Image grohtml-30209-56.png are regarded as lognormally distributed (because the normally distributed Image grohtml-30209-57.png appear as logarithms):

Image grohtml-302096.png

In this case the transformed model is the constant cv model given above. (Therefore, no matter whether the given intraindividual model or the constant cv model is presented to NM-TRAN, the results of the analysis will be the same.)

Alternatively, the user might be able to transform the data so that a mean-variance model applies to the transformed data, which can then be presented directly to NM-TRAN. With the above example, and using the log transformation on the data Image grohtml-30209-59.png , an appropriate mean-variance model to present to NM-TRAN would be

Image grohtml-302097.png

(Actually, NM-TRAN allows one to explicitly accomplish the log transformation of both the data and the Image grohtml-30209-61.png .) The results of the analysis differ depending on whether or not the log transformation is used. Without the log transformation, the values of the Image grohtml-30209-62.png are regarded as arithmetic means (under the approximate model obtained by linearizing), and with the log transformation, these values are regarded as geometric means. Use of the log transformation (when this can be done; when there are no Image grohtml-30209-63.png or Image grohtml-30209-64.png with value 0) can often lead to a better analysis.

It is also assumed that as individuals are sampled randomly from the population, the Image grohtml-30209-65.png are also being sampled randomly (and statistically independently), although these values are not observable. The value Image grohtml-30209-66.png is called the random interindividual effect for Image grohtml-30209-67.png . It is assumed that the Image grohtml-30209-68.png are instances of the random vector Image grohtml-30209-69.png , normally distributed with mean 0 and variance-covariance matrix Image grohtml-30209-70.png . The density function of this distribution (at Image grohtml-30209-71.png ) is denoted by Image grohtml-30209-72.png .

Often, some quantity P (viewed as a function of values of the covariates and the Image grohtml-30209-73.png ) is common to different intraindividual models. For example, a clearance parameter may be common to different intraindividual models, but its value differs between different intraindividual models because the values of the covariates and the Image grohtml-30209-74.png differ. The randomness of the Image grohtml-30209-75.png in the population induces randomness in P. The quantity P is said to be a randomly dispersed parameter When speaking of its distribution, we are imagining that the values of the covariates are fixed, so that indeed, there is a unique distribution in question.

From the above assumptions, the (marginal) likelihood of Image grohtml-30209-76.png and Image grohtml-30209-77.png for the data Image grohtml-30209-78.png is given by

Image grohtml-302098.png

In general, this integral is difficult to compute exactly. The likelihood for all the data is given by

Image grohtml-302099.png

The first-order estimation method was the first population estimation method available with NONMEM. This method produces estimates of the population parameters Image grohtml-30209-81.png and Image grohtml-30209-82.png , but it does not produce estimates of the random interindividual effects. An estimate of Image grohtml-30209-83.png is nonetheless obtainable, conditional on the first-order estimates for Image grohtml-30209-84.png and Image grohtml-30209-85.png (or on any other values for these parameters), by maximizing the empirical Bayes posterior density of Image grohtml-30209-86.png , given Image grohtml-30209-87.png : Image grohtml-30209-88.png , with respect to Image grohtml-30209-89.png . In other words, the estimate is the mode of the posterior distribution. Since this estimate is obtained after values for Image grohtml-30209-90.png and Image grohtml-30209-91.png are obtained, it is called the posthoc estimate When a mean-variance model is used, and a request is put to NONMEM to compute a posthoc estimate, by default this estimate is computed using Image grohtml-30209-92.png . In other words, the intraindividual variance-covariance is assumed to be the same as that for the mean individual the hypothetical individual having the mean interindividual effect, 0, and sharing the same values of the covariates as has the ith individual). However, it is also possible to obtain the posterior mode without this assumption.

The posterior density can be maximized using any given values for Image grohtml-30209-93.png and Image grohtml-30209-94.png . Since the resulting estimate for Image grohtml-30209-95.png is obtained conditionally on these values, it is sometimes called a conditional estimate at these values, to emphasize its conditional nature.

In contrast with the first-order method, the conditional estimation methods to be described produce estimates of the population parameters and, simultaneously, estimates of the random interindividual effects. With each different method, a different approximation to the likelihood function (1) is used, and (2) is maximized with respect to Image grohtml-30209-96.png and Image grohtml-30209-97.png . The approximation to (1) at the values Image grohtml-30209-98.png and Image grohtml-30209-99.png depends on an estimate Image grohtml-30209-100.png , and as this estimate itself depends on the values Image grohtml-30209-101.png and Image grohtml-30209-102.png , the approximation gives rise to a further dependence of Image grohtml-30209-103.png on the values of Image grohtml-30209-104.png and Image grohtml-30209-105.png , one not expressed in (1). Consequently, as different values Image grohtml-30209-106.png and Image grohtml-30209-107.png are tried, different estimates Image grohtml-30209-108.png are obtained as a part of the maximization of (2). The estimates Image grohtml-30209-109.png at the values Image grohtml-30209-110.png and Image grohtml-30209-111.png that maximize (2) constitute the estimates of the random interindividual effects produced by the method (except for the hybrid method†). The estimate Image grohtml-30209-112.png also depends on Image grohtml-30209-113.png , and so, the approximation gives rise to a further dependence of Image grohtml-30209-114.png on Image grohtml-30209-115.png , one also not expressed in (1).
----------

† After obtaining the population parameter estimates with the hybrid method (see chapter II), NONMEM ignores the estimates of the Image grohtml-30209-116.png that have been produced simultaneously with the population parameter estimates, and as with the first-order method, the posthoc estimates (described above) are the ones reported as the estimates of the random interindividual effects.
----------

In contrast with the first-order method, a conditional estimation method involves multiple maximizations within a maximization. The estimate Image grohtml-30209-117.png is the value of Image grohtml-30209-118.png that maximizes the posterior distribution of Image grohtml-30209-119.png given Image grohtml-30209-120.png (except for the hybrid method††). For each different value of Image grohtml-30209-121.png and Image grohtml-30209-122.png that is tried by the maximization algorithm used to maximize (2), first a value Image grohtml-30209-123.png is found that maximizes the posterior distribution given Image grohtml-30209-124.png , then a value Image grohtml-30209-125.png is found that maximizes the posterior distribution given Image grohtml-30209-126.png , etc. Therefore, maximizing (2) is a very difficult and CPU intensive task. The numerical methods by which this is accomplished are not described in this document.
----------

†† With the hybrid method, a constrained maximum is computed.
----------

Fortunately, it often suffices to use the first-order method; a conditional estimation method is not needed, or if it is, sometimes it is needed only minimally during the course of a data analysis. Some guidance is given in chapter III. Briefly, the need for a conditional estimation method increases with the degree to which the intraindividual models are nonlinear in the Image grohtml-30209-127.png . Population pharmacokinetic models are often actually rather linear in this respect, although the degree of nonlinearity increases with the degree of multiple dosing. Population pharmacodynamic models are more nonlinear. The potential for a conditional estimation method to produce results different from those obtained with the first-order estimation method decreases as the amount of data per individual decreases, since a conditional estimation method uses conditional estimates of the Image grohtml-30209-128.png , which are all shrunken to 0, and the shrinkage is greater the less the amount of data per individual. Many population analyses involve little amounts of data per individual.

The conditional estimation methods that are available with NONMEM and which are described in chapter II are: the first-order conditional estimation method (with and without interaction when mean-variance models are used, and with or without centering), the Laplacian method (with and without centering), and the hybrid method (a hybrid between the first-order and first-order conditional estimation methods). For purposes of description here and in other NONMEM Users Guides, the term

conditional estimation methods refers not only to these population estimation methods, but also to methods for obtaining conditional estimates themselves.

To summarize, each of the (population) conditional estimation methods involves maximizing (2), but each uses a different approximation to (1). Actually, Image grohtml-30209-129.png is minimized with respect to Image grohtml-30209-130.png and Image grohtml-30209-131.png . This is called the objective function Its minimum value serves as a useful statistic for comparing models. Standard errors for the estimates (indeed, an estimated asymptotic variance-covariance matrix for all the estimates) is obtained by computing derivatives of the objective function.