Bayesian Models with Measurement Error that Partially Account for Identifiable Subjects
Edward J. StanekIII, Parimal Mukhopadhyay, Viviana B. Lencina, and Luz Mery González
Department of Public Health, University of Massachusetts at Amherst, USA
Indian Statistics Institute, Kolkata, India
Facultad de Ciencias Economicas, Universidad Nacional de Tucumán, CONICET, Argentina
Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia
In a mixed model, latent values associated with subjects are typically random while the subject specific measurement error variances are not considered to be random. To understand this paradox, we consider prediction via Bayesian models similar to those proposed by Ericson (1969), but that account for identifiable subjects. Defining the data as response for a subset of identifiable subjects from a finite population, we note that the posterior distribution of the subjects’ latent values in the data under an exchangeable prior for population latent values is an exchangeable distribution of the subjects’ latent values in the data. We expand this development to settings where response is measured with error on subjects in the data set, and develop the expected value and variance of the posterior distribution of the subjects’ latent values in the data, revealing that the measurement error variance component in the posterior distribution includes the average measurement error variance instead of subject specific measurement error variances. Based on these results, we specify a new prior distribution that leads to a posterior distribution of the subjects’ latent values in the data where the subjects’ identities are retained for measurement error, but not for the corresponding latent values. This class of models allows flexible specification of fixed and random effects, and highlights the distinction between potentially observable points and artificial points in the prior distribution. The results clarify the relationship between physical populations and measurements, and stochastic models that may partially connect to this physical reality. There are important implications in applications when there is interest in estimating population parameters, domain means, and latent values for realized random effects.
ACKNOWLEDGEMENT
This work was developed at a joint meeting of the authors in July, 2010 in the Department of Public Health at the University of Massachusetts, Amherst, USA and a follow-up meeting in September, 2010 in the Departamento de Economia, Universidad Nacional de Tucumán, Tucumán, Argentina. Appreciation is given to the helpful comments of Julio Singer, Wenjun Li, Michael Lavine, Shrikant Bangdiwala, Silvina San Martino, and Mirta Santana on early drafts of this manuscript. Previous meetings over the past five years of the Finite Population Mixed Model Research group (including many of these investigators) were supported by the National Institutes of Health (NIH-PHS-R01-HD36848, R01-HL071828-02), USA, Conselho Nacional de desenvolvimento Científico e Tecnológico (CNPq) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil.
Keywords: Finite population, heteroskedasticity, superpopulation, unbiasedness, inference, mixed models, finite population mixed model.
1
C11ed01v4-shortv5.doc 10/22/2018 5:37 PM
1. INTRODUCTION
Estimating a subject’s latent value in a population based on response for a subset of subjects is a common problem. When subjects are measured with error, subjects’ labels have an intriguing role. If the subjects’ identities are known for the subset and used in defining a stochastic model for each subjects’ responses, the response error model has fixed subject effects and a subject’s latent value can be estimated directly from the response error model. Ignoring the subject labels, a mixed model can be specified for response that sums the latent value and measurement error for subjects in a randomly selected set of subjects, and used to predict a realized subject’s latent value. Although the latent value of a realized subject in the set can be estimated (or predicted) with each model, the best linear unbiased predictor (BLUP) from the mixed model is more accurate.
The subjects’ labels are not completely suppressed in mixed models when measurement error variances differ between subjects. In such settings, although subject labels are ignored (so that subject effects are random), response error is identified with realized subjects. This partial use of labeling results in weighted least squares estimates of the mean in a mixed model, and shrinkage constants for BLUPs of realized subjects’ latent values that depend on the realized subjects’ response error variances. The better accuracy of these estimators relative to estimators that result from a finite population mixed model (FPMM) proposed by Stanek and Singer (2004) provides the motivation for this investigation. We seek to more clearly understand the manner in which labels are partially used in mixed models when there is heterogeneous endogenous variance.
One approach that enables a subject’s identity to be known is to expand the number of random variables in a finite population framework that represents the underlying problem, as in Stanek and Singer (2008). Models using this approach can include effects identified with subjects, but the approach does not lead to practical estimators. The approach is based on representing the data using an expanded number of random variables relative to finite population models, but fewer random variables than would be needed for Godambe’s (1955) general sampling framework, where a different probability may be associated with each permutation of the labeled data (Zhang, 2010). A second approach to partially account for a subject’s identify is to use superpopulation models (Godambe 1966). Although superpopulation models may be developed where subject’s latent values are random, and measurement error is fixed as in Scott and Smith (1969), they do so by introducing points into the modeling space that can not be observed (Stanek and Singer 2011). We explore a third approach to partially account for subject’s identify that uses Basu’s (1969) sufficient statistics in Godambe’s framework in a Bayesian context.
Our investigation focuses on the role of labels in a simple Bayesian model similar to Ericson (1969,1988), proceeding in the following manner. We first introduce notation and the basic ideas when there is no measurement error and where our interest is in the target parameter corresponding to the population mean. Subsequently, we apply this framework to a setting where subject’s response includes measurement error with heterogeneous variances, and develop a predictor of a realized subject’s latent value from the posterior distribution. Random variables for this distribution can be defined identical to those in a FPMM assuming the subjects in the data set are a finite population. Measurement error variances are not identifiable. Finally, we introduce a new prior distribution that results in a posterior distribution where the measurement error variance is identifiable, but random effects only match the first and second moments of a set of exchangeable random variables whose realization is a vector of latent values for subjects in the data set. Response for a subject in this posterior distribution can be represented by a mixed model where the latent values for the subjects in the data are exchangeable, but the measurement error variance is identified with a realized subject.
- THE PRIOR AND POSTERIOR DISTRIBUTIONSOF THE POPULATION MEAN WITH IDENTIFIABLE SUBJECTS AND NO MEASUREMENT ERROR
Assume our interest is in the mean response in a population of subjects, but that the observed data corresponds to a subset of labeled subjects’ responses (equal to the subjects’ latent values). We define the population aswhere represents a subject’s label (such as a name, assumed to be unique for each subject), and represents the subject’s response (equal to the subject’s latent value) which is a non-stochastic parameter. The population mean is the simple average latent value in the population, which we represent by , where is the set of subjects in the population. It is not necessary that the labels be completely known, nor that be exactly specified for such a population to be conceptually defined. However, we require the population to be defined in space and time, so that these quantities (while not known) are at least potentially knowable, and is defined for . For illustration, our interest may be in the average number of hypoglycemic episodes in the past year that occurred for Medicare enrollees with a diabetes diagnosis in Massachusetts on 7/1/2010. Conceptually, a list could be formed of all enrollees to define the finite population, where for each enrollee, the actual number of hypoglycemic episodes is recorded. It is not necessary to list the enrollees and record each response to have a clear interpretation of .
Although we don’t know the population mean, we assume that from previous studies and/or experience, we can guess it. The previous studies most likely were conducted in different locations and times, and have different strengths and limitations. Associated with each guess, we assign a prior probability that reflects our subjective measure of belief that the guess is the actual value of the parameter for the population. These values, , and their associated prior probabilities, , , define the prior distribution, where . We assume (i.e. believe) that the actual mean is one of the possible values specified in the prior. Let the prior parameter be defined for with subjects . In order for , we require not only that , and , but also that for each , must be identical in and . Different interpretations can be given to the population underlying the prior parameter and its associated prior probability, .
One concept of the population underlying the prior parameters is that the subjects are the same for all , such that , but that the latent value, , for a given subject is uncertain. The populations and are distinguished by having and differ for at least one . An example occurs when the populations are defined by associating the latent values in any of possible permutations of latent values with a given listing of subjects in . For this set of prior populations if all responses are distinct, although and for all , for only one will equal . It is for this population that we wish to know . A variation on this concept is to define the populations by associating the latent values in any of possible permutations of any of a set of latent values where , resulting in . The subjects may correspond conceptually to the subjects in a superpopulation. The prior parameter, , will be the same for all permutations of a set, but may differ for different sets. When is very large, the prior distribution of could be approximated by a continuous parametric distribution.
A different concept of the prior can be illustrated via an example. Suppose we want to know the average annual Medicare cost per enrollee in 2010 in the United States, with enrollees defined as theenrollees on July 1, 2010. From past research, we can specify (or guess) values of for each of the years 2006-2009, knowing that the population of Medicare enrollees differs somewhat between the years. Conceptually, assume that we are able to form a list of all enrollees over the time period 2006-2009, and associate with each enrollee their average annual cost for each year they are enrolled in Medicare. Let us augment this list with the new 2010 enrollees in Medicare to form a superpopulation of subjects, . For a subject enrolled for four years in Medicare, there will be four pairs that may differ as a result of different latent values for the four years.
By definition, the enrollees in 2010 are a subset of the subjects, but without knowing , we don’t know exactly what subset corresponds to the 2010 enrollees. Suppose we construct all possible subsets of subjects from the superpopulation, and for each subject in the subset, choose one pair, defining and . The 2010 enrollees correspond to one of these subsets of subjects, but we don’t know which one. We may limit the possible latent values for a subject to those observed in previous years, or expand the set of latent values that are possible for a subject similar to the previous example. Conceptually, the entire set of populations and their corresponding parameters, , can be summarized in a distribution. Although the actual enrollees in 2010, , are not known, for many , . In one of these populations, we assume . The prior distribution reflects our uncertainty over for which , is equal to .
Often the prior distribution may be defined by and their associated prior probabilities, for with only an implicit understanding of . Additional specificity of the prior population is not needed, and thus suppressed. The population could conceivably be identical to any of the populations in the prior. Suppose now that we observe data on a subset of subjects,, where is the latent value for the subject labeled . The basic idea underlying Bayesian inference is to use these data to reduce the uncertainty associated with the prior distribution. The uncertainty is reduced since once we know the subjects in the data, only prior populations that include these subjects, i.e., where , are possible. The challenge in updating the prior distribution is identifying where , such that for each , .
2.1. Identifying Subjects in an Exchangeable Distribution for Population
In order to have a clear strategy for linking subjects in the data to subjects in the prior distribution, we expand the representation of the prior so as to be able to identify subjects. We do so by assuming that response for the subjects in each population is a realization of a vector of exchangeable random variables, and define notation that enables subjects to be identified for each point in such an exchangeable distribution. Suppressing the subscript to simplify notation, for each , we define , as a vector of exchangeable random variables similar to Ericson (1969) such that the joint probability density of response , associated with each permutation, , of the subjects in is identical for all . Unlike Ericson, we define points in the prior distribution for each so that both the subject’s parameter and the subject’s label can be identified. To do so, we introduce notation to keep track of permutations of subject labels used to construct different listings, and notation forpermutations of subject’s response used to define possible points in . In each case, we maintain the connection between the subject’s label and response evident in . First, we define different possible listings.
A population listing links each subject’s label to a position, , in an ordered array . Let us define an initial listing by placing the subject labels into a vector , where the subject’s label in position is for. We define the corresponding parameter vector by , where is response for the subject with label . Different listings are defined by , where , , is an permutation matrix with elements in row of column that take a value of one when is in position in listing , and zero otherwise. Each row and column of total to one. We define , an identity matrix, so that . By knowing (and hence ), we know which subject is in position in , and hence know the subject associated with response in position in .
The vector of random variables is defined for listing with points for and . The matrix is an permutation matrix with elements in row of column that take a value of one when the subject in listing is in position in permutation , and zero otherwise. Rows and columns of total to one. By knowing (and hence ), we know which subject from listing corresponds to the response in position of . The labels for subjects in are given by . We associate an indicator random variable, , with such that where for all , and denotes expectation over . With these definitions, . Associated with each point is a vector of subject labels,, and a probability . The additional indices and determine , and along with the definition of , determine the order of the subjects in for population . We also define, where .
2.2. The Prior Distribution with Exchangeable Subjects
The vector is defined for each , , which we now identify using the subscript, such that where . For each , , we assume the vector of random variables are exchangeable implying that the joint probability density is identical for all , and represent a vector of random variables with this joint probability density by . Defining as an indicator random variable for population such that , including the distribution of the populations, we define . The product is a permutation matrix (which we index by ), and it is possible to identify different pairs, that will result in the same permutation matrix. When the product is identical for different pairs of and , i.e. ,the same point, will be realized via the random variables. As long as the probability associated with the point, is the same in , for each , is exchangeable. Note that these probabilities may be different for different . As a special case, when for all and, then for population , the prior distribution of is exchangeable.In this setting, and where is a vector of ones, , is an identity matrix, , and . Using to denote expectation with respect to , where and where and . Since possible values of are defined by realizations of , and .
2.3. The Data
We define the data next. Suppose response is observed on a set of labeled subjects from the population given by where is the subject’s label, and is the subject’s response. We list the subject labels in the data, in a vector , where the subject’s label in position is for . We define the corresponding response vector by , where is response for the subject with label . The subjects may be placed in different orders defined by , , where is an permutation matrix with elements in row of column that take a value of one when is in position in , and zero otherwise. Each row and column of total to one. We define , an identity matrix, so that . The vector where for is response for the subject whose label is in row of . Using this notation, we define the data equivalently as , indexing the vector of subject labels and response by the different possible orders of subjects. For the data, the average response is , and .
