没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Chapter 9
The MI Procedure
Chapter Table of Contents
OVERVIEW ...................................131
GETTING STARTED ..............................133
SYNTAX .....................................137
PROCMIStatement ..............................138
BYStatement..................................141
EMStatement..................................141
FREQStatement ................................142
MCMCStatement................................143
MONOTONEStatement ............................149
TRANSFORMStatement............................150
VARStatement .................................151
DETAILS .....................................152
DescriptiveStatistics ..............................152
EM Algorithm for Data with Missing Values . . ................153
Statistical Assumptions for Multiple Imputation ................154
MissingDataPatterns..............................155
ImputationMechanisms ............................156
Regression Method for Monotone Missing Data ................157
Propensity Score Method for Monotone Missing Data . . ...........158
MCMC Method for Arbitrary Missing Data . . ................159
Producing Monotone Missingness with the MCMC Method ..........164
MCMCMethodSpecifications.........................166
ConvergenceinMCMC.............................167
Input Data Sets .................................170
OutputDataSets ................................171
Combining Inferences from Multiply Imputed Data Sets ...........173
Multiple Imputation Efficiency . . . ......................174
Imputer’sModelVersusAnalyst’sModel ...................174
Parameter Simulation Versus Multiple Imputation ...............175
ODSTableNames ...............................176
EXAMPLES ...................................177
Example 9.1 EM Algorithm for MLE .....................177
130
Chapter 9. The MI Procedure
Example9.2PropensityScoreMethod.....................181
Example 9.3 Regression Method . . ......................184
Example9.4MCMCMethod..........................185
Example 9.5 Producing Monotone Missingness with MCMC .........188
Example9.6CheckingConvergenceinMCMC ................191
Example 9.7 Transformation to Normality . . . ................194
Example9.8SavingandUsingParametersforMCMC ............198
REFERENCES ..................................199
SAS OnlineDoc
: Version 8
Chapter 9
The MI Procedure
Overview
The experimental MI procedure performs multiple imputation of missing data. Miss-
ing values are an issue in a substantial number of statistical analyses. Most SAS
statistical procedures exclude observations with any missing variable values from
the analysis. These observations are called incomplete cases. While analyzing only
complete cases has its simplicity, the information contained in the incomplete cases
is lost. This approach also ignores possible systematic differences between the com-
plete cases and the incomplete cases, and the resulting inference may not be appli-
cable to the population of all cases, especially with a smaller number of complete
cases.
Some SAS procedures use all the available cases in an analysis, that is, cases with
available information. For example, the CORR procedure estimates a variable mean
by using all cases with nonmissing values for this variable, ignoring the possible
missing values in other variables. PROC CORR also estimates a correlation by using
all cases with nonmissing values for this pair of variables. This makes better use of
the available data, but the resulting correlation matrix may not be positive definite.
Another strategy for handling missing data is simple imputation, which substitutes a
value for each missing value. Standard statistical procedures for complete data anal-
ysis can then be used with the filled-in data set. For example, each missing value
can be imputed with the variable mean of the complete cases, or it can be imputed
with the mean conditional on observed values of other variables. This approach treats
missing values as if they were known in the complete-data analysis. However, sin-
gle imputation does not reflect the uncertainty about the predictions of the unknown
missing values, and the resulting estimated variances of the parameter estimates will
be biased toward zero (Rubin 1987, p. 13).
Instead of filling in a single value for each missing value, multiple imputation (Rubin
1976; 1987) replaces each missing value with a set of plausible values that represent
the uncertainty about the right value to impute. The multiply imputed data sets are
then analyzed by using standard procedures for complete data and combining the
results from these analyses. No matter which complete-data analysis is used, the
process of combining results from different data sets is essentially the same.
Multiple imputation does not attempt to estimate each missing value through sim-
ulated values but rather to represent a random sample of the missing values. This
process results in valid statistical inferences that properly reflect the uncertainty due
to missing values; for example, confidence intervals with the correct probability cov-
erage.
132
Chapter 9. The MI Procedure
Multiple imputation inference involves three distinct phases:
1. The missing data are filled in m times to generate m complete data sets.
2. The m complete data sets are analyzed using standard statistical analyses.
3. The results from the m complete data sets are combined to produce inferential
results.
The new MI procedure creates multiply imputed data sets for incomplete multivariate
data. It uses methods that incorporate appropriate variability across the m imputa-
tions. The method of choice depends on the patterns of missingness. A data set with
variables
Y
1
,
Y
2
, ...,
Y
p
(in that order) is said to have a monotone missing pattern
when the event that a variable
Y
j
is missing for a particular individual implies that all
subsequent variables
Y
k
,
k>j
, are missing for that individual.
For data sets with monotone missing patterns, either a parametric regression method
(Rubin 1987) that assumes multivariate normality or a nonparametric method that
uses propensity scores (Rubin 1987; Lavori, Dawson, and Shera 1995) is appro-
priate. For data sets with arbitrary missing patterns, a Markov Chain Monte Carlo
(MCMC) method (Schafer 1997) that assumes multivariate normality is used to im-
pute all missing values or just enough missing values to make the imputed data sets
have monotone missing patterns.
Once the m complete data sets are analyzed using standard SAS procedures, the new
MIANALYZE procedure can be used to generate valid statistical inferences about
these parameters by combining results from the m analyses. These two procedures
are available in experimental form in Release 8.2 of the SAS System.
Often, as few as three to five imputations are adequate in multiple imputation (Rubin
1996, p. 480). The relative efficiency of the small
m
imputation estimator is high for
cases with little missing information (Rubin 1987, p. 114). Also see the “Multiple
Imputation Efficiency” section on page 174.
Multiple imputation inference assumes that the model (variables) you used to analyze
the multiply imputed data (the analyst’s model) is the same as the model used to im-
pute missing values in multiple imputation (the imputer’s model). But in practice, the
two models may not be the same. The consequence for different scenarios (Schafer
1997, pp. 139–143) is discussed in the “Imputer’s Model Versus Analyst’s Model”
section on page 174.
In addition to the multiple imputation method, a simulation-based method of pa-
rameter simulation can also be used to analyze the data for many incomplete-data
problems. Although the MI procedure does not offer a simulation-based method of
parameter simulation, the choice between the two methods (Schafer 1997, pp. 89–90,
135–136) is examined in the “Parameter Simulation Versus Multiple Imputation” sec-
tion on page 175.
SAS OnlineDoc
: Version 8
Getting Started
133
Getting Started
Consider the following Fitness data set that has been altered to contain an arbitrary
pattern of missingness:
*----------------- Data on Physical Fitness -----------------*
| These measurements were made on men involved in a physical |
| fitness course at N.C. State University. |
| Only selected variables of |
| Oxygen (oxygen intake, ml per kg body weight per minute), |
| Runtime (time to run 1.5 miles in minutes), and |
| RunPulse (heart rate while running) are used. |
| Certain values were changed to missing for the analysis. |
*------------------------------------------------------------*;
data FitMiss;
input Oxygen RunTime RunPulse @@;
datalines;
44.609 11.37 178 45.313 10.07 185
54.297 8.65 156 59.571 . .
49.874 9.22 . 44.811 11.63 176
. 11.95 176 . 10.85 .
39.442 13.08 174 60.055 8.63 170
50.541 . . 37.388 14.03 186
44.754 11.12 176 47.273 . .
51.855 10.33 166 49.156 8.95 180
40.836 10.95 168 46.672 10.00 .
46.774 10.25 . 50.388 10.08 168
39.407 12.63 174 46.080 11.17 156
45.441 9.63 164 . 8.92 .
45.118 11.08 . 39.203 12.88 168
45.790 10.47 186 50.545 9.93 148
48.673 9.40 186 47.920 11.50 170
47.467 10.50 170
;
Suppose that the data are multivariate normally distributed and the missing data are
missing at random (MAR). That is, the probability that an observation is missing
can depend on the observed variable values of the individual, but not on the miss-
ing variable values of the individual. See the “Statistical Assumptions for Multiple
Imputation” section on page 154 for a detailed description of the MAR assumption.
The following statements invoke the MI procedure and impute missing values for the
FitMiss data set.
proc mi data=FitMiss seed=37851 mu0=50 10 180 out=outmi;
var Oxygen RunTime RunPulse;
run;
SAS OnlineDoc
: Version 8
剩余71页未读,继续阅读
资源评论
zhanggang0003
- 粉丝: 0
- 资源: 12
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功