There are some basic questions that survey researchers usually consider when apply model-based estimation, such as the selection of the model covariates, the smoothing of the sample variances of the survey direct estimators, and, after fitting the model, the estimation of the mean squared error (Mse) of the empirical predictor (Eblup). However, some methodologically relevant aspects can be considered when we try to assess in deep the model proposed, i.e. the impact that some data have on the model estimation itself. For instance, potential outliers, data with high-leverage values, can influence dramatically the model parameters estimates, and, consequently, they can affect the global performance of the survey predictors [2]. Some important features of diagnostic tools in the application of small area models should be related with the evaluation of the data structure, dealing with the estimation of fixed and random effects, or the covariance parameters estimates, and the estimation of the Eblup and its Mse [1]. The analysis of the influence on the covariance parameters estimates is also a central question, e.g., these estimates can specify correctly the weight (the shrinkage factor) on the regression-synthetic estimation component in the traditional Fay-Herriot (FH) model. Multivariate FH models (MFH) treat estimation of a vector of small area population parameters. Because of the correlation between components of the vector of survey direct estimators, the multivariate Eblup leads to a better performance of the multivariate predictor, respect to the univariate case [3]. The proposal of specific measures of statistical influence on small area estimates by a multivariate FH model, together with the assessment of diagnostic tools of this specific model, represents the topic of our research. We adapt the influence analysis of linear mixed models to the needs of the multivariate FH model diagnostics, such as leverages on the predictors, leverages on covariance parameters and Mse components, influence analysis on the vector of Mse, by the estimated covariance matrix of the survey direct estimates and by the data, and the deletion diagnostics on predictors and on the covariance parameters. Further, in this paper we provide a comprehensive treatment of the algebra of influence analysis in the multivariate FH model, together with a dedicated set of influence plots. We discuss an application on official data, the last supplied by the Italian National Statistics Institute. An estimation of a vector of three variables at NUTS 3 level using the regional Farm Structure Surveys will be analyzed: the number of farmers patronizing a cooperative, the number of farmers with marketing contracts and the number of farmers with higher education degree. Demographic variables (average age and education) from population surveys and farm sector from conjuncture statistics (land use and agricultural production) are used as auxiliary variables in the model.
Diagnostics and influence analysis in the multivariate Fay-Herriot model
SALVATORE, Renato
2013-01-01
Abstract
There are some basic questions that survey researchers usually consider when apply model-based estimation, such as the selection of the model covariates, the smoothing of the sample variances of the survey direct estimators, and, after fitting the model, the estimation of the mean squared error (Mse) of the empirical predictor (Eblup). However, some methodologically relevant aspects can be considered when we try to assess in deep the model proposed, i.e. the impact that some data have on the model estimation itself. For instance, potential outliers, data with high-leverage values, can influence dramatically the model parameters estimates, and, consequently, they can affect the global performance of the survey predictors [2]. Some important features of diagnostic tools in the application of small area models should be related with the evaluation of the data structure, dealing with the estimation of fixed and random effects, or the covariance parameters estimates, and the estimation of the Eblup and its Mse [1]. The analysis of the influence on the covariance parameters estimates is also a central question, e.g., these estimates can specify correctly the weight (the shrinkage factor) on the regression-synthetic estimation component in the traditional Fay-Herriot (FH) model. Multivariate FH models (MFH) treat estimation of a vector of small area population parameters. Because of the correlation between components of the vector of survey direct estimators, the multivariate Eblup leads to a better performance of the multivariate predictor, respect to the univariate case [3]. The proposal of specific measures of statistical influence on small area estimates by a multivariate FH model, together with the assessment of diagnostic tools of this specific model, represents the topic of our research. We adapt the influence analysis of linear mixed models to the needs of the multivariate FH model diagnostics, such as leverages on the predictors, leverages on covariance parameters and Mse components, influence analysis on the vector of Mse, by the estimated covariance matrix of the survey direct estimates and by the data, and the deletion diagnostics on predictors and on the covariance parameters. Further, in this paper we provide a comprehensive treatment of the algebra of influence analysis in the multivariate FH model, together with a dedicated set of influence plots. We discuss an application on official data, the last supplied by the Italian National Statistics Institute. An estimation of a vector of three variables at NUTS 3 level using the regional Farm Structure Surveys will be analyzed: the number of farmers patronizing a cooperative, the number of farmers with marketing contracts and the number of farmers with higher education degree. Demographic variables (average age and education) from population surveys and farm sector from conjuncture statistics (land use and agricultural production) are used as auxiliary variables in the model.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.