Invariance analyses in large-scale studies
Large-scale surveys such as the Programme for International Student Assessment (PISA), the Teaching and Learning International Survey (TALIS), and the Programme for the International Assessment of Adult Competences (PIAAC) use advanced statistical models to estimate scores of latent traits from multiple observed responses. The comparison of such estimated scores across different groups of respondents is valid to the extent that the same set of estimated parameters holds in each group surveyed. This issue of invariance of parameter estimates is addressed in model fit indices which gauge the likelihood that one set of parameters can be used across all groups. Therefore, the problem of scale invariance across groups of respondents can typically be framed as the question of how well a single model fits the responses of all groups. However, the procedures used to evaluate the fit of these models pose a series of theoretical and practical problems. The most commonly applied procedures to establish invariance of cognitive and non-cognitive scales across countries in large-scale surveys are developed within the framework of confirmatory factor analysis and item response theory. The criteria that are commonly applied to evaluate the fit of such models, such as the decrement of the Comparative Fit Index in confirmatory factor analysis, work normally well in the comparison of a small number of countries or groups, but can perform poorly in large-scale surveys featuring a large number of countries. More specifically, the common criteria often result in the non-rejection of metric invariance; however, the step from metric invariance (i.e. identical factor loadings across countries) to scalar invariance (i.e. identical intercepts, in addition to identical factor loadings) appears to set overly restrictive standards for scalar invariance (i.e. identical intercepts). This report sets out to identify and apply novel procedures to evaluate model fit across a large number of groups, or novel scaling models that are more likely to pass common model fit criteria.