BRMS – WAMBS

 

By Laurent Smeets and Rens van de Schoot

Last modified: July 9, 2018

This document shows how you can replicate the popularity data multilevel models in a Bayesian framework from the book Multilevel analysis: Techniques and applications, Chapter 2. In this exercise the software package brms for R was used. You will be following the steps of the When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics – checklist (the WAMBS-checklist) to analyze the cross level interaction model we did in the BRMS Tutorial. This is part 3 of a 3 part series on how to do multilevel models.

Packages and Data

The main package that is used for this analysis is brms (https://cran.r-project.org/web/packages/brms/brms.pdf) . In order to make this package function, it needs to call on STAN and a C++ compiler For more information and a tutorial on how to install these please have a look at: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started and https://cran.r-project.org/bin/windows/Rtools/.

“Because brms is based on Stan, a C++ compiler is required. The program Rtools (available on https://cran.r-project.org/bin/windows/Rtools/) comes with a C++ compiler for Windows. On Mac, you should use Xcode. For further instructions on how to get the compilers running, see the prerequisites section at the RStan-Getting-Started page.” ~ quoted from the BRMS package document:

After you installed the aforementioned software you need to load some other R packages. If you have not yet installed all below mentioned packages, you can install them by the command install.packages(“NAMEOFPACKAGE”)

library(brms) # for the analysis
library(haven) # to load the SPSS .sav file
library(tidyverse) # needed for data manipulation.
library(RColorBrewer) # needed for some extra colours in one of the graphs
library(ggmcmc)
library(mcmcplots) 

To download the popularity data go to https://multilevel-analysis.sites.uu.nl/datasets/ and follow the links to https://github.com/MultiLevelAnalysis/Datasets-third-edition-Multilevel-book/blob/master/chapter%202/popularity/SPSS/popular2.sav. We will use the .sav file which can be found in the SPSS folder. After downloading the data to your working directory you can open it with the read_sav() command.

Alternatively, you can directly download them from GitHub into your R work space using the following command:

popular2data<- read_sav(file ="https://github.com/MultiLevelAnalysis/Datasets-third-edition-Multilevel-book/blob/master/chapter%202/popularity/SPSS/popular2.sav?raw=true")

There are some variables in the dataset that we do not use, so we can select the variables we will use and have a look at the first few observations.

popular2data<-select(popular2data, pupil, class, extrav, sex, texp, popular) # we select just the variables we will use
head(popular2data) # we have a look at the first 6 observations
## # A tibble: 6 x 6
##   pupil class extrav sex        texp popular
##   <dbl> <dbl>  <dbl> <dbl+lbl> <dbl>   <dbl>
## 1    1.    1.     5. 1           24.    6.30
## 2    2.    1.     7. 0           24.    4.90
## 3    3.    1.     4. 1           24.    5.30
## 4    4.    1.     3. 1           24.    4.70
## 5    5.    1.     5. 1           24.    6.00
## 6    6.    1.     4. 0           24.    4.70

The model

For this tutorial we make use of the multilevel crosslevel model (Model M2 from Table 2.3 in the book) we developed in the BRMS Tutorial. We have a main effect of sex, a random effect of Extravesion and a cross-level interaction between Extraversion and Teacher experience. This means we have to add texp as a predictor for the coefficient of extrav The cross level interaction term between extraversion and teacher experience can be created by the ‘:’ sign or by multiplying the terms.

If we put all of this in formula form we get: \(Popularity_{ij}=\beta_{0j}+\beta_1*gender_{ij}+ \beta_{2j}*extraversion_{ij}+e_{ij}\).

In which \(\beta_{0j}=\gamma_{00}+\gamma_{01}*experience_j+u_{0j}\) and \(\beta_{2j}= \gamma_{20}+\gamma_{21}*experience_j+u_{2j}\)

Combined we get:

\[Popularity_{ij}= \gamma_{00}+\gamma_{10}*sex_{ij}+\gamma_{20}*extraversion_{ij}+\gamma_{01}*experience_j+\gamma_{21}*extraversion_{ij}*experience_j+u_{2j}*extraversion_{ij}+u_{0j}+e_{ij}\]

 1.Do you understand the priors?

With the get_prior() command we can see which priors we can specify for this model.

get_prior(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data)
##                  prior     class        coef group resp dpar nlpar bound
## 1                              b                                        
## 2                              b      extrav                            
## 3                              b extrav:texp                            
## 4                              b         sex                            
## 5                              b        texp                            
## 6               lkj(1)       cor                                        
## 7                            cor             class                      
## 8  student_t(3, 5, 10) Intercept                                        
## 9  student_t(3, 0, 10)        sd                                        
## 10                            sd             class                      
## 11                            sd      extrav class                      
## 12                            sd   Intercept class                      
## 13 student_t(3, 0, 10)     sigma

In this tutorial we will only specify priors for:

  1. The regression coefficient of extrav \(\gamma_{20}\)
  2. The regression coefficient of sex \(\gamma_{10}\)
  3. The regression coefficient of texp \(\gamma_{01}\)
  4. The regression coefficient of extrav:texp \(\gamma_{21}\)
  5. The intercept \(\gamma_{00}\)

Since we are using a interaction effect and we are working with uncentered independent variables, the intercept will no longer be just a mean, but the value when all other values are zero even if this are impossible values. We can therefore not be very sure about the intercept and therefore we will give it a cauchy distribution with a shape parameter of 10. Based on earlier literature we might be fairly sure that that girls have a higher popularity than boys and will thus give its regression coefficient a normal distribution with a mean of 2 and a sigma (standard deviation) of .2. For teacher experience and extraverion we might be a bit less sure, because we are using interaction effects for these which means that the individual effects of these independent variables are now dependent on each other. Because of that, we decide to pick a wide normal distribution with a mean of 0 and a sigma (standard deviation) of 5. We are fairly sure about a negative interaction effect so we pick a normal distribution with a mean of -1 and a sigma (standard deviation) of .3 for the regression coefficient of extrav:texp.

PRIORS <- c(set_prior("normal(0,5)", class = "b", coef= "extrav"),
            set_prior("normal(-1,.3)", class = "b", coef= "extrav:texp"),
            set_prior("normal(2,.2)", class = "b", coef= "sex"),
            set_prior("normal(0,5)", class = "b", coef= "texp"),
            set_prior("cauchy(0,10)", class = "Intercept"))
model<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data, warmup = 1000, iter =3000, chains = 3, inits= "random", control = list(adapt_delta = 0.96), prior=PRIORS, save_all_pars = TRUE,  sample_prior = TRUE, cores= 3) # the cores function tells STAN to make use of 3 CPU cores simultaneously instead of just 1.

2. Does the trace-plot exhibit convergence?

Before interpreting results, one should inspect the convergence of the chains that form the posterior distribution of the model parameters. A straightforward and common way to visualize convergence is the trace plot that illustrates the iterations of the chains from start to end.

modeltranformed<-ggs(model) # the ggs function transforms the BRMS output into a longformat tibble, that we can use to make different types of plots.
## Warning: NAs introduced by coercion
ggplot(filter(modeltranformed, Parameter==c("b_Intercept", "b_extrav", "b_sex", "b_extrav:texp", "b_texp"), Iteration>1000),aes(x=Iteration,y=value, col=as.factor(Chain)))+
  geom_line()+
  facet_grid(Parameter ~ .,scale='free_y',switch = 'y')+
  labs(title="Caterpillar Plots", col= "Chains")

The first impression obtained from the trace plots shows that the chains are probably converged. In addition, we can look at different convergence statistics. To obtain convergence diagnostics to check convergence for the parameters of interest we use the Gelman and Rubin diagnostic (Gelman and Rubin 1992) and the Geweke (1992) diagnostic. To obtain the G-R diagnostic use:

modelposterior<-as.mcmc(model) # with the as.mcmc() command we can use all the CODA package convergence statistics and plotting options
gelman.diag(modelposterior[, 1:5])
## Potential scale reduction factors:
## 
##               Point est. Upper C.I.
## b_Intercept            1       1.00
## b_sex                  1       1.00
## b_extrav               1       1.01
## b_texp                 1       1.00
## b_extrav:texp          1       1.01
## 
## Multivariate psrf
## 
## 1.01
gelman.plot(modelposterior[, 1:5])

Look at the Upper CI/Upper limit this should be close to 1. In our case this all looks good. To obtain the Geweke diagnostic use:

This syntax will provide a z-score, where a z-score higher than the absolute value of 1.96 is associated with a p-value of < .05 (two-tailed). These values should therefore be lower than the absolute value of 1.96.

If Geweke indicates that the first and last part of a sample from a Markov chain are not drawn from the same distribution, it may be useful to discard the first few iterations to see if the rest of the chain has “converged”. This plot shows what happens to Geweke’s z-score when successively larger numbers of iterations are discarded from the beginning of the chain.

geweke.diag(modelposterior[, 1:5])
## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   b_Intercept         b_sex      b_extrav        b_texp b_extrav:texp 
##        0.7529        0.2753        0.4584       -0.7247       -0.5830 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   b_Intercept         b_sex      b_extrav        b_texp b_extrav:texp 
##        0.4636        0.2200        0.9813       -0.1301       -0.5033 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   b_Intercept         b_sex      b_extrav        b_texp b_extrav:texp 
##       -1.2313       -2.6936        0.7392        0.7343       -0.1045
geweke.plot(modelposterior[, 1:5])

3. Does convergence remain after doubling the number of iterations?

As the WAMBS check list recommends, we can now check if doubling the amount of iterations does change the convergence.

modeldoubleniter<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data, warmup = 2000, iter =6000, chains = 3, inits= "random", control = list(adapt_delta = 0.96), prior=PRIORS, save_all_pars = TRUE,  sample_prior = TRUE,  cores= 3) 
modeldoubleniterposterior<-as.mcmc(modeldoubleniter)
modelposterior<-as.mcmc(model) # with the as.mcmc() command we can use all the CODA package convergence statistics and plotting options
gelman.diag(modeldoubleniterposterior[, 1:5])
## Potential scale reduction factors:
## 
##               Point est. Upper C.I.
## b_Intercept            1       1.01
## b_sex                  1       1.00
## b_extrav               1       1.01
## b_texp                 1       1.01
## b_extrav:texp          1       1.01
## 
## Multivariate psrf
## 
## 1
gelman.plot(modeldoubleniterposterior[, 1:5])

geweke.diag(modeldoubleniterposterior[, 1:5])
## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   b_Intercept         b_sex      b_extrav        b_texp b_extrav:texp 
##       -0.9255        0.3919        1.1069        1.0672       -1.2321 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   b_Intercept         b_sex      b_extrav        b_texp b_extrav:texp 
##       -0.1691        2.3825       -0.1511       -0.1189        0.2210 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   b_Intercept         b_sex      b_extrav        b_texp b_extrav:texp 
##     -0.030477     -0.560422     -0.272318      0.008822      0.324217
geweke.plot(modeldoubleniterposterior[, 1:5])

We now see that doubling the iterations does not really change the convergence.

We can also look at the relative bias to see if more iterations make a large difference (\(bias= 100*\frac{(initial converged model- model with double iteration)}{model with double iteration}\))

round(100*((summary(model)$fixed-summary(modeldoubleniter)$fixed)/summary(modeldoubleniter)$fixed), 5)
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample    Rhat
## Intercept   -1.08602  -1.16350 -1.73718 -0.55215  -16.86953 0.02286
## sex         -0.13129  -0.68600 -0.09451 -0.16548  -50.00000 0.11799
## extrav      -0.31027   1.13033 -0.05729 -0.12346  -55.16341 0.31509
## texp        -0.56637   0.35903 -0.00353 -0.24850  -24.26731 0.26031
## extrav:texp -0.90261   3.89365 -0.20209 -0.64665  -90.39636 0.80193

The relative bias is small enough not too worry about it.

4. Does the posterior distribution histogram have enough information?

BY having a look at the postrior distribution density (or if you like histogram) we can check if it has enough information. For regression coefficients it ideally it is clearly centerered with smootg sloping tails.

plot(hypothesis(model, "extrav = 0"), plot = F, theme = theme_get())[[1]]+ scale_x_continuous(limits=c(-.2, 1.5))

plot(hypothesis(model, "sex = 0"), plot = F, theme = theme_get())[[1]]+ scale_x_continuous(limits=c(-.2, 3))

plot(hypothesis(model, "texp = 0"), plot = F, theme = theme_get())[[1]]+ scale_x_continuous(limits=c(-.2, .5))

plot(hypothesis(model, "extrav:texp = 0"), plot = F, theme = theme_get())[[1]]+ scale_x_continuous(limits=c(-2, 0))

plot(hypothesis(model, "Intercept = 0"), plot = F, theme = theme_get())[[1]]+ scale_x_continuous(limits=c(-5, 5))

The different posterior distributions have enough information and more iterations are not necessary. They are all single peaked with smooth slopes. Posterior distributions doe not have to be symmetrical, but in this example they seem to be.

5. Do the chains exhibit a strong degree of autocorrelation?

In addition to lack of convergence, autocorrelation can also bias the results in Bayesian analysis. Severe autocorrelation results in chains that don’t completely describe the whole posterior distribution of the model parameters. Then, invalid results would follow from these incomplete posterior distributions. Therefore, we check autocorrelation by finding the correlation between different lags of the iterations of each model parameters with this piece of code:

autocorr.diag(modelposterior[,1:5], lags = c(0, 1,2,3,4, 5, 10, 50))
##         b_Intercept        b_sex   b_extrav       b_texp b_extrav:texp
## Lag 0   1.000000000  1.000000000 1.00000000  1.000000000    1.00000000
## Lag 1   0.209276950 -0.165889410 0.06895819  0.219876661    0.07560304
## Lag 2   0.153759111  0.144518496 0.13765318  0.169520886    0.16364743
## Lag 3   0.077998835 -0.003987969 0.05926797  0.088358614    0.07561187
## Lag 4   0.055208748  0.047733948 0.06666504  0.061416216    0.08500281
## Lag 5   0.039938017  0.040238641 0.05994243  0.055014211    0.08883707
## Lag 10  0.002222980  0.016366742 0.02122398 -0.004802806    0.03302167
## Lag 50 -0.004980992  0.015646871 0.00669497  0.003728686    0.03748117

These results show that autocorrelation disappears after a few lags. To make it completely disappear you could specify a trim of 2 or 3.

6. Do the posterior distributions make substantive sense?

Looking at the posterior distributions we do not see any abnormalities All posterior distributions are within reasonable bounds.

ggplot(filter(modeltranformed, Parameter==c("b_Intercept","b_extrav" ,"b_sex"), Iteration>1000), aes(x=value, fill=Parameter))+
  geom_density(alpha=.5)+geom_vline(xintercept = 0, col="red", size=1)+
  scale_x_continuous(name="Value", limits=c(-2.2, 1.5))+ 
    geom_vline(xintercept = summary(model)$fixed[1,3:4], col="darkgreen", linetype=2)+
    geom_vline(xintercept = summary(model)$fixed[2,3:4], col="blue", linetype=2)+
    geom_vline(xintercept = summary(model)$fixed[3,3:4], col="red", linetype=2)+
  theme_light()+
   scale_fill_manual(name='Parameters', values = c("red","darkgreen" , "lightblue"), labels=c(expression( " "  ~  gamma[Exraversion]), expression( " "  ~  gamma[Intercept]),  expression( " "  ~  gamma[Sex])))+
  labs(title="Posterior Density of Parameters With 95% CCI lines (1)")
## Warning: There were 487 divergent transitions after warmup. Increasing adapt_delta above 0.96 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: There were 487 divergent transitions after warmup. Increasing adapt_delta above 0.96 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: There were 487 divergent transitions after warmup. Increasing adapt_delta above 0.96 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

ggplot(filter(modeltranformed, Parameter==c("b_texp","b_extrav:texp"), Iteration>1000), aes(x=value, fill=Parameter))+
  geom_density(alpha=.5)+geom_vline(xintercept = 0, col="red", size=1)+
  coord_cartesian(ylim = c(0, 25))+
    geom_vline(xintercept = summary(model)$fixed[4,3:4], col="purple", linetype=2)+
    geom_vline(xintercept = summary(model)$fixed[5,3:4], col="red", linetype=2)+
  theme_light()+
     scale_fill_manual(name='Parameters', values = c("Darkred","purple"), labels=c(expression( " "  ~  gamma[extrav:texp]), expression( " "  ~  gamma[ texp])))+
  labs(title="Posterior Density of Parameters With 95% CCI lines (2)")
## Warning: There were 487 divergent transitions after warmup. Increasing adapt_delta above 0.96 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: There were 487 divergent transitions after warmup. Increasing adapt_delta above 0.96 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

7. Do different specification of the multivariate variance priors influence the results?

So far we have only set the priors for the regression coefficients and the intercept and have used the default BRMS priors for the standard deviations of group-level (‘random’) effects named (sd), the correlations of group-level (‘random’) effects (cor), and the residual standard deviation (sigma).

From the BRMS manual we learn that:

  1. The SD “parameters are restricted to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function. Minimally, the scale parameter is 10.” 2.The cor prior *“lkj_corr_cholesky(eta)” or in short “lkj(eta)” with eta > 0 is essentially the only prior for (Cholesky factors) of correlation matrices. If eta = 1 (the default) all correlations matrices are equally likely a priori. If eta > 1, extreme correlations become less likely, whereas 0 < eta < 1 results in higher probabilities for extreme correlations“*
  2. “By default, sigma has a half student-t prior that scales in the same way as the group-level standard deviations.”

We can re-specify these priors a bit to see if doing so strongly influences the results.

PRIORS2 <- c(set_prior("normal(0,5)", class = "b", coef= "extrav"),
            set_prior("normal(-1,.3)", class = "b", coef= "extrav:texp"),
            set_prior("normal(2,.2)", class = "b", coef= "sex"),
            set_prior("normal(0,5)", class = "b", coef= "texp"),
            set_prior("cauchy(0,10)", class = "Intercept"),
            set_prior("cauchy(0,2)", class = "sd"),          # a half cauchy distribution (truncuated at 0) for the sd
            set_prior("lkj(2)", class = "cor"),              # a Cholesky of 2 for the correlation  
             set_prior("inv_gamma(.5,.5)", class = "sigma")) # an uniformative inverse gamma for the sigma. 


modeldifferentMVpriors<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data, warmup = 1000, iter =3000, chains = 3, inits= "random", control = list(adapt_delta = 0.96), prior=PRIORS2, save_all_pars = TRUE,  sample_prior = TRUE,  cores= 3) 
summary(modeldifferentMVpriors)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 3 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 6000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.60      0.10     0.44     0.85        388 1.01
## sd(extrav)                0.04      0.03     0.00     0.10        256 1.02
## cor(Intercept,extrav)    -0.31      0.37    -0.82     0.59        783 1.01
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -1.22      0.26    -1.75    -0.71       1086 1.00
## sex             1.26      0.04     1.19     1.33       6000 1.00
## extrav          0.80      0.04     0.73     0.87       2944 1.00
## texp            0.23      0.02     0.20     0.26       1054 1.00
## extrav:texp    -0.02      0.00    -0.03    -0.02       3354 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.75      0.01     0.72     0.77       6000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

We see that these new priors had no influence on the estimation of the regression coefficients, but we do see that the sds got smaller. This is because the half cauchy prior we used is weakly informative towards 0 compared to the default priors. Also, because we set the correlation prior quite a bit higher we get a lower estimated correlation between the randon effects.

8. Is there a notable effect of the prior when compared with non-informative priors?

One might be interested in re-runinng the analysis, but with uninformartive priors simply to check if these priors had a large influence on the estimates. A large influence of informative prior is not per se problematic (even one of the strengths of a Bayesian analysis), but unlikely in a large dataset such as this one. We can specify priors as we did before, using the prior command in the BRM function.

PRIORSUNIFORMATIVE <- c(set_prior("normal(0,100)", class = "b", coef= "extrav"),
            set_prior("normal(0,100)", class = "b", coef= "extrav:texp"),
            set_prior("normal(0,100)", class = "b", coef= "sex"),
            set_prior("normal(0,100)", class = "b", coef= "texp"),
            set_prior("normal(0,100)", class = "Intercept"))
modeluninformativepriors<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data, warmup = 1000, iter =3000, chains = 3, inits= "random", control = list(adapt_delta = 0.96), prior=PRIORSUNIFORMATIVE, save_all_pars = TRUE,  sample_prior = TRUE,  cores= 3) 
summary(model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 3 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 6000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.62      0.10     0.45     0.84        664 1.00
## sd(extrav)                0.05      0.03     0.00     0.11        119 1.03
## cor(Intercept,extrav)    -0.39      0.40    -0.88     0.72        849 1.01
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -1.20      0.26    -1.70    -0.70       2891 1.00
## sex             1.26      0.03     1.19     1.33       6000 1.00
## extrav          0.80      0.04     0.73     0.88       2234 1.00
## texp            0.22      0.02     0.19     0.26       2624 1.00
## extrav:texp    -0.02      0.00    -0.03    -0.02        491 1.01
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.75      0.01     0.72     0.77       6000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
summary(modeluninformativepriors)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 3 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 6000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.62      0.10     0.45     0.85        607 1.00
## sd(extrav)                0.05      0.03     0.00     0.11        156 1.02
## cor(Intercept,extrav)    -0.39      0.42    -0.90     0.78       1154 1.00
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -1.20      0.26    -1.70    -0.69       2829 1.00
## sex             1.24      0.04     1.17     1.31       6000 1.00
## extrav          0.80      0.04     0.73     0.88       2697 1.00
## texp            0.23      0.02     0.20     0.26       2511 1.00
## extrav:texp    -0.02      0.00    -0.03    -0.02       2583 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.75      0.01     0.72     0.77       6000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

We see only a small difference between the informative and uninformative priors. We see that the estimates are ‘pulled’ toward the mean of the informative priors we specified for the regression coefficients of sex and extrav:texp.

We can also calculate a relative bias. Except for the sex regression coefficient, for all estimated the bias is less than 1%.

round(100*((summary(model)$fixed-summary(modeluninformativepriors)$fixed)/summary(modeluninformativepriors)$fixed), 5)
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample    Rhat
## Intercept   -0.21261  -0.79058  0.09123  1.68470    2.17495 0.13181
## sex          1.85356  -4.48480  2.24169  1.53316    0.00000 0.06788
## extrav      -0.32358   0.54170  0.12537 -0.29650  -17.16656 0.40382
## texp        -0.47541   0.28593 -0.43574 -0.33077    4.50113 0.34527
## extrav:texp -0.60780   2.30924 -0.16379 -0.16000  -80.97788 0.88059

When we compare the range of the 95% Credible Interval, we see that compared to the uninformative priors, the informative priors leads to a narrower interval

((summary(model)$fixed[,4]-summary(model)$fixed[,3])-(summary(modeluninformativepriors)$fixed[,4]-summary(modeluninformativepriors)$fixed[,3]))/(summary(modeluninformativepriors)$fixed[,4]-summary(modeluninformativepriors)$fixed[,3])
##     Intercept           sex        extrav          texp   extrav:texp 
## -9.996467e-03 -4.203577e-02 -2.317241e-02  2.701101e-06 -1.716502e-03

9. Are the results stable from a sensitivity analysis?

If you still have time left, you can adjust the hyperparameters of the priors upward and downward and re-estimating the model with these varied priors to check for robustness.

From the original paper:

“If informative or weakly-informative priors are used, then we suggest running a sensitivity analysis of these priors. When subjective priors are in place, then there might be a discrepancy between results using different subjective prior settings. A sensitivity analysis for priors would entail adjusting the entire prior distribution (i.e., using a completely different prior distribution than before) or adjusting hyperparameters upward and downward and re-estimating the model with these varied priors. Several different hyperparameter specifications can be made in a sensitivity analysis, and results obtained will point toward the impact of small fluctuations in hyperparameter values. [.] The purpose of this sensitivity analysis is to assess how much of an impact the location of the mean hyperparameter for the prior has on the posterior. [.] Upon receiving results from the sensitivity analysis, assess the impact that fluctuations in the hyperparameter values have on the substantive conclusions. Results may be stable across the sensitivity analysis, or they may be highly instable based on substantive conclusions. Whatever the finding, this information is important to report in the results and discussion sections of a paper. We should also reiterate here that original priors should not be modified, despite the results obtained.”

10. Is the Bayesian way of interpreting and reporting model results used?

For a summary on how to interpret and report models, please refer to https://www.rensvandeschoot.com/bayesian-analyses-where-to-start-and-what-to-report/

summary(model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 3 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 6000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.62      0.10     0.45     0.84        664 1.00
## sd(extrav)                0.05      0.03     0.00     0.11        119 1.03
## cor(Intercept,extrav)    -0.39      0.40    -0.88     0.72        849 1.01
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -1.20      0.26    -1.70    -0.70       2891 1.00
## sex             1.26      0.03     1.19     1.33       6000 1.00
## extrav          0.80      0.04     0.73     0.88       2234 1.00
## texp            0.22      0.02     0.19     0.26       2624 1.00
## extrav:texp    -0.02      0.00    -0.03    -0.02        491 1.01
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.75      0.01     0.72     0.77       6000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

In the current model we see that:

  • The estimate for the intercept is -1.21 [-1.72; -0.69]
  • The estimate for the fixed effect of sex is 1.26 [1.19 ; 1.33]
  • The estimate for the effect of teacher experience is 0.23 [0.19; 0.26]
  • The estimate for the (random) effect of extraversion is 0.80 [0.73; 0.88]
  • The estimate for the crosslevel interaction effect of extraversion and teacher experience is -0.02 [-0.03;-0.02]

We can see that none of 95% Posterior Credible Intervals for these effects include zero, which means we are can be quite certain that all of the random and fixed effects are different from 0.