BRMS-priors

 

Last modified: July 9, 2018

This document shows how you can replicate the popularity data multilevel models in a Bayesian framework from the book Multilevel analysis: Techniques and applications, Chapter 2. In this exercise the software package brms for R was used. You will be playing around with the options in brms to adjust the priors as opposed to using the default prior settings as we used in the BRMS Tutorial. This is part 2 of a 3 part series on how to do multilevel models.

As stated in the BRMS manual:

“Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs.”

We will set 4 types of extra priors here (in a addition to the uninformative prior we have used thus far) 1. With an estimate far off the value we found in the data with uninformative priors with a wide variance 2. With an estimate close to the value we found in the data with uninformative priors with a small variance 3. With an estimate far off the value we found in the data with uninformative priors with a small variance (1). 4. With an estimate far off the value we found in the data with uninformative priors with a small variance (2).

In this tutorial we will only focus on priors for the regression coefficients and not on the error and variance terms, since we are most likely to actually have information on the size and direction of a certain effect and less (but not completely) unlikely to have prior knowledge on the unexplained variances. You might have to play around a little bit with the controls of the brm function and specifically the adapt_delta and max_treedepth. Thankfully BRMS will tell you when to do so. We can also did not set highly informative priors for all regression coefficients at once, because this leads to the blowing up of estimates convergence issues (potentially solvebale by just running many more iterations)

STEP 1: setting up packages

The main package that is used for this analysis is brms (https://cran.r-project.org/web/packages/brms/brms.pdf) . In order to make this package function it need to call on STAN and a C++ compiler For more information and a tutorial on how to install these please have a look at: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started and https://cran.r-project.org/bin/windows/Rtools/.

“Because brms is based on Stan, a C++ compiler is required. The program Rtools (available on https://cran.r-project.org/bin/windows/Rtools/) comes with a C++ compiler for Windows. On Mac, you should use Xcode. For further instructions on how to get the compilers running, see the prerequisites section at the RStan-Getting-Started page.” ~ quoted from the BRMS package document

After you have install the aforementioned software you need to load some other R packages. If you have not yet installed all below mentioned packages, you can install them by the command install.packages(“NAMEOFPACKAGE”)

library(brms) # for the analysis
library(haven) # to load the SPSS .sav file
library(tidyverse) # needed for data manipulation.
library(RColorBrewer) # needed for some extra colours in one of the graphs
library(ggmcmc)
library(ggthemes)
library(lme4)

STEP 2: Downloading the data

To download the popularity data go to https://multilevel-analysis.sites.uu.nl/datasets/ and follow the links to https://github.com/MultiLevelAnalysis/Datasets-third-edition-Multilevel-book/blob/master/chapter%202/popularity/SPSS/popular2.sav. We will use the .sav file which can be found in the SPSS folder.

Alternatively, you can directly download them from GitHub into your R work space using the following command:

popular2data<- read_sav(file ="https://github.com/MultiLevelAnalysis/Datasets-third-edition-Multilevel-book/blob/master/chapter%202/popularity/SPSS/popular2.sav?raw=true")

There are some variables in the dataset that we do not use, so we can select the variables we will use and have a look at the first few observations.

popular2data<-select(popular2data, pupil, class, extrav, sex, texp, popular) # we select just the variables we will use
head(popular2data) # we have a look at the first 6 observations
## # A tibble: 6 x 6
##   pupil class extrav sex        texp popular
##   <dbl> <dbl>  <dbl> <dbl+lbl> <dbl>   <dbl>
## 1    1.    1.     5. 1           24.    6.30
## 2    2.    1.     7. 0           24.    4.90
## 3    3.    1.     4. 1           24.    5.30
## 4    4.    1.     3. 1           24.    4.70
## 5    5.    1.     5. 1           24.    6.00
## 6    6.    1.     4. 0           24.    4.70

Step 3: The Effect of Priors

With the get_prior() command we can see which priors we can specify for this model.

get_prior(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data)
##                  prior     class        coef group resp dpar nlpar bound
## 1                              b                                        
## 2                              b      extrav                            
## 3                              b extrav:texp                            
## 4                              b         sex                            
## 5                              b        texp                            
## 6               lkj(1)       cor                                        
## 7                            cor             class                      
## 8  student_t(3, 5, 10) Intercept                                        
## 9  student_t(3, 0, 10)        sd                                        
## 10                            sd             class                      
## 11                            sd      extrav class                      
## 12                            sd   Intercept class                      
## 13 student_t(3, 0, 10)     sigma

For the first model with priors we just set normal priors for all regression coefficients, in reality many, many more prior distributions are possible, see the BRMS manual for an overview: https://cran.r-project.org/web/packages/brms/brms.pdf

prior1 <- c(set_prior("normal(-10,100)", class = "b", coef= "extrav"),
            set_prior("normal(10,100)", class = "b", coef= "extrav:texp"),
            set_prior("normal(-5,100)", class = "b", coef= "sex"),
            set_prior("normal(-5,100)", class = "b", coef= "texp"),
            set_prior("normal(10,100)", class = "Intercept"))
model6<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data, warmup = 1000, iter =3000, chains = 2, inits= "random", control = list(adapt_delta = 0.96), prior=prior1,  cores= 2, sample_prior = TRUE) # the cores function tells STAN to make use of 3 CPU cores simultaneously instead of just 1.

To see which priors were inserted, use the prior_summary() command

prior_summary(model6)
##                   prior     class        coef group resp dpar nlpar bound
## 1                               b                                        
## 2       normal(-10,100)         b      extrav                            
## 3        normal(10,100)         b extrav:texp                            
## 4        normal(-5,100)         b         sex                            
## 5        normal(-5,100)         b        texp                            
## 6        normal(10,100) Intercept                                        
## 7  lkj_corr_cholesky(1)         L                                        
## 8                               L             class                      
## 9   student_t(3, 0, 10)        sd                                        
## 10                             sd             class                      
## 11                             sd      extrav class                      
## 12                             sd   Intercept class                      
## 13  student_t(3, 0, 10)     sigma
summary(model6)
## Warning: There were 129 divergent transitions after warmup. Increasing adapt_delta above 0.96 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 2 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.62      0.11     0.45     0.86        345 1.01
## sd(extrav)                0.05      0.03     0.00     0.11         60 1.02
## cor(Intercept,extrav)    -0.40      0.41    -0.90     0.73        565 1.01
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -1.21      0.26    -1.72    -0.70       1855 1.00
## sex             1.24      0.03     1.17     1.31       4000 1.00
## extrav          0.80      0.04     0.73     0.88       2321 1.00
## texp            0.23      0.02     0.19     0.26       1985 1.00
## extrav:texp    -0.02      0.00    -0.03    -0.02       2169 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.75      0.01     0.72     0.77       4000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

 

 

After this model with uninformative priors, it’s time to do the analysis with informative priors. Three models with different priors are tested and compared to investigate the influence of the construction of priors on the posterior distributions and therefore on the results in general.

prior2 <- c(set_prior("normal(.8,.1)", class = "b", coef= "extrav"),
            set_prior("normal(-.025,.1)", class = "b", coef= "extrav:texp"),
            set_prior("normal(1.25,.1)", class = "b", coef= "sex"),
            set_prior("normal(.23,.1)", class = "b", coef= "texp"),
            set_prior("normal(-1.21,.1)", class = "Intercept"))

model7<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data, warmup = 1000, iter =3000, chains = 2, inits= "random", control = list(adapt_delta = 0.96, max_treedepth=20), prior=prior2,  cores= 2) 
summary(model7)
## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.96 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 2 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             6.28      0.58     5.23     7.56        158 1.01
## sd(extrav)                0.08      0.05     0.00     0.18        367 1.00
## cor(Intercept,extrav)    -0.04      0.56    -0.94     0.92       2737 1.00
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -7.32      1.02    -9.43    -5.37         87 1.02
## sex             1.23      0.03     1.16     1.30       4000 1.00
## extrav          0.81      0.06     0.68     0.93       2388 1.00
## texp            0.23      0.07     0.09     0.37         85 1.03
## extrav:texp    -0.02      0.00    -0.03    -0.02       4000 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.74      0.01     0.72     0.77       4000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
prior3 <- c(set_prior("normal(-1,.1)", class = "b", coef= "extrav"),
            set_prior("normal(3, 1)", class = "b", coef= "extrav:texp"),
            set_prior("normal(-3,1)", class = "b", coef= "sex"),
            set_prior("normal(-3,1)", class = "b", coef= "texp"),
            set_prior("normal(0,5)", class = "Intercept"))

model8<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data, warmup = 1000, iter =3000, chains = 2, inits= "random", control = list(adapt_delta = 0.96, max_treedepth=20), prior=prior3,  cores= 2, sample_prior = TRUE) 
summary(model8)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 2 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             2.25      0.41     1.50     3.06        278 1.00
## sd(extrav)                0.42      0.08     0.27     0.57        265 1.00
## cor(Intercept,extrav)    -0.97      0.02    -0.99    -0.93        316 1.00
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept       3.81      0.88     2.06     5.53        322 1.00
## sex             1.25      0.04     1.18     1.33       4000 1.00
## extrav         -0.14      0.16    -0.43     0.19        317 1.00
## texp           -0.06      0.05    -0.17     0.04        347 1.00
## extrav:texp     0.03      0.01     0.01     0.05        334 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.74      0.01     0.72     0.77       4000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
prior4 <- c(set_prior("normal(3,.1)", class = "b", coef= "extrav"),
            set_prior("normal(-3,1)", class = "b", coef= "extrav:texp"),
            set_prior("normal(3,1)", class = "b", coef= "sex"),
            set_prior("normal(3,1)", class = "b", coef= "texp"),
            set_prior("normal(0,5)", class = "Intercept"))

model9<-brm(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data,  warmup = 1000, iter =3000, chains = 2, inits= "random", control = list(adapt_delta = 0.95, max_treedepth=20), prior=prior4, sample_prior = TRUE,  cores= 2)  # with these informed priors, far off the actual data, the model takes longer to converge
summary(model9)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: popular ~ 1 + sex + extrav + texp + extrav:texp + (1 + extrav | class) 
##    Data: popular2data (Number of observations: 2000) 
## Samples: 2 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~class (Number of levels: 100) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             3.67      0.45     2.82     4.61        247 1.00
## sd(extrav)                0.69      0.07     0.55     0.84        399 1.00
## cor(Intercept,extrav)    -0.99      0.00    -0.99    -0.98        325 1.00
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept      -9.83      0.84   -11.48    -8.18        333 1.00
## sex             1.25      0.04     1.18     1.32       4000 1.00
## extrav          2.44      0.12     2.21     2.66        821 1.00
## texp            0.72      0.05     0.62     0.83        320 1.00
## extrav:texp    -0.12      0.01    -0.14    -0.10        584 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.74      0.01     0.72     0.77       4000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

 

Comparing the last three models we see that for the first two models the prior specification does not really have a large influence on the results. However, for the final model with the highly informative priors that are far from the observed data, the priors do influence the posterior results. Because of the fairly large dataset, the priors are unlikely to have a large influence unless they are highly informative. Because we asked to save the prior in the last model (“sample_prior = TRUE”), we can now plot the difference between the prior and the posterior distribution of different parameters. In all cases we see that the prior has a large influence on the posterior compared to the posterior estimates we arrived in earlier models.

plot(hypothesis(model8, "texp > 0")) # if you would just run this command without the plot wrapper, you would get the support for the hypothesis that the regression coefficient texp is larger than 0, this is in interesting way to test possible hypothesis you had.

plot(hypothesis(model8, "sex = 0"))

plot(hypothesis(model8, "extrav >0"))

plot(hypothesis(model8, "extrav:texp>0"))

posterior1<-posterior_samples(model6, pars = "b_extrav")[, c(1,3)]
posterior2<-posterior_samples(model8, pars = "b_extrav")[, c(1,3)]
posterior3<-posterior_samples(model9, pars = "b_extrav")[, c(1,3)]


posterior1.2.3<-bind_rows("prior 1"= gather(posterior1), "prior 2"= gather(posterior2), "prior 3"= gather(posterior3),.id= "id")
modelLME<-lmer(popular~1 + sex + extrav + texp+ extrav:texp+(1+extrav|class), data=popular2data)


ggplot(data=posterior1.2.3, mapping = aes(x=value, fill =  id, colour=key, linetype= key, alpha=key)) +
  geom_density( size=1.2)+
  geom_vline(xintercept =summary(modelLME)$coefficients["extrav", "Estimate"], size=1, linetype=2, col="black")+
  scale_x_continuous(limits=c(-1.5, 3))+
  coord_cartesian(ylim = c(0, 5))+
  scale_fill_manual(name="Densities", values = c("Yellow","darkred","blue" ), 
                    labels=c("uniformative ~ N(-10,100) prior","informative ~ N(-1,.1) prior","informative ~ N(3,.1) prior") )+
  scale_colour_manual(name='Posterior/Prior', values = c("black","red"), labels= c("posterior", "prior"))+
  scale_linetype_manual(name='Posterior/Prior', values = c("solid","dotted"), labels= c("posterior", "prior"))+
  scale_alpha_discrete(name='Posterior/Prior',range = c(.7,.3), labels= c("posterior", "prior"))+
  annotate("text", x=0.45, y=-.13, label= "LME estimate:  0.804", col="black", family= theme_get()$text[["family"]], 
           size= theme_get()$text[["size"]]/3.5, 
           fontface="italic")+
  labs(title=expression("Influence of (Informative) Priors on" ~ gamma[Extraversion]), subtitle= "3 different densities of priors and posteriors and the LME estimate")+
  theme_tufte()
## Warning: Removed 5953 rows containing non-finite values (stat_density).