Regression in Mplus (Bayesian)

This tutorial provides the reader with a basic tutorial how to perform a Bayesian regression in Mplus. Throughout this tutorial, the reader will be guided through importing data files, exploring summary statistics and regression analyses. Here, we will exclusively focus on Bayesian statistics.

In this tutorial, we start by using the default prior settings of the software. In a second step, we will apply user-specified priors, and if you really want to use Bayes for your own data, we recommend to follow the WAMBS-checklist, also available in other software.

Preparation

This tutorial expects:

  • Version 8 or higher of Mplus.This tutorial was made using Mplus demo version 8.
  • Basic knowledge of hypothesis testing
  • Basic knowledge of correlation and regression
  • Basic knowledge of Bayesian inference

Example Data

The data we will be using for this exercise is based on a study about predicting PhD-delays (Van de Schoot, Yerkes, Mouw and Sonneveld 2013). You can find the data here. Among many other questions, the researchers asked the Ph.D. recipients how long it took them to finish their Ph.D. thesis (n=333). It appeared that Ph.D. recipients took an average of 59.8 months (five years and four months) to complete their Ph.D. trajectory. The variable B3_difference_extra measures the difference between planned and actual project time in months (mean=9.96,sd=14.43). For more information on the sample, instruments, methodology and research context we refer the interested reader to the paper.

For the current exercise we are interested in the question whether age of the Ph.D. recipients is related to a delay in their project.

The relation between completion time and age is expected to be non-linear. This might be due to that at a certain point in your life (i.e., mid thirties), family life takes up more of your time than when you are in your twenties or when you are older.

So, in our model the gap (diff) is the dependent variable and age and \(age^2\) are the predictors. The data can be found in the file phd-delays_nonames.csv . (In Mplus the first row CANNOT have the variable names, these have already been deleted for you)

Question: Write down the null and alternative hypotheses that represent this question. Which hypothesis do you deem more likely?
Answer

\(H_0:\) \(Age^2\) is not related to a delay in the PhD projects.

\(H_1:\) \(Age\)^2$ is related to a delay in the PhD projects.

Preparation – Importing and Exploring Data

You can find the data in the file phd-delays_nonames.csv , which contains all variables that you need for this analysis. Although it is a .csv-file, you can directly load it into Mplus using the following syntax:

TITLE: Bayesian analysis with diffuse priors
DATA: FILE IS phd-delays_nonames.csv;
VARIABLE: NAMES ARE diff child sex Age Age2;
USEVARIABLES ARE diff Age Age2;
OUTPUT: sampstat;

Once you loaded in your data, it is advisable to check whether your data import worked well. Therefore, first have a look at the summary statistics of your data. you can do this by looking at the sampstat ouput.

Question: Have all your data been loaded in correctly? That is, do all data points substantively make sense? If you are unsure, go back to the .csv-file to inspect the raw data.
Answer

The descriptive statistics make sense:

diff: Mean (9.97)

\(age\) : Mean (31.68)

\(age^2\) : Mean (1050.22)

Regression – Default Priors

In this exercise you will investigate the impact of Ph.D. students’ \(age\) and \(age^2\) on the delay in their project time, which serves as the outcome variable using a regression analysis (note that we ignore assumption checking!).

As you know, Bayesian inference consists of combining a prior distribution with the likelihood obtained from the data. Specifying a prior distribution is one of the most crucial points in Bayesian inference and should be treated with your highest attention (for a quick refresher see e.g. Van de Schoot et al. 2017). In this tutorial, we will first rely on the default prior settings, thereby behaving a ‘naive’ Bayesians (which might notalways be a good idea).

To run a multiple regression with Mplus, you first specify the model (after the MODEL line using the ON command for regression coefficients, and the [] command for the intercept). As default Mplus does not run a Bayesian analysis, so you would have to change the ESTIMATOR to BAYES under ANALYSIS in the input file and then look at the output under MODEL RESULTS.

TITLE: Bayesian analysis with diffuse priors

DATA: FILE IS phd-delays_nonames.csv;

VARIABLE: NAMES ARE diff child sex Age Age2;

USEVARIABLES ARE diff Age Age2;

ANALYSIS:
ESTIMATOR IS bayes;
Bseed= 23082018; !specify a seed value for reproducing the results
CHAINS=3;

MODEL: 
[diff] (intercept);       ! specify that we want an intercept
diff ON Age (Beta_Age);   ! Regression coefficient 1
diff ON Age2(Beta_Age2);  ! Regression coefficient 2 
diff (e);                 ! Error variance


OUTPUT: tech8;
MODEL RESULTS

                                Posterior  One-Tailed         95% C.I.
                    Estimate       S.D.      P-Value   Lower 2.5%  Upper 2.5%  Significance

 DIFF       ON
    AGE                2.647       0.622      0.000       1.348       3.874      *
    AGE2              -0.026       0.007      0.000      -0.039      -0.013      *

 Intercepts
    DIFF             -46.578      12.898      0.000     -70.394     -21.095      *

 Residual Variances
    DIFF             198.117      16.978      0.000     171.316     232.769      *

 

Question: Interpret the estimated effect, its interval and the posterior distribution.
Answer

Age seems to be a relevant predictor of PhD delays, with a posterior mean regression coefficient of 2.64, 95% Credibility Interval [1.35, 3.87]. Also, \(age^2\) seems to be a relevant predictor of PhD delays, with a posterior mean of -0.026, and a 95% credibility Interval of [-0.04, -0.01]. The 95% Credibility Interval shows that there is a 95% probability that these regression coefficients in the population lie within the corresponding intervals, see also the posterior distributions in the figures below. Since 0 is not contained in the Credibility Interval we can be fairly sure there is an effect.

The results that stem from a Bayesian analysis are genuinely different from those that are provided by a frequentist model. The key difference between Bayesian statistical inference and frequentist statistical methods concerns the nature of the unknown parameters that you are trying to estimate. In the frequentist framework, a parameter of interest is assumed to be unknown, but fixed. That is, it is assumed that in the population there is only one true population parameter, for example, one true mean or one true regression coefficient. In the Bayesian view of subjective probability, all unknown parameters are treated as uncertain and therefore are be described by a probability distribution. Every parameter is unknown, and everything unknown receives a distribution._

This is why in frequentist inference, you are primarily provided with a point estimate of the unknown but fixed population parameter. This is the parameter value that, given the data, is most likely in the population. An accompanying confidence interval tries to give you further insight in the uncertainty that is attached to this estimate. It is important to realize that a confidence interval simply constitutes a simulation quantity. Over an infinite number of samples taken from the population, the procedure to construct a (95%) confidence interval will let it contain the true population value 95% of the time. This does not provide you with any information how probable it is that the population parameter lies within the confidence interval boundaries that you observe in your very specific and sole sample that you are analyzing.

In Bayesian analyses, the key to your inference is the parameter of interest’s posterior distribution. It fulfills every property of a probability distribution and quantifies how probable it is for the population parameter to lie in certain regions. On the one hand, you can characterize the posterior by its mode. This is the parameter value that, given the data and its prior probability, is most probable in the population. Alternatively, you can use the posterior’s mean or median. Using the same distribution, you can construct a 95% credibility interval, the counterpart to the confidence interval in frequentist statistics. Other than the confidence interval, the Bayesian counterpart directly quantifies the probability that the population value lies within certain limits. There is a 95% probability that the parameter value of interest lies within the boundaries of the 95% credibility interval. Unlike the confidence interval, this is not merely a simulation quantity, but a concise and intuitive probability statement. For more on how to interpret Bayesian analysis, check Van de Schoot et al. 2014.

Question: Every Bayesian model uses a prior distribution. Describe the shape of the prior distribution.
Answer

To see what default priors Mplus uses we can have a look at the Appendix A of this manual. “The default prior for intercepts, regression slopes, and loading parameters is \(\beta \sim \mathcal{N}(0, \infty)\).” This means in practice that we have a normal prior with a super wide variance, and allow probability mass across the entire parameter space. The default priors can also be found in the TECH8 output of Mplus

Regression – User-specified Priors

In Mplus, you can also manually specify your prior distributions. Be aware that usually, this has to be done BEFORE peeking at the data, otherwise you are double-dipping (!). In theory, you can specify your prior knowledge using any kind of distribution you like. However, if your prior distribution does not follow the same parametric form as your likelihood, calculating the model can be computationally intense. Conjugate priors avoid this issue, as they take on a functional form that is suitable for the model that you are constructing. For your normal linear regression model, conjugacy is reached if the priors for your regression parameters are specified using normal distributions (the residual variance receives an inverse gamma distribution, which is neglected here). In Mplus, you are quite flexible in the specification of informative priors.

Let’s re-specify the regression model of the exercise above, using conjugate priors. We leave the priors for the intercept and the residual variance untouched for the moment. Regarding your regression parameters, you need to specify the hyperparameters of their normal distribution, which are the mean and the variance. The mean indicates which parameter value you deem most likely. The variance expresses how certain you are about that. We try 4 different prior specifications (for now we only look at the priors for the regression coefficients and ignore the intercept and the error term), for both the \(\beta_{age}\) regression coefficient, and the \(\beta_{age^2}\) coefficient.

First, we use the following prior specifications:

\(Age\) ~ N(3, 0.4)

\(Age^2\) N(0, 0.1)

In Mplus, the priors are set in the under the MODEL PRIORS command using the ~ sign and an N for a normal distribution.

The priors are presented in code as follows:

MODEL: 
[diff] (intercept);       ! specify that we want an intercept
diff ON Age (Beta_Age);   ! Regression coefficient 1
diff ON Age2(Beta_Age2);  ! Regression coefficient 2 
diff (e);                 ! Error variance


MODEL PRIORS:
  Beta_Age ~ N(3, .4);
  Beta_Age2 ~ N(0, .1);

Question Fill in the results in the table below:

AgeDefault priorN(3, 0.4)N(3, 1000)N(20, 0.4)N(20, 1000)
Posterior mean2.64
Posterior sd0.62
Age squaredDefault priorN(0, 0.1)N(0, 1000)N(20, 0.1)N(20, 1000)
Posterior mean-0.026
Posterior sd0.007

To get the posterior variance instead of the sd, just take the square of the posterior standard deviation. So \(0.52^2= 0.27\)

Answer

The results can again be found in the MODEL RESULTS table

MODEL RESULTS

                                Posterior  One-Tailed         95% C.I.
                    Estimate       S.D.      P-Value   Lower 2.5%  Upper 2.5%  Significance

 DIFF       ON
    AGE                2.794       0.460      0.000       1.863       3.697      *
    AGE2              -0.027       0.005      0.000      -0.038      -0.017      *

 Intercepts
    DIFF             -50.109       9.537      0.000     -67.545     -30.948      *

 Residual Variances
    DIFF             198.178      16.947      0.000     171.160     232.840      *
AgeDefault priorN(3, 0.4)N(3, 1000)N(20, 0.4)N(20, 1000)
Posterior mean2.642.79
Posterior sd0.620.46
Age squaredDefault priorN(0, 0.1)N(0, 1000)N(20, 0.1)N(20, 1000)
Posterior mean-0.026-0.027
Posterior sd0.0070.005

Next, try to adapt the code, using the prior specifications of the other columns. Make the table complete after the different prior specifications were used.

Answer

You can change these lines of code in the Mplus input:

MODEL PRIORS:
  Beta_Age ~ N(3, 1000);
  Beta_Age2 ~ N(0, 1000);
MODEL PRIORS:
  Beta_Age ~ N(20, .4);
  Beta_Age2 ~ N(0, .1);
MODEL PRIORS:
  Beta_Age ~ N(20, 1000);
  Beta_Age2 ~ N(0, 1000);
AgeDefault priorN(3, 0.4)N(3, 1000)N(20, 0.4)N(20, 1000)
Posterior mean2.642.792.6413.782.654
Posterior sd0.620.460.620.730.62
Age squaredDefault priorN(0, 0.1)N(0, 1000)N(20, 0.1)N(20, 1000)
Posterior mean-0.026-0.027-0.026-0.14-0.026
Posterior sd0.0070.0050.0070.0080.007

Question: Compare the results over the different prior specifications. Are the results comparable with the default model?

Answer

The results change with different prior specifications, but are still comparable. Only using N(20, 0.4) for age, results in different coefficients, since this prior mean is far from the mean of the data, while its variance is quite certain. However, in general the results are comparable. Because we use a big dataset the influence of the prior is relatively small. If one would use a smaller dataset the influence of the priors are larger.

Question: Do we end up with similar conclusions, using different prior specifications?

To help answering this quesstion we can calculate the relative bias to express this difference: \(bias= 100*\frac{(model informative priors-model uninformed priors)}{model uninformative priors}\). In order to preserve clarity we now calculate the bias of the two regression coefficients and only compare the default (uninformative) model with the model that uses the N(20, 0.4) and N(20, 0.1) priors._

Answer

(\(100*\frac{(13.78 – 2.64 )}{2.64}=428\%\))

(\(100*\frac{(-0.014–0.026 )}{-0.026}=-46.15\%\))

We see that the influence of this highly informative prior is around 428% and 46.15% on the two regression coefficients respectively.

A logical next step is to define your own priors. To do so, one first needs to think about a plausible parameter space. You can follow the steps in this responsive tutorial to do so.

If you really want to use Bayes for your own data, we recommend to follow the WAMBS-checklist, which you are guided through by this exercise.


References

Benjamin, D. J., Berger, J., Johannesson, M., Nosek, B. A., Wagenmakers, E.,… Johnson, V. (2017, July 22). Redefine statistical significance. Retrieved from psyarxiv.com/mky9j

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N. Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology 31 (4). https://doi.org/10.1007/s10654-016-0149-3 _ _

van de Schoot R, Yerkes MA, Mouw JM, Sonneveld H (2013) What Took Them So Long? Explaining PhD Delays among Doctoral Candidates. PLoS ONE 8(7): e68839. https://doi.org/10.1371/journal.pone.0068839

Trafimow D, Amrhein V, Areshenkoff CN, Barrera-Causil C, Beh EJ, Bilgiç Y, Bono R, Bradley MT, Briggs WM, Cepeda-Freyre HA, Chaigneau SE, Ciocca DR, Carlos Correa J, Cousineau D, de Boer MR, Dhar SS, Dolgov I, Gómez-Benito J, Grendar M, Grice J, Guerrero-Gimenez ME, Gutiérrez A, Huedo-Medina TB, Jaffe K, Janyan A, Karimnezhad A, Korner-Nievergelt F, Kosugi K, Lachmair M, Ledesma R, Limongi R, Liuzza MT, Lombardo R, Marks M, Meinlschmidt G, Nalborczyk L, Nguyen HT, Ospina R, Perezgonzalez JD, Pfister R, Rahona JJ, Rodríguez-Medina DA, Romão X, Ruiz-Fernández S, Suarez I, Tegethoff M, Tejo M, ** van de Schoot R** , Vankov I, Velasco-Forero S, Wang T, Yamada Y, Zoppino FC, Marmolejo-Ramos F. (2017) Manipulating the alpha level cannot cure significance testing – comments on “Redefine statistical significance”_ _PeerJ reprints 5:e3411v1 https://doi.org/10.7287/peerj.preprints.3411v1