Mplus: how to get started

Developed by Naomi Schalken and Rens van de Schoot

Throughout this tutorial we will use a dataset from Van de Schoot, van der Velden, Boom & Brugman (2010). Using multiple regression, we will predict adolescents’ socially desirable answering patterns (sd) from overt (overt) and covert (covert) antisocial behaviour. For more information on the sample, instruments, methodology and research context we refer the interested reader to the paper (see references). Here, we will focus on data-analysis only. We will start by running this regression model in SPSS and move on to Mplus later on in the exercise. All the solutions, final data sets and syntax files can be found in the subfolder tilted ‘solutions’.


Preparation - Preparing Data for MPLUS

In order to use this dataset in Mplus, we need to make sure all missing values are recoded into one extreme value, for example -999, and we need to save the data in a different format, e.g. tab delimited.

To recode all user and system missing values into -999, use Transform, Recode into same variables, select all variable and put these in the Variable box, Old and New values, select System or User missing and enter the value -999, click Add, Continue, and OK.


Or use the following syntax:

RECODE respnr Dutch gender sd covert overt (sysmis=-999) (else=copy).


All missing values should now be coded -999. You can verify this by inspecting the dataset. Now, we will save the data file in a different format. Again, you may use the menus or opt for the syntax method.

When using the menus use File -> Save as, give the file a title (e.g., popular_regr_1.dat), choose tab-delimited (.dat), use as file type, uncheck the option write variable names to spreadsheet, click Save.


When using syntax, copy-paste the following commands but make sure to change the directory to your preferred folder of choice:

SAVE TRANSLATE OUTFILE='<path directory to preferred folder>'
/textoptions decimal = dot


Always inspect the saved .dat file. Note that you can open the .dat file with software such as Notepad to check whether data-preparation succeeded. Upon opening the .dat file in Notepad, the datafile should look like this:


Make a habit out of scanning the .dat file for empty cells and make sure that decimal numbers are preceded by a dot, NOT a comma. We ensured this would be the case by using the /textoptions command in the SPSS syntax. In the menu system we cannot specify /textoptions, and you must use the Replace command in Notepad to change all decimal comma’s into dots. Here, we see that everything went well. A tab is used for separation of each value for every subject. Another option is to use comma separation, in which case the data would look like this, but we do not recommend to use this option.


Preparation - Installing MPLUS


A free demonstration version of Mplus may be obtained from

Make sure to follow the instructions that pertain to your operating system.

An icon for the Mplus demo version should now appear in your start menu (Windows) or launchpad (Mac OS X). Open this demo version and go to File -> New to open a brand new syntax file.


Exercise - Multiple Regression in MPLUS

Exercise 1a. Let’s first take a look at the sample statistics to get familiar with the Mplus environment. You can copy-paste the following syntax into the new syntax file you just opened:

DATA: FILE IS popular_regr_1.dat;
NAMES ARE respnr Dutch gender sd covert overt;
USEVARIABLES ARE covert sd overt;
OUTPUT: sampstat;

Lets take a closer look at the syntax written above. In the first line, we use a DATA command and we tell Mplus what the datafile is called. In the next syntax line, we use a VARIABLE command that consists of three lines. First, we tell Mplus what the variable names are by using the NAMES ARE statement. Note that the order of the variable names has to mirror the actual order in the dataset. You can simple copy-paste the variable names from SPSS. Note that each line should end in a semicolon. You can use multiple lines in Mplus to structure your syntax. For example:



Here, Mplus will stop reading the syntax line after overt. You can use the exclamation mark to make comments. For example:

respnr !this part will not be read by Mplus


Second, we tell Mplus which variables we are actually going to use by using USEVARIABLES ARE. This way Mplus knows which columns in the .dat file to use. Finally, we tell Mplus that for all variables missing values are coded by the number -999 using MISSING ARE ALL. In the last command line we use an output command and ask for some descriptive statistics by requesting sampstat, which is short for sample statistics.


Before you can run this syntax, you need to save it as an .inp (input) file. Make sure to always save your input files in the same (sub)folder as your tab delimited dataset. Now you can run the syntax by pressing the blue ‘run’ button or pressing Alt+R. Mplus will generate an output file, called .out, and this output file will automatically be added and saved to the working folder where the input file was also saved.

In the output file, two warnings will appear:


WARNING in MODEL command
All variables are uncorrelated with all other variables in the model.Check that this is what is intended.
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 145



The first warning occurs because we did not actually specify a model in this syntax and we can ignore this warning for now. The second warning refers to the fact that several participants had missing values on all variables specified in the USEVARIABLES command. These participants will not be used in determining the sample statistics.

Not all other output is relevant to us. For now, we only want to inspect the sample statistics. If you scroll down in the output file, you will find the Sample Statistics with the estimated means for the three variables we selected in the USEVARIABLES statement. We also see the covariance and correlation matrix of the three variables.

Question: Compare your results to the results you obtained in the tutorial SPSS: How to get started. Are the results similar? If not, can you explain differences between the Mplus output and the SPSS output?


Exercise 1b. In the previous exercise we only looked at sample statistics. Now, we are going to run the regression analysis in Mplus by adding a model statement to the syntax. Open a new syntax file in Mplus and enter the following syntax commands:

DATA: FILE IS popular_regr_1.dat;
NAMES ARE respnr Dutch gender sd covert overt;
USEVARIABLES ARE sd covert overt;
MODEL: sd ON covert overt;
OUTPUT: sampstat; stand;


We specified a MODEL where an outcome variable (sd) is being regressed ON two predictors (covert and overt), and we asked for standardized results by requesting stand in the output. Dependent or Y variables always appear on the left hand side of the ON statement and independent or X variables always appear on the right hand side of the ON statement.

Again, save the input file in the same folder as the .dat file and run the syntax. We can expect three warnings, all concerning the missing values.


Looking at the output, we can ignore the model fit information since this is a saturated model. The model results and standardized model results are most relevant to this exercise (see screenshot).

Question: How would you interpret these results? How do they compare to the results you found in the tutorial SPSS: How to get started?




Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36.

Van de Schoot, R., van der Velden, F., Boom, J. & Brugman, D. (2010). Can at Risk Young Adolescents be Popular and Antisocial? Sociometric Status Groups, AntiSocial Behavior, Gender and Ethnic Background. Journal of Adolescence, 33, 583-592.


Why don't you also get started with:

Lavaan, Blavaan, SPSS, RJAGS and JASP