Developed by Naomi Schalken and Rens van de Schoot
This tutorial provides the reader with a basic introduction to the software package lavaan (Rosseel, 2012). Lavaan is a free package that can be used within the R environment. The reader will be guided through the process of downloading lavaan, writing a code to obtain sample statistics and to conduct a regression analysis. Most of the written text consists of instructions. Questions are highlighted in grey.
Throughout this tutorial we will use a dataset from Van de Schoot, van der Velden, Boom & Brugman (2010). Using multiple regression, we will predict adolescents’ socially desirable answering patterns (sd) from overt (overt) and covert (covert) antisocial behaviour. For more information on the sample, instruments, methodology and research context we refer the interested reader to the paper (see references). Here we will focus on data-analysis only. All the solutions, final data sets and syntax files can be found in the subfolder tilted ‘solutions’.
Exercise 1 - explore data in R
Some general remarks: we will frequently ask you to “run a command”" in R. You can do so by pressing enter after you’ve typed/pasted a section of R code. You may assume the command was processed accordingly when no errors are reported and a new prompt appears.
Exercise 1a. Importing .sav data in R
Usually, you are starting with a SPSS dataset stored as a .sav file. To get to your .sav file, download popular_regr_1.xlsx, open it with SPSS and store it as you would usually do it. Then, after opening R, we start by importing our .sav file in three steps:
- Set the working directory so that R knows where to look for the .sav file. To do this, type the setwd(“”) command where you enter your working directory between quotation marks. Hint: right-clicking the SPSS file
popular_regr_1.savon your computer and asking for Properties will give you the working directory. Run the setwd() command by pressing enter. Attention: To enable R to find your file, all backslashes need to be changed into normal slashes (“/”). Your setwd(“”) command could for example look like this:
Now R knows where to find your .sav file.
- Activate the in-built foreign package by running
This opens up options that we need in step 3. The foreign package assists R in importing datafiles from SPSS, STATA, SAS, MiniTab et cetera:
Question: Which command would you use to import the SPSS-file?
- Import the .sav file with the following command:
popular <- read.spss("popular_regr_1.sav", to.data.frame = TRUE)
You can ignore the following warning:
Warning message: In read.spss(“popular_regr_1.sav”, to.data.frame = TRUE) : popular_regr_1.sav: Unrecognized record type 7, subtype 18 encountered in system file
To see if the data was imported correctly we can use the following function:
This will show the data-file for the first 6 subjects. If you run this command, you should see the following:
As you can see, some of the data is missing (NA; Not Available). If this is not the case (or in general if you need to code missing values) you can manually identify missing values by running the following R command, where -999 (or 99) is the value used in SPSS to denote missing data:
popular[popular==-999] <- NA
Finally, we need to attach the data to use the variable names of sd, overt and covert directly by using
Note that the attach command is merely a technical step; no output is expected.
Exercise 1b. Looking at descriptive results using R
Let’s explore the R environment in closer detail. A general function that’s useful for obtaining descriptives is
Run this command to obtain means and other useful info for every variable separately. Now let’s look at the data graphically, for example by means of a boxplot:
Finally, let’s consider bivariate relations by looking at the correlations like we did in SPSS using:
cor(popular[,(4:6)], use = "pairwise")
In this command we call the cor (=correlation) function and say that we want to use columns 4 through 6 of the data, and all rows. To deal with the missing data, for now, we ask for pairwise correlations.
Question: Compare your results to the results you obtained in SPSS: How to get stared. Are the results similar? If not, can you explain the differences between the R output and the SPSS output?
Note that the correlation between Covert and Overt is estimated to be -0.3335563 in R and -0.334 in SPSS. This can be explained by differences in rounding; SPSS obtains -0.334 by rounding the third decimal whereas R displays more decimals of the correlation, leaving the third decimal at ‘3’. If there are any other differences between the correlations you obtained with SPSS and lavaan it is likely that something went wrong.
Exercise 2 - Multiple regression in lavaan
Exercise 2a. Installing lavaan
To install and activate the lavaan package, run the following code in R:
This step may take several minutes. When the R prompt shows R is done installing and you’re ready to use R again.
Exercise 2b. Multiple regression in lavaan
Functions pertaining to the lavaan package are now available to us in R. For example, we can repeat the multiple regression we did in SPSS. To define the model, run the code:
model <- 'sd ~ overt + covert'
Here, we create something named “model”" that we call later on. This “model” is denoted by a bunch of characters (called a string) between single quotation marks. This string is assigned by the arrow symbol (<-). The string itself says that sd, the dependent variable, is regressed on (~) the linear combination of overt and covert. The intercept is not included in the equation as lavaan automatically accounts for the existence of an intercept in linear regression. To run the model in lavaan and store the output in an R object we can use, run the following code:
fit <- sem(model, data = popular)
Note that we use Generalized Least Squares as an estimator because this is the estimator SPSS uses for regression analyses. Feel free to experiment with other estimators. For example, try ML for Maximum Likelihood or WLS for Weighted Least Squares. To inspect the output use
summary(fit, fit.measures = TRUE)
Question: How would you interpret the results? How do they compare to the results you found in SPSS: How to get started?
Note: the regression coefficients and their standard errors are supposed to be exactly equal with GLS as estimator. If they are not, this is an indication that something went wrong.
You have now completed the first lavaan assignment. You are ready to use lavaan and all of its features, which will be explored in further detail during the lavaan course. For more information and documentation about lavaan, see http://lavaan.ugent.be/.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL
Van de Schoot, R., van der Velden, F., Boom, J. & Brugman, D. (2010). Can at Risk Young Adolescents be Popular and Antisocial? Sociometric Status Groups, AntiSocial Behavior, Gender and Ethnic Background. Journal of Adolescence, 33, 583-592. URL