Lavaan: how to get started

Developed by Naomi Schalken and Rens van de Schoot


This tutorial expects: 

  • Basic knowledge of correlation and regression
  • An installed version of R on your electronic device
  • Basic knowledge of R

⇒ If you feel like you need a refresher of your R skills, have a look at R: how to get started


This tutorial provides the reader with a basic introduction to the software package lavaan (Rosseel, 2012). Lavaan is a free package that can be used within the R environment. The reader will be guided through the process of downloading lavaan, writing a code to obtain sample statistics and to conduct a regression analysis. Most of the written text consists of instructions. Questions are highlighted in grey.

Throughout this tutorial we will use a dataset from Van de Schoot, van der Velden, Boom & Brugman (2010). Using multiple regression, we will predict adolescents’ socially desirable answering patterns (sd) from overt (overt) and covert (covert) antisocial behaviour. For more information on the sample, instruments, methodology and research context we refer the interested reader to the paper (see references). Here we will focus on data-analysis only. The data set and syntax files can be found in the subfolders tiltled 'Assignment Files' and 'Solutions'.


Exercise 1a. Installing lavaan

To install and activate the lavaan package, run the following code in R:


This step may take several minutes. When the R prompt shows R is done installing and you’re ready to use R again.


Exercise 1b. Multiple regression in lavaan

Functions pertaining to the lavaan package are now available to us in R. For example, we can repeat the multiple regression we did in SPSS: how to get started. To define the model, run the code:

model  <- 'sd ~ overt + covert'   

Here, we create something named “model”" that we call later on. This “model” is denoted by a bunch of characters (called a string) between single quotation marks. This string is assigned by the arrow symbol (<-). The string itself says that sd, the dependent variable, is regressed on (~) the linear combination of overt and covert. The intercept is not included in the equation as lavaan automatically accounts for the existence of an intercept in linear regression. To run the model in lavaan and store the output in an R object we can use, run the following code:

fit <- sem(model, data = popular)  

Note that we use Generalized Least Squares as an estimator because this is the estimator SPSS uses for regression analyses. Feel free to experiment with other estimators. For example, try ML for Maximum Likelihood or WLS for Weighted Least Squares. To inspect the output use

summary(fit, fit.measures = TRUE)   

Question: How would you interpret the results? How do they compare to the results found in SPSS: How to get started?

Note: the regression coefficients and their standard errors are supposed to be exactly equal with GLS as estimator. If they are not, this is an indication that something went wrong.

You have now completed the first lavaan assignment. You are ready to use lavaan and all of its features, which will be explored in further detail during the lavaan course. For more information and documentation about lavaan, see



Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36.

Van de Schoot, R., van der Velden, F., Boom, J. & Brugman, D. (2010). Can at Risk Young Adolescents be Popular and Antisocial? Sociometric Status Groups, AntiSocial Behavior, Gender and Ethnic Background. Journal of Adolescence, 33, 583-592.