Stata: how to get started

Developed by Lion Behrens and Rens van de Schoot

This tutorial expects:

  • Basic knowledge of correlation and regression
  • An installed version of Stata on your electronic device

This tutorial provides the reader with a basic introduction to the software package Stata. The reader will be guided through the investigation of basic data relations using correlations and through the process of conducting a multiple regression analysis in Stata.

Throughout this tutorial we will use a dataset from Van de Schoot, van der Velden, Boom & Brugman (2010). Using multiple regression, we will predict adolescents’ socially desirable answering patterns (sd) from overt (overt) and covert (covert) antisocial behaviour. For more information on the sample, instruments, methodology and research context we refer the interested reader to the paper (see references). Here we will focus on data-analysis only. The data set and syntax file can be found in the subfolders tilted 'Assignment Files' and ‘Solutions’.


Exercise 1 - Correlation and Multiple regression in Stata

In this exercise you will run a regression model with sd as outcome variable and overt and covert as predictors. You can find the data in the file popular_regr_1.xlsx. You can either import the datafile using the commands

cd "C:\your working directory"
import excel "C:\your working directory\popular_regr_1.xlsx", sheet("test") firstrow

or by clicking on File -> Import -> Excel Spreadsheet using the option Import first row as variable names.

Note: In many other "How to get started" exercises you will be asked to compare the results from here with results you can obtain e.g. in R or lavaan. Make sure to save or write down the results you found in this exercise.


Exercise 1a. Obtain pairwise correlations between the variables of interest with significance levels by running the following syntax:

pwcorr sd covert overt, sig

Simply copy-paste this line into the Stata command line and press Enter.

Question: What do the significance signs and magnitudes of the correlations tell you about the relationships between variables?


Exercise 1b. Run the multiple regression model described above by running the following syntax:

regress sd covert overt

Question: What do you conclude from the regression coefficients? Include the significance and relevance (R2) of effects in your answer.



Van de Schoot, R., van der Velden, F., Boom, J. & Brugman, D. (2010). Can at Risk Young Adolescents be Popular and Antisocial? Sociometric Status Groups, AntiSocial Behavior, Gender and Ethnic Background. Journal of Adolescence, 33, 583-592.

Why don't you also get started with

                                                              MPLUS                                        SPSS                                         Lavaan                                         RJAGS                                                 JASP

How to get started                How to get started                 How to get started                   How to get started                      How to get started