Model syntax 1

At the heart of the lavaan package is the ‘model syntax’. The model syntax is a description of the model to be estimated. In this section, we briefly explain the elements of the lavaan model syntax. More details are given in the examples that follow.

In the R environment, a regression formula has the following form:

y ~ x1 + x2 + x3 + x4

In this formula, the tilde (“~”) is the regression operator. On the left-hand side of the operator, we have the dependent variable (y), and on the right-hand side, we have the independent variables, separated by the “+” operator. In lavaan, a typical model is simply a set (or system) of regression formulas, where some variables (starting with an ‘f’ below) may be latent. For example:

 y ~ f1 + f2 + x1 + x2 
f1 ~ f2 + f3 
f2 ~ f3 + x1 + x2

If we have latent variables in any of the regression formulas, we must ‘define’ them by listing their (manifest or latent) indicators. We do this by using the special operator “=~”, which can be read as is measured by. For example, to define the three latent variables f1, f2 and f3, we can use something like:

f1 =~ y1 + y2 + y3 
f2 =~ y4 + y5 + y6 
f3 =~ y7 + y8 + y9 + y10

Furthermore, variances and covariances are specified using a ‘double tilde’ operator, for example:

y1 ~~ y1  # variance
y1 ~~ y2  # covariance
f1 ~~ f2  # covariance

And finally, intercepts for observed and latent variables are simple regression formulas with only an intercept (explicitly denoted by the number ‘1’) as the only predictor:

y1 ~ 1
f1 ~ 1

Using these four formula types, a large variety of latent variable models can be described. The current set of formula types is summarized in the table below.

formula type operator mnemonic
latent variable definition =~ is measured by
regression ~ is regressed on
(residual) (co)variance ~~ is correlated with
intercept ~ 1 intercept

A complete lavaan model syntax is simply a combination of these formula types, enclosed between single quotes. For example:

myModel <- ' # regressions
             y1 + y2 ~ f1 + f2 + x1 + x2
                  f1 ~ f2 + f3
                  f2 ~ f3 + x1 + x2

             # latent variable definitions 
               f1 =~ y1 + y2 + y3 
               f2 =~ y4 + y5 + y6 
               f3 =~ y7 + y8 + y9 + y10

             # variances and covariances 
               y1 ~~ y1 
               y1 ~~ y2 
               f1 ~~ f2

             # intercepts 
               y1 ~ 1 
               f1 ~ 1
           '

There reason why you should use single quotes is that this is the only way (in R) to allow for double quotes inside a string. See ?Quotes in R for more information.

You can type this syntax interactively at the R prompt, but it is much more convenient to type the whole model syntax first in an external text editor. And when you are done, you can copy/paste it to the R console. If you are using RStudio, open a new ‘R script’, and type your model syntax (and all other R commands needed for this session) in the source editor of RStudio. And save your script, so you can reuse it later on.

The code piece above will produce a model syntax object, called myModel that can be used later when calling a function that actually estimates this model given a dataset. Note that formulas can be split over multiple lines, and you can use comments (starting with the # character) and blank lines within the single quotes to improve the readability of the model syntax.

You may split your model syntax is multiple parts. For example:

part1 <- '   # latent variable definitions 
               f1 =~ y1 + y2 + y3 
               f2 =~ y4 + y5 + y6 
               f3 =~ y7 + y8 + y9 + y10 
         '
part2 <- '   # fix covariance between f1 and f2 to zero
               f1 ~~ 0*f2
         '

When fitting the model, you may then simply concatenate the multiple parts together as follows:

fit <- cfa(model = c(part1, part2), data = myData)