If the data is clustered, one way to handle the clustering is to use a multilevel modeling approach. In the SEM framework, this leads to multilevel SEM. The multilevel capabilities of lavaan are still limited, but you can fit a two-level SEM with random intercepts (note: only when all data is continuous and complete; listwise deletion is currently used for cases with missing values).
To fit a two-level SEM, you must specify a model for both levels, as follows:
model <- ' level: 1 fw =~ y1 + y2 + y3 fw ~ x1 + x2 + x3 level: 2 fb =~ y1 + y2 + y3 fb ~ w1 + w2 '
This model syntax contains two blocks, one for level 1, and one for level 2.
Within each block, you can specify a model just like in the single-level case.
To fit this model, using a toy dataset
Demo.twolevel that is part of
the lavaan package, you need to add the
cluster= argument to the sem/lavaan
fit <- sem(model = model, data = Demo.twolevel, cluster = "cluster")
The output looks similar to a multigroup SEM output, but where the two groups are now the within and the between level respectively.
lavaan 0.6-8 ended normally after 36 iterations Estimator ML Optimization method NLMINB Number of model parameters 20 Number of observations 2500 Number of clusters [cluster] 200 Model Test User Model: Test statistic 8.092 Degrees of freedom 10 P-value (Chi-square) 0.620 Parameter Estimates: Standard errors Standard Information Observed Observed information based on Hessian Level 1 [within]: Latent Variables: Estimate Std.Err z-value P(>|z|) fw =~ y1 1.000 y2 0.774 0.034 22.671 0.000 y3 0.734 0.033 22.355 0.000 Regressions: Estimate Std.Err z-value P(>|z|) fw ~ x1 0.510 0.023 22.037 0.000 x2 0.407 0.022 18.273 0.000 x3 0.205 0.021 9.740 0.000 Intercepts: Estimate Std.Err z-value P(>|z|) .y1 0.000 .y2 0.000 .y3 0.000 .fw 0.000 Variances: Estimate Std.Err z-value P(>|z|) .y1 0.986 0.046 21.591 0.000 .y2 1.066 0.039 27.271 0.000 .y3 1.011 0.037 27.662 0.000 .fw 0.546 0.040 13.539 0.000 Level 2 [cluster]: Latent Variables: Estimate Std.Err z-value P(>|z|) fb =~ y1 1.000 y2 0.717 0.052 13.824 0.000 y3 0.587 0.048 12.329 0.000 Regressions: Estimate Std.Err z-value P(>|z|) fb ~ w1 0.165 0.079 2.093 0.036 w2 0.131 0.076 1.715 0.086 Intercepts: Estimate Std.Err z-value P(>|z|) .y1 0.024 0.075 0.327 0.743 .y2 -0.016 0.060 -0.269 0.788 .y3 -0.042 0.054 -0.777 0.437 .fb 0.000 Variances: Estimate Std.Err z-value P(>|z|) .y1 0.058 0.047 1.213 0.225 .y2 0.120 0.031 3.825 0.000 .y3 0.149 0.028 5.319 0.000 .fb 0.899 0.118 7.592 0.000
After fitting the model, you can inspect the intra-class correlations:
y1 y2 y3 x1 x2 x3 0.331 0.263 0.232 0.000 0.000 0.000
The see the unrestricted (h1) within and between means and covariances, you can use
$within $within$cov y1 y2 y3 x1 x2 x3 y1 2.000 y2 0.789 1.674 y3 0.749 0.564 1.557 x1 0.489 0.393 0.376 0.982 x2 0.416 0.322 0.299 0.001 1.011 x3 0.221 0.160 0.155 -0.006 0.008 1.045 $within$mean y1 y2 y3 x1 x2 x3 0.001 -0.002 -0.001 -0.007 -0.003 0.020 $cluster $cluster$cov y1 y2 y3 w1 w2 y1 0.992 y2 0.668 0.598 y3 0.548 0.391 0.469 w1 0.125 0.119 0.036 0.870 w2 0.086 0.057 0.130 -0.128 0.931 $cluster$mean y1 y2 y3 w1 w2 0.019 -0.017 -0.043 0.052 -0.091
note that in
level: 1 the colon follows the
level keyword; if you
level 1:, you will get an error
you must specify a model for each level; the following syntax is not allowed and will produce an error:
model <- ' level: 1 fw =~ y1 + y2 + y3 fw ~ x1 + x2 + x3 level: 2 '
if you do not have a model in mind for level 2, you can specify a saturated level by adding all variances and covariances of the endogenous variables (here: y1, y2 and y3):
model <- ' level: 1 fw =~ y1 + y2 + y3 fw ~ x1 + x2 + x3 level: 2 y1 ~~ y1 + y2 + y3 y2 ~~ y2 + y3 y3 ~~ y3 '
By default, the current version of lavaan (0.6) uses a quasi-Newton procedure to maximize the loglikelihood of the data given the model (just like in the single-level case). For most model and data combinations, this will work fine (and fast). However, every now and then, you may experience convergence issues.
Non-convergence is typically a sign that something is not quite right with either your model, or your data. Typical settings are: a small number of clusters, in combination with (almost) no variance of an endogenous variable at the between level.
However, if you believe nothing is wrong, you may want to try another optimization procedure. The current version of lavaan allows for using the Expectation Maximization (EM) algorithm as an alternative. To switch to the EM algorithm, you can use:
fit <- sem(model = model, data = Demo.twolevel, cluster = "cluster", verbose = TRUE, optim.method = "em")
As the EM algorithm is not accelerated yet, this may take a long time. It is not unusual that more than 10000 iterations are needed to reach a solution. To control when the EM algorithm stops, you can set the stopping criteria as follows:
fit <- sem(model = model, data = Demo.twolevel, cluster = "cluster", verbose = TRUE, optim.method = "em", em.iter.max = 20000, em.fx.tol = 1e-08, em.dx.tol = 1e-04)
em.fx.tol argument is used to monitor the change in loglikelihood
between the current step and the previous step. If this change is smaller
em.fx.tol, the algorithm stops. The
em.dx.tol argument is used
to monitor the (unscaled) gradient. When a solution is reached, all
elements of the gradient should be near zero. When the largest gradient
element is smaller than
em.dx.tol, the algorithm stops.
A word of caution: the EM algorithm can always be forced to ‘converge’ (perhaps after changing the stopping criteria), but that does not mean you have a model/dataset combination that deserves to converge.