Multiple groups

The lavaan package has full support for multiple groups. To request a multiple group analysis, you need to add the name of the group variable in your dataset to the argument group in the fitting function. By default, the same model is fitted in all groups. In the following example, we fit the H&S CFA model for the two schools (Pasteur and Grant-White).

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school")

summary(fit)

lavaan 0.7-2 ended normally after 57 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        60

  Number of observations per group:                   
    Pasteur                                        156
    Grant-White                                    145

Model Test User Model:
                                                      
  Test statistic                               115.851
  Degrees of freedom                                48
  P-value (Chi-square)                           0.000
  Test statistic for each group:
    Pasteur                                     64.309
    Grant-White                                 51.542

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured


Group 1 [Pasteur]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.394    0.122    3.220    0.001
    x3                0.570    0.140    4.076    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.183    0.102   11.613    0.000
    x6                0.875    0.077   11.421    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.125    0.277    4.057    0.000
    x9                0.922    0.225    4.104    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.479    0.106    4.531    0.000
    speed             0.185    0.077    2.397    0.017
  textual ~~                                          
    speed             0.182    0.069    2.628    0.009

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.941    0.095   52.249    0.000
   .x2                5.984    0.098   60.949    0.000
   .x3                2.487    0.093   26.778    0.000
   .x4                2.823    0.092   30.689    0.000
   .x5                3.995    0.105   38.183    0.000
   .x6                1.922    0.079   24.321    0.000
   .x7                4.432    0.087   51.181    0.000
   .x8                5.563    0.078   71.214    0.000
   .x9                5.418    0.079   68.440    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.298    0.232    1.286    0.198
   .x2                1.334    0.158    8.464    0.000
   .x3                0.989    0.136    7.271    0.000
   .x4                0.425    0.069    6.138    0.000
   .x5                0.456    0.086    5.292    0.000
   .x6                0.290    0.050    5.780    0.000
   .x7                0.820    0.125    6.580    0.000
   .x8                0.510    0.116    4.406    0.000
   .x9                0.680    0.104    6.516    0.000
    visual            1.097    0.276    3.967    0.000
    textual           0.894    0.150    5.963    0.000
    speed             0.350    0.126    2.778    0.005


Group 2 [Grant-White]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.736    0.155    4.760    0.000
    x3                0.925    0.166    5.583    0.000
  textual =~                                          
    x4                1.000                           
    x5                0.990    0.087   11.418    0.000
    x6                0.963    0.085   11.377    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.226    0.187    6.569    0.000
    x9                1.058    0.165    6.429    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.098    4.153    0.000
    speed             0.276    0.076    3.639    0.000
  textual ~~                                          
    speed             0.222    0.073    3.022    0.003

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.930    0.095   51.696    0.000
   .x2                6.200    0.092   67.416    0.000
   .x3                1.996    0.086   23.195    0.000
   .x4                3.317    0.093   35.625    0.000
   .x5                4.712    0.096   48.986    0.000
   .x6                2.469    0.094   26.277    0.000
   .x7                3.921    0.086   45.819    0.000
   .x8                5.488    0.087   63.174    0.000
   .x9                5.327    0.085   62.571    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.715    0.126    5.676    0.000
   .x2                0.899    0.123    7.339    0.000
   .x3                0.557    0.103    5.409    0.000
   .x4                0.315    0.065    4.870    0.000
   .x5                0.419    0.072    5.812    0.000
   .x6                0.406    0.069    5.880    0.000
   .x7                0.600    0.091    6.584    0.000
   .x8                0.401    0.094    4.249    0.000
   .x9                0.535    0.089    6.010    0.000
    visual            0.604    0.160    3.762    0.000
    textual           0.942    0.152    6.177    0.000
    speed             0.461    0.118    3.910    0.000

If you want to fix parameters or provide starting values, you can use the same pre-multiplication techniques, but the single argument is now replaced by a vector of arguments, one for each group. If you use a single element instead of a vector (which is not recommended), that element will be applied for all groups. If you specify a single label, this will generate a warning as this would imply equality constraints across groups. For example:

HS.model <- ' visual  =~ x1 + 0.5*x2 + c(0.6, 0.8)*x3
              textual =~ x4 + start(c(1.2, 0.6))*x5 + c(a1, a2)*x6
              speed   =~ x7 + x8 + x9 '

In the definition of the latent factor visual, we have fixed the factor loading of the indicator x3 to the value ‘0.6’ in the first group, and to the value ‘0.8’ in the second group, while the factor loading of the indicator x2 is fixed to the value ‘0.5’ in both groups. In the definition of the textual factor, two different starting values are provided for the x5 indicator; one for each group. In addition, we have labeled the factor loading of the x6 indicator as a1 in the first group, and a2 in the second group. It may be tempting to write a*x6. But using a single label in a multiple group setting has a double effect: it gives the label a to the factor loading of x6 in both groups, and as a result, those two parameters are now constrained to be equal. Because this may be unintended, lavaan will produce a warning message about this. If this is really intended, it is much better to use a vector of labels: c(a, a)*x6.

To verify the effects of our modifiers, we refit the model:

fit <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school")
 summary(fit)

lavaan 0.7-2 ended normally after 45 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        56

  Number of observations per group:                   
    Pasteur                                        156
    Grant-White                                    145

Model Test User Model:
                                                      
  Test statistic                               118.976
  Degrees of freedom                                52
  P-value (Chi-square)                           0.000
  Test statistic for each group:
    Pasteur                                     64.901
    Grant-White                                 54.075

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured


Group 1 [Pasteur]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.500                           
    x3                0.600                           
  textual =~                                          
    x4                1.000                           
    x5                1.185    0.102   11.598    0.000
    x6        (a1)    0.876    0.077   11.409    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.129    0.279    4.055    0.000
    x9                0.931    0.227    4.103    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.460    0.103    4.479    0.000
    speed             0.182    0.076    2.408    0.016
  textual ~~                                          
    speed             0.181    0.069    2.625    0.009

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.941    0.094   52.379    0.000
   .x2                5.984    0.100   59.945    0.000
   .x3                2.487    0.092   26.983    0.000
   .x4                2.823    0.092   30.689    0.000
   .x5                3.995    0.105   38.183    0.000
   .x6                1.922    0.079   24.320    0.000
   .x7                4.432    0.087   51.181    0.000
   .x8                5.563    0.078   71.214    0.000
   .x9                5.418    0.079   68.440    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.388    0.129    3.005    0.003
   .x2                1.304    0.155    8.432    0.000
   .x3                0.965    0.120    8.016    0.000
   .x4                0.427    0.069    6.153    0.000
   .x5                0.454    0.086    5.270    0.000
   .x6                0.289    0.050    5.763    0.000
   .x7                0.824    0.124    6.617    0.000
   .x8                0.510    0.116    4.417    0.000
   .x9                0.677    0.105    6.479    0.000
    visual            1.001    0.172    5.803    0.000
    textual           0.892    0.150    5.953    0.000
    speed             0.346    0.125    2.768    0.006


Group 2 [Grant-White]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.500                           
    x3                0.800                           
  textual =~                                          
    x4                1.000                           
    x5                0.990    0.087   11.425    0.000
    x6        (a2)    0.963    0.085   11.374    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.228    0.188    6.539    0.000
    x9                1.081    0.168    6.417    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.454    0.099    4.585    0.000
    speed             0.315    0.079    4.004    0.000
  textual ~~                                          
    speed             0.222    0.073    3.049    0.002

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.930    0.097   50.688    0.000
   .x2                6.200    0.089   69.616    0.000
   .x3                1.996    0.086   23.223    0.000
   .x4                3.317    0.093   35.625    0.000
   .x5                4.712    0.096   48.986    0.000
   .x6                2.469    0.094   26.277    0.000
   .x7                3.921    0.086   45.819    0.000
   .x8                5.488    0.087   63.174    0.000
   .x9                5.327    0.085   62.571    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.637    0.115    5.539    0.000
   .x2                0.966    0.120    8.076    0.000
   .x3                0.601    0.091    6.591    0.000
   .x4                0.316    0.065    4.877    0.000
   .x5                0.418    0.072    5.805    0.000
   .x6                0.407    0.069    5.887    0.000
   .x7                0.609    0.091    6.658    0.000
   .x8                0.411    0.094    4.385    0.000
   .x9                0.522    0.089    5.887    0.000
    visual            0.735    0.132    5.544    0.000
    textual           0.942    0.152    6.177    0.000
    speed             0.453    0.117    3.871    0.000

Fixing parameters in some groups, but not all

Sometimes, we wish to fix the value of a parameter in all groups, except for one particular group. In this group, we wish to freely estimate the value of that parameter. The modifier for this parameter is again a vector containing the fixed values for this parameter for each group, but we can use NA to force a parameter to be free in one (or more) group(s). Suppose for example we have four groups. We define a latent variable (say f) with three indicators. We wish to fix the factor loading of indicator item2 to 1.0 in all but the second group. We can write something like

f =~ item1 + c(1,NA,1,1)*item2 + item3

Constraining a single parameter to be equal across groups

If you want to constrain one or more parameters to be equal across groups, you need to give them the same label. For example, to constrain the factor loading of the indicator x3 to be equal across (two) groups, you can write:

HS.model <- ' visual  =~ x1 + x2 + c(v3,v3)*x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

Again, identical labels imply identical parameters, both within and across groups.

Constraining groups of parameters to be equal across groups

Although providing identical labels is a very flexible method to specify equality constraints for a few parameters, there is a more convenient way to impose equality constraints on a whole set of parameters (for example: all factor loadings, or all intercepts). We call these type of constraints group equality constraints and they can be specified by the argument group.equal in the fitting function. For example, to constrain (all) the factor loadings to be equal across groups, you can proceed as follows:

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
fit <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school",
           group.equal = c("loadings"))
summary(fit)

lavaan 0.7-2 ended normally after 42 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        60
  Number of equality constraints                     6

  Number of observations per group:                   
    Pasteur                                        156
    Grant-White                                    145

Model Test User Model:
                                                      
  Test statistic                               124.044
  Degrees of freedom                                54
  P-value (Chi-square)                           0.000
  Test statistic for each group:
    Pasteur                                     68.825
    Grant-White                                 55.219

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured


Group 1 [Pasteur]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2      (.p2.)    0.599    0.100    5.979    0.000
    x3      (.p3.)    0.784    0.108    7.267    0.000
  textual =~                                          
    x4                1.000                           
    x5      (.p5.)    1.083    0.067   16.049    0.000
    x6      (.p6.)    0.912    0.058   15.785    0.000
  speed =~                                            
    x7                1.000                           
    x8      (.p8.)    1.201    0.155    7.738    0.000
    x9      (.p9.)    1.038    0.136    7.629    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.416    0.097    4.271    0.000
    speed             0.169    0.064    2.643    0.008
  textual ~~                                          
    speed             0.176    0.061    2.882    0.004

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.941    0.093   52.991    0.000
   .x2                5.984    0.100   60.096    0.000
   .x3                2.487    0.094   26.465    0.000
   .x4                2.823    0.093   30.371    0.000
   .x5                3.995    0.101   39.714    0.000
   .x6                1.922    0.081   23.711    0.000
   .x7                4.432    0.086   51.540    0.000
   .x8                5.563    0.078   71.087    0.000
   .x9                5.418    0.079   68.153    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.551    0.137    4.010    0.000
   .x2                1.258    0.155    8.117    0.000
   .x3                0.882    0.128    6.884    0.000
   .x4                0.434    0.070    6.238    0.000
   .x5                0.508    0.082    6.229    0.000
   .x6                0.266    0.050    5.294    0.000
   .x7                0.849    0.114    7.468    0.000
   .x8                0.515    0.095    5.409    0.000
   .x9                0.658    0.096    6.865    0.000
    visual            0.805    0.171    4.714    0.000
    textual           0.913    0.137    6.651    0.000
    speed             0.305    0.078    3.920    0.000


Group 2 [Grant-White]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2      (.p2.)    0.599    0.100    5.979    0.000
    x3      (.p3.)    0.784    0.108    7.267    0.000
  textual =~                                          
    x4                1.000                           
    x5      (.p5.)    1.083    0.067   16.049    0.000
    x6      (.p6.)    0.912    0.058   15.785    0.000
  speed =~                                            
    x7                1.000                           
    x8      (.p8.)    1.201    0.155    7.738    0.000
    x9      (.p9.)    1.038    0.136    7.629    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.437    0.099    4.423    0.000
    speed             0.314    0.079    3.958    0.000
  textual ~~                                          
    speed             0.226    0.072    3.144    0.002

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.930    0.097   50.763    0.000
   .x2                6.200    0.091   68.379    0.000
   .x3                1.996    0.085   23.455    0.000
   .x4                3.317    0.092   35.950    0.000
   .x5                4.712    0.100   47.173    0.000
   .x6                2.469    0.091   27.248    0.000
   .x7                3.921    0.086   45.555    0.000
   .x8                5.488    0.087   63.257    0.000
   .x9                5.327    0.085   62.786    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.645    0.127    5.084    0.000
   .x2                0.933    0.121    7.732    0.000
   .x3                0.605    0.096    6.282    0.000
   .x4                0.329    0.062    5.279    0.000
   .x5                0.384    0.073    5.270    0.000
   .x6                0.437    0.067    6.576    0.000
   .x7                0.599    0.090    6.651    0.000
   .x8                0.406    0.089    4.541    0.000
   .x9                0.532    0.086    6.202    0.000
    visual            0.722    0.161    4.490    0.000
    textual           0.906    0.136    6.646    0.000
    speed             0.475    0.109    4.347    0.000

The .p2., .p3., .p5, … labels which appear in the output have been auto-generated to impose the equality constraints. More ‘group equality constraints’ can be added. In addition to the factor loadings, the following keywords are supported in the group.equal argument:

intercepts: the intercepts of the observed variables
means: the intercepts/means of the latent variables
residuals: the residual variances of the observed variables
residual.covariances: the residual covariances of the observed variables
lv.variances: the (residual) variances of the latent variables
lv.covariances: the (residual) covariances of the latent variables
regressions: all regression coefficients in the model

If you omit the group.equal argument, all parameters are freely estimated in each group (but the model structure is the same).

But what if you want to constrain a whole group of parameters (say all factor loadings and intercepts) across groups, except for one or two parameters that need to stay free in all groups. For this scenario, you can use the argument group.partial, containing the names of those parameters that need to remain free. For example:

fit <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school",
           group.equal = c("loadings", "intercepts"),
           group.partial = c("visual=~x2", "x7~1"))

Measurement invariance testing

Before we compare, say, the values of latent means across multiple groups, we first need to establish measurement invariance. When data is continuous, testing for measurement invariance involves a fixed sequence of model comparison tests. A typical sequence involves three models:

Model 1: configural invariance. The same factor structure is imposed on all groups.
Model 2: weak invariance. The factor loadings are constrained to be equal across groups.
Model 3: strong invariance. The factor loadings and intercepts are constrained to be equal across groups.

In lavaan, we can proceed as follows:

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

# configural invariance
fit1 <- cfa(HS.model, data = HolzingerSwineford1939, group = "school")

# weak invariance
fit2 <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
            group.equal = "loadings")

# strong invariance
fit3 <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
            group.equal = c("intercepts", "loadings"))

# model comparison tests
lavTestLRT(fit1, fit2, fit3)


Chi-Squared Difference Test

     Df    AIC    BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)    
fit1 48 7484.4 7706.8 115.85                                           
fit2 54 7480.6 7680.8 124.04      8.192 0.049272       6     0.2244    
fit3 60 7508.6 7686.6 164.10     40.059 0.194211       6  4.435e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The lavTestLRT() function can be used for model comparison tests. Because we provided three model fits, it will produce two tests: the first test compares the first model versus the second model, while the second test compares the second model versus the third model. Because the first p-value is non-significant, we may conclude that weak invariance (equal factor loadings) is supported in this dataset. However, because the second p-value is significant, strong invariance is not. Therefore, it is unwise to directly compare the values of the latent means across the two groups.