Binary, ordinal and nominal variables are considered categorical (not continuous). It makes a big difference if these categorical variables are exogenous (independent) or endogenous (dependent) in the model.
If you have a binary exogenous covariate (say, gender), all you need to do is to recode it as a dummy (0/1) variable. Just like you would do in a classic regression model. If you have an exogenous ordinal variable, you can use a coding scheme reflecting the order (say, 1,2,3,…) and treat it as any other (numeric) covariate. If you have a nominal categorical variable with $K > 2$ levels, you need to replace it by a set of $K-1$ dummy variables, again, just like you would do in classical regression.
The lavaan 0.5 series can deal with binary and ordinal (but not nominal) endogenous variables. Only the three-stage WLS approach is currently supported, including some ‘robust’ variants. To use binary/ordinal data, you have two choices:
declare them as ‘ordered’ (using the ordered
function, which is part of
base R) in your data.frame before you run the analysis; for example, if you
need to declare four variables (say, item1
, item2
, item3
, item4
) as
ordinal in your data.frame (called Data
), you can use something like:
Data[,c("item1",
"item2",
"item3",
"item4")] <-
lapply(Data[,c("item1",
"item2",
"item3",
"item4")], ordered)
use the ordered
argument when using one of the fitting functions
(cfa/sem/growth/lavaan), for example, if you have four binary or ordinal
variables (say, item1
, item2
, item3
, item4
), you can use:
fit <- cfa(myModel, data = myData,
ordered=c("item1","item2",
"item3","item4"))
In both cases, lavaan will automatically switch to the WLSMV
estimator: it
will use diagonally weighted least squares (DWLS
) to estimate the model
parameters, but it will use the full weight matrix to compute robust standard
errors, and a mean- and variance-adjusted test stastistic.