For StepMixR, please refer to this repository.
A Python package following the scikit-learn API for generalized mixture modeling. The package supports categorical data (Latent Class Analysis) and continuous data (Gaussian Mixtures/Latent Profile Analysis). StepMix can be used for both clustering and supervised learning.
Additional features include:
- Support for missing values through Full Information Maximum Likelihood (FIML);
 - Multiple stepwise Expectation-Maximization (EM) estimation methods based on pseudolikelihood theory;
 - Covariates and distal outcomes;
 - Parametric and non-parametric bootstrapping.
 
If you find StepMix useful, please leave a ⭐ and consider citing our Journal of Statistical Software paper:
@Article{,
  title = {{StepMix}: A {Python} Package for Pseudo-Likelihood
    Estimation of Generalized Mixture Models with External
    Variables},
  author = {Sacha Morin and Robin Legault and F{\'e}lix Lalibert{\'e}
    and Zsuzsa Bakk and Charles-{\'E}douard Gigu{\`e}re and Roxane
    {de la Sablonni{\`e}re} and {\'E}ric Lacourse},
  journal = {Journal of Statistical Software},
  year = {2025},
  volume = {113},
  number = {8},
  pages = {1--39},
  doi = {10.18637/jss.v113.i08},
}
You can install StepMix with pip, preferably in a virtual environment:
pip install stepmix
A StepMix mixture using categorical variables on a preloaded data matrix. StepMix accepts either numpy.arrayor
pandas.DataFrame. Categories should be integer-encoded and 0-indexed.
from stepmix.stepmix import StepMix
# Categorical StepMix Model with 3 latent classes
model = StepMix(n_components=3, measurement="categorical")
model.fit(data)
# Allow missing values
model_nan = StepMix(n_components=3, measurement="categorical_nan")
model_nan.fit(data_nan)For binary data you can also use measurement="binary" or measurement="binary_nan". For continuous data, you can fit a Gaussian Mixture with diagonal covariances using measurement="continuous" or measurement="continuous_nan".
Set verbose=1 for a detailed output.
Please refer to the StepMix tutorials to learn how to combine continuous and categorical data in the same model.
Detailed tutorials are available in notebooks:
- Generalized Mixture Models with StepMix:
an in-depth look at how mixture models can be defined with StepMix. The tutorial uses the Iris Dataset as an example
and covers:
- Gaussian Mixtures (Latent Profile Analysis);
 - Binary Mixtures (LCA);
 - Categorical Mixtures (LCA);
 - Mixed Categorical and Continuous Mixtures;
 - Missing Values through Full-Information Maximum Likelihood.
 
 - Stepwise Estimation with StepMix:
a tutorial demonstrating how to define measurement and structural models. The tutorial discusses:
- LCA models with distal outcomes;
 - LCA models with covariates;
 - 1-step, 2-step and 3-step estimation;
 - Corrections (BCH or ML) and other options for 3-step estimation;
 - Putting it All Together: A Complete Model with Missing Values
 
 - Model Selection:
- Selecting the number of components in a mixture model (
n_components) with cross-validation; - Selecting the number of components with the Parametric Bootstrapped Likelihood Ratio Test (BLRT);
 - Fit indices: AIC, BIC and other metrics.
 
 - Selecting the number of components in a mixture model (
 - Parameters, Bootstrapping and CI:
a tutorial discussing how to:
- Access StepMix parameters;
 - Bootstrap StepMix estimators;
 - Quickly plot confidence intervals.
 
 - Supervised and Semi-Supervised Learning with StepMix:
- Binary Classification;
 - Multiclass Classification;
 - Semi-Supervised Learning;
 - Cross-Validation.
 
 - Deriving p-values in StepMix: a tutorial demonstrating how to transform SM parameters into conventional regression coefficients and how to derive p-values.
The tutorial covers models with:
- Continuous covariate;
 - Binary covariate;
 - Categorical covariate;
 - Multiple covariates (different distributions);
 - Binary distal outcome;