API Reference

Estimators

PretrainedLasso

ptlasso.PretrainedLasso

Bases: RegressorMixin, BasePretrainedLasso

Pretrained Lasso estimator.

Two-step training: 1. Fit an overall Lasso on all samples (lambda selected by internal CV). 2. For each group, fit a group-specific Lasso with offset (1 - alpha) * eta_overall, where eta_overall is the overall linear predictor (before the link function). Features not selected by the overall model receive a stronger penalty of 1 / alpha.

Parameters:

Name	Type	Description	Default
`alpha`	`float in [0, 1]`	Pretraining strength. `0` = overall model with fine-tuning (maximum pretraining); `1` = individual per-group Lasso (no pretraining). Matches the R ptLasso convention.	`0.5`
`family`	`(gaussian, binomial, multinomial)`	Response distribution.	`"gaussian"`
`overall_lambda`	`('lambda.1se', 'lambda.min')`	Lambda selection rule for the stage-1 overall model. `"lambda.1se"` (default, matching R) gives a sparser offset; `"lambda.min"` uses the CV minimum.	`"lambda.1se"`
`fit_intercept`	`bool`	Whether to fit an intercept in every sub-model.	`True`
`lmda_path_size`	`int`	Number of lambdas in the regularisation path.	`100`
`min_ratio`	`float`	Ratio of the smallest to largest lambda on the path.	`0.0001`
`verbose`	`bool`	Whether to display fitting progress and a summary after training. Adelie's internal output is always suppressed regardless of this setting.	`True`
`standardize`	`bool`	Whether to standardize features before fitting each sub-model. Each model standardizes using only its own training data subset, matching R's `glmnet(standardize=TRUE)` per-model behaviour.	`True`
`n_folds`	`int`	Number of folds used for (a) adelie's internal lambda-selection CV and (b) the OOF predictions within each group model. Capped by the minimum per-group class size for classification families.	`10`
`n_threads`	`int`	Number of threads passed to adelie's solver. Set to a higher value to parallelise the coordinate descent within each model fit. `-1` uses all available CPU cores (`os.cpu_count()`).	`-1`

Attributes:

Name	Type	Description
`overall_model_`	`adelie state`	Fitted overall Lasso (stage 1).
`overall_coef_`	`ndarray of shape (n_features,) or (n_features, K)`	Coefficients from the overall model at the selected lambda. Shape is `(n_features, K)` for multinomial.
`overall_intercept_`	`float`	Intercept from the overall model. Not set for multinomial.
`overall_lmda_idx_`	`int`	Index into `overall_model_.lmdas` for the selected lambda.
`pretrain_models_`	`dict {group -> adelie state}`	Per-group fitted Lasso models (stage 2, with pretraining offset).
`pretrain_lmda_idx_`	`dict {group -> int}`	CV-selected lambda index for each pretrain group model (`lambda.min`).
`individual_models_`	`dict {group -> adelie state}`	Per-group fitted Lasso models without any pretraining offset.
`individual_lmda_idx_`	`dict {group -> int}`	CV-selected lambda index for each individual group model (`lambda.min`).
`groups_`	`ndarray`	Unique group labels seen during fit.
`n_features_in_`	`int`	Number of features seen during fit.
`feature_names_in_`	`ndarray of str or None`	Feature names, if provided.
`n_classes_`	`int or None`	Number of classes for multinomial; `None` otherwise.

References

Craig, E., Pilanci, M., Le Menestrel, T., Narasimhan, B., Rivas, M. A., Gullaksen, S. E., & Tibshirani, R. (2025). Pretraining and the lasso. Journal of the Royal Statistical Society Series B, qkaf050.

Source code in src/ptlasso/_estimator.py

class PretrainedLasso(RegressorMixin, BasePretrainedLasso):
    """Pretrained Lasso estimator.

    Two-step training:
    1. Fit an overall Lasso on all samples (lambda selected by internal CV).
    2. For each group, fit a group-specific Lasso with offset
       ``(1 - alpha) * eta_overall``, where ``eta_overall`` is the overall
       linear predictor (before the link function).  Features not selected
       by the overall model receive a stronger penalty of ``1 / alpha``.

    Parameters
    ----------
    alpha : float in [0, 1], default=0.5
        Pretraining strength.  ``0`` = overall model with fine-tuning
        (maximum pretraining); ``1`` = individual per-group Lasso
        (no pretraining).  Matches the R ptLasso convention.
    family : {"gaussian", "binomial", "multinomial"}, default="gaussian"
        Response distribution.
    overall_lambda : {"lambda.1se", "lambda.min"}, default="lambda.1se"
        Lambda selection rule for the stage-1 overall model.
        ``"lambda.1se"`` (default, matching R) gives a sparser offset;
        ``"lambda.min"`` uses the CV minimum.
    fit_intercept : bool, default=True
        Whether to fit an intercept in every sub-model.
    lmda_path_size : int, default=100
        Number of lambdas in the regularisation path.
    min_ratio : float, default=0.0001
        Ratio of the smallest to largest lambda on the path.
    verbose : bool, default=True
        Whether to display fitting progress and a summary after training.
        Adelie's internal output is always suppressed regardless of this setting.
    standardize : bool, default=True
        Whether to standardize features before fitting each sub-model.
        Each model standardizes using only its own training data subset,
        matching R's ``glmnet(standardize=TRUE)`` per-model behaviour.
    n_folds : int, default=10
        Number of folds used for (a) adelie's internal lambda-selection CV and
        (b) the OOF predictions within each group model.  Capped by the
        minimum per-group class size for classification families.
    n_threads : int, default=-1
        Number of threads passed to adelie's solver.  Set to a higher value
        to parallelise the coordinate descent within each model fit.
        ``-1`` uses all available CPU cores (``os.cpu_count()``).

    Attributes
    ----------
    overall_model_ : adelie state
        Fitted overall Lasso (stage 1).
    overall_coef_ : ndarray of shape (n_features,) or (n_features, K)
        Coefficients from the overall model at the selected lambda.
        Shape is ``(n_features, K)`` for multinomial.
    overall_intercept_ : float
        Intercept from the overall model.  Not set for multinomial.
    overall_lmda_idx_ : int
        Index into ``overall_model_.lmdas`` for the selected lambda.
    pretrain_models_ : dict {group -> adelie state}
        Per-group fitted Lasso models (stage 2, with pretraining offset).
    pretrain_lmda_idx_ : dict {group -> int}
        CV-selected lambda index for each pretrain group model (``lambda.min``).
    individual_models_ : dict {group -> adelie state}
        Per-group fitted Lasso models without any pretraining offset.
    individual_lmda_idx_ : dict {group -> int}
        CV-selected lambda index for each individual group model (``lambda.min``).
    groups_ : ndarray
        Unique group labels seen during fit.
    n_features_in_ : int
        Number of features seen during fit.
    feature_names_in_ : ndarray of str or None
        Feature names, if provided.
    n_classes_ : int or None
        Number of classes for multinomial; ``None`` otherwise.

    References
    ----------
    Craig, E., Pilanci, M., Le Menestrel, T., Narasimhan, B., Rivas, M. A.,
    Gullaksen, S. E., & Tibshirani, R. (2025). Pretraining and the lasso.
    *Journal of the Royal Statistical Society Series B*, qkaf050.
    """

    def __init__(
        self,
        alpha=0.5,
        family="gaussian",
        overall_lambda="lambda.1se",
        fit_intercept=True,
        lmda_path_size=100,
        min_ratio=0.0001,
        verbose=True,
        n_threads=-1,
        standardize=True,
        n_folds=10,
        type_measure="deviance",
        foldid=None,
    ):
        self.alpha = alpha
        self.family = family
        self.overall_lambda = overall_lambda
        self.fit_intercept = fit_intercept
        self.lmda_path_size = lmda_path_size
        self.min_ratio = min_ratio
        self.verbose = verbose
        self.n_threads = n_threads
        self.standardize = standardize
        self.n_folds = n_folds
        self.type_measure = type_measure
        self.foldid = foldid
        # fitted attributes (set by fit())
        self.groups_: np.ndarray
        self.n_classes_: Optional[int]
        self._show_overall_progress: bool

    # ------------------------------------------------------------------
    # Internal helpers
    # ------------------------------------------------------------------

    def _validate_params(self):
        if not isinstance(self.alpha, (int, float)):
            raise TypeError(f"alpha must be a number, got {type(self.alpha).__name__}")
        if not (0.0 <= self.alpha <= 1.0):
            raise ValueError(f"alpha must be in [0, 1], got {self.alpha}")
        if self.family not in FAMILIES:
            raise ValueError(f"family must be one of {FAMILIES}, got '{self.family}'")
        if self.overall_lambda not in LMDA_MODES:
            raise ValueError(
                f"overall_lambda must be one of {LMDA_MODES}, got '{self.overall_lambda}'"
            )
        if not isinstance(self.lmda_path_size, int) or self.lmda_path_size < 1:
            raise ValueError(
                f"lmda_path_size must be a positive integer, got {self.lmda_path_size}"
            )
        if not isinstance(self.min_ratio, (int, float)):
            raise TypeError(f"min_ratio must be a number, got {type(self.min_ratio).__name__}")
        if not (0.0 < self.min_ratio < 1.0):
            raise ValueError(f"min_ratio must be in (0, 1), got {self.min_ratio}")

    def _label(self, g):
        return self.group_labels_.get(g, g)

    def _names_or_indices(self, indices):
        if self.feature_names_in_ is not None:
            return self.feature_names_in_[indices]
        return indices

    def _make_onehot(self, groups):
        """Build (n, k-1) group indicator matrix.

        The first group in ``self.groups_`` is the reference category (all
        zeros), matching R's ``model.matrix(~groups - 1)[, 2:k]`` convention.
        These columns are prepended to X in the overall model with zero
        penalty so they act as free group-specific intercepts (R's
        ``group.intercepts = TRUE``).
        """
        k = len(self.groups_)
        n = len(groups)
        if k <= 1:
            return np.empty((n, 0), dtype=np.float64, order="F")
        onehot = np.zeros((n, k - 1), dtype=np.float64, order="F")
        for col, g in enumerate(self.groups_[1:]):  # groups_[0] is reference
            onehot[groups == g, col] = 1.0
        return onehot

    def _grpnet_kwargs(self):
        n_threads = os.cpu_count() if self.n_threads == -1 else self.n_threads
        return dict(
            alpha=1,  # pure lasso (no group penalty mixing)
            intercept=self.fit_intercept,
            lmda_path_size=self.lmda_path_size,
            min_ratio=self.min_ratio,
            progress_bar=False,  # replaced by ptlasso's own tqdm output
            n_threads=n_threads,
            adev_tol=1.0,  # match glmnet: don't terminate the lambda path early
        )

    def _cv_grpnet_kwargs(self):
        return {**self._grpnet_kwargs(), "n_folds": self.n_folds}

    def _wrap_matrix(self, X):
        # adelie incurs ~20x slowdown when matrix and solver use different
        # n_threads (OpenMP switching cost). Wrap X so both use the same value.
        n_threads = os.cpu_count() if self.n_threads == -1 else self.n_threads
        return ad.matrix.dense(X, method="naive", n_threads=n_threads)

    def _overall_eta(self, X, groups):
        """Overall linear predictor at the selected lambda.

        The overall model was trained on ``[onehot | X]``, so we reconstruct
        the augmented matrix before computing the linear predictor.  This
        matches R's predict behaviour where group-intercept columns are always
        included.
        """
        onehot = self._make_onehot(groups)
        X_aug = (
            np.asfortranarray(np.hstack([onehot, X]))
            if onehot.shape[1] > 0
            else np.asfortranarray(X)
        )
        return _eta_from_state(
            self.overall_model_, X_aug, self.overall_lmda_idx_, self.family, self.n_classes_
        )

    def _unified_cv_oof(self, X_raw_aug, y, pf, n_onehot, groups):
        """Unified CV matching R's cv.glmnet(keep=TRUE).

        Uses **one** StratifiedKFold split for both lambda selection and OOF
        predictions — the same fold assignment drives both, so the prevalidated
        offset passed to stage-2 is consistent with the selected lambda.

        Per-fold standardization is applied to the X columns
        (``X_raw_aug[:, n_onehot:]``), matching R's ``glmnet(standardize=TRUE)``
        where each fold model standardizes only its own training subset.
        The full-data refit uses global standardization via ``self.scaler_``.

        Parameters
        ----------
        X_raw_aug : (n, n_onehot+p) ndarray
            Augmented matrix ``[onehot | X_raw]`` with un-standardized X.
        y : (n,) ndarray
        pf : (n_onehot+p,) ndarray
            Penalty factors (0 for onehot columns, 1 for X columns).
        n_onehot : int
            Number of leading one-hot group-indicator columns.
        groups : (n,) ndarray
            Group labels used for StratifiedKFold stratification.

        Returns
        -------
        full_state : adelie grpnet state
            Full-data model trained on globally standardized X.
        selected_lmda_idx : int
            CV-selected lambda index into ``full_state.lmdas``.
        oof_eta : (n,) or (n, K) ndarray
            OOF linear predictors at the selected lambda.
        """
        n = X_raw_aug.shape[0]
        n_folds = self._n_folds_oof_

        # Full-data fit on globally standardized X — defines the lambda path.
        X_x_global = self.scaler_.transform(X_raw_aug[:, n_onehot:])
        if n_onehot > 0:
            X_aug_full = np.asfortranarray(np.hstack([X_raw_aug[:, :n_onehot], X_x_global]))
        else:
            X_aug_full = np.asfortranarray(X_x_global)

        glm_all = _make_glm(self.family, y)
        with _silence():
            full_state = ad.grpnet(
                self._wrap_matrix(X_aug_full), glm_all, penalty=pf, **self._grpnet_kwargs()
            )
        lmda_path = np.asarray(full_state.lmdas)
        L = len(lmda_path)

        # One unified fold split — same folds for both lambda selection and OOF.
        if self.foldid is not None:
            foldid_arr = np.asarray(self.foldid)
            n_folds = int(foldid_arr.max()) + 1
            folds = [
                (np.where(foldid_arr != f)[0], np.where(foldid_arr == f)[0]) for f in range(n_folds)
            ]
        else:
            splitter = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=42)
            folds = list(splitter.split(X_raw_aug, groups))

        if self.family == "multinomial":
            oof_eta_all = np.zeros((n, self.n_classes_, L))
        else:
            oof_eta_all = np.zeros((n, L))
        fold_losses = np.zeros((n_folds, L))

        for fold_i, (train_idx, test_idx) in enumerate(folds):
            # Per-fold standardization of X columns only (not onehot indicators),
            # matching R's glmnet(standardize=TRUE) per-fold behaviour.
            fold_scaler = StandardScaler(with_mean=True, with_std=self.standardize)
            X_x_tr = fold_scaler.fit_transform(X_raw_aug[train_idx, n_onehot:])
            X_x_te = fold_scaler.transform(X_raw_aug[test_idx, n_onehot:])
            if n_onehot > 0:
                X_tr = np.asfortranarray(np.hstack([X_raw_aug[train_idx, :n_onehot], X_x_tr]))
                X_te = np.asfortranarray(np.hstack([X_raw_aug[test_idx, :n_onehot], X_x_te]))
            else:
                X_tr = np.asfortranarray(X_x_tr)
                X_te = np.asfortranarray(X_x_te)

            glm_fold = _make_glm(self.family, y[train_idx])
            with _silence():
                fold_state = ad.grpnet(
                    self._wrap_matrix(X_tr),
                    glm_fold,
                    penalty=pf,
                    lmda_path=lmda_path,
                    **self._grpnet_kwargs(),
                )

            for l_idx in range(L):
                eta_te = _eta_from_state(fold_state, X_te, l_idx, self.family, self.n_classes_)
                pred_te = _apply_link(eta_te, self.family)
                if self.type_measure == "auc" and self.family == "binomial":
                    try:
                        fold_losses[fold_i, l_idx] = -roc_auc_score(y[test_idx], pred_te)
                    except ValueError:  # only one class in this fold
                        fold_losses[fold_i, l_idx] = _fold_loss(y[test_idx], pred_te, self.family)
                else:
                    fold_losses[fold_i, l_idx] = _fold_loss(y[test_idx], pred_te, self.family)
                if self.family == "multinomial":
                    oof_eta_all[test_idx, :, l_idx] = eta_te
                else:
                    oof_eta_all[test_idx, l_idx] = eta_te

        # Lambda selection from unified fold losses.
        avg_losses = fold_losses.mean(axis=0)
        se = fold_losses.std(axis=0, ddof=1) / np.sqrt(n_folds)
        best_local = int(np.argmin(avg_losses))
        threshold = avg_losses[best_local] + se[best_local]
        candidates = np.where(avg_losses <= threshold)[0]
        lse_local = int(candidates[0]) if len(candidates) else best_local
        selected_local = best_local if self.overall_lambda == "lambda.min" else lse_local

        # OOF eta at the selected lambda (same fold split as lambda selection).
        if self.family == "multinomial":
            oof_eta = oof_eta_all[:, :, selected_local]
        else:
            oof_eta = oof_eta_all[:, selected_local]

        # Map selected lambda back to the full-data model path (nearest value).
        selected_lmda = lmda_path[selected_local]
        full_lmda_idx = int(np.argmin(np.abs(np.asarray(full_state.lmdas) - selected_lmda)))

        return full_state, full_lmda_idx, oof_eta

    def _group_cv(self, X_g, X_raw_g, y_g, pf, offset=None, foldid_g=None):
        """CV lambda selection + OOF predictions in one fold pass (mirrors cv.glmnet keep=TRUE).

        Fold assignments are generated once and shared between lambda selection
        and OOF computation, eliminating the variance from two independent random
        draws that would arise if the two steps used separate fold splits.

        Parameters
        ----------
        X_g : (n_g, p) ndarray
            Group feature matrix, already standardised by group_scalers_[g].
            Used for the full-data model only.
        X_raw_g : (n_g, p) ndarray
            Raw (unstandardized) group feature matrix.  Each fold applies its
            own StandardScaler, matching R's per-fold standardization.
        y_g : (n_g,) ndarray
            Group targets.
        pf : (p,) ndarray or None
            Penalty factors.  None = uniform (individual model).
        offset : (n_g,) ndarray or None
            Training offset (pretrain model only).

        Returns
        -------
        full_state : adelie grpnet state
        best_idx : int
            Index of the selected lambda in full_state.lmdas.
        oof_eta : (n_g,) or (n_g, K) ndarray
            OOF linear predictor at best_idx, WITHOUT the offset applied.
            Caller adds ``(1-alpha)*overall_oof_eta`` to obtain the full predictor.
        """
        glm_all = _make_glm(self.family, y_g)
        kw_full = self._grpnet_kwargs()
        if pf is not None:
            kw_full["penalty"] = pf
        if offset is not None:
            kw_full["offsets"] = offset
        with _silence():
            full_state = ad.grpnet(self._wrap_matrix(X_g), glm_all, **kw_full)
        lmda_path = np.asarray(full_state.lmdas)
        L = len(lmda_path)

        n_g = X_raw_g.shape[0]
        oof_shape = (n_g, self.n_classes_) if self.family == "multinomial" else (n_g,)
        if L == 0:
            return full_state, 0, np.zeros(oof_shape)

        if foldid_g is not None:
            fold_ids = np.asarray(foldid_g)
            n_folds = int(fold_ids.max()) + 1
        else:
            n_folds = self._n_folds_oof_
            fold_ids = np.tile(np.arange(n_folds), n_g // n_folds + 1)[:n_g]
            np.random.shuffle(fold_ids)

        fold_losses = np.zeros((n_folds, L))
        # Store OOF across all lambdas so we can extract at the selected lambda
        # after selection — same fold pass used for both (keep=TRUE equivalent).
        if self.family == "multinomial":
            oof_eta_all = np.zeros((n_g, self.n_classes_, L))
        else:
            oof_eta_all = np.zeros((n_g, L))

        for fold_i in range(n_folds):
            train_idx = np.where(fold_ids != fold_i)[0]
            test_idx = np.where(fold_ids == fold_i)[0]

            fold_scaler = StandardScaler(with_mean=True, with_std=self.standardize)
            X_tr = np.asfortranarray(fold_scaler.fit_transform(X_raw_g[train_idx]))
            X_te = np.asfortranarray(fold_scaler.transform(X_raw_g[test_idx]))

            glm_fold = _make_glm(self.family, y_g[train_idx])
            kw_fold = dict(lmda_path=lmda_path, **self._grpnet_kwargs())
            if pf is not None:
                kw_fold["penalty"] = pf
            if offset is not None:
                kw_fold["offsets"] = np.asfortranarray(offset[train_idx])
            with _silence():
                fold_state = ad.grpnet(self._wrap_matrix(X_tr), glm_fold, **kw_fold)

            n_fitted = len(fold_state.lmdas)
            for l_idx in range(L):
                if l_idx >= n_fitted:
                    fold_losses[fold_i, l_idx] = np.inf
                    continue
                eta_te = _eta_from_state(fold_state, X_te, l_idx, self.family, self.n_classes_)
                # Store WITHOUT offset — caller combines with overall OOF offset.
                if self.family == "multinomial":
                    oof_eta_all[test_idx, :, l_idx] = eta_te
                else:
                    oof_eta_all[test_idx, l_idx] = eta_te
                # Loss computed WITH offset to match R's cv.glmnet fold evaluation.
                if offset is not None:
                    eta_te = eta_te + offset[test_idx]
                pred_te = _apply_link(eta_te, self.family)
                if self.type_measure == "auc" and self.family == "binomial":
                    try:
                        fold_losses[fold_i, l_idx] = -roc_auc_score(y_g[test_idx], pred_te)
                    except ValueError:
                        fold_losses[fold_i, l_idx] = _fold_loss(y_g[test_idx], pred_te, self.family)
                else:
                    fold_losses[fold_i, l_idx] = _fold_loss(y_g[test_idx], pred_te, self.family)

        best_idx = int(np.argmin(fold_losses.mean(axis=0)))

        if self.family == "multinomial":
            oof_eta = oof_eta_all[:, :, best_idx]
        else:
            oof_eta = oof_eta_all[:, best_idx]

        return full_state, best_idx, oof_eta

    # ------------------------------------------------------------------
    # Progress / display helpers
    # ------------------------------------------------------------------

    def _support_size(self, state, lmda_idx):
        """Number of nonzero features in a fitted state at a given lambda index."""
        c = _coef_at(state, lmda_idx)
        if self.n_classes_ is not None:
            c = c.reshape(self.n_features_in_, self.n_classes_, order="F")
            return int(np.sum(np.any(c != 0, axis=1)))
        return int(np.sum(c != 0))

    def _print_fit_summary(self, elapsed):
        SEP = "─" * 54
        if self.family == "multinomial":
            n_ov = int(np.sum(np.any(self.overall_coef_ != 0, axis=1)))
        else:
            n_ov = int(np.sum(self.overall_coef_ != 0))
        pre_parts = "   ".join(
            f"{self._label(g)}: |S|={self._support_size(self.pretrain_models_[g], self.pretrain_lmda_idx_[g])}"
            for g in self.groups_
        )
        ind_parts = "   ".join(
            f"{self._label(g)}: |S|={self._support_size(self.individual_models_[g], self.individual_lmda_idx_[g])}"
            for g in self.groups_
        )
        _logger.info(SEP)
        _logger.info(f"  {'overall':<13}|S| = {n_ov}")
        _logger.info(f"  {'pretrain':<13}{pre_parts}")
        _logger.info(f"  {'individual':<13}{ind_parts}")
        _logger.info(SEP)
        _logger.info(f"  Fitted in {elapsed:.1f}s")

    # ------------------------------------------------------------------
    # fit
    # ------------------------------------------------------------------

    def fit(self, X, y, groups, group_labels=None, feature_names=None):
        """Fit the pretrained Lasso.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.  For ``family="binomial"``, must contain only 0
            and 1.  For ``family="multinomial"``, must contain non-negative
            integer class labels 0..K-1.
        groups : array-like of shape (n_samples,)
            Group membership for each sample.  Must contain at least two
            distinct values.
        group_labels : dict or None, default=None
            Optional mapping from group values to display names used in
            ``__repr__`` and ``get_coef()``.
        feature_names : array-like of str or None, default=None
            Feature names.  Inferred from ``X.columns`` when ``X`` is a
            DataFrame and this argument is ``None``.

        Returns
        -------
        self : PretrainedLasso
            Fitted estimator.
        """
        t0 = time.time()
        self._validate_params()

        if feature_names is None and hasattr(X, "columns"):
            feature_names = list(X.columns)

        X, y = check_X_y(X, y, dtype=np.float64, order="F")
        self.n_features_in_ = X.shape[1]

        # Save raw X for per-group standardization (see group_scalers_ below).
        X_raw = X.copy()

        # Global standardization for the overall model only — matches R's
        # glmnet(standardize=TRUE) on the full dataset.
        self.scaler_ = StandardScaler(with_mean=True, with_std=self.standardize)
        X = self.scaler_.fit_transform(X)

        if feature_names is not None:
            feature_names = np.asarray(feature_names)
            if len(feature_names) != self.n_features_in_:
                raise ValueError(
                    f"feature_names has {len(feature_names)} entries "
                    f"but X has {self.n_features_in_} features"
                )
        self.feature_names_in_ = feature_names

        if self.family == "binomial":
            unique_y = np.unique(y)
            if not np.all(np.isin(unique_y, [0.0, 1.0])):
                raise ValueError(
                    f"For family='binomial', y must contain only 0 and 1, "
                    f"got unique values {unique_y}"
                )
        if self.family == "multinomial":
            y_int = np.asarray(y)
            if not (np.all(y_int == np.round(y_int)) and np.all(y_int >= 0)):
                raise ValueError(
                    "For family='multinomial', y must contain non-negative integer class labels"
                )

        groups = np.asarray(groups)
        if len(groups) != X.shape[0]:
            raise ValueError(f"groups has {len(groups)} elements but X has {X.shape[0]} samples")
        self.groups_ = np.unique(groups)
        if len(self.groups_) < 2:
            raise ValueError("groups must contain at least 2 unique values")

        # Single OOF fold count used across all models, matching R's uniform nfolds.
        _min_group = min(int(np.sum(groups == g)) for g in self.groups_)
        if self.family in ("binomial", "multinomial"):
            _min_class = min(
                int(np.bincount(np.asarray(y)[groups == g].astype(int)).min()) for g in self.groups_
            )
            self._n_folds_oof_ = max(2, min(self.n_folds, _min_group, _min_class))
        else:
            self._n_folds_oof_ = max(2, min(self.n_folds, _min_group))

        # Per-group scalers: each group's model standardizes using only that
        # group's data — matching R's glmnet(standardize=TRUE) per model call.
        self.group_scalers_ = {}
        for _g in self.groups_:
            _sg = StandardScaler(with_mean=True, with_std=self.standardize)
            _sg.fit(X_raw[groups == _g])
            self.group_scalers_[_g] = _sg

        if group_labels is not None:
            if not isinstance(group_labels, dict):
                raise TypeError(f"group_labels must be a dict, got {type(group_labels).__name__}")
            unknown = set(group_labels) - set(self.groups_)
            if unknown:
                raise ValueError(f"group_labels contains unknown group keys: {unknown}")
        self.group_labels_ = group_labels or {}

        self.n_classes_ = (
            int(np.asarray(y, dtype=int).max()) + 1 if self.family == "multinomial" else None
        )

        if self.verbose:
            _enable_verbose_logging()
            k = len(self.groups_)
            glabels = [str(self._label(g)) for g in self.groups_]
            gstr = ", ".join(glabels[:4]) + (f" +{k - 4} more" if k > 4 else "")
            _logger.info(
                f"\nPretrainedLasso  {self.family}  ·  {k} groups ({gstr})"
                f"  ·  {self.n_features_in_} features  ·  α={self.alpha}"
            )

        # ----------------------------------------------------------
        # Build augmented raw X for the overall model (group intercepts).
        # Matches R's group.intercepts = TRUE: one-hot group dummies (k-1
        # columns, first group = reference) are prepended to X with zero
        # penalty so they act as free intercepts per group.
        # X_raw_aug carries un-standardized X; _unified_cv_oof applies
        # per-fold standardization internally to match R's behaviour.
        # ----------------------------------------------------------
        onehot = self._make_onehot(groups)  # (n, k-1)
        n_onehot = onehot.shape[1]  # = k-1 >= 1
        self._n_onehot_ = n_onehot
        X_raw_aug = (
            np.asfortranarray(np.hstack([onehot, X_raw]))
            if n_onehot > 0
            else np.asfortranarray(X_raw)
        )
        # Zero penalty for group-dummy columns; unit penalty for X features.
        overall_pf = np.concatenate([np.zeros(n_onehot), np.ones(self.n_features_in_)])

        # ----------------------------------------------------------
        # Step 1: overall model — unified CV (one fold split for both
        # lambda selection and OOF predictions, matching R's
        # cv.glmnet(keep=TRUE) where foldid is shared between both).
        # ----------------------------------------------------------
        _show_progress = self.verbose or getattr(self, "_show_overall_progress", False)
        if _show_progress:
            _t_cv = time.time()

        self.overall_model_, self.overall_lmda_idx_, preval_offset = self._unified_cv_oof(
            X_raw_aug, y, overall_pf, n_onehot, groups
        )
        self._preval_offset = preval_offset  # cached for PretrainedLassoCV reuse

        if _show_progress:
            _logger.info(f"    Overall model (unified CV + OOF) done ({time.time() - _t_cv:.1f}s)")

        # Extract X-feature coefficients only (skip the k-1 onehot columns).
        # overall_coef_ has shape (p,) or (p, K) — identical to the no-group-
        # intercepts case from the caller's perspective.
        if self.family == "multinomial":
            flat = _coef_at(self.overall_model_, self.overall_lmda_idx_)
            p_aug = self.n_features_in_ + n_onehot
            coef_mat = flat.reshape(p_aug, self.n_classes_, order="F")
            self.overall_coef_ = coef_mat[n_onehot:, :]  # (p, K)
        else:
            coef_full = _coef_at(self.overall_model_, self.overall_lmda_idx_)
            self.overall_coef_ = coef_full[n_onehot:]  # (p,)
            self.overall_intercept_ = float(self.overall_model_.intercepts[self.overall_lmda_idx_])

        # ----------------------------------------------------------
        # Step 2: per-group models
        # ----------------------------------------------------------
        # Penalty factor: features NOT in the overall X-support get penalty
        # 1/alpha, steering the group model to prefer features already
        # selected overall.  Matches R's fac = rep(1/alpha, p); fac[supall] = 1.
        if self.family == "multinomial":
            overall_support = np.where(np.any(self.overall_coef_ != 0, axis=1))[0]
        else:
            overall_support = np.where(self.overall_coef_ != 0)[0]

        if self.alpha == 0:
            if len(overall_support) == 0:
                # R fallback: all-infinity penalty would produce an empty model;
                # use a very large finite value instead (matches R's 1e-9 hack).
                pf = np.full(self.n_features_in_, 1.0 / 1e-9)
                pf = pf / pf.mean()
            else:
                # Non-support features effectively excluded (matches R's Inf penalty).
                # Do NOT normalize: dividing 1e15 by its mean inverts the structure,
                # making support features unpenalized (pf~1e-15) instead of normal (pf=1).
                pf = np.full(self.n_features_in_, 1e15)
                pf[overall_support] = 1.0
        else:
            pf = np.full(self.n_features_in_, 1.0 / self.alpha)
            pf[overall_support] = 1.0
            # Match glmnet's internal normalization: pf / sum(pf) * nvars = pf / mean(pf).
            # Without this adelie's lambda path is on a different scale, causing ~5x
            # divergence in CV-selected lambda relative to R.
            pf = pf / pf.mean()

        if self.verbose:
            if self.family == "multinomial":
                n_ov = int(np.sum(np.any(self.overall_coef_ != 0, axis=1)))
            else:
                n_ov = int(np.sum(self.overall_coef_ != 0))
            _logger.info(f"  Overall model done  |S|={n_ov}")

        self.pretrain_models_ = {}
        self.pretrain_lmda_idx_ = {}
        self.individual_models_ = {}
        self.individual_lmda_idx_ = {}
        self._pretrain_oof_ = {}
        self._individual_oof_ = {}

        if self.verbose:
            try:
                from tqdm import tqdm as _tqdm

                group_iter = _tqdm(
                    self.groups_, desc="  [2/2] Group models", unit="group", leave=True
                )
            except ImportError:
                _logger.info("  [2/2] Group models")
                group_iter = self.groups_
        else:
            group_iter = self.groups_

        for g in group_iter:
            mask = groups == g
            X_g = np.asfortranarray(self.group_scalers_[g].transform(X_raw[mask]))
            X_raw_g = np.asfortranarray(X_raw[mask])
            # OOF offset: each sample's overall prediction came from a model
            # trained without that sample — mirrors R's use of fit.preval.
            offset = np.asfortranarray((1 - self.alpha) * preval_offset[mask])
            foldid_g = np.asarray(self.foldid)[mask] if self.foldid is not None else None

            pre_state, pre_idx, pre_oof = self._group_cv(
                X_g, X_raw_g, y[mask], pf, offset, foldid_g=foldid_g
            )
            self.pretrain_models_[g] = pre_state
            self.pretrain_lmda_idx_[g] = pre_idx
            self._pretrain_oof_[g] = pre_oof

            ind_state, ind_idx, ind_oof = self._group_cv(
                X_g, X_raw_g, y[mask], None, foldid_g=foldid_g
            )
            self.individual_models_[g] = ind_state
            self.individual_lmda_idx_[g] = ind_idx
            self._individual_oof_[g] = ind_oof

        if self.verbose:
            self._print_fit_summary(time.time() - t0)

        return self

    def _fit_groups_only(self, X, y, groups, preval_offset):
        """Fit stage-2 group models, reusing the stage-1 state already on self.

        Called by PretrainedLassoCV to avoid re-running the expensive overall
        model + OOF computation for every alpha within the same CV fold.
        Requires that all stage-1 attributes (overall_model_, overall_coef_,
        overall_lmda_idx_, n_features_in_, groups_, _n_onehot_, etc.) are
        already set (typically copied from a template estimator).
        """
        X = np.asarray(X, dtype=np.float64)
        y = np.asarray(y, dtype=np.float64)
        groups = np.asarray(groups)

        if self.family == "multinomial":
            overall_support = np.where(np.any(self.overall_coef_ != 0, axis=1))[0]
        else:
            overall_support = np.where(self.overall_coef_ != 0)[0]

        if self.alpha == 0:
            if len(overall_support) == 0:
                pf = np.full(self.n_features_in_, 1.0 / 1e-9)
                pf = pf / pf.mean()
            else:
                pf = np.full(self.n_features_in_, 1e15)
                pf[overall_support] = 1.0
        else:
            pf = np.full(self.n_features_in_, 1.0 / self.alpha)
            pf[overall_support] = 1.0
            pf = pf / pf.mean()

        self.pretrain_models_ = {}
        self.pretrain_lmda_idx_ = {}
        self._pretrain_oof_ = {}
        # individual_models_ and _individual_oof_ are alpha-independent;
        # caller (PretrainedLassoCV) copies them from _stage1 after this call.

        for g in self.groups_:
            mask = groups == g
            X_g = np.asfortranarray(self.group_scalers_[g].transform(X[mask]))
            X_raw_g = np.asfortranarray(X[mask])
            offset = np.asfortranarray((1 - self.alpha) * preval_offset[mask])

            pre_state, pre_idx, pre_oof = self._group_cv(X_g, X_raw_g, y[mask], pf, offset)
            self.pretrain_models_[g] = pre_state
            self.pretrain_lmda_idx_[g] = pre_idx
            self._pretrain_oof_[g] = pre_oof

    # ------------------------------------------------------------------
    # predict / score / evaluate
    # ------------------------------------------------------------------

    def predict(self, X, groups, model="pretrain", type="response", lmda_idx=None):
        """Predict target values.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
        groups : array-like of shape (n_samples,)
        model : {"pretrain", "individual", "overall"}, default="pretrain"
        type : {"response", "link", "class"}, default="response"
            Scale of the returned predictions.

            ``"response"`` — fitted values on the response scale: probabilities
            for binomial/multinomial, fitted values for gaussian.

            ``"link"`` — raw linear predictor before the link function
            (``η = Xβ + intercept``).  For gaussian this is identical to
            ``"response"``.

            ``"class"`` — predicted class label: ``0`` or ``1`` for binomial
            (threshold 0.5), integer argmax for multinomial.  Not valid for
            gaussian.
        lmda_idx : int or None
            Lambda index for group models.  ``None`` uses the CV-selected
            lambda for each group (``lambda.min``, matching R).

        Returns
        -------
        y_pred : ndarray
            Shape ``(n,)`` for gaussian/binomial and for multinomial
            ``"class"``.  Shape ``(n, K)`` for multinomial ``"response"``
            or ``"link"``.
        """
        check_is_fitted(self)
        X = check_array(X, dtype=np.float64, order="F")
        X_raw = X  # keep raw; standardization is applied per-model below
        groups = np.asarray(groups)

        if model not in PREDICT_MODELS:
            raise ValueError(f"model must be one of {PREDICT_MODELS}, got '{model}'")
        if type not in PREDICT_TYPES:
            raise ValueError(f"type must be one of {PREDICT_TYPES}, got '{type}'")
        if type == "class" and self.family == "gaussian":
            raise ValueError("type='class' is not valid for family='gaussian'")
        if len(groups) != X_raw.shape[0]:
            raise ValueError(
                f"groups has {len(groups)} elements but X has {X_raw.shape[0]} samples"
            )
        unknown = set(np.unique(groups)) - set(self.groups_)
        if unknown:
            raise ValueError(f"predict received groups not seen during fit: {unknown}")

        X_std_global = np.asfortranarray(self.scaler_.transform(X_raw))

        if model == "overall":
            return _eta_to_output(self._overall_eta(X_std_global, groups), self.family, type)

        n_out = (
            (X_raw.shape[0], self.n_classes_) if self.family == "multinomial" else (X_raw.shape[0],)
        )
        eta_out = np.empty(n_out)

        # Overall eta uses global standardization (overall model was trained on all data).
        eta_ov = self._overall_eta(X_std_global, groups) if model == "pretrain" else None

        for g in self.groups_:
            mask = groups == g
            if not mask.any():
                continue
            X_g = np.asfortranarray(self.group_scalers_[g].transform(X_raw[mask]))

            if model == "pretrain":
                g_idx = self.pretrain_lmda_idx_.get(g, -1) if lmda_idx is None else lmda_idx
                eta_group = _eta_from_state(
                    self.pretrain_models_[g], X_g, g_idx, self.family, self.n_classes_
                )
                assert eta_ov is not None
                eta_out[mask] = (1 - self.alpha) * eta_ov[mask] + eta_group
            else:
                g_idx = self.individual_lmda_idx_.get(g, -1) if lmda_idx is None else lmda_idx
                eta_out[mask] = _eta_from_state(
                    self.individual_models_[g], X_g, g_idx, self.family, self.n_classes_
                )

        return _eta_to_output(eta_out, self.family, type)

    def score(self, X, y, groups):
        """Return a scalar performance metric using the pretrained model.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
        y : array-like of shape (n_samples,)
        groups : array-like of shape (n_samples,)

        Returns
        -------
        score : float
            R² for gaussian; classification accuracy for binomial/multinomial.
        """
        return _model_score(y, self.predict(X, groups), self.family)

    def evaluate(self, X, y, groups):
        """Predict and score with all three sub-models.

        Convenience method matching R's ``predict(fit, xtest, ytest=ytest)``.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
        y : array-like of shape (n_samples,)
        groups : array-like of shape (n_samples,)

        Returns
        -------
        result : dict
            Keys are ``"pretrain"``, ``"individual"``, ``"overall"``.
            Each value is a dict with entries:

            ``"predictions"`` : ndarray
                Predicted values, shape ``(n,)`` or ``(n, K)`` for multinomial.
            ``"score"`` : float
                R² for gaussian; accuracy for binomial/multinomial.
        """
        check_is_fitted(self)
        result = {}
        for m in ("pretrain", "individual", "overall"):
            preds = self.predict(X, groups, model=m)
            result[m] = {"predictions": preds, "score": _model_score(y, preds, self.family)}
        return result

    # ------------------------------------------------------------------
    # repr / get_coef
    # ------------------------------------------------------------------

    def __repr__(self):
        header = (
            f"PretrainedLasso(alpha={self.alpha}, family='{self.family}', "
            f"overall_lambda='{self.overall_lambda}', "
            f"fit_intercept={self.fit_intercept}, lmda_path_size={self.lmda_path_size})"
        )
        if not hasattr(self, "overall_model_"):
            return header + "\n  [not fitted]"

        coef = self.overall_coef_
        nz_idx = (
            np.where(np.any(coef != 0, axis=1))[0]
            if self.family == "multinomial"
            else np.where(coef != 0)[0]
        )
        names = self._names_or_indices(nz_idx)
        preview = list(names[:5]) + (["..."] if len(nz_idx) > 5 else [])
        ov_str = (
            f"|Ŝ| = {len(nz_idx)} / {self.n_features_in_}  [{', '.join(str(v) for v in preview)}]"
        )

        group_sizes = {}
        for g in self.groups_:
            c = _coef_at(self.pretrain_models_[g], self.pretrain_lmda_idx_.get(g, -1))
            if self.family == "multinomial":
                c = c.reshape(self.n_features_in_, self.n_classes_, order="F")
                group_sizes[self._label(g)] = int(np.sum(np.any(c != 0, axis=1)))
            else:
                group_sizes[self._label(g)] = int(np.sum(c != 0))
        group_str = ", ".join(f"{lbl}: |Ŝ|={v}" for lbl, v in group_sizes.items())

        return (
            f"{header}\n"
            f"  family       : {self.family}\n"
            f"  n_features   : {self.n_features_in_}\n"
            f"  n_groups     : {len(self.groups_)}\n"
            f"  overall |Ŝ|  : {ov_str}\n"
            f"  pretrain |Ŝ| : {group_str}\n"
            f"  overall λ    : {self.overall_lambda}  (idx {self.overall_lmda_idx_})"
        )

    def get_coef(self, model="all", lmda_idx=None):
        """Return fitted coefficients as a nested dict.

        Parameters
        ----------
        model : {"all", "overall", "pretrain", "individual"}, default="all"
            Which sub-model(s) to return.
        lmda_idx : int or None
            Lambda index for group models.  ``None`` uses the last lambda (``-1``).

        Returns
        -------
        coefs : dict
            When ``model="all"``, keys are ``"overall"``, ``"pretrain"``,
            ``"individual"``.  For ``"overall"``, value is
            ``{"coef": ndarray, "intercept": ndarray}``.  For ``"pretrain"``
            and ``"individual"``, value is a dict keyed by group label, each
            containing ``{"coef": ndarray, "intercept": ndarray}``.
            When a specific model is requested, only that sub-dict is returned.
        """
        if model not in COEF_MODELS:
            raise ValueError(f"model must be one of {COEF_MODELS}, got '{model}'")
        check_is_fitted(self)

        def _group_coefs(models, lmda_idxs):
            result = {}
            for g, state in models.items():
                use_idx = lmda_idxs.get(g, -1) if lmda_idx is None else lmda_idx
                result[self._label(g)] = {
                    "coef": _coef_at(state, use_idx),
                    "intercept": np.asarray(state.intercepts[use_idx]).ravel(),
                }
            return result

        result = {}
        if model in ("all", "overall"):
            result["overall"] = {
                "coef": self.overall_coef_,
                "intercept": np.asarray(
                    self.overall_model_.intercepts[self.overall_lmda_idx_]
                ).ravel(),
            }
        if model in ("all", "pretrain"):
            result["pretrain"] = _group_coefs(self.pretrain_models_, self.pretrain_lmda_idx_)
        if model in ("all", "individual"):
            result["individual"] = _group_coefs(self.individual_models_, self.individual_lmda_idx_)
        return result if model == "all" else result[model]

    # ------------------------------------------------------------------
    # Serialisation
    # ------------------------------------------------------------------

    def __getstate__(self):
        d = self.__dict__.copy()
        _proxify_models(d)
        return d

fit

fit(X, y, groups, group_labels=None, feature_names=None)

Fit the pretrained Lasso.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`	Training data.	required
`y`	`array-like of shape (n_samples,)`	Target values. For `family="binomial"`, must contain only 0 and 1. For `family="multinomial"`, must contain non-negative integer class labels 0..K-1.	required
`groups`	`array-like of shape (n_samples,)`	Group membership for each sample. Must contain at least two distinct values.	required
`group_labels`	`dict or None`	Optional mapping from group values to display names used in `__repr__` and `get_coef()`.	`None`
`feature_names`	`array-like of str or None`	Feature names. Inferred from `X.columns` when `X` is a DataFrame and this argument is `None`.	`None`

Returns:

Name	Type	Description
`self`	`PretrainedLasso`	Fitted estimator.

Source code in src/ptlasso/_estimator.py

def fit(self, X, y, groups, group_labels=None, feature_names=None):
    """Fit the pretrained Lasso.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.  For ``family="binomial"``, must contain only 0
        and 1.  For ``family="multinomial"``, must contain non-negative
        integer class labels 0..K-1.
    groups : array-like of shape (n_samples,)
        Group membership for each sample.  Must contain at least two
        distinct values.
    group_labels : dict or None, default=None
        Optional mapping from group values to display names used in
        ``__repr__`` and ``get_coef()``.
    feature_names : array-like of str or None, default=None
        Feature names.  Inferred from ``X.columns`` when ``X`` is a
        DataFrame and this argument is ``None``.

    Returns
    -------
    self : PretrainedLasso
        Fitted estimator.
    """
    t0 = time.time()
    self._validate_params()

    if feature_names is None and hasattr(X, "columns"):
        feature_names = list(X.columns)

    X, y = check_X_y(X, y, dtype=np.float64, order="F")
    self.n_features_in_ = X.shape[1]

    # Save raw X for per-group standardization (see group_scalers_ below).
    X_raw = X.copy()

    # Global standardization for the overall model only — matches R's
    # glmnet(standardize=TRUE) on the full dataset.
    self.scaler_ = StandardScaler(with_mean=True, with_std=self.standardize)
    X = self.scaler_.fit_transform(X)

    if feature_names is not None:
        feature_names = np.asarray(feature_names)
        if len(feature_names) != self.n_features_in_:
            raise ValueError(
                f"feature_names has {len(feature_names)} entries "
                f"but X has {self.n_features_in_} features"
            )
    self.feature_names_in_ = feature_names

    if self.family == "binomial":
        unique_y = np.unique(y)
        if not np.all(np.isin(unique_y, [0.0, 1.0])):
            raise ValueError(
                f"For family='binomial', y must contain only 0 and 1, "
                f"got unique values {unique_y}"
            )
    if self.family == "multinomial":
        y_int = np.asarray(y)
        if not (np.all(y_int == np.round(y_int)) and np.all(y_int >= 0)):
            raise ValueError(
                "For family='multinomial', y must contain non-negative integer class labels"
            )

    groups = np.asarray(groups)
    if len(groups) != X.shape[0]:
        raise ValueError(f"groups has {len(groups)} elements but X has {X.shape[0]} samples")
    self.groups_ = np.unique(groups)
    if len(self.groups_) < 2:
        raise ValueError("groups must contain at least 2 unique values")

    # Single OOF fold count used across all models, matching R's uniform nfolds.
    _min_group = min(int(np.sum(groups == g)) for g in self.groups_)
    if self.family in ("binomial", "multinomial"):
        _min_class = min(
            int(np.bincount(np.asarray(y)[groups == g].astype(int)).min()) for g in self.groups_
        )
        self._n_folds_oof_ = max(2, min(self.n_folds, _min_group, _min_class))
    else:
        self._n_folds_oof_ = max(2, min(self.n_folds, _min_group))

    # Per-group scalers: each group's model standardizes using only that
    # group's data — matching R's glmnet(standardize=TRUE) per model call.
    self.group_scalers_ = {}
    for _g in self.groups_:
        _sg = StandardScaler(with_mean=True, with_std=self.standardize)
        _sg.fit(X_raw[groups == _g])
        self.group_scalers_[_g] = _sg

    if group_labels is not None:
        if not isinstance(group_labels, dict):
            raise TypeError(f"group_labels must be a dict, got {type(group_labels).__name__}")
        unknown = set(group_labels) - set(self.groups_)
        if unknown:
            raise ValueError(f"group_labels contains unknown group keys: {unknown}")
    self.group_labels_ = group_labels or {}

    self.n_classes_ = (
        int(np.asarray(y, dtype=int).max()) + 1 if self.family == "multinomial" else None
    )

    if self.verbose:
        _enable_verbose_logging()
        k = len(self.groups_)
        glabels = [str(self._label(g)) for g in self.groups_]
        gstr = ", ".join(glabels[:4]) + (f" +{k - 4} more" if k > 4 else "")
        _logger.info(
            f"\nPretrainedLasso  {self.family}  ·  {k} groups ({gstr})"
            f"  ·  {self.n_features_in_} features  ·  α={self.alpha}"
        )

    # ----------------------------------------------------------
    # Build augmented raw X for the overall model (group intercepts).
    # Matches R's group.intercepts = TRUE: one-hot group dummies (k-1
    # columns, first group = reference) are prepended to X with zero
    # penalty so they act as free intercepts per group.
    # X_raw_aug carries un-standardized X; _unified_cv_oof applies
    # per-fold standardization internally to match R's behaviour.
    # ----------------------------------------------------------
    onehot = self._make_onehot(groups)  # (n, k-1)
    n_onehot = onehot.shape[1]  # = k-1 >= 1
    self._n_onehot_ = n_onehot
    X_raw_aug = (
        np.asfortranarray(np.hstack([onehot, X_raw]))
        if n_onehot > 0
        else np.asfortranarray(X_raw)
    )
    # Zero penalty for group-dummy columns; unit penalty for X features.
    overall_pf = np.concatenate([np.zeros(n_onehot), np.ones(self.n_features_in_)])

    # ----------------------------------------------------------
    # Step 1: overall model — unified CV (one fold split for both
    # lambda selection and OOF predictions, matching R's
    # cv.glmnet(keep=TRUE) where foldid is shared between both).
    # ----------------------------------------------------------
    _show_progress = self.verbose or getattr(self, "_show_overall_progress", False)
    if _show_progress:
        _t_cv = time.time()

    self.overall_model_, self.overall_lmda_idx_, preval_offset = self._unified_cv_oof(
        X_raw_aug, y, overall_pf, n_onehot, groups
    )
    self._preval_offset = preval_offset  # cached for PretrainedLassoCV reuse

    if _show_progress:
        _logger.info(f"    Overall model (unified CV + OOF) done ({time.time() - _t_cv:.1f}s)")

    # Extract X-feature coefficients only (skip the k-1 onehot columns).
    # overall_coef_ has shape (p,) or (p, K) — identical to the no-group-
    # intercepts case from the caller's perspective.
    if self.family == "multinomial":
        flat = _coef_at(self.overall_model_, self.overall_lmda_idx_)
        p_aug = self.n_features_in_ + n_onehot
        coef_mat = flat.reshape(p_aug, self.n_classes_, order="F")
        self.overall_coef_ = coef_mat[n_onehot:, :]  # (p, K)
    else:
        coef_full = _coef_at(self.overall_model_, self.overall_lmda_idx_)
        self.overall_coef_ = coef_full[n_onehot:]  # (p,)
        self.overall_intercept_ = float(self.overall_model_.intercepts[self.overall_lmda_idx_])

    # ----------------------------------------------------------
    # Step 2: per-group models
    # ----------------------------------------------------------
    # Penalty factor: features NOT in the overall X-support get penalty
    # 1/alpha, steering the group model to prefer features already
    # selected overall.  Matches R's fac = rep(1/alpha, p); fac[supall] = 1.
    if self.family == "multinomial":
        overall_support = np.where(np.any(self.overall_coef_ != 0, axis=1))[0]
    else:
        overall_support = np.where(self.overall_coef_ != 0)[0]

    if self.alpha == 0:
        if len(overall_support) == 0:
            # R fallback: all-infinity penalty would produce an empty model;
            # use a very large finite value instead (matches R's 1e-9 hack).
            pf = np.full(self.n_features_in_, 1.0 / 1e-9)
            pf = pf / pf.mean()
        else:
            # Non-support features effectively excluded (matches R's Inf penalty).
            # Do NOT normalize: dividing 1e15 by its mean inverts the structure,
            # making support features unpenalized (pf~1e-15) instead of normal (pf=1).
            pf = np.full(self.n_features_in_, 1e15)
            pf[overall_support] = 1.0
    else:
        pf = np.full(self.n_features_in_, 1.0 / self.alpha)
        pf[overall_support] = 1.0
        # Match glmnet's internal normalization: pf / sum(pf) * nvars = pf / mean(pf).
        # Without this adelie's lambda path is on a different scale, causing ~5x
        # divergence in CV-selected lambda relative to R.
        pf = pf / pf.mean()

    if self.verbose:
        if self.family == "multinomial":
            n_ov = int(np.sum(np.any(self.overall_coef_ != 0, axis=1)))
        else:
            n_ov = int(np.sum(self.overall_coef_ != 0))
        _logger.info(f"  Overall model done  |S|={n_ov}")

    self.pretrain_models_ = {}
    self.pretrain_lmda_idx_ = {}
    self.individual_models_ = {}
    self.individual_lmda_idx_ = {}
    self._pretrain_oof_ = {}
    self._individual_oof_ = {}

    if self.verbose:
        try:
            from tqdm import tqdm as _tqdm

            group_iter = _tqdm(
                self.groups_, desc="  [2/2] Group models", unit="group", leave=True
            )
        except ImportError:
            _logger.info("  [2/2] Group models")
            group_iter = self.groups_
    else:
        group_iter = self.groups_

    for g in group_iter:
        mask = groups == g
        X_g = np.asfortranarray(self.group_scalers_[g].transform(X_raw[mask]))
        X_raw_g = np.asfortranarray(X_raw[mask])
        # OOF offset: each sample's overall prediction came from a model
        # trained without that sample — mirrors R's use of fit.preval.
        offset = np.asfortranarray((1 - self.alpha) * preval_offset[mask])
        foldid_g = np.asarray(self.foldid)[mask] if self.foldid is not None else None

        pre_state, pre_idx, pre_oof = self._group_cv(
            X_g, X_raw_g, y[mask], pf, offset, foldid_g=foldid_g
        )
        self.pretrain_models_[g] = pre_state
        self.pretrain_lmda_idx_[g] = pre_idx
        self._pretrain_oof_[g] = pre_oof

        ind_state, ind_idx, ind_oof = self._group_cv(
            X_g, X_raw_g, y[mask], None, foldid_g=foldid_g
        )
        self.individual_models_[g] = ind_state
        self.individual_lmda_idx_[g] = ind_idx
        self._individual_oof_[g] = ind_oof

    if self.verbose:
        self._print_fit_summary(time.time() - t0)

    return self

predict

predict(X, groups, model='pretrain', type='response', lmda_idx=None)

Predict target values.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`		required
`groups`	`array-like of shape (n_samples,)`		required
`model`	`(pretrain, individual, overall)`		`"pretrain"`
`type`	`(response, link, 'class')`	Scale of the returned predictions. `"response"` — fitted values on the response scale: probabilities for binomial/multinomial, fitted values for gaussian. `"link"` — raw linear predictor before the link function (`η = Xβ + intercept`). For gaussian this is identical to `"response"`. `"class"` — predicted class label: `0` or `1` for binomial (threshold 0.5), integer argmax for multinomial. Not valid for gaussian.	`"response"`
`lmda_idx`	`int or None`	Lambda index for group models. `None` uses the CV-selected lambda for each group (`lambda.min`, matching R).	`None`

Returns:

Name	Type	Description
`y_pred`	`ndarray`	Shape `(n,)` for gaussian/binomial and for multinomial `"class"`. Shape `(n, K)` for multinomial `"response"` or `"link"`.

Source code in src/ptlasso/_estimator.py

def predict(self, X, groups, model="pretrain", type="response", lmda_idx=None):
    """Predict target values.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
    groups : array-like of shape (n_samples,)
    model : {"pretrain", "individual", "overall"}, default="pretrain"
    type : {"response", "link", "class"}, default="response"
        Scale of the returned predictions.

        ``"response"`` — fitted values on the response scale: probabilities
        for binomial/multinomial, fitted values for gaussian.

        ``"link"`` — raw linear predictor before the link function
        (``η = Xβ + intercept``).  For gaussian this is identical to
        ``"response"``.

        ``"class"`` — predicted class label: ``0`` or ``1`` for binomial
        (threshold 0.5), integer argmax for multinomial.  Not valid for
        gaussian.
    lmda_idx : int or None
        Lambda index for group models.  ``None`` uses the CV-selected
        lambda for each group (``lambda.min``, matching R).

    Returns
    -------
    y_pred : ndarray
        Shape ``(n,)`` for gaussian/binomial and for multinomial
        ``"class"``.  Shape ``(n, K)`` for multinomial ``"response"``
        or ``"link"``.
    """
    check_is_fitted(self)
    X = check_array(X, dtype=np.float64, order="F")
    X_raw = X  # keep raw; standardization is applied per-model below
    groups = np.asarray(groups)

    if model not in PREDICT_MODELS:
        raise ValueError(f"model must be one of {PREDICT_MODELS}, got '{model}'")
    if type not in PREDICT_TYPES:
        raise ValueError(f"type must be one of {PREDICT_TYPES}, got '{type}'")
    if type == "class" and self.family == "gaussian":
        raise ValueError("type='class' is not valid for family='gaussian'")
    if len(groups) != X_raw.shape[0]:
        raise ValueError(
            f"groups has {len(groups)} elements but X has {X_raw.shape[0]} samples"
        )
    unknown = set(np.unique(groups)) - set(self.groups_)
    if unknown:
        raise ValueError(f"predict received groups not seen during fit: {unknown}")

    X_std_global = np.asfortranarray(self.scaler_.transform(X_raw))

    if model == "overall":
        return _eta_to_output(self._overall_eta(X_std_global, groups), self.family, type)

    n_out = (
        (X_raw.shape[0], self.n_classes_) if self.family == "multinomial" else (X_raw.shape[0],)
    )
    eta_out = np.empty(n_out)

    # Overall eta uses global standardization (overall model was trained on all data).
    eta_ov = self._overall_eta(X_std_global, groups) if model == "pretrain" else None

    for g in self.groups_:
        mask = groups == g
        if not mask.any():
            continue
        X_g = np.asfortranarray(self.group_scalers_[g].transform(X_raw[mask]))

        if model == "pretrain":
            g_idx = self.pretrain_lmda_idx_.get(g, -1) if lmda_idx is None else lmda_idx
            eta_group = _eta_from_state(
                self.pretrain_models_[g], X_g, g_idx, self.family, self.n_classes_
            )
            assert eta_ov is not None
            eta_out[mask] = (1 - self.alpha) * eta_ov[mask] + eta_group
        else:
            g_idx = self.individual_lmda_idx_.get(g, -1) if lmda_idx is None else lmda_idx
            eta_out[mask] = _eta_from_state(
                self.individual_models_[g], X_g, g_idx, self.family, self.n_classes_
            )

    return _eta_to_output(eta_out, self.family, type)

score

score(X, y, groups)

Return a scalar performance metric using the pretrained model.

Parameters:

Name	Type	Default
`X`	`array-like of shape (n_samples, n_features)`	required
`y`	`array-like of shape (n_samples,)`	required
`groups`	`array-like of shape (n_samples,)`	required

Returns:

Name	Type	Description
`score`	`float`	R² for gaussian; classification accuracy for binomial/multinomial.

Source code in src/ptlasso/_estimator.py

def score(self, X, y, groups):
    """Return a scalar performance metric using the pretrained model.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
    y : array-like of shape (n_samples,)
    groups : array-like of shape (n_samples,)

    Returns
    -------
    score : float
        R² for gaussian; classification accuracy for binomial/multinomial.
    """
    return _model_score(y, self.predict(X, groups), self.family)

evaluate

evaluate(X, y, groups)

Predict and score with all three sub-models.

Convenience method matching R's predict(fit, xtest, ytest=ytest).

Parameters:

Name	Type	Default
`X`	`array-like of shape (n_samples, n_features)`	required
`y`	`array-like of shape (n_samples,)`	required
`groups`	`array-like of shape (n_samples,)`	required

Returns:

Name	Type	Description
`result`	`dict`	Keys are `"pretrain"`, `"individual"`, `"overall"`. Each value is a dict with entries: `"predictions"` : ndarray Predicted values, shape `(n,)` or `(n, K)` for multinomial. `"score"` : float R² for gaussian; accuracy for binomial/multinomial.

Source code in src/ptlasso/_estimator.py

def evaluate(self, X, y, groups):
    """Predict and score with all three sub-models.

    Convenience method matching R's ``predict(fit, xtest, ytest=ytest)``.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
    y : array-like of shape (n_samples,)
    groups : array-like of shape (n_samples,)

    Returns
    -------
    result : dict
        Keys are ``"pretrain"``, ``"individual"``, ``"overall"``.
        Each value is a dict with entries:

        ``"predictions"`` : ndarray
            Predicted values, shape ``(n,)`` or ``(n, K)`` for multinomial.
        ``"score"`` : float
            R² for gaussian; accuracy for binomial/multinomial.
    """
    check_is_fitted(self)
    result = {}
    for m in ("pretrain", "individual", "overall"):
        preds = self.predict(X, groups, model=m)
        result[m] = {"predictions": preds, "score": _model_score(y, preds, self.family)}
    return result

get_coef

get_coef(model='all', lmda_idx=None)

Return fitted coefficients as a nested dict.

Parameters:

Name	Type	Description	Default
`model`	`(all, overall, pretrain, individual)`	Which sub-model(s) to return.	`"all"`
`lmda_idx`	`int or None`	Lambda index for group models. `None` uses the last lambda (`-1`).	`None`

Returns:

Name	Type	Description
`coefs`	`dict`	When `model="all"`, keys are `"overall"`, `"pretrain"`, `"individual"`. For `"overall"`, value is `{"coef": ndarray, "intercept": ndarray}`. For `"pretrain"` and `"individual"`, value is a dict keyed by group label, each containing `{"coef": ndarray, "intercept": ndarray}`. When a specific model is requested, only that sub-dict is returned.

Source code in src/ptlasso/_estimator.py

def get_coef(self, model="all", lmda_idx=None):
    """Return fitted coefficients as a nested dict.

    Parameters
    ----------
    model : {"all", "overall", "pretrain", "individual"}, default="all"
        Which sub-model(s) to return.
    lmda_idx : int or None
        Lambda index for group models.  ``None`` uses the last lambda (``-1``).

    Returns
    -------
    coefs : dict
        When ``model="all"``, keys are ``"overall"``, ``"pretrain"``,
        ``"individual"``.  For ``"overall"``, value is
        ``{"coef": ndarray, "intercept": ndarray}``.  For ``"pretrain"``
        and ``"individual"``, value is a dict keyed by group label, each
        containing ``{"coef": ndarray, "intercept": ndarray}``.
        When a specific model is requested, only that sub-dict is returned.
    """
    if model not in COEF_MODELS:
        raise ValueError(f"model must be one of {COEF_MODELS}, got '{model}'")
    check_is_fitted(self)

    def _group_coefs(models, lmda_idxs):
        result = {}
        for g, state in models.items():
            use_idx = lmda_idxs.get(g, -1) if lmda_idx is None else lmda_idx
            result[self._label(g)] = {
                "coef": _coef_at(state, use_idx),
                "intercept": np.asarray(state.intercepts[use_idx]).ravel(),
            }
        return result

    result = {}
    if model in ("all", "overall"):
        result["overall"] = {
            "coef": self.overall_coef_,
            "intercept": np.asarray(
                self.overall_model_.intercepts[self.overall_lmda_idx_]
            ).ravel(),
        }
    if model in ("all", "pretrain"):
        result["pretrain"] = _group_coefs(self.pretrain_models_, self.pretrain_lmda_idx_)
    if model in ("all", "individual"):
        result["individual"] = _group_coefs(self.individual_models_, self.individual_lmda_idx_)
    return result if model == "all" else result[model]

PretrainedLassoCV

ptlasso.PretrainedLassoCV

Bases: RegressorMixin, BasePretrainedLasso

Pretrained Lasso with cross-validation over alpha.

Parameters:

Name	Type	Description	Default
`alphas`	`array - like or None`	Candidate pretraining strengths. Defaults to [0, 0.25, 0.5, 0.75, 1.0]. `0` = maximum pretraining; `1` = no pretraining (individual models).	`None`
`n_folds`	`int`	Number of folds used for (a) adelie's internal lambda-selection CV and (b) the OOF predictions that drive alpha selection. Capped by the minimum per-group class size for classification families.	`10`
`alphahat_choice`	`(overall, mean)`	`"overall"` minimises the global CV error; `"mean"` minimises the unweighted mean of per-group CV errors.	`"overall"`
`family`	`(gaussian, binomial, multinomial)`	Response distribution.	`"gaussian"`
`overall_lambda`	`('lambda.1se', 'lambda.min')`	Lambda selection rule for the stage-1 overall model.	`"lambda.1se"`
`fit_intercept`	`bool`	Whether to fit an intercept in every sub-model.	`True`
`lmda_path_size`	`int`	Number of lambdas in the regularisation path.	`100`
`min_ratio`	`float`	Ratio of the smallest to largest lambda on the path.	`0.0001`
`verbose`	`bool`	Whether to display fitting progress and a summary after training. Adelie's internal output is always suppressed regardless of this setting.	`True`
`standardize`	`bool`	Whether to standardize features before fitting each sub-model. Each model standardizes using only its own training data subset, matching R's `glmnet(standardize=TRUE)` per-model behaviour.	`True`
`foldid`	`array-like of int or None`	Fold assignments, one integer per sample. When provided, overrides the internal `StratifiedKFold` splitter.	`None`
`n_threads`	`int`	Number of threads passed to adelie's solver. Set to a higher value to parallelise the coordinate descent within each model fit. `-1` uses all available CPU cores (`os.cpu_count()`).	`-1`
`type_measure`	`(deviance, mse, mae, auc, 'class')`	CV criterion used for both internal lambda selection and alpha selection, matching R's single `type.measure` parameter. `"deviance"` — family-based loss (MSE for gaussian, log-loss for binomial/multinomial). `"mse"` — mean squared error. `"mae"` — mean absolute error. `"auc"` — area under the ROC curve (binomial only). `"class"` — classification error (binomial/multinomial). `cv_results_` always stores values as losses (lower = better), so AUC will appear as `-AUC`.	`"deviance"`

Attributes:

Name	Type	Description
`alpha_`	`float`	Best alpha selected by CV (based on `alphahat_choice`).
`varying_alphahat_`	`dict {group -> float}`	Per-group best alpha.
`cv_results_`	`dict {alpha -> float}`	Global mean CV loss per alpha.
`cv_results_se_`	`dict {alpha -> float}`	Standard error of global CV loss.
`cv_results_per_group_`	`dict {alpha -> {group -> float}}`	Mean CV loss per alpha per group.
`cv_results_mean_`	`dict {alpha -> float}`	Unweighted mean of per-group CV losses.
`cv_results_wtd_mean_`	`dict {alpha -> float}`	Group-size-weighted mean of per-group CV losses.
`cv_results_individual_`	`float`	Global CV loss for the individual (no-pretraining) model.
`cv_results_overall_`	`float`	Global CV loss for the overall model.
`best_estimator_`	`PretrainedLasso`	Full-data refit at the globally selected `alpha_`.
`all_estimators_`	`dict {alpha -> PretrainedLasso}`	Full-data refits for each unique alpha needed by `varying_alphahat_`.

References

Craig, E., Pilanci, M., Le Menestrel, T., Narasimhan, B., Rivas, M. A., Gullaksen, S. E., & Tibshirani, R. (2025). Pretraining and the lasso. Journal of the Royal Statistical Society Series B, qkaf050.

Source code in src/ptlasso/_cv.py

class PretrainedLassoCV(RegressorMixin, BasePretrainedLasso):
    """Pretrained Lasso with cross-validation over alpha.

    Parameters
    ----------
    alphas : array-like or None, default=None
        Candidate pretraining strengths.  Defaults to [0, 0.25, 0.5, 0.75, 1.0].
        ``0`` = maximum pretraining; ``1`` = no pretraining (individual models).
    n_folds : int, default=10
        Number of folds used for (a) adelie's internal lambda-selection CV and
        (b) the OOF predictions that drive alpha selection.  Capped by the
        minimum per-group class size for classification families.
    alphahat_choice : {"overall", "mean"}, default="overall"
        ``"overall"`` minimises the global CV error; ``"mean"`` minimises the
        unweighted mean of per-group CV errors.
    family : {"gaussian", "binomial", "multinomial"}, default="gaussian"
        Response distribution.
    overall_lambda : {"lambda.1se", "lambda.min"}, default="lambda.1se"
        Lambda selection rule for the stage-1 overall model.
    fit_intercept : bool, default=True
        Whether to fit an intercept in every sub-model.
    lmda_path_size : int, default=100
        Number of lambdas in the regularisation path.
    min_ratio : float, default=0.0001
        Ratio of the smallest to largest lambda on the path.
    verbose : bool, default=True
        Whether to display fitting progress and a summary after training.
        Adelie's internal output is always suppressed regardless of this setting.
    standardize : bool, default=True
        Whether to standardize features before fitting each sub-model.
        Each model standardizes using only its own training data subset,
        matching R's ``glmnet(standardize=TRUE)`` per-model behaviour.
    foldid : array-like of int or None, default=None
        Fold assignments, one integer per sample.  When provided, overrides
        the internal ``StratifiedKFold`` splitter.
    n_threads : int, default=-1
        Number of threads passed to adelie's solver.  Set to a higher value
        to parallelise the coordinate descent within each model fit.
        ``-1`` uses all available CPU cores (``os.cpu_count()``).
    type_measure : {"deviance", "mse", "mae", "auc", "class"}, default="deviance"
        CV criterion used for both internal lambda selection and alpha selection,
        matching R's single ``type.measure`` parameter.

        - ``"deviance"`` — family-based loss (MSE for gaussian, log-loss for
          binomial/multinomial).
        - ``"mse"`` — mean squared error.
        - ``"mae"`` — mean absolute error.
        - ``"auc"`` — area under the ROC curve (binomial only).
        - ``"class"`` — classification error (binomial/multinomial).

        ``cv_results_`` always stores values as *losses* (lower = better),
        so AUC will appear as ``-AUC``.

    Attributes
    ----------
    alpha_ : float
        Best alpha selected by CV (based on ``alphahat_choice``).
    varying_alphahat_ : dict {group -> float}
        Per-group best alpha.
    cv_results_ : dict {alpha -> float}
        Global mean CV loss per alpha.
    cv_results_se_ : dict {alpha -> float}
        Standard error of global CV loss.
    cv_results_per_group_ : dict {alpha -> {group -> float}}
        Mean CV loss per alpha per group.
    cv_results_mean_ : dict {alpha -> float}
        Unweighted mean of per-group CV losses.
    cv_results_wtd_mean_ : dict {alpha -> float}
        Group-size-weighted mean of per-group CV losses.
    cv_results_individual_ : float
        Global CV loss for the individual (no-pretraining) model.
    cv_results_overall_ : float
        Global CV loss for the overall model.
    best_estimator_ : PretrainedLasso
        Full-data refit at the globally selected ``alpha_``.
    all_estimators_ : dict {alpha -> PretrainedLasso}
        Full-data refits for each unique alpha needed by ``varying_alphahat_``.

    References
    ----------
    Craig, E., Pilanci, M., Le Menestrel, T., Narasimhan, B., Rivas, M. A.,
    Gullaksen, S. E., & Tibshirani, R. (2025). Pretraining and the lasso.
    *Journal of the Royal Statistical Society Series B*, qkaf050.
    """

    def __init__(
        self,
        alphas=DEFAULT_ALPHAS,
        n_folds=10,
        alphahat_choice="overall",
        family="gaussian",
        overall_lambda="lambda.1se",
        fit_intercept=True,
        lmda_path_size=100,
        min_ratio=0.0001,
        verbose=True,
        foldid=None,
        n_threads=-1,
        standardize=True,
        type_measure="deviance",
    ):
        self.alphas = alphas
        self.n_folds = n_folds
        self.alphahat_choice = alphahat_choice
        self.family = family
        self.overall_lambda = overall_lambda
        self.fit_intercept = fit_intercept
        self.lmda_path_size = lmda_path_size
        self.min_ratio = min_ratio
        self.verbose = verbose
        self.foldid = foldid
        self.n_threads = n_threads
        self.standardize = standardize
        self.type_measure = type_measure
        # fitted attributes (set by fit())
        self.groups_: np.ndarray
        self.n_classes_: Optional[int]

    # ------------------------------------------------------------------
    # Internal helpers
    # ------------------------------------------------------------------

    def _validate_params(self):
        alphas = self.alphas
        bad = [a for a in alphas if not (0.0 <= a <= 1.0)]
        if bad:
            raise ValueError(f"All alphas must be in [0, 1], got {bad}")
        if len(set(alphas)) != len(alphas):
            raise ValueError(f"alphas must not contain duplicates, got {alphas}")
        if not isinstance(self.n_folds, int) or self.n_folds < 2:
            raise ValueError(f"n_folds must be an integer >= 2, got {self.n_folds}")
        if self.alphahat_choice not in ("overall", "mean"):
            raise ValueError(
                f"alphahat_choice must be 'overall' or 'mean', got '{self.alphahat_choice}'"
            )
        if self.family not in FAMILIES:
            raise ValueError(f"family must be one of {FAMILIES}, got '{self.family}'")
        if self.overall_lambda not in LMDA_MODES:
            raise ValueError(f"overall_lambda must be one of {LMDA_MODES}")
        if not isinstance(self.lmda_path_size, int) or self.lmda_path_size < 1:
            raise ValueError("lmda_path_size must be a positive integer")
        if not (0.0 < self.min_ratio < 1.0):
            raise ValueError("min_ratio must be in (0, 1)")
        _valid_type_measures = ("deviance", "mse", "mae", "auc", "class")
        if self.type_measure not in _valid_type_measures:
            raise ValueError(
                f"type_measure must be one of {_valid_type_measures}, got '{self.type_measure}'"
            )

    def _get_alphas(self):
        """Return the list of alpha candidates."""
        return list(self.alphas if self.alphas is not None else DEFAULT_ALPHAS)

    def _base_estimator(self, alpha):
        """Return a configured PretrainedLasso for a given alpha."""
        return PretrainedLasso(
            alpha=alpha,
            family=self.family,
            overall_lambda=self.overall_lambda,
            fit_intercept=self.fit_intercept,
            lmda_path_size=self.lmda_path_size,
            min_ratio=self.min_ratio,
            verbose=False,  # CV sub-fits are silent; PretrainedLassoCV owns the progress bar
            n_threads=self.n_threads,
            standardize=self.standardize,
            n_folds=self.n_folds,
            type_measure=self.type_measure,
        )

    # ------------------------------------------------------------------
    # Progress / display helpers
    # ------------------------------------------------------------------

    def _print_cv_summary(self, elapsed):
        SEP = "─" * 54
        rows = [SEP, f"    {'α':<8} {'CV loss':<12} {'±SE'}", f"  {'─' * 34}"]
        for a in self.alphalist_:
            marker = "►" if a == self.alpha_ else " "
            rows.append(
                f" {marker}  {a:<8.2f} {self.cv_results_[a]:<12.4f} {self.cv_results_se_[a]:.4f}"
            )
        rows += [
            f"  {'─' * 34}",
            f"  {'individual':<13}{self.cv_results_individual_:.4f}",
            f"  {'overall':<13}{self.cv_results_overall_:.4f}",
            SEP,
        ]
        _logger.info("\n".join(rows))
        best = self.best_estimator_
        stage2 = set()
        for g in self.groups_:
            c = _coef_at(best.pretrain_models_[g], best.pretrain_lmda_idx_[g])
            if best.n_classes_ is not None:
                c = c.reshape(best.n_features_in_, best.n_classes_, order="F")
                stage2 |= set(int(i) for i in np.where(np.any(c != 0, axis=1))[0])
            else:
                stage2 |= set(int(i) for i in np.where(c != 0)[0])
        _logger.info(
            f"  Best α = {self.alpha_:.2f}   |S| = {len(stage2)}"
            f"   Fitted in {elapsed:.1f}s"
            f"  ({len(self.alphalist_)} alphas · OOF-based)"
        )

    # ------------------------------------------------------------------
    # fit
    # ------------------------------------------------------------------

    def fit(self, X, y, groups, group_labels=None, feature_names=None):
        """Fit PretrainedLassoCV.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Training data.
        y : array-like of shape (n_samples,)
            Target values.  For ``family="binomial"``, must contain only 0
            and 1.  For ``family="multinomial"``, must contain non-negative
            integer class labels 0..K-1.
        groups : array-like of shape (n_samples,)
            Group membership for each sample.  Must contain at least two
            distinct values.
        group_labels : dict or None, default=None
            Optional mapping from group values to display names used in
            ``__repr__`` and ``get_coef()``.
        feature_names : array-like of str or None, default=None
            Feature names.  Inferred from ``X.columns`` when ``X`` is a
            DataFrame and this argument is ``None``.

        Returns
        -------
        self : PretrainedLassoCV
            Fitted estimator.
        """
        t0 = time.time()
        self._validate_params()

        if feature_names is None and hasattr(X, "columns"):
            feature_names = list(X.columns)

        X, y = check_X_y(X, y, dtype=np.float64, order="F")
        groups = np.asarray(groups)

        self.n_features_in_ = X.shape[1]
        self.group_labels_ = group_labels or {}

        alphas = self._get_alphas()
        self.alphalist_ = np.asarray(alphas)

        unique_groups = np.unique(groups)
        group_sizes = {g: int(np.sum(groups == g)) for g in unique_groups}

        scorer_fn = _type_measure_to_scorer(self.type_measure)

        if self.verbose:
            _enable_verbose_logging()
            _logger.info(
                f"\nPretrainedLassoCV  {self.family}  ·  {len(unique_groups)} groups"
                f"  ·  {self.n_features_in_} features"
                f"  ·  {len(alphas)} alphas · OOF-based (all data)"
            )
            from tqdm import tqdm as _tqdm

        # --- R-style architecture: train all models on full data, evaluate
        #     via within-model OOF predictions (mirrors cv.glmnet keep=TRUE).
        #     No data is withheld from model fitting. ---

        if self.verbose:
            _logger.info("\n── Full-data fit (all samples) " + "─" * 26)
            _logger.info("  Overall model (λ-path):")

        # Step 1: fit stage-1 (overall model + OOF) once on ALL data.
        _stage1 = self._base_estimator(alphas[0])
        _stage1._show_overall_progress = self.verbose
        _stage1.fit(X, y, groups, group_labels=group_labels, feature_names=feature_names)
        _preval_offset = _stage1._preval_offset  # (n,) OOF overall linear predictor

        if self.verbose:
            _n_support = (
                int(np.sum(_stage1.overall_coef_ != 0))
                if self.family != "multinomial"
                else int(np.sum(np.any(_stage1.overall_coef_ != 0, axis=1)))
            )
            _logger.info(f"  Overall model done  |S|={_n_support}")

        # Convert to float64 once; per-group standardization happens inside
        # _group_cv and _fit_groups_only via group_scalers_.
        X = np.asarray(X, dtype=np.float64)

        # Per-alpha OOF accumulators (all n samples, no fold averaging).
        oof_losses = {}
        oof_losses_grp = {a: {} for a in alphas}
        all_estimators_full = {}

        # Pre-compute individual OOF once — doesn't depend on alpha.
        if self.family == "multinomial":
            _oof_ind_eta = np.zeros((len(y), _stage1.n_classes_))
        else:
            _oof_ind_eta = np.zeros(len(y))
        for _g in unique_groups:
            _mask = groups == _g
            _oof_ind_eta[_mask] = _stage1._individual_oof_[_g]
        self.cv_results_individual_ = _cv_loss(
            scorer_fn, y, _apply_link(_oof_ind_eta, self.family), self.family
        )

        # Overall OOF score comes directly from _preval_offset.
        self.cv_results_overall_ = _cv_loss(
            scorer_fn, y, _apply_link(_preval_offset, self.family), self.family
        )

        _pbar = None
        if self.verbose:
            _pbar = _tqdm(total=len(alphas), desc="  Group fits", unit="α", leave=True)
            _pbar.refresh()

        for i, a in enumerate(alphas):
            # Step 2: fit stage-2 group models for this alpha on ALL data.
            if i == 0:
                est = _stage1  # already fully fitted
            else:
                est = self._base_estimator(a)
                est.n_features_in_ = _stage1.n_features_in_
                est.groups_ = _stage1.groups_
                est.group_labels_ = _stage1.group_labels_
                est.n_classes_ = _stage1.n_classes_
                est.feature_names_in_ = _stage1.feature_names_in_
                est._n_onehot_ = _stage1._n_onehot_
                est.scaler_ = _stage1.scaler_
                est.group_scalers_ = _stage1.group_scalers_
                est.overall_model_ = _stage1.overall_model_
                est.overall_lmda_idx_ = _stage1.overall_lmda_idx_
                est.overall_coef_ = _stage1.overall_coef_
                est._n_folds_oof_ = _stage1._n_folds_oof_
                if self.family != "multinomial":
                    est.overall_intercept_ = _stage1.overall_intercept_
                est._fit_groups_only(X, y, groups, _preval_offset)
                # Individual models are alpha-independent — reuse from _stage1.
                est.individual_models_ = _stage1.individual_models_
                est.individual_lmda_idx_ = _stage1.individual_lmda_idx_
                est._individual_oof_ = _stage1._individual_oof_
            all_estimators_full[a] = est

            # Step 3: read stored OOF pretrain predictions (computed inside _group_cv
            # with the same fold split used for lambda selection — keep=TRUE equivalent).
            if self.family == "multinomial":
                oof_pre_eta = np.zeros((len(y), est.n_classes_))
            else:
                oof_pre_eta = np.zeros(len(y))

            for g in unique_groups:
                mask = groups == g
                # _pretrain_oof_[g] is WITHOUT offset; add overall OOF offset here.
                oof_pre_eta[mask] = est._pretrain_oof_[g] + (1.0 - a) * _preval_offset[mask]

                y_pred_g = _apply_link(oof_pre_eta[mask], self.family)
                oof_losses_grp[a][g] = _cv_loss(scorer_fn, y[mask], y_pred_g, self.family)

            y_pred_pre = _apply_link(oof_pre_eta, self.family)
            oof_losses[a] = _cv_loss(scorer_fn, y, y_pred_pre, self.family)

            if _pbar is not None:
                _pbar.update(1)
                _pbar.set_postfix(alpha=f"{a:.2f}", refresh=False)

        if _pbar is not None:
            _pbar.close()

        # Aggregate OOF results.
        self.cv_results_ = oof_losses
        self.cv_results_se_ = {a: 0.0 for a in alphas}  # no SE without fold averaging
        self.cv_results_per_group_ = oof_losses_grp
        self.cv_results_mean_ = {
            a: float(np.nanmean(list(oof_losses_grp[a].values()))) for a in alphas
        }
        self.cv_results_wtd_mean_ = {
            a: float(
                np.nansum(
                    [
                        oof_losses_grp[a][g] * group_sizes[g] / len(y)
                        for g in unique_groups
                        if not np.isnan(oof_losses_grp[a].get(g, float("nan")))
                    ]
                )
            )
            for a in alphas
        }

        # Select global best alpha.
        criterion = self.cv_results_mean_ if self.alphahat_choice == "mean" else self.cv_results_
        self.alpha_ = min(alphas, key=criterion.__getitem__)

        # Per-group best alpha.
        self.varying_alphahat_ = {
            g: min(alphas, key=lambda a, _g=g: self.cv_results_per_group_[a].get(_g, np.inf))
            for g in unique_groups
        }

        # All models were already fitted on full data above — no refit needed.
        unique_alphas = set(self.varying_alphahat_.values()) | {self.alpha_}
        fit_kwargs = dict(group_labels=group_labels, feature_names=feature_names)
        self.all_estimators_ = {}
        for a in unique_alphas:
            if a in all_estimators_full:
                self.all_estimators_[a] = all_estimators_full[a]
            else:
                # Shouldn't happen: all alphalist_ alphas were fitted above.
                self.all_estimators_[a] = self._base_estimator(a).fit(X, y, groups, **fit_kwargs)
        self.best_estimator_ = self.all_estimators_[self.alpha_]

        # Mirror fitted attributes from the best estimator for a uniform interface
        for attr in (
            "overall_model_",
            "overall_lmda_idx_",
            "overall_coef_",
            "pretrain_models_",
            "pretrain_lmda_idx_",
            "individual_models_",
            "individual_lmda_idx_",
            "groups_",
            "feature_names_in_",
            "n_classes_",
            "_n_onehot_",
        ):
            setattr(self, attr, getattr(self.best_estimator_, attr))
        if self.family != "multinomial":
            self.overall_intercept_ = self.best_estimator_.overall_intercept_

        if self.verbose:
            self._print_cv_summary(time.time() - t0)

        return self

    # ------------------------------------------------------------------
    # predict / score / evaluate
    # ------------------------------------------------------------------

    @property
    def alpha(self):
        """Selected alpha — mirrors PretrainedLasso.alpha for a uniform interface."""
        check_is_fitted(self, ["alpha_"])
        return self.alpha_

    def predict(
        self, X, groups, model="pretrain", type="response", alphatype="best", lmda_idx=None
    ):
        """Predict target values.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
        groups : array-like of shape (n_samples,)
        model : {"pretrain", "individual", "overall"}, default="pretrain"
        type : {"response", "link", "class"}, default="response"
            Scale of the returned predictions.  See :meth:`PretrainedLasso.predict`
            for full documentation.
        alphatype : {"best", "varying"}, default="best"
            ``"best"`` uses the globally selected ``alpha_``; ``"varying"``
            uses each group's own optimal alpha from ``varying_alphahat_``.
        lmda_idx : int or None

        Returns
        -------
        y_pred : ndarray of shape (n_samples,) or (n_samples, K)
            Shape is ``(n_samples, K)`` for multinomial ``"response"`` or
            ``"link"``.
        """
        check_is_fitted(self)
        groups = np.asarray(groups)

        if model not in PREDICT_MODELS:
            raise ValueError(f"model must be one of {PREDICT_MODELS}, got '{model}'")
        if type not in PREDICT_TYPES:
            raise ValueError(f"type must be one of {PREDICT_TYPES}, got '{type}'")
        if alphatype not in ALPHATYPES:
            raise ValueError(f"alphatype must be one of {ALPHATYPES}, got '{alphatype}'")

        if alphatype == "varying":
            X = check_array(X, dtype=np.float64, order="F")
            n_out = (X.shape[0], self.n_classes_) if self.family == "multinomial" else (X.shape[0],)
            y_pred = np.empty(n_out)
            for g in np.unique(groups):
                mask = groups == g
                a = self.varying_alphahat_.get(g, self.alpha_)
                y_pred[mask] = self.all_estimators_[a].predict(
                    X[mask], groups[mask], model=model, type=type, lmda_idx=lmda_idx
                )
            return y_pred

        return self.best_estimator_.predict(
            X=X, groups=groups, model=model, type=type, lmda_idx=lmda_idx
        )

    def evaluate(self, X, y, groups, alphatype="best"):
        """Predict and score with all three sub-models.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
        y : array-like of shape (n_samples,)
        groups : array-like of shape (n_samples,)
        alphatype : {"best", "varying"}, default="best"
            ``"best"`` uses ``alpha_``; ``"varying"`` uses each group's own
            optimal alpha from ``varying_alphahat_``.

        Returns
        -------
        result : dict
            Keys are ``"pretrain"``, ``"individual"``, ``"overall"``.
            Each value is a dict with entries:

            ``"predictions"`` : ndarray
                Predicted values.
            ``"score"`` : float
                R² for gaussian; accuracy for binomial/multinomial.
        """
        check_is_fitted(self)
        result = {}
        for m in ("pretrain", "individual", "overall"):
            preds = self.predict(X, groups, model=m, alphatype=alphatype)
            result[m] = {"predictions": preds, "score": _model_score(y, preds, self.family)}
        return result

    def score(self, X, y, groups):
        """Return a scalar performance metric using the best estimator.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
        y : array-like of shape (n_samples,)
        groups : array-like of shape (n_samples,)

        Returns
        -------
        score : float
            R² for gaussian; classification accuracy for binomial/multinomial.
        """
        check_is_fitted(self)
        return self.best_estimator_.score(X, y, groups)

    # ------------------------------------------------------------------
    # repr / get_coef
    # ------------------------------------------------------------------

    def __repr__(self):
        alphas = self._get_alphas()
        header = (
            f"PretrainedLassoCV(alphas={alphas}, n_folds={self.n_folds}, "
            f"alphahat_choice='{self.alphahat_choice}', "
            f"family='{self.family}', overall_lambda='{self.overall_lambda}')"
        )
        if not hasattr(self, "best_estimator_"):
            return header + "\n  [not fitted]"

        rows = [
            f"  alpha={a:.2f}  overall={self.cv_results_[a]:.4f}"
            f"  mean={self.cv_results_mean_[a]:.4f}"
            f"  wtd_mean={self.cv_results_wtd_mean_[a]:.4f}"
            for a in self.alphalist_
        ]
        return (
            f"{header}\n"
            f"  family     : {self.family}\n"
            f"  n_features : {self.n_features_in_}\n"
            f"  n_groups   : {len(self.groups_)}\n"
            f"  alpha_     : {self.alpha_}  ({self.alphahat_choice})\n"
            f"  individual : {self.cv_results_individual_:.4f}\n"
            f"  overall    : {self.cv_results_overall_:.4f}\n"
            f"  CV results :\n" + "\n".join(rows)
        )

    # ------------------------------------------------------------------
    # Serialisation
    # ------------------------------------------------------------------

    def __getstate__(self):
        d = self.__dict__.copy()
        # Convert the adelie state refs mirrored directly onto this object.
        # all_estimators_ / best_estimator_ are PretrainedLasso instances and
        # go through PretrainedLasso.__getstate__ automatically.
        _proxify_models(d)
        return d

    # ------------------------------------------------------------------
    # get_coef
    # ------------------------------------------------------------------

    def get_coef(self, model="all", **kwargs):
        """Return fitted coefficients from the best estimator.

        Delegates to :meth:`PretrainedLasso.get_coef`.

        Parameters
        ----------
        model : {"all", "overall", "pretrain", "individual"}, default="all"
        **kwargs
            Forwarded to :meth:`PretrainedLasso.get_coef`.

        Returns
        -------
        coefs : dict
            See :meth:`PretrainedLasso.get_coef` for the structure.
        """
        check_is_fitted(self)
        return self.best_estimator_.get_coef(model=model, **kwargs)

alpha `property`

alpha

Selected alpha — mirrors PretrainedLasso.alpha for a uniform interface.

fit

fit(X, y, groups, group_labels=None, feature_names=None)

Fit PretrainedLassoCV.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`	Training data.	required
`y`	`array-like of shape (n_samples,)`	Target values. For `family="binomial"`, must contain only 0 and 1. For `family="multinomial"`, must contain non-negative integer class labels 0..K-1.	required
`groups`	`array-like of shape (n_samples,)`	Group membership for each sample. Must contain at least two distinct values.	required
`group_labels`	`dict or None`	Optional mapping from group values to display names used in `__repr__` and `get_coef()`.	`None`
`feature_names`	`array-like of str or None`	Feature names. Inferred from `X.columns` when `X` is a DataFrame and this argument is `None`.	`None`

Returns:

Name	Type	Description
`self`	`PretrainedLassoCV`	Fitted estimator.

Source code in src/ptlasso/_cv.py

def fit(self, X, y, groups, group_labels=None, feature_names=None):
    """Fit PretrainedLassoCV.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Training data.
    y : array-like of shape (n_samples,)
        Target values.  For ``family="binomial"``, must contain only 0
        and 1.  For ``family="multinomial"``, must contain non-negative
        integer class labels 0..K-1.
    groups : array-like of shape (n_samples,)
        Group membership for each sample.  Must contain at least two
        distinct values.
    group_labels : dict or None, default=None
        Optional mapping from group values to display names used in
        ``__repr__`` and ``get_coef()``.
    feature_names : array-like of str or None, default=None
        Feature names.  Inferred from ``X.columns`` when ``X`` is a
        DataFrame and this argument is ``None``.

    Returns
    -------
    self : PretrainedLassoCV
        Fitted estimator.
    """
    t0 = time.time()
    self._validate_params()

    if feature_names is None and hasattr(X, "columns"):
        feature_names = list(X.columns)

    X, y = check_X_y(X, y, dtype=np.float64, order="F")
    groups = np.asarray(groups)

    self.n_features_in_ = X.shape[1]
    self.group_labels_ = group_labels or {}

    alphas = self._get_alphas()
    self.alphalist_ = np.asarray(alphas)

    unique_groups = np.unique(groups)
    group_sizes = {g: int(np.sum(groups == g)) for g in unique_groups}

    scorer_fn = _type_measure_to_scorer(self.type_measure)

    if self.verbose:
        _enable_verbose_logging()
        _logger.info(
            f"\nPretrainedLassoCV  {self.family}  ·  {len(unique_groups)} groups"
            f"  ·  {self.n_features_in_} features"
            f"  ·  {len(alphas)} alphas · OOF-based (all data)"
        )
        from tqdm import tqdm as _tqdm

    # --- R-style architecture: train all models on full data, evaluate
    #     via within-model OOF predictions (mirrors cv.glmnet keep=TRUE).
    #     No data is withheld from model fitting. ---

    if self.verbose:
        _logger.info("\n── Full-data fit (all samples) " + "─" * 26)
        _logger.info("  Overall model (λ-path):")

    # Step 1: fit stage-1 (overall model + OOF) once on ALL data.
    _stage1 = self._base_estimator(alphas[0])
    _stage1._show_overall_progress = self.verbose
    _stage1.fit(X, y, groups, group_labels=group_labels, feature_names=feature_names)
    _preval_offset = _stage1._preval_offset  # (n,) OOF overall linear predictor

    if self.verbose:
        _n_support = (
            int(np.sum(_stage1.overall_coef_ != 0))
            if self.family != "multinomial"
            else int(np.sum(np.any(_stage1.overall_coef_ != 0, axis=1)))
        )
        _logger.info(f"  Overall model done  |S|={_n_support}")

    # Convert to float64 once; per-group standardization happens inside
    # _group_cv and _fit_groups_only via group_scalers_.
    X = np.asarray(X, dtype=np.float64)

    # Per-alpha OOF accumulators (all n samples, no fold averaging).
    oof_losses = {}
    oof_losses_grp = {a: {} for a in alphas}
    all_estimators_full = {}

    # Pre-compute individual OOF once — doesn't depend on alpha.
    if self.family == "multinomial":
        _oof_ind_eta = np.zeros((len(y), _stage1.n_classes_))
    else:
        _oof_ind_eta = np.zeros(len(y))
    for _g in unique_groups:
        _mask = groups == _g
        _oof_ind_eta[_mask] = _stage1._individual_oof_[_g]
    self.cv_results_individual_ = _cv_loss(
        scorer_fn, y, _apply_link(_oof_ind_eta, self.family), self.family
    )

    # Overall OOF score comes directly from _preval_offset.
    self.cv_results_overall_ = _cv_loss(
        scorer_fn, y, _apply_link(_preval_offset, self.family), self.family
    )

    _pbar = None
    if self.verbose:
        _pbar = _tqdm(total=len(alphas), desc="  Group fits", unit="α", leave=True)
        _pbar.refresh()

    for i, a in enumerate(alphas):
        # Step 2: fit stage-2 group models for this alpha on ALL data.
        if i == 0:
            est = _stage1  # already fully fitted
        else:
            est = self._base_estimator(a)
            est.n_features_in_ = _stage1.n_features_in_
            est.groups_ = _stage1.groups_
            est.group_labels_ = _stage1.group_labels_
            est.n_classes_ = _stage1.n_classes_
            est.feature_names_in_ = _stage1.feature_names_in_
            est._n_onehot_ = _stage1._n_onehot_
            est.scaler_ = _stage1.scaler_
            est.group_scalers_ = _stage1.group_scalers_
            est.overall_model_ = _stage1.overall_model_
            est.overall_lmda_idx_ = _stage1.overall_lmda_idx_
            est.overall_coef_ = _stage1.overall_coef_
            est._n_folds_oof_ = _stage1._n_folds_oof_
            if self.family != "multinomial":
                est.overall_intercept_ = _stage1.overall_intercept_
            est._fit_groups_only(X, y, groups, _preval_offset)
            # Individual models are alpha-independent — reuse from _stage1.
            est.individual_models_ = _stage1.individual_models_
            est.individual_lmda_idx_ = _stage1.individual_lmda_idx_
            est._individual_oof_ = _stage1._individual_oof_
        all_estimators_full[a] = est

        # Step 3: read stored OOF pretrain predictions (computed inside _group_cv
        # with the same fold split used for lambda selection — keep=TRUE equivalent).
        if self.family == "multinomial":
            oof_pre_eta = np.zeros((len(y), est.n_classes_))
        else:
            oof_pre_eta = np.zeros(len(y))

        for g in unique_groups:
            mask = groups == g
            # _pretrain_oof_[g] is WITHOUT offset; add overall OOF offset here.
            oof_pre_eta[mask] = est._pretrain_oof_[g] + (1.0 - a) * _preval_offset[mask]

            y_pred_g = _apply_link(oof_pre_eta[mask], self.family)
            oof_losses_grp[a][g] = _cv_loss(scorer_fn, y[mask], y_pred_g, self.family)

        y_pred_pre = _apply_link(oof_pre_eta, self.family)
        oof_losses[a] = _cv_loss(scorer_fn, y, y_pred_pre, self.family)

        if _pbar is not None:
            _pbar.update(1)
            _pbar.set_postfix(alpha=f"{a:.2f}", refresh=False)

    if _pbar is not None:
        _pbar.close()

    # Aggregate OOF results.
    self.cv_results_ = oof_losses
    self.cv_results_se_ = {a: 0.0 for a in alphas}  # no SE without fold averaging
    self.cv_results_per_group_ = oof_losses_grp
    self.cv_results_mean_ = {
        a: float(np.nanmean(list(oof_losses_grp[a].values()))) for a in alphas
    }
    self.cv_results_wtd_mean_ = {
        a: float(
            np.nansum(
                [
                    oof_losses_grp[a][g] * group_sizes[g] / len(y)
                    for g in unique_groups
                    if not np.isnan(oof_losses_grp[a].get(g, float("nan")))
                ]
            )
        )
        for a in alphas
    }

    # Select global best alpha.
    criterion = self.cv_results_mean_ if self.alphahat_choice == "mean" else self.cv_results_
    self.alpha_ = min(alphas, key=criterion.__getitem__)

    # Per-group best alpha.
    self.varying_alphahat_ = {
        g: min(alphas, key=lambda a, _g=g: self.cv_results_per_group_[a].get(_g, np.inf))
        for g in unique_groups
    }

    # All models were already fitted on full data above — no refit needed.
    unique_alphas = set(self.varying_alphahat_.values()) | {self.alpha_}
    fit_kwargs = dict(group_labels=group_labels, feature_names=feature_names)
    self.all_estimators_ = {}
    for a in unique_alphas:
        if a in all_estimators_full:
            self.all_estimators_[a] = all_estimators_full[a]
        else:
            # Shouldn't happen: all alphalist_ alphas were fitted above.
            self.all_estimators_[a] = self._base_estimator(a).fit(X, y, groups, **fit_kwargs)
    self.best_estimator_ = self.all_estimators_[self.alpha_]

    # Mirror fitted attributes from the best estimator for a uniform interface
    for attr in (
        "overall_model_",
        "overall_lmda_idx_",
        "overall_coef_",
        "pretrain_models_",
        "pretrain_lmda_idx_",
        "individual_models_",
        "individual_lmda_idx_",
        "groups_",
        "feature_names_in_",
        "n_classes_",
        "_n_onehot_",
    ):
        setattr(self, attr, getattr(self.best_estimator_, attr))
    if self.family != "multinomial":
        self.overall_intercept_ = self.best_estimator_.overall_intercept_

    if self.verbose:
        self._print_cv_summary(time.time() - t0)

    return self

predict

predict(X, groups, model='pretrain', type='response', alphatype='best', lmda_idx=None)

Predict target values.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`		required
`groups`	`array-like of shape (n_samples,)`		required
`model`	`(pretrain, individual, overall)`		`"pretrain"`
`type`	`(response, link, 'class')`	Scale of the returned predictions. See :meth:`PretrainedLasso.predict` for full documentation.	`"response"`
`alphatype`	`(best, varying)`	`"best"` uses the globally selected `alpha_`; `"varying"` uses each group's own optimal alpha from `varying_alphahat_`.	`"best"`
`lmda_idx`	`int or None`		`None`

Returns:

Name	Type	Description
`y_pred`	`ndarray of shape (n_samples,) or (n_samples, K)`	Shape is `(n_samples, K)` for multinomial `"response"` or `"link"`.

Source code in src/ptlasso/_cv.py

def predict(
    self, X, groups, model="pretrain", type="response", alphatype="best", lmda_idx=None
):
    """Predict target values.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
    groups : array-like of shape (n_samples,)
    model : {"pretrain", "individual", "overall"}, default="pretrain"
    type : {"response", "link", "class"}, default="response"
        Scale of the returned predictions.  See :meth:`PretrainedLasso.predict`
        for full documentation.
    alphatype : {"best", "varying"}, default="best"
        ``"best"`` uses the globally selected ``alpha_``; ``"varying"``
        uses each group's own optimal alpha from ``varying_alphahat_``.
    lmda_idx : int or None

    Returns
    -------
    y_pred : ndarray of shape (n_samples,) or (n_samples, K)
        Shape is ``(n_samples, K)`` for multinomial ``"response"`` or
        ``"link"``.
    """
    check_is_fitted(self)
    groups = np.asarray(groups)

    if model not in PREDICT_MODELS:
        raise ValueError(f"model must be one of {PREDICT_MODELS}, got '{model}'")
    if type not in PREDICT_TYPES:
        raise ValueError(f"type must be one of {PREDICT_TYPES}, got '{type}'")
    if alphatype not in ALPHATYPES:
        raise ValueError(f"alphatype must be one of {ALPHATYPES}, got '{alphatype}'")

    if alphatype == "varying":
        X = check_array(X, dtype=np.float64, order="F")
        n_out = (X.shape[0], self.n_classes_) if self.family == "multinomial" else (X.shape[0],)
        y_pred = np.empty(n_out)
        for g in np.unique(groups):
            mask = groups == g
            a = self.varying_alphahat_.get(g, self.alpha_)
            y_pred[mask] = self.all_estimators_[a].predict(
                X[mask], groups[mask], model=model, type=type, lmda_idx=lmda_idx
            )
        return y_pred

    return self.best_estimator_.predict(
        X=X, groups=groups, model=model, type=type, lmda_idx=lmda_idx
    )

evaluate

evaluate(X, y, groups, alphatype='best')

Predict and score with all three sub-models.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`		required
`y`	`array-like of shape (n_samples,)`		required
`groups`	`array-like of shape (n_samples,)`		required
`alphatype`	`(best, varying)`	`"best"` uses `alpha_`; `"varying"` uses each group's own optimal alpha from `varying_alphahat_`.	`"best"`

Returns:

Name	Type	Description
`result`	`dict`	Keys are `"pretrain"`, `"individual"`, `"overall"`. Each value is a dict with entries: `"predictions"` : ndarray Predicted values. `"score"` : float R² for gaussian; accuracy for binomial/multinomial.

Source code in src/ptlasso/_cv.py

def evaluate(self, X, y, groups, alphatype="best"):
    """Predict and score with all three sub-models.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
    y : array-like of shape (n_samples,)
    groups : array-like of shape (n_samples,)
    alphatype : {"best", "varying"}, default="best"
        ``"best"`` uses ``alpha_``; ``"varying"`` uses each group's own
        optimal alpha from ``varying_alphahat_``.

    Returns
    -------
    result : dict
        Keys are ``"pretrain"``, ``"individual"``, ``"overall"``.
        Each value is a dict with entries:

        ``"predictions"`` : ndarray
            Predicted values.
        ``"score"`` : float
            R² for gaussian; accuracy for binomial/multinomial.
    """
    check_is_fitted(self)
    result = {}
    for m in ("pretrain", "individual", "overall"):
        preds = self.predict(X, groups, model=m, alphatype=alphatype)
        result[m] = {"predictions": preds, "score": _model_score(y, preds, self.family)}
    return result

score

score(X, y, groups)

Return a scalar performance metric using the best estimator.

Parameters:

Name	Type	Default
`X`	`array-like of shape (n_samples, n_features)`	required
`y`	`array-like of shape (n_samples,)`	required
`groups`	`array-like of shape (n_samples,)`	required

Returns:

Name	Type	Description
`score`	`float`	R² for gaussian; classification accuracy for binomial/multinomial.

Source code in src/ptlasso/_cv.py

def score(self, X, y, groups):
    """Return a scalar performance metric using the best estimator.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
    y : array-like of shape (n_samples,)
    groups : array-like of shape (n_samples,)

    Returns
    -------
    score : float
        R² for gaussian; classification accuracy for binomial/multinomial.
    """
    check_is_fitted(self)
    return self.best_estimator_.score(X, y, groups)

get_coef

get_coef(model='all', **kwargs)

Return fitted coefficients from the best estimator.

Delegates to :meth:PretrainedLasso.get_coef.

Parameters:

Name	Type	Description	Default
`model`	`(all, overall, pretrain, individual)`		`"all"`
`**kwargs`		Forwarded to :meth:`PretrainedLasso.get_coef`.	`{}`

Returns:

Name	Type	Description
`coefs`	`dict`	See :meth:`PretrainedLasso.get_coef` for the structure.

Source code in src/ptlasso/_cv.py

def get_coef(self, model="all", **kwargs):
    """Return fitted coefficients from the best estimator.

    Delegates to :meth:`PretrainedLasso.get_coef`.

    Parameters
    ----------
    model : {"all", "overall", "pretrain", "individual"}, default="all"
    **kwargs
        Forwarded to :meth:`PretrainedLasso.get_coef`.

    Returns
    -------
    coefs : dict
        See :meth:`PretrainedLasso.get_coef` for the structure.
    """
    check_is_fitted(self)
    return self.best_estimator_.get_coef(model=model, **kwargs)

Support utilities

ptlasso.get_overall_support

get_overall_support(fit, lmda_idx=None)

Nonzero features from the overall model.

Parameters:

Name	Type	Description	Default
`fit`	`PretrainedLasso or PretrainedLassoCV`		required
`lmda_idx`	`int or None`	Index into the overall model's lambda path. Defaults to `overall_lmda_idx_` (the CV-selected lambda).	`None`

Returns:

Name	Type	Description
`support`	`ndarray of int or str`

Source code in src/ptlasso/_support.py

def get_overall_support(fit, lmda_idx=None):
    """Nonzero features from the overall model.

    Parameters
    ----------
    fit : PretrainedLasso or PretrainedLassoCV
    lmda_idx : int or None
        Index into the overall model's lambda path.
        Defaults to ``overall_lmda_idx_`` (the CV-selected lambda).

    Returns
    -------
    support : ndarray of int or str
    """
    check_is_fitted(fit)
    idx = lmda_idx if lmda_idx is not None else fit.overall_lmda_idx_
    return _resolve(fit, _nonzero_overall(fit, idx))

ptlasso.get_pretrain_support

get_pretrain_support(fit, lmda_idx=None, groups=None, include_overall=True, common_only=False)

Nonzero features from the per-group pretrained models.

Parameters:

Name	Type	Description	Default
`fit`	`PretrainedLasso or PretrainedLassoCV`		required
`lmda_idx`	`int or None`	Lambda index for the pretrained group models. When `None` (default), each group uses its own CV-selected index (`pretrain_lmda_idx_`), matching the lambda used by `predict`.	`None`
`groups`	`array - like or None`	Subset of group labels to consider. Default is all groups.	`None`
`include_overall`	`bool`	Union the per-group support with the overall model support. Ignored when `alpha=1` (R convention).	`True`
`common_only`	`bool`	If True, return only features selected by more than half the groups.	`False`

Returns:

Name	Type	Description
`support`	`ndarray of int or str`

Source code in src/ptlasso/_support.py

def get_pretrain_support(fit, lmda_idx=None, groups=None, include_overall=True, common_only=False):
    """Nonzero features from the per-group pretrained models.

    Parameters
    ----------
    fit : PretrainedLasso or PretrainedLassoCV
    lmda_idx : int or None
        Lambda index for the pretrained group models.  When ``None`` (default),
        each group uses its own CV-selected index (``pretrain_lmda_idx_``),
        matching the lambda used by ``predict``.
    groups : array-like or None
        Subset of group labels to consider.  Default is all groups.
    include_overall : bool, default=True
        Union the per-group support with the overall model support.
        Ignored when ``alpha=1`` (R convention).
    common_only : bool, default=False
        If True, return only features selected by more than half the groups.

    Returns
    -------
    support : ndarray of int or str
    """
    check_is_fitted(fit)
    groups = fit.groups_ if groups is None else np.asarray(groups)

    base = (
        _nonzero_overall(fit, fit.overall_lmda_idx_)
        if include_overall and fit.alpha < 1
        else np.array([], dtype=int)
    )
    # Use each group's CV-selected lambda when lmda_idx is not specified.
    per_group_idxs = getattr(fit, "pretrain_lmda_idx_", {})
    per_group = [
        _nonzero(
            fit,
            fit.pretrain_models_[g],
            lmda_idx if lmda_idx is not None else per_group_idxs.get(g, -1),
        )
        for g in groups
    ]

    return _resolve(fit, _combine(per_group, base, common_only, len(groups)))

ptlasso.get_pretrain_support_split

get_pretrain_support_split(fit, lmda_idx=None, groups=None)

Split pretrain support into stage-1 ("common") and stage-2 ("individual") parts.

Mirrors the R package's suppre.common / suppre.individual convention:

common : features selected by the overall model (stage 1).
individual : features selected by the per-group models (stage 2) that are not in the common support.

Parameters:

Name	Type	Description	Default
`fit`	`PretrainedLasso or PretrainedLassoCV`		required
`lmda_idx`	`int or None`	Lambda index for the group models. When `None` (default), each group uses its own CV-selected index (`pretrain_lmda_idx_`).	`None`
`groups`	`array - like or None`	Subset of group labels. Default is all groups.	`None`

Returns:

Name	Type	Description
`common`	`ndarray of int or str`
`individual`	`ndarray of int or str`

Source code in src/ptlasso/_support.py

def get_pretrain_support_split(fit, lmda_idx=None, groups=None):
    """Split pretrain support into stage-1 ("common") and stage-2 ("individual") parts.

    Mirrors the R package's ``suppre.common`` / ``suppre.individual`` convention:

    - **common**     : features selected by the overall model (stage 1).
    - **individual** : features selected by the per-group models (stage 2)
                       that are *not* in the common support.

    Parameters
    ----------
    fit : PretrainedLasso or PretrainedLassoCV
    lmda_idx : int or None
        Lambda index for the group models.  When ``None`` (default), each
        group uses its own CV-selected index (``pretrain_lmda_idx_``).
    groups : array-like or None
        Subset of group labels.  Default is all groups.

    Returns
    -------
    common : ndarray of int or str
    individual : ndarray of int or str
    """
    check_is_fitted(fit)
    groups = fit.groups_ if groups is None else np.asarray(groups)

    per_group_idxs = getattr(fit, "pretrain_lmda_idx_", {})
    overall_idx = set(_nonzero_overall(fit, fit.overall_lmda_idx_).tolist())
    stage2_idx = set()
    for g in groups:
        idx = lmda_idx if lmda_idx is not None else per_group_idxs.get(g, -1)
        stage2_idx |= set(_nonzero(fit, fit.pretrain_models_[g], idx).tolist())

    common_idx = np.sort(np.array(list(overall_idx), dtype=int))
    individual_idx = np.sort(np.array(list(stage2_idx - overall_idx), dtype=int))

    return _resolve(fit, common_idx), _resolve(fit, individual_idx)

ptlasso.get_individual_support

get_individual_support(fit, lmda_idx=None, groups=None, common_only=False)

Nonzero features from the per-group individual (no-pretraining) models.

Parameters:

Name	Type	Description	Default
`fit`	`PretrainedLasso or PretrainedLassoCV`		required
`lmda_idx`	`int or None`	Lambda index for the individual group models. When `None` (default), each group uses its own CV-selected index (`individual_lmda_idx_`), matching the lambda used by `predict`.	`None`
`groups`	`array - like or None`	Subset of group labels to consider. Default is all groups.	`None`
`common_only`	`bool`	If True, return only features selected by more than half the groups.	`False`

Returns:

Name	Type	Description
`support`	`ndarray of int or str`

Source code in src/ptlasso/_support.py

def get_individual_support(fit, lmda_idx=None, groups=None, common_only=False):
    """Nonzero features from the per-group individual (no-pretraining) models.

    Parameters
    ----------
    fit : PretrainedLasso or PretrainedLassoCV
    lmda_idx : int or None
        Lambda index for the individual group models.  When ``None`` (default),
        each group uses its own CV-selected index (``individual_lmda_idx_``),
        matching the lambda used by ``predict``.
    groups : array-like or None
        Subset of group labels to consider.  Default is all groups.
    common_only : bool, default=False
        If True, return only features selected by more than half the groups.

    Returns
    -------
    support : ndarray of int or str
    """
    check_is_fitted(fit)
    groups = fit.groups_ if groups is None else np.asarray(groups)

    per_group_idxs = getattr(fit, "individual_lmda_idx_", {})
    per_group = [
        _nonzero(
            fit,
            fit.individual_models_[g],
            lmda_idx if lmda_idx is not None else per_group_idxs.get(g, -1),
        )
        for g in groups
    ]

    return _resolve(fit, _combine(per_group, np.array([], dtype=int), common_only, len(groups)))

Plotting

ptlasso.plot_cv

plot_cv(fit, ax=None, plot_alphahat=True, column='single', save=None, colors=None, figure_widths=None)

Plot the cross-validation curve for a :class:PretrainedLassoCV.

Draws the mean CV loss ±1 SE band over alpha, with horizontal reference lines for the individual and overall baselines.

Parameters:

Name	Type	Description	Default
`fit`	`PretrainedLassoCV`	A fitted CV estimator.	required
`ax`	`Axes or None`	Axes to draw on. A new figure is created when `None`.	`None`
`plot_alphahat`	`bool`	Whether to draw a vertical line at the selected `alpha_`.	`True`
`column`	`(single, double)`	Target figure width — `"single"` ≈ 3.5 in, `"double"` ≈ 7 in.	`"single"`
`save`	`str or None`	File path to save the figure (300 dpi). No file is written when `None`.	`None`
`colors`	`dict or None`	Override plot colours for `"overall"`, `"pretrain"`, and/or `"individual"`. Missing keys fall back to :data:`ptlasso.COLORS`.	`None`
`figure_widths`	`dict or None`	Override figure widths for `"single"` and/or `"double"`. Missing keys fall back to :data:`ptlasso.FIGURE_WIDTHS`.	`None`

Returns:

Name	Type	Description
`fig`	`Figure`
`ax`	`Axes`

Source code in src/ptlasso/_plot.py

def plot_cv(
    fit, ax=None, plot_alphahat=True, column="single", save=None, colors=None, figure_widths=None
):
    """Plot the cross-validation curve for a :class:`PretrainedLassoCV`.

    Draws the mean CV loss ±1 SE band over alpha, with horizontal reference
    lines for the individual and overall baselines.

    Parameters
    ----------
    fit : PretrainedLassoCV
        A fitted CV estimator.
    ax : matplotlib.axes.Axes or None, default=None
        Axes to draw on.  A new figure is created when ``None``.
    plot_alphahat : bool, default=True
        Whether to draw a vertical line at the selected ``alpha_``.
    column : {"single", "double"}, default="single"
        Target figure width — ``"single"`` ≈ 3.5 in, ``"double"`` ≈ 7 in.
    save : str or None, default=None
        File path to save the figure (300 dpi).  No file is written when ``None``.
    colors : dict or None, default=None
        Override plot colours for ``"overall"``, ``"pretrain"``, and/or
        ``"individual"``.  Missing keys fall back to :data:`ptlasso.COLORS`.
    figure_widths : dict or None, default=None
        Override figure widths for ``"single"`` and/or ``"double"``.
        Missing keys fall back to :data:`ptlasso.FIGURE_WIDTHS`.

    Returns
    -------
    fig : matplotlib.figure.Figure
    ax : matplotlib.axes.Axes
    """
    check_is_fitted(fit)

    c = _resolve_colors(colors)
    w = _resolve_widths(figure_widths).get(column, 3.5)

    if ax is None:
        fig, ax = plt.subplots(figsize=(w, w * 0.8))
    else:
        fig = ax.get_figure()

    alphas = list(fit.alphalist_)
    cv_mean = np.array([fit.cv_results_[a] for a in alphas])
    cv_se = np.array([fit.cv_results_se_[a] for a in alphas])

    # ±1 SE band + pretrain curve
    ax.fill_between(
        alphas, cv_mean - cv_se, cv_mean + cv_se, color=c["pretrain"], alpha=0.15, linewidth=0
    )
    ax.plot(
        alphas,
        cv_mean,
        color=c["pretrain"],
        marker="o",
        markersize=5,
        linewidth=2,
        label="Pretrain",
        zorder=3,
    )

    # Individual and overall baselines
    ax.axhline(fit.cv_results_individual_, color=c["individual"], linestyle="--", linewidth=1.8)
    ax.axhline(fit.cv_results_overall_, color=c["overall"], linestyle="--", linewidth=1.8)

    # Selected alpha
    if plot_alphahat:
        ax.axvline(
            fit.alpha_,
            color="#555555",
            linestyle=":",
            linewidth=1.5,
            label=f"$\\hat{{\\alpha}}={fit.alpha_}$",
        )

    # Support sizes at right edge
    n_pre = len(get_pretrain_support(fit))
    n_ind = len(get_individual_support(fit))
    n_ov = len(get_overall_support(fit))
    xmax = max(alphas)
    ax.text(
        xmax + 0.01,
        fit.cv_results_individual_,
        f"Individual  $|\\hat{{S}}|={n_ind}$",
        va="center",
        fontsize=8,
        color=c["individual"],
        clip_on=False,
    )
    ax.text(
        xmax + 0.01,
        fit.cv_results_overall_,
        f"Overall  $|\\hat{{S}}|={n_ov}$",
        va="center",
        fontsize=8,
        color=c["overall"],
        clip_on=False,
    )

    # Top axis: support size at selected alpha
    ax2 = ax.twiny()
    ax2.set_xlim(ax.get_xlim())
    ax2.set_xticks([fit.alpha_])
    ax2.set_xticklabels([f"$|\\hat{{S}}|={n_pre}$"], fontsize=8, color=c["pretrain"])
    ax2.tick_params(length=0)
    ax2.spines["top"].set_visible(False)
    ax2.spines["right"].set_visible(False)

    _despine(ax)
    ax.set_xlabel("$\\alpha$", fontsize=11)
    ax.set_ylabel(_cv_ylabel(fit.family), fontsize=11)
    ax.set_title("Cross-validation over $\\alpha$", fontsize=12, pad=14)
    ax.set_xticks(alphas)
    ax.tick_params(labelsize=9)
    ax.legend(frameon=False, fontsize=9, loc="upper right")

    fig.tight_layout()
    if save:
        fig.savefig(save, dpi=300, bbox_inches="tight")
    return fig, ax

ptlasso.plot_paths

plot_paths(fit, column='double', save=None, colors=None, figure_widths=None)

Plot regularisation paths for all sub-models in a :class:PretrainedLasso.

Produces a 3-row grid: overall model (full width), per-group pretrained models, and per-group individual models. Features in the final support are coloured; inactive features are shown in grey.

Parameters:

Name	Type	Description	Default
`fit`	`PretrainedLasso or PretrainedLassoCV`	A fitted estimator.	required
`column`	`(single, double)`	Target figure width — `"single"` ≈ 3.5 in, `"double"` ≈ 7 in.	`"single"`
`save`	`str or None`	File path to save the figure (300 dpi). No file is written when `None`.	`None`
`colors`	`dict or None`	Override plot colours for `"overall"`, `"pretrain"`, and/or `"individual"`. Missing keys fall back to :data:`ptlasso.COLORS`.	`None`
`figure_widths`	`dict or None`	Override figure widths for `"single"` and/or `"double"`. Missing keys fall back to :data:`ptlasso.FIGURE_WIDTHS`.	`None`

Returns:

Name	Type	Description
`fig`	`Figure`

Source code in src/ptlasso/_plot.py

def plot_paths(fit, column="double", save=None, colors=None, figure_widths=None):
    """Plot regularisation paths for all sub-models in a :class:`PretrainedLasso`.

    Produces a 3-row grid: overall model (full width), per-group pretrained
    models, and per-group individual models.  Features in the final support are
    coloured; inactive features are shown in grey.

    Parameters
    ----------
    fit : PretrainedLasso or PretrainedLassoCV
        A fitted estimator.
    column : {"single", "double"}, default="double"
        Target figure width — ``"single"`` ≈ 3.5 in, ``"double"`` ≈ 7 in.
    save : str or None, default=None
        File path to save the figure (300 dpi).  No file is written when ``None``.
    colors : dict or None, default=None
        Override plot colours for ``"overall"``, ``"pretrain"``, and/or
        ``"individual"``.  Missing keys fall back to :data:`ptlasso.COLORS`.
    figure_widths : dict or None, default=None
        Override figure widths for ``"single"`` and/or ``"double"``.
        Missing keys fall back to :data:`ptlasso.FIGURE_WIDTHS`.

    Returns
    -------
    fig : matplotlib.figure.Figure
    """
    check_is_fitted(fit)

    c = _resolve_colors(colors)
    k = len(fit.groups_)
    w = _resolve_widths(figure_widths).get(column, 7.0)
    labels = _label_map(fit)
    feature_colors = _feature_color_map(fit)

    all_lmdas = np.log(np.asarray(fit.overall_model_.lmdas))
    xlim = (all_lmdas[0], all_lmdas[-1] + 0.18 * (all_lmdas[-1] - all_lmdas[0]))
    overall_sup = set(
        np.where(np.reshape(fit.overall_coef_, (fit.n_features_in_, -1)).any(axis=1))[0]
    )

    # The overall model was trained on [onehot | X]; strip the onehot columns
    # from betas so that _draw_paths sees only the p X-feature coefficients.
    n_onehot = getattr(fit, "_n_onehot_", 0)

    class _StrippedOverallState:
        """Thin view of the overall model state with onehot columns removed."""

        def __init__(self, state, n_skip):
            raw = _betas_dense(state)  # (L, k-1+p)
            self.betas = raw[:, n_skip:]  # (L, p)
            self.lmdas = state.lmdas
            self.intercepts = state.intercepts

    overall_state_for_plot = (
        _StrippedOverallState(fit.overall_model_, n_onehot) if n_onehot > 0 else fit.overall_model_
    )

    fig = plt.figure(figsize=(w, w * 0.42 * 3))
    gs = gridspec.GridSpec(3, k, figure=fig, hspace=0.7, wspace=0.45)

    # Row 0: overall model (spans all columns)
    _draw_paths(
        fig.add_subplot(gs[0, :]),
        overall_state_for_plot,
        "Overall",
        c["overall"],
        feature_colors,
        overall_sup,
        labels,
        xlim,
    )

    # Rows 1 & 2: per-group pretrain and individual
    pre_idxs = getattr(fit, "pretrain_lmda_idx_", {})
    ind_idxs = getattr(fit, "individual_lmda_idx_", {})
    for col, g in enumerate(fit.groups_):
        lbl = fit._label(g)
        pre_i = pre_idxs.get(g, -1)
        ind_i = ind_idxs.get(g, -1)
        pre_sup = set(np.where(_betas_dense(fit.pretrain_models_[g])[pre_i] != 0)[0])
        ind_sup = set(np.where(_betas_dense(fit.individual_models_[g])[ind_i] != 0)[0])

        _draw_paths(
            fig.add_subplot(gs[1, col]),
            fit.pretrain_models_[g],
            lbl,
            c["pretrain"],
            feature_colors,
            pre_sup,
            labels,
            xlim,
        )
        _draw_paths(
            fig.add_subplot(gs[2, col]),
            fit.individual_models_[g],
            lbl,
            c["individual"],
            feature_colors,
            ind_sup,
            labels,
            xlim,
        )

    # Row labels in left margin
    for row, (text, color) in enumerate(
        [
            ("Overall", c["overall"]),
            ("Pretrain", c["pretrain"]),
            ("Individual", c["individual"]),
        ]
    ):
        fig.text(
            0.01,
            1 - (row + 0.5) / 3,
            text,
            va="center",
            ha="center",
            fontsize=10,
            fontweight="bold",
            color=color,
            rotation=90,
            transform=fig.transFigure,
        )

    fig.suptitle("Regularisation paths", fontsize=13, y=1.01)
    if save:
        fig.savefig(save, dpi=300, bbox_inches="tight")
    return fig

Simulation

ptlasso.make_data

make_data(k, class_sizes, s_common, s_indiv, beta_common, beta_indiv, intercepts=None, sigma=1.0, family='gaussian', seed=None)

Generate synthetic grouped data for ptlasso.

Feature layout

Columns 0 … s_common-1 : shared features (active in all groups)
Columns s_common … s_common+s_indiv[0]-1 : features specific to group 0
... and so on for each group
Remaining columns (if any) : pure noise

Parameters:

Name	Type	Description	Default
`k`	`int`	Number of groups.	required
`class_sizes`	`array-like of length k`	Number of observations per group.	required
`s_common`	`int`	Number of shared (common) features.	required
`s_indiv`	`int or array-like of length k`	Number of group-specific features per group.	required
`beta_common`	`float or array - like`	Coefficients for common features. Scalar → same value for all s_common features in every group. 1-D array of length s_common → per-feature coefficient (same across groups). List of k arrays → per-group, per-feature coefficients.	required
`beta_indiv`	`float or array - like`	Coefficients for group-specific features, same shapes as beta_common.	required
`intercepts`	`array-like of length k or None`	Per-group intercepts. Default is 0.	`None`
`sigma`	`float`	Gaussian noise std (only used for `family="gaussian"`).	`1.0`
`family`	`(gaussian, binomial)`		`"gaussian"`
`seed`	`int or None`	Random seed passed to `numpy.random.default_rng`.	`None`

Returns:

Type	Description
`dict with keys`	`X` : ndarray (n, p) `y` : ndarray (n,) `groups` : ndarray (n,) of int

Source code in src/ptlasso/_simulate.py

def make_data(
    k,
    class_sizes,
    s_common,
    s_indiv,
    beta_common,
    beta_indiv,
    intercepts=None,
    sigma=1.0,
    family="gaussian",
    seed=None,
):
    """Generate synthetic grouped data for ptlasso.

    Feature layout
    --------------
    - Columns 0 … s_common-1             : shared features (active in all groups)
    - Columns s_common … s_common+s_indiv[0]-1 : features specific to group 0
    - ... and so on for each group
    - Remaining columns (if any)          : pure noise

    Parameters
    ----------
    k : int
        Number of groups.
    class_sizes : array-like of length k
        Number of observations per group.
    s_common : int
        Number of shared (common) features.
    s_indiv : int or array-like of length k
        Number of group-specific features per group.
    beta_common : float or array-like
        Coefficients for common features.  Scalar → same value for all s_common
        features in every group.  1-D array of length s_common → per-feature
        coefficient (same across groups).  List of k arrays → per-group,
        per-feature coefficients.
    beta_indiv : float or array-like
        Coefficients for group-specific features, same shapes as beta_common.
    intercepts : array-like of length k or None
        Per-group intercepts.  Default is 0.
    sigma : float, default=1.0
        Gaussian noise std (only used for ``family="gaussian"``).
    family : {"gaussian", "binomial"}, default="gaussian"
    seed : int or None
        Random seed passed to ``numpy.random.default_rng``.

    Returns
    -------
    dict with keys
        ``X``      : ndarray (n, p)
        ``y``      : ndarray (n,)
        ``groups`` : ndarray (n,) of int
    """
    rng = np.random.default_rng(seed)

    class_sizes = np.asarray(class_sizes, dtype=int)
    s_indiv = np.broadcast_to(s_indiv, (k,)).copy()
    n = int(class_sizes.sum())
    p = s_common + int(s_indiv.sum())

    X = rng.standard_normal((n, p))
    y = np.empty(n)
    groups = np.empty(n, dtype=int)

    row_start = 0
    feat_start = s_common

    if intercepts is None:
        intercepts = np.zeros(k)

    for kk in range(k):
        row_end = row_start + class_sizes[kk]
        feat_end = feat_start + s_indiv[kk]

        beta = np.zeros(p)
        beta[:s_common] = _expand_coef(beta_common, kk, s_common)
        beta[feat_start:feat_end] = _expand_coef(beta_indiv, kk, s_indiv[kk])

        mu = X[row_start:row_end] @ beta + intercepts[kk]

        if family == "gaussian":
            y[row_start:row_end] = mu + sigma * rng.standard_normal(class_sizes[kk])
        elif family == "binomial":
            prob = 1.0 / (1.0 + np.exp(-mu))
            y[row_start:row_end] = rng.binomial(1, prob).astype(float)
        else:
            raise ValueError(f"family must be 'gaussian' or 'binomial', got '{family}'")

        groups[row_start:row_end] = kk
        row_start = row_end
        feat_start = feat_end

    return {"X": X, "y": y, "groups": groups}

ptlasso.gaussian_example_data

gaussian_example_data(k=2, class_sizes=None, s_common=5, s_indiv=5, beta_common=1.0, beta_indiv=0.5, sigma=1.0, seed=None)

Convenience wrapper for Gaussian grouped data.

Parameters:

Name	Type	Description	Default
`k`	`int`		`2`
`class_sizes`	`array - like or None`	Defaults to 50 observations per group.	`None`
`s_common`	`int`		`5`
`s_indiv`	`int or array - like`		`5`
`beta_common`	`float`		`1.0`
`beta_indiv`	`float`		`0.5`
`sigma`	`float`		`1.0`
`seed`	`int or None`		`None`

Returns:

Type	Description
dict with keys ``X``, ``y``, ``groups``

Source code in src/ptlasso/_simulate.py

def gaussian_example_data(
    k=2,
    class_sizes=None,
    s_common=5,
    s_indiv=5,
    beta_common=1.0,
    beta_indiv=0.5,
    sigma=1.0,
    seed=None,
):
    """Convenience wrapper for Gaussian grouped data.

    Parameters
    ----------
    k : int, default=2
    class_sizes : array-like or None
        Defaults to 50 observations per group.
    s_common : int, default=5
    s_indiv : int or array-like, default=5
    beta_common : float, default=1.0
    beta_indiv : float, default=0.5
    sigma : float, default=1.0
    seed : int or None

    Returns
    -------
    dict with keys ``X``, ``y``, ``groups``
    """
    if class_sizes is None:
        class_sizes = [50] * k
    return make_data(
        k=k,
        class_sizes=class_sizes,
        s_common=s_common,
        s_indiv=s_indiv,
        beta_common=beta_common,
        beta_indiv=beta_indiv,
        sigma=sigma,
        family="gaussian",
        seed=seed,
    )

ptlasso.binomial_example_data

binomial_example_data(k=2, class_sizes=None, s_common=5, s_indiv=5, beta_common=0.5, beta_indiv=0.3, seed=None)

Convenience wrapper for binary grouped data.

Parameters:

Name	Type	Description	Default
`k`	`int`		`2`
`class_sizes`	`array - like or None`	Defaults to 100 observations per group.	`None`
`s_common`	`int`		`5`
`s_indiv`	`int or array - like`		`5`
`beta_common`	`float`		`0.5`
`beta_indiv`	`float`		`0.3`
`seed`	`int or None`		`None`

Returns:

Type	Description
dict with keys ``X``, ``y``, ``groups``

Source code in src/ptlasso/_simulate.py

def binomial_example_data(
    k=2,
    class_sizes=None,
    s_common=5,
    s_indiv=5,
    beta_common=0.5,
    beta_indiv=0.3,
    seed=None,
):
    """Convenience wrapper for binary grouped data.

    Parameters
    ----------
    k : int, default=2
    class_sizes : array-like or None
        Defaults to 100 observations per group.
    s_common : int, default=5
    s_indiv : int or array-like, default=5
    beta_common : float, default=0.5
    beta_indiv : float, default=0.3
    seed : int or None

    Returns
    -------
    dict with keys ``X``, ``y``, ``groups``
    """
    if class_sizes is None:
        class_sizes = [100] * k
    return make_data(
        k=k,
        class_sizes=class_sizes,
        s_common=s_common,
        s_indiv=s_indiv,
        beta_common=beta_common,
        beta_indiv=beta_indiv,
        family="binomial",
        seed=seed,
    )

API Reference

Estimators

PretrainedLasso

ptlasso.PretrainedLasso

fit

predict

score

evaluate

get_coef

PretrainedLassoCV

ptlasso.PretrainedLassoCV

alpha property

fit

predict

evaluate

score

get_coef

Support utilities

ptlasso.get_overall_support

ptlasso.get_pretrain_support

ptlasso.get_pretrain_support_split

ptlasso.get_individual_support

Plotting

ptlasso.plot_cv

ptlasso.plot_paths

Simulation

ptlasso.make_data

ptlasso.gaussian_example_data

ptlasso.binomial_example_data

alpha `property`