Feature Importance#

Aggregation, extraction, and stability analysis of feature importance scores across outer folds.

FeatureImportanceAggregator#

class nestkit.importance.FeatureImportanceAggregator(results, method='auto', feature_names=None, shap_type='auto', normalize=True)[source]#

Bases: object

Aggregate feature importances across nested CV outer folds.

Extracts importance scores from each outer-fold estimator, optionally normalizes them, and computes summary statistics including mean, standard deviation, coefficient of variation, and rank-based diagnostics. Also supports SHAP-based model-agnostic importances.

Parameters:

results (_BaseNestedCVResults) – Fitted nested CV results object. Must have been produced with return_estimator=True so that per-fold estimators are available.
method ({"auto", "model", "shap"}, default="auto") –
Importance extraction strategy.
- "auto" / "model" – use feature_importances_ or coef_ from the fitted estimator.
- "shap" – compute SHAP values on the outer test fold.
feature_names (list[str] or None, optional) – Human-readable feature names. If None, inferred from results.feature_names_in_ when available, otherwise feature_0, feature_1, etc.
shap_type ({"tree", "kernel", "linear", "auto"}, default="auto") – SHAP explainer backend. Only used when method is "shap".
normalize (bool, default=True) – If True, absolute importances are rescaled to sum to 1 within each fold.

importances_matrix_#

Shape (n_folds, n_features). Set after compute().

Type:: numpy.ndarray

ranks_matrix_#

Shape (n_folds, n_features). Rank of each feature per fold.

Type:: numpy.ndarray

summary_#

Per-feature aggregated statistics. Set after compute().

Type:: pandas.DataFrame

Raises:

ValueError – If results does not contain fitted estimators.

Parameters:

results (_BaseNestedCVResults)
method (str)
feature_names (list[str] | None)
shap_type (str)
normalize (bool)

Examples

>>> agg = FeatureImportanceAggregator(results)
>>> agg.compute()
>>> agg.summary_.head()

See also

nestkit.importance.stability.nogueira_stability_index

References

[1]

Nogueira, S., Sechidis, K., and Brown, G. (2018). “On the Stability of Feature Selection Algorithms.” JMLR, 18(174), 1–54.

consensus_features(criterion='top_k', top_k=10, min_frequency=0.8)[source]#

Identify features that are consistently important across folds.

Two selection strategies are available:

"top_k" – return the top_k features by mean importance (from summary_).
"frequency" – return features that appear in the per-fold top-k set in at least min_frequency fraction of all folds.

Parameters:

criterion ({"top_k", "frequency"}, default="top_k") – Selection strategy.
top_k (int, default=10) – Number of top features per fold (used by both criteria).
min_frequency (float, default=0.8) – Minimum fraction of folds in which a feature must appear in the top-k set. Only used when criterion="frequency".

Returns:

Feature names that satisfy the criterion.

Return type:

list[str]

Raises:

ValueError – If criterion is not recognised.

Examples

>>> agg.compute()
>>> agg.consensus_features("frequency", top_k=5, min_frequency=0.9)

pairwise_rank_correlation()[source]#

Compute Spearman rank correlation of feature importances between all fold pairs.

High correlations indicate that the relative ordering of features is stable across outer folds.

Returns:: One row per fold pair with columns fold_i, fold_j, spearman_r, and p_value.
Return type:: pandas.DataFrame

Examples

>>> agg.compute()
>>> agg.pairwise_rank_correlation()