Diagnostics#

Tools for assessing the stability and reliability of a nested cross-validation procedure.

HyperparameterStability#

class nestkit.diagnostics.HyperparameterStability(best_params_per_fold)[source]#

Bases: object

Assess hyperparameter selection consistency across outer folds.

Analyses the best hyperparameter configurations chosen in each outer fold and provides summary statistics (mode, entropy, agreement rate, coefficient of variation), pairwise Jaccard similarity, and a stability flag per parameter.

Parameters:

best_params_per_fold (list of dict) – Best hyperparameters selected in each outer fold. Each dict maps parameter names to their selected values.

best_params_per_fold#

The input parameter sets (stored by reference).

Type:

list of dict

n_folds#

Number of outer folds (len(best_params_per_fold)).

Type:

int

Examples

>>> from nestkit.diagnostics.stability import HyperparameterStability
>>> params = [
...     {"C": 1.0, "kernel": "rbf"},
...     {"C": 1.0, "kernel": "rbf"},
...     {"C": 0.1, "kernel": "rbf"},
... ]
>>> hs = HyperparameterStability(params)
>>> hs.summary()
  param mode  nunique   entropy  agreement_rate   cv
0     C  1.0        2  0.918...        0.666667  ...
1  kernel  rbf      1  0.000000        1.000000  NaN
summary()[source]#

Compute per-parameter stability summary statistics.

Returns:

One row per hyperparameter with columns:

  • param – Hyperparameter name.

  • mode – Most frequently selected value (as string).

  • nunique – Number of distinct values across folds.

  • entropy – Shannon entropy (base 2) of the value distribution. Zero means perfect agreement.

  • agreement_rate – Fraction of folds that selected the modal value. Ranges from 1/n_folds to 1.

  • cv – Coefficient of variation (std / mean) for numeric parameters. NaN for non-numeric parameters.

Return type:

pandas.DataFrame

Notes

Values are converted to strings for counting purposes, so 1 and 1.0 are treated as distinct.

Examples

>>> hs = HyperparameterStability([{"lr": 0.01}, {"lr": 0.01}])
>>> hs.summary()["agreement_rate"].iloc[0]
1.0
is_stable(threshold=0.8)[source]#

Determine whether each hyperparameter is stable.

A parameter is considered stable if its agreement rate (fraction of folds selecting the modal value) meets or exceeds the given threshold.

Parameters:

threshold (float, default 0.8) – Minimum agreement rate to consider a parameter stable.

Returns:

dict of {str – Mapping from parameter name to stability flag.

Return type:

bool}

Examples

>>> hs = HyperparameterStability([
...     {"C": 1.0}, {"C": 1.0}, {"C": 0.1}
... ])
>>> hs.is_stable(threshold=0.5)
{'C': True}
>>> hs.is_stable(threshold=0.8)
{'C': False}
pairwise_jaccard()[source]#

Compute pairwise Jaccard similarity of hyperparameter configurations.

Treats each fold’s selected configuration as a set of "param=value" strings and computes the Jaccard index for every pair of folds.

Returns:

DataFrame with columns fold_i, fold_j, jaccard. One row per unique pair of folds.

Return type:

pandas.DataFrame

Notes

The Jaccard similarity index is defined as:

\[J(A, B) = \frac{|A \cap B|}{|A \cup B|}\]

where A and B are the sets of "param=value" strings for two folds. A Jaccard index of 1.0 means the two folds selected identical configurations; 0.0 means completely different configurations.

Examples

>>> hs = HyperparameterStability([
...     {"C": 1.0, "kernel": "rbf"},
...     {"C": 1.0, "kernel": "rbf"},
...     {"C": 0.1, "kernel": "linear"},
... ])
>>> hs.pairwise_jaccard()
   fold_i  fold_j  jaccard
0       0       1      1.0
1       0       2      0.0
2       1       2      0.0