A nested cross-validation toolkit for scikit-learn
nestkit provides a nested cross-validation framework for scikit-learn with integrated calibration, threshold optimization, statistical comparison, and comprehensive diagnostics, all within a single, leakage-free evaluation pipeline.
Classification and regression with full scikit-learn API compatibility and leakage-free evaluation.
Post-hoc probability calibration via Platt scaling, isotonic regression, beta calibration, and Venn-ABERS.
Decision-threshold optimization with Youden’s J, F-beta, cost-sensitive, and precision-at-recall criteria.
CV+ Mondrian conformal prediction sets (classification) and conditional prediction intervals (regression).
Nadeau-Bengio corrected t-test, Bayesian correlated t-test with ROPE, and Holm-Bonferroni correction.
Hyperparameter stability and feature importance aggregation with Nogueira stability index.
25+ visualizations: ROC curves, calibration diagrams, confusion matrices, critical difference diagrams, and more.
Interactive Jupyter notebooks covering basic usage and advanced workflows.
Getting started#
pip install nestkit
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from nestkit import NestedCVClassifier
X, y = load_breast_cancer(return_X_y=True)
ncv = NestedCVClassifier(
estimator=RandomForestClassifier(random_state=42),
param_grid={"n_estimators": [50, 100], "max_depth": [3, 5, 10]},
outer_cv=5,
inner_cv=3,
scoring="accuracy",
random_state=42,
)
ncv.fit(X, y)
print(ncv.results_.summary_default_)