Thresholding#
Utilities for optimising classification decision thresholds, including multiple built-in criteria and strategies for computing thresholds across folds.
Result container#
- class nestkit.thresholding.ThresholdResult(strategy, optimal_threshold, criterion_name, criterion_value_at_optimum, fold_thresholds=None, fold_threshold_std=None, threshold_sensitivity=<factory>)[source]#
Bases:
objectThreshold optimization results for a single outer fold.
Stores the optimal threshold, the criterion used, per-inner-fold thresholds (for the fold-specific strategy), and a full threshold-sensitivity grid for downstream analysis and plotting.
- Parameters:
- strategy#
Threshold selection strategy:
"fold_specific"(optimise per inner fold then average) or"pooled"(optimise on pooled inner out-of-fold predictions).- Type:
- criterion_name#
Human-readable name of the optimisation criterion (e.g.,
"youden_j","f_1.0").- Type:
- fold_thresholds#
Per-inner-fold optimal thresholds. Only populated for the
"fold_specific"strategy;Nonefor"pooled".- Type:
numpy.ndarray or None
- fold_threshold_std#
Standard deviation of
fold_thresholds, serving as a stability indicator for the fold-specific strategy.Nonefor"pooled".- Type:
float or None
- threshold_sensitivity#
Full threshold-sensitivity grid with columns
threshold,criterion_value,sensitivity,specificity,precision,recall,f1. Useful for plotting threshold-performance curves.- Type:
See also
nestkit.thresholding.strategies.FoldSpecificThresholdProduces
ThresholdResultwithstrategy="fold_specific".nestkit.thresholding.strategies.PooledThresholdProduces
ThresholdResultwithstrategy="pooled".nestkit.thresholding.criteriaBuilt-in criterion functions.
Examples
>>> result.optimal_threshold 0.42 >>> result.threshold_sensitivity.head() threshold criterion_value sensitivity specificity ...
Criteria functions#
Pre-built objective functions that can be passed to the threshold optimiser.
- nestkit.thresholding.youden_j(y_true, y_proba, threshold)[source]#
Compute Youden’s J statistic at the given threshold.
Youden’s J is defined as
sensitivity + specificity - 1and ranges from -1 (complete misclassification) to +1 (perfect classification). Maximising J yields the threshold that best separates the two classes.- Parameters:
y_true (numpy.ndarray of shape (n_samples,)) – True binary labels (0 or 1).
y_proba (numpy.ndarray of shape (n_samples,)) – Predicted positive-class probabilities.
threshold (float) – Decision threshold in [0, 1].
- Returns:
Youden’s J in [-1, 1].
- Return type:
Notes
\[J = \text{sensitivity} + \text{specificity} - 1 = \frac{TP}{TP + FN} + \frac{TN}{TN + FP} - 1\]This is equivalent to the vertical distance between the ROC curve and the diagonal chance line.
Examples
>>> import numpy as np >>> from nestkit.thresholding.criteria import youden_j >>> youden_j(np.array([0, 0, 1, 1]), np.array([0.1, 0.4, 0.6, 0.9]), 0.5) 1.0
- nestkit.thresholding.f_beta_criterion(beta=1.0)[source]#
Create a criterion function that maximises the F-beta score.
- Parameters:
beta (float, default 1.0) – The beta parameter of the F-beta score.
beta < 1weights precision higher;beta > 1weights recall higher.beta = 1gives the standard F1 score.- Returns:
A criterion function with signature
(y_true, y_proba, threshold) -> float.- Return type:
callable
Notes
The F-beta score is defined as:
\[F_\beta = (1 + \beta^2) \cdot \frac{\text{precision} \cdot \text{recall}} {\beta^2 \cdot \text{precision} + \text{recall}}\]Examples
>>> import numpy as np >>> from nestkit.thresholding.criteria import f_beta_criterion >>> f1_criterion = f_beta_criterion(beta=1.0) >>> f1_criterion( ... np.array([0, 0, 1, 1]), ... np.array([0.1, 0.4, 0.6, 0.9]), ... 0.5, ... ) 1.0
See also
youden_jAlternative criterion based on sensitivity + specificity.
- nestkit.thresholding.cost_sensitive(cost_matrix)[source]#
Create a criterion that minimises expected misclassification cost.
The returned function computes the negative total cost so that
argmaxcorresponds toargminof cost.- Parameters:
cost_matrix (array-like of shape (2, 2)) –
Cost matrix
[[C_TN, C_FP], [C_FN, C_TP]]where:C_TN– cost of a true negative (usually 0).C_FP– cost of a false positive.C_FN– cost of a false negative.C_TP– cost of a true positive (usually 0).
- Returns:
A criterion function with signature
(y_true, y_proba, threshold) -> floatreturning negative total cost.- Return type:
callable
Notes
The total cost is:
\[\text{Cost} = C_{TN} \cdot TN + C_{FP} \cdot FP + C_{FN} \cdot FN + C_{TP} \cdot TP\]The function returns
-\text{Cost}so that maximisation viaargmaxyields the cost-minimising threshold.Examples
>>> import numpy as np >>> from nestkit.thresholding.criteria import cost_sensitive >>> # FP costs 1, FN costs 5 >>> criterion = cost_sensitive([[0, 1], [5, 0]]) >>> criterion( ... np.array([0, 0, 1, 1]), ... np.array([0.1, 0.4, 0.6, 0.9]), ... 0.5, ... ) 0
See also
youden_jCost-agnostic criterion.
- nestkit.thresholding.balanced_accuracy_criterion(y_true, y_proba, threshold)[source]#
Maximise balanced accuracy at the given threshold.
Balanced accuracy is the arithmetic mean of sensitivity and specificity, equivalent to
(Youden's J + 1) / 2.- Parameters:
y_true (numpy.ndarray of shape (n_samples,)) – True binary labels (0 or 1).
y_proba (numpy.ndarray of shape (n_samples,)) – Predicted positive-class probabilities.
threshold (float) – Decision threshold in [0, 1].
- Returns:
Balanced accuracy in [0, 1].
- Return type:
Notes
\[\text{BA} = \frac{\text{sensitivity} + \text{specificity}}{2}\]Examples
>>> import numpy as np >>> from nestkit.thresholding.criteria import balanced_accuracy_criterion >>> balanced_accuracy_criterion( ... np.array([0, 0, 1, 1]), ... np.array([0.1, 0.4, 0.6, 0.9]), ... 0.5, ... ) 1.0
See also
youden_jEquivalent to
2 * balanced_accuracy - 1.
- nestkit.thresholding.precision_at_recall(min_recall=0.9)[source]#
Create a criterion that maximises precision subject to a minimum recall.
Thresholds that produce a recall below
min_recallreceive a score of -1, effectively excluding them from selection.- Parameters:
min_recall (float, default 0.90) – Minimum acceptable recall. Must be in (0, 1].
- Returns:
A criterion function with signature
(y_true, y_proba, threshold) -> float. Returnsprecisionwhenrecall >= min_recall, else-1.- Return type:
callable
Notes
This implements a constrained optimisation: among all thresholds achieving at least
min_recall, select the one with the highest precision. The penalty of -1 for violating the recall constraint ensures thatargmaxnever selects an infeasible threshold.Examples
>>> import numpy as np >>> from nestkit.thresholding.criteria import precision_at_recall >>> criterion = precision_at_recall(min_recall=0.80) >>> criterion( ... np.array([0, 0, 1, 1, 1]), ... np.array([0.1, 0.3, 0.6, 0.7, 0.9]), ... 0.5, ... ) 1.0
See also
f_beta_criterionUnconstrained precision–recall trade-off.