Imbalanced binary classification. For compliance, recall on the reportable class matters most — a false negative is a reportable complaint that escaped.
Benchmark vs current performance. The control concern is the recall drop, not the high precision.
Scorecard
Metric
Benchmark
Current
Δ
accuracy
0.949
0.832
-0.117
balanced_accuracy
0.948
0.857
-0.091
precision
0.953
0.966
+0.013
recall
0.957
0.759
-0.198
f1
0.955
0.850
-0.105
f1_macro
0.948
0.830
-0.118
roc_auc
0.946
0.870
-0.075
pr_auc
0.932
0.925
-0.007
Confusion matrix & error breakdown
False negatives are the bottom-left cell: truly reportable complaints predicted as not reportable.
False negatives (FN=454) are reportable complaints predicted not-reportable — the costly error here.
False positives (FP=50) create review workload but are safer.