Quality metrics

Imbalanced binary classification. For compliance, recall on the reportable class matters most — a false negative is a reportable complaint that escaped.

metric bars — Benchmark vs current performance. The control concern is the recall drop, not the high precision.

Scorecard

Metric	Benchmark	Current	Δ
accuracy	0.949	0.832	-0.117
balanced_accuracy	0.948	0.857	-0.091
precision	0.953	0.966	+0.013
recall	0.957	0.759	-0.198
f1	0.955	0.850	-0.105
f1_macro	0.948	0.830	-0.118
roc_auc	0.946	0.870	-0.075
pr_auc	0.932	0.925	-0.007

Confusion matrix & error breakdown

False negatives (FN=454) are reportable complaints predicted not-reportable — the costly error here. False positives (FP=50) create review workload but are safer.