Reportability model monitoring

Generated 2026-06-04 15:40 UTC · benchmark n=900 · current n=3000 · model: stand-in TF-IDF+LogReg

The classifier decides whether a customer complaint is reportable because it shows financial or emotional impact. The core control question is whether reportable complaints are being missed.

Missed reportable complaints

454

false negatives in current window

Recall floor check

0.76

Δ -0.198 vs benchmark

Overall drift PSI

0.33

significant

Precision

0.97

Δ +0.013

0.85

Δ -0.105

Accuracy

0.83

Δ -0.117

Evidence map

Page	Question it answers	Primary risk signal
Quality metrics	Is the model accurate, and where do errors fall?	False negatives and reportable recall
Distribution & bias	Does it under/over-flag? Are confidence scores safe to use?	Prediction bias, calibration, subgroup skew
Drift vs benchmark	Has the input or score distribution moved?	PSI, new categories, score shift
Assessment	What should an owner do next?	Tuning, retraining, data, and monitoring controls

Design principle: the monitor consumes a predictions table. The classifier can be a lightweight stand-in today, a production export tomorrow, or a local RoBERTa backend later. Reproduce: python -m reportability_monitoring.cli run-all --out reports/