PD Model Calibration: From Theory to Practice

1. Why Calibration Matters

PD models frequently drift as origination standards, macroeconomic cycles, and borrower behavior evolve. Calibration ensures that predicted probabilities align with realized default rates for each segment, avoiding systematic underestimation of expected loss and capital requirements.

Supervisors expect institutions to demonstrate that PDs used in IFRS 9, Basel III, or Resolution 4.966 calculations reflect current risk levels, including back-tests, overrides, and thresholds for re-triggering calibration exercises.

2. Diagnostic Toolkit Before Recalibrating

Discrimination metrics: AUC, Gini, KS and lift charts to understand ranking capability before changing the scale.
Calibration plots: compare predicted PD buckets against observed default rates using reliability diagrams or binomial tests.
Population stability index (PSI): highlight shifts in input distributions that may justify segmentation changes.
Vintage curves: identify cohorts where PDs systematically overshoot or undershoot realized performance.

The diagnostic phase should end with a clear hypothesis: global drift, specific portfolios, or structural breaks after policy changes.

3. Calibration Techniques You Should Combine

Platt scaling or logistic recalibration refits an intercept and slope using recent out-of-time samples without altering rank ordering.
Isotonic regression offers non-parametric adjustment when residuals show non-linear bias, ideal for retail portfolios with large datasets.
Binning with Bayesian smoothing stabilizes low-default portfolios by pooling information across adjacent buckets and applying credibility adjustments.
Hybrid approach starts with logistic scaling and applies isotonic corrections on residual bias for specific exposure bands.

Whatever the technique, store the transformation parameters in your model repository and version the calibration dataset.

4. Segmenting and Applying Business Overlays

Calibration rarely succeeds with a single global factor. Segment by product type, ticket size, collateralization, and behavioral score bands. Credit committees often approve management overlays to reflect new underwriting rules or extraordinary macroeconomic events; document the rationale, approval date, and reversion criteria to satisfy auditors.

Link overlays to measurable triggers (unemployment rate, inflation, vintage delinquency) so the decision is reproducible and defensible.

5. Validation and Governance Checklist

Independent validation replicates calibration steps and challenges data filters, sample representativeness, and override logic.
Performance monitoring dashboards track PD-to-default ratios monthly, comparing expected versus realized losses.
Thresholds for recalibration are defined ex-ante (for example ±15% deviation in two consecutive quarters).
Audit trail includes datasets, scripts, version numbers, and approval minutes stored in the model risk inventory.

6. Implementation Timeline Example

Week 1-2: data extraction, cleaning, and diagnostic review. Week 3: selection of calibration technique and segmentation design. Week 4: champion/challenger testing and documentation. Week 5: governance approvals, deployment to IFRS 9 and Basel engines, and communication to finance teams.

Keeping a predictable cadence (semi-annual for retail, annual for wholesale) prevents emergency recalibrations during stress periods.

References and Further Reading

Basel Committee on Banking Supervision - Soundness of IRB Estimation
EBA Guidelines on PD estimation, LGD estimation and treatment of defaulted exposures
Bacen Resolution 4.966/21 - Annex with credit risk measurement requirements