Calibration Bias
Introduction
- Imagine you are fitting a statistical distribution to n data points. The data is assumed to be independent, and the distribution is continuous and parametric.
- And imagine your goal is to make an assessment of the probability distribution of the
next data point. So you want to make an out-of-sample prediction of probabilities, quantiles, densities, or random deviates. - You would hope that an x% probability from the predictive distribution you create would correspond to future events that will occur x% of the time, otherwise the probabilities are somewhat meaningless. This property is known as reliability, or calibration.
- But fitting distributions and using them to make predictions that are well calibrated is actually not so easy to do.
- The most commonly-used methods for fitting statistical distributions, which use point estimates of parameters (methods such as maximum likelihood, method of moments, L moments and probability weighted moments) do not produce calibrated probabilities.
- The biggest problem for these methods is in the tail, for extremes, where they give underestimates of probabilities. They underestimate extremes because they ignore parameter uncertainty. The underestimation can be very large, and if you are designing a flood defence, pricing insurance, or figuring out what kinds of heat-waves you need to be ready for, that underestimation could be very unfortunate.
Calibration Bias Charts
- The calibration bias of a method for fitting distributions can be quantified using simulations.
- This website provides the results of simulations which show the size of the calibration bias for a number of commonly-used statistical models.
- The idea is that if we understand the bias, maybe we can figure out what to do about it.
- The charts cover predictions generated by maximum likelihood prediction and calibrating prior prediction.
- Calibrating prior prediction is an Objective Bayesian method which was designed to be as easy to use as maximum likelihood, but to give lower calibration bias (see the references below for more information).
- In some cases (homogeneous statistical models, such as the normal, Gumbel and simple linear regression, among others), calibrating prior predictions are perfectly calibrated.
- In other cases (inhomogeneous statistical models, such as the GEVD and GPD, among others), calibrating prior predictions are not perfectly calibrated, but the bias is still materially lower than maximum likelihood, and new research promises to further reduce the bias.
Reasons to Use the Charts
There are two main reasons to use these charts:- There are 100s of scientific publications that have assessed the probabilities of extremes using maximum likelihood. You can use these charts to see how biased the probability assessments in those publications are. In some cases, the bias will be small enough that it can be ignored. In other cases, it will be large enough that it cannot be ignored. When it is large, the charts can be used to estimate a correction, or to motivate recalculating the probabilities.
- If you are designing a study that will involve fitting statistical distributions as a way to estimate probabilities of future events, then you can use the charts to understand whether maximum likelihood prediction or calibrating prior prediction will be good enough for your purposes.
Reference Charts
The table below gives the charts, for all the models that are currently supported, in alphabetical order. The list of models supported so far was motivated by various climate and actuarial applications. Let me know if you have any suggestions for other models to include.
The columns in the chart give the following information:- The model.
- The R command for making maximum likelihood and calibrating prior predictions in the R library
fitdistcp. H/Iindicates whether the model is homogenenous (H) or inhomogeneous (I). For homogeneous models, the calibration bias does not depend on the parameters, so we only need one chart. For inhomogeneous models, the calibration bias depends on one of the parameters, so we have produced multiple charts for a range of parameter values, and two types of charts. Chart type 1 is at fixed parameter value and varies the sample size, and chart type 2 is at fixed sample size and varies the parameter value.A/Dindicates whether the calculations were performed using exact closed-form solutions (A for Analytic) or using the DMGS asymptotic approximation (D).
| Model | R | H/I | A/D | Maximum likelihood calibration bias | Calibrating prior calibration bias |
|---|---|---|---|---|---|
| Exponential | exp_cp | H | A | Type 1 | Type 1 |
| Exponential with single predictor on the rate | exp_p1_cp | H | D | Type 1 | Type 1 |
| Frechet with known location | frechet_k1_cp | H | D | Type 1 | Type 1 |
| GEV | gev_cp | I | D | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 |
| GEV with 1 predictor, on the location | gev_p1_cp | I | D | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 |
| GEV with 2 predictors, one each on the location and scale | gev_p12_cp | I | D | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 |
| GPD with known location | gpd_k1_cp | I | D | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 | Type 1: xi=-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Type 2: n=30 40 50 60 70 80 100 120 |
| Gumbel | gumbel_cp | H | D | Type 1 | Type 1 |
| Gumbel with single predictor on the location | gumbel_cp | H | D | Type 1 | Type 1 |
| Logistic | logis_cp | H | D | Type 1 | Type 1 |
| Log-normal | lnorm_cp | H | A | Type 1(same as normal) | Type 1(same as normal) |
| Log-normal with a single predictor on the log-mean | lnorm_p1_cp | H | A | Type 1(same as normal with 1 predictor) | Type 1(same as normal with 1 predictor) |
| Normal | norm_cp | H | A | Type 1 | Type 1 |
| Normal with 1 predictor, on the mean i.e., simple linear regression | norm_p1_cp | H | A | Type 1 | Type 1 |
| Pareto with known scale | pareto_k1_cp | H | A | Type 1(same as exponential) | Type 1(same as exponential) |
| Pareto with known scale and a single predictor on the shape parameter | pareto_p1k3_cp | H | D | Type 1(same as exponential with 1 predictor) | Type 1(same as exponential with 1 predictor) |
| Weibull | weibull_cp | H | D | Type 1 | Type 1 |
Reading the Charts
Chart Type 1
- The horizontal axis is sample size
- The vertical axis is one over nominal exceedance probability (NP). Nominal probability is the probability you are asking about, that you specify, or that the model returns.
- The contours show one over the true exceedance probability, also known as the predictive coverage probability (PCP). This is the probability after correction for bias.
- Each chart is split into three parts, covering different ranges of inverse NP. This allows us to use axes that are linear in inverse NP, which are easy to read, give appropriate resolution in each range, and cover a wide range of probabilities, from p = 1/2 to p = 1/500.
- If a study with sample size n returns a maximum likelihood assessment of the probability of an event (or quantile) as being NP then we can use the appropriate chart to derive the PCP (i.e., the true probability) for that event, using the contours, as a function of n (on the horizontal axis) and NP (on the vertical axis). The PCPs are typically higher than the NPs. This is because the tail of maximum likelihood predictions is typically too thin.
- The calibrating prior prediction charts are interpreted in a similar way.
Chart Type 2
- These charts only apply to inhomogeneous models, where the results depend on a parameter.
- The horizontal axis is now parameter value, and each chart represents one sample size.
Notes
Acknowledgements
- Many thanks to Trevor Sweeting for help with the statistical theory, and Lynne Jewson for help with the group theory.
- Also, many thanks to the anonymous insurance companies that fund this research project.
- And in addition, many thanks to the various people I have discussed this whole topic with over the last couple of years, including Clare Barnes, Chris Paciorek, Paul Northrop, Stephen Dupon, and the anonymous reviewers of our journal papers.
More information
- This website is intentionally brief, and there is much more that can be said about this topic. There is much more information in the references below.
Research
- There are various extensions and improvements in the pipeline. I post news about my research on Linkedin.
Contact
- I can be contacted at stephen.jewson-at-gmail.com
References
1) Our initial paper on this topic, which contains a detailed technical discussion:- S. Jewson, T. Sweeting and L. Jewson (2025): Reducing Reliability Bias in Assessments of Extreme Weather Risk using Calibrating Priors: ASCMO (Advances in Statistical Climatology, Meteorology and Oceanography)
- Bibtex:
@article{jewsonet2025, author = {Jewson, Stephen and Sweeting, Trevor and Jewson, Lynne}, title = {Reducing reliability bias in assessments of extreme weather risk using calibrating priors}, journal = {ASCMO}, volume = {11}, number = {1}, pages = {1-22}, doi = {10.5194/ascmo-11-1-2025}, url = {https://doi.org/10.5194/ascmo-11-1-2025}, year = {2025} }
- S. Jewson (2025): Maximum Likelihood and Calibration Prior Reliability Bias Reference Charts: Stats
- Bibtex:
@article{jewson2025, author = {Jewson, Stephen}, title = {Maximum Likelihood and Calibration Prior Reliability Bias Reference Charts}, journal = {Stats}, volume = {x}, number = {x}, pages = {x}, doi = {x}, url = {x}, year = {2025} }