Laplace Approximation for Parameter Uncertainty¶
Overview¶
The LaplaceEstimator
class implements a second-order approximation method for quantifying uncertainty in model parameters. Around a local optimum of the loss function (typically the negative log-likelihood), the function is approximated as quadratic, implying a multivariate Gaussian posterior distribution over parameters.
The Laplace approximation enables the estimation of: - Marginal parameter variances - Confidence intervals - Local curvature (eigenvalue spectrum)
It supports both exact Hessian inversion for small models and a scalable approximation (Hutchinson’s method) for larger models.
Mathematical Background¶
Let \(\mathcal{L}(\boldsymbol{\theta})\) be the negative log-likelihood. Near the maximum likelihood estimate (MLE) \(\hat{\boldsymbol{\theta}}\), we approximate:
where \(H\) is the Hessian of the loss at \(\hat{\boldsymbol{\theta}}\). The posterior is then:
Marginal Variance¶
The marginal variance of parameter \(\theta_i\) is:
Confidence intervals are estimated as:
for a given confidence level (e.g., \(z_{0.975} = 1.96\) for 95%).
Estimation Methods¶
Exact Dense Hessian¶
For small models, the full Hessian \(H\) is computed and inverted:
H = compute_dense_hessian()
Hinv = torch.linalg.inv(H)
The diagonal of \(H^{-1}\) gives marginal variances【67†source】.
Hutchinson’s Estimator (Large Models)¶
When \(H\) is too large to invert, the estimator uses Hutchinson’s method:
- Sample random vectors \(v_i \sim \text{Rademacher}\)
- Solve \(H x_i = v_i\) using conjugate gradient
- Estimate:
This avoids computing or storing the full Hessian【67†source】.
API Highlights¶
estimate_inv_hessian_diag()
: estimates the full diagonal of \(H^{-1}\)estim_variance_by_name(name)
: returns variance for a named parameterestim_all_confint_dict()
: returns 95% confidence intervalscurvature_report()
: eigenvalue analysis of \(H\) to detect sloppy or non-identifiable directions【67†source】
Curvature Analysis¶
To detect identifiability issues, eigenvalue bounds of \(H\) are estimated:
- Largest eigenvalue: via power iteration
- Smallest eigenvalue: via inverse iteration
The flatness ratio is:
Interpretation: - \(\gtrsim 10^{-2}\): well-conditioned - \(10^{-6} \lesssim \cdot \lesssim 10^{-2}\): mild sloppiness - \(\ll 10^{-6}\): severe non-identifiability【67†source】
Limitations¶
- Assumes the loss is locally quadratic
- Fails under flat or ill-posed likelihoods
- Hutchinson estimates are approximate and depend on sample size
References¶
- MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press.
- Rasmussen & Williams (2006). Gaussian Processes for Machine Learning.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning.
Source¶
Defined in laplace_estimator.py
【67†source】.