Diagnostics and inspection

Diagnostics answer two questions:

  1. Did parameter selection end in a usable state?

  2. Are the LOO and test-set scores consistent with the expected predictive accuracy?

Run diagnostics after select_params:

model.select_params(xi, zi)
model.run_diagnosis(xi, zi)
model.run_perf(xi, zi, xtzt=(xt, zt), zpmzpv=(zpm, zpv))

Model diagnosis

run_diagnosis(xi, zi) prints one block per output. For the seeded Hartmann4 example in Getting started, the output is:

~ Model [0]
[Model diagnosis]
  * Parameter selection
    cvg_reached: True
    optimal_val: True
        n_evals: 22
           time: 0.2095
    initial_val: 43.818328528617435
      final_val: 37.7947
  * Parameters
         Name:                     Path       Norm             Bounds     Value    Denorm
    z0_sigma2:       covparam->variance        log     [-9.56, 10.44]     0.191     1.210
     z0_rho_0:    covparam->lengthscale    log_inv    [-9.577, 10.42]     0.705     0.494
     z0_rho_1:    covparam->lengthscale    log_inv    [-9.545, 10.45]     0.308     0.735
     z0_rho_2:    covparam->lengthscale    log_inv     [-9.599, 10.4]    -0.568     1.764
     z0_rho_3:    covparam->lengthscale    log_inv    [-9.582, 10.42]     0.874     0.417
  * Data
    count: 40
    -----
              min      max    delta     mean      std delta_over_sigma
     zi:   -2.678    1.292    3.970   0.0221    1.047            3.280
   xi_0: 8.399e-3    0.984    0.976    0.493    0.283            1.974
   xi_1:   0.0450    0.991    0.946    0.510    0.294            1.288
   xi_2: 1.544e-3    1.000    0.998    0.515    0.300            0.566
   xi_3:   0.0164    0.997    0.981    0.526    0.307            2.350

The displayed block is produced by model.run_diagnosis(xi, zi) after model.select_params(xi, zi).

The first part reports the optimizer status:

cvg_reached

SciPy optimizer success flag.

optimal_val

Whether the final criterion value is finite and usable.

n_evals

Number of criterion evaluations.

initial_val and final_val

Criterion value before and after optimization. Compare them to detect optimizer-start or bounds problems.

The parameter table prints normalized coordinates, bounds, raw values, and denormalized values. For Matérn lengthscales, the raw covariance vector stores -log(rho) while the denormalized display reports rho.

Prediction performance

run_perf reports leave-one-out summaries and, when test values are supplied, test-set summaries:

~ Model [0]
[Prediction performances]
  LOO (n=40)
                     value
          std(z):    1.047
             tss:   43.829
           press:   11.259
       press/tss:    0.257
log10(press/tss):   -0.590
            rmse:    0.531
     rmse/std(z):    0.507
              Q2:    0.743
  Test (n=300)
                   value
        std(z):    0.989
           tss:  293.704
           rss:   41.061
       rss/tss:    0.140
log10(rss/tss):   -0.854
          rmse:    0.370
   rmse/std(z):    0.374
            R2:    0.860

The displayed block is produced by model.run_perf(xi, zi, xtzt=(xt, zt), zpmzpv=(zpm, zpv)).

For each output, the displayed quantities are defined as follows. For reference values \(z_i\),

\[\mathrm{TSS} = \sum_i (z_i - \bar z)^2.\]

For leave-one-out prediction, let \(e_i^{\mathrm{LOO}} = z_i - \widehat z_{-i}(x_i)\), where \(\widehat z_{-i}\) is computed without observation \(i\). Then

\[\mathrm{PRESS} = \sum_i \left(e_i^{\mathrm{LOO}}\right)^2, \qquad Q^2 = 1 - \frac{\mathrm{PRESS}}{\mathrm{TSS}}.\]

For a test set, let \(e_i^{\mathrm{test}} = z_i - \widehat z(x_i)\). Then

\[\mathrm{RSS} = \sum_i \left(e_i^{\mathrm{test}}\right)^2, \qquad R^2 = 1 - \frac{\mathrm{RSS}}{\mathrm{TSS}}.\]

In both blocks, \(\mathrm{RMSE} = \sqrt{\mathrm{SSE}/n}\) with \(\mathrm{SSE}=\mathrm{PRESS}\) for LOO and \(\mathrm{SSE}=\mathrm{RSS}\) for the test set. std(z) is the empirical standard deviation of the reference values in the block. rmse/std(z) is a scale-free error. press/tss and rss/tss are error-to-variance ratios, so smaller values are better.

A low Q2 or R2 does not identify the cause by itself. Inspect the observation design, covariance class, bounds, optimizer start, and prior anchors.

Model state checks

Use per-output entries to inspect the selected raw vectors, readable Param object, optimizer report, and prior object when one exists:

covparam = model[0]["model"].covparam
param = model[0].get_param()
info = model[0]["info"]
prior = model[0].get("prior", None)

Use Model state and parameter objects for the storage convention and Priors for REMAP selection for prior objects.

Selection-criterion inspection

The optimizer report stores selection_criterion_nograd. Use this callable for diagnosis, plotting, and posterior parameter sampling when gradients are not needed:

criterion = model[0]["info"]["selection_criterion_nograd"]
value = criterion(model[0]["model"].covparam)

Cross sections of the selection criterion help check whether the optimizer result is isolated or lies on a flat region. Cross sections are provided by gpmp.modeldiagnosis.

Posterior parameter sampling

sample_parameters forwards to posterior-sampling functions in gpmp.mcmc.param_posterior. The common methods are "mh", "nuts", and "smc":

res = model.sample_parameters(method="mh", n_steps_total=2000)

The sampler uses the stored info object, so select_params must be called first. The return dictionary contains method-specific objects and, for methods that record them, criterion or log-target traces.