6.4. Task: Sensitivity

Task Sensitivity runs an HSIC sensitivity analysis quantifying the effect of active parameters on the loss function.

To understand this task better, see the tutorial on Parameter sensitivity analysis.

6.4.1. Generating Samples

The first step of the sensitivity analysis is obtaining i.i.d uniformly-distributed random samples of the parameter space.

Samples can be automatically generated as part of the sensitivity calculation by using the RunSampling key:

RunSampling
Type:

Bool

Default value:

No

Description:

Produce a set of samples of the loss function and active parameters. Samples from the parameter space are drawn from a uniform random distribution. Such a set of samples serves as the input to the sensitivity calculation.

Note

If you chose to generate samples, then the shared collection keys must also be defined so that ParAMS can construct the loss function to sample.

If you generating samples, you may select the number and manner in which you would like to sample:

NumberSamples
Type:

Integer

Default value:

1000

GUI name:

Generate n samples:

Description:

Number of samples to generate during the sampling procedure.

RandomSeed
Type:

Integer

Description:

Random seed to use during the sampling procedure (for reproducibility).

SaveResiduals
Type:

Bool

Default value:

No

Description:

During the sampling, save the individual difference between reference and predicted values for every sample and training set item. Required for the Reweight calculation, and will be automatically activated if the reweight calculation is requested. Saving and analyzing the residuals can provide valuable insight into your training set, but can quickly occupy a large amount of disk space. Only save the residuals if you would like to run the reweight calculation or have a particular reason to do so.

Tip

You can reuse the residuals to calculate new loss values:

  • with different loss functions;

  • after changes to weights/sigmas; and

  • after removing training set items.

This allows you to tweak and tailor your training set and run a new sensitivity calculation without having to resample.

6.4.2. Loading Samples

Instead of generating samples, they can be loaded:

SamplesDirectory
Type:

String

Default value:

Description:

Path to an ‘optimization’ directory containing the results of a previously run sampling. First looks for a ‘glompo_log.h5’ file. If not found, will look for ‘running_loss.txt’ and ‘running_active_parameters.txt’ in a sub-directory. The sub-directory used will depend on the DataSet Name. For the Reweight calculation only a ‘glompo_log.h5’ file (with residuals) may be used.

6.4.3. Kernels

The most important configurable options are the kernels applied to the parameter values and loss values.

We generally recommend:

  • applying the Gaussian kernel to the parameter values (in order to capture dis/similarity between parameter sets);

  • applying the conjunctive-Gaussian kernel to the loss values (in order to focus the weight of the distribution on good minima); and

  • using polynomial or linear kernels only if you have a specific type of relationship/dependency you would like to investigate.

ParametersKernel
   Alpha float
   Gamma float
   Polynomial
      Order integer
      Shift float
   End
   Sigma float
   Type [Gaussian | ConjunctiveGaussian | Threshold | Polynomial | Linear]
End
ParametersKernel
Type:

Block

Description:

Kernel applied to the parameters for which sensitivity is being measured.

Alpha
Type:

Float

Description:

Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.

Gamma
Type:

Float

Default value:

0.1

Description:

Bandwidth parameter for the conjunctive-Gaussian kernel.

Polynomial
Type:

Block

Description:

Settings for the Polynomial kernel.

Order
Type:

Integer

Default value:

1

Description:

Maximum order of the polynomial.

Shift
Type:

Float

Default value:

0.0

Description:

Free parameter (≥ 0) trading off higher-order versus lower-order effects.

Sigma
Type:

Float

Default value:

0.3

Description:

Bandwidth parameter for the Gaussian kernel.

Type
Type:

Multiple Choice

Default value:

Gaussian

Options:

[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]

Description:

Name of the kernel to applied to the parameters for which sensitivity is being measured.

LossValuesKernel
Type:

Block

Description:

Kernel applied to the parameters for which sensitivity is being measured.

Alpha
Type:

Float

Description:

Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.

Gamma
Type:

Float

Default value:

0.3

Description:

Bandwidth parameter for the conjunctive-Gaussian kernel.

Polynomial
Type:

Block

Description:

Settings for the Polynomial kernel.

Order
Type:

Integer

Default value:

1

Description:

Maximum order of the polynomial.

Shift
Type:

Float

Default value:

0.0

Description:

Free parameter (≥ 0) trading off higher-order versus lower-order effects.

Sigma
Type:

Float

Description:

Bandwidth parameter for the Gaussian kernel. If not specified or -1, calculates a reasonable default based on the number of parameters being tested.

Type
Type:

Multiple Choice

Default value:

ConjunctiveGaussian

Options:

[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]

Description:

Name of the kernel to applied to the parameters for which sensitivity is being measured.

6.4.4. Other settings

The sensitivity analysis can be run on either the training set or the validation set:

SetToAnalyze
Type:

Multiple Choice

Default value:

TrainingSet

Options:

[TrainingSet, ValidationSet]

GUI name:

Analyze:

Description:

Name of the data set to use for the sensitivity analysis.

Generally, all of the samples generated are not used simultaneously in the sensitivity calculation. This is because the more samples used, the slower the calculation. It is also difficult to know if one has enough samples to capture an accurate approximation of the true sensitivity.

Therefore, suppose one has generated 10000 samples. It is often better to run 10 repeats (bootstraps) of the calculation with 1000 points each than it is to run a single calculation with 10000 points. The first configuration will calculate faster, and provide a spread of the data to evaluate the robustness of the result.

To specify the number of times you would like the calculation repeated:

NumberBootstraps
Type:

Integer

Default value:

1

GUI name:

Repeat calculation n times:

Description:

Number of repeats of the calculation with different sub-samples. A small spread from a large number of bootstraps provides confidence on the estimation of the sensitivity.

To specify how many points to use from our sample set in each calculation:

NumberCalculationSamples
Type:

Integer

GUI name:

Number of samples per repeat:

Description:

Number of samples from the full set available to use in the calculation. If not specified or -1, uses all available points. For the sensitivity calculation, this will be redrawn for every bootstrap.

To specify how the points are chosen from our sample set:

SampleWithReplacement
Type:

Bool

Default value:

Yes

Description:

Sample from the available data with or without replacement. This only has an effect if the number of samples for the calculation is less than the total number available otherwise replace is Yes by necessity.

To remove parameter sets which produced non-finite loss values:

FilterInfiniteValues
Type:

Bool

Default value:

Yes

Description:

If Yes, removes points from the calculation with non-finite loss values. Non-finite points can cause numerical issues in the sensitivity calculation.

Finally, we have included an extension to the sensitivity calculation:

RunReweightCalculation
Type:

Bool

Default value:

No

Description:

Run a more expensive sensitivity calculation that will also return suggested weights for the training set which will produce more balanced sensitivities between all the parameters. Note: The Gaussian kernel is recommended for the loss values kernel in this case.

Warning

The reweight calculation is experimental.

6.4.5. Technical details

For technical details of the sensitivity calculation see the API documentation