6.4. Task: Sensitivity¶
Task Sensitivity
runs an HSIC sensitivity analysis quantifying the effect of active parameters on the loss function.
To understand this task better, see the tutorial on Parameter sensitivity analysis.
6.4.1. Generating Samples¶
The first step of the sensitivity analysis is obtaining i.i.d uniformly-distributed random samples of the parameter space.
Samples can be automatically generated as part of the sensitivity calculation by using the RunSampling
key:
RunSampling
- Type
Bool
- Default value
No
- Description
Produce a set of samples of the loss function and active parameters. Samples from the parameter space are drawn from a uniform random distribution. Such a set of samples serves as the input to the sensitivity calculation.
Note
If you chose to generate samples, then the shared collection keys must also be defined so that ParAMS can construct the loss function to sample.
If you generating samples, you may select the number and manner in which you would like to sample:
NumberSamples
- Type
Integer
- Default value
1000
- GUI name
Generate n samples:
- Description
Number of samples to generate during the sampling procedure.
RandomSeed
- Type
Integer
- Description
Random seed to use during the sampling procedure (for reproducibility).
SaveResiduals
- Type
Bool
- Default value
No
- Description
During the sampling, save the individual difference between reference and predicted values for every sample and training set item. Required for the Reweight calculation, and will be automatically activated if the reweight calculation is requested. Saving and analyzing the residuals can provide valuable insight into your training set, but can quickly occupy a large amount of disk space. Only save the residuals if you would like to run the reweight calculation or have a particular reason to do so.
Tip
You can reuse the residuals to calculate new loss values:
with different loss functions;
after changes to weights/sigmas; and
after removing training set items.
This allows you to tweak and tailor your training set and run a new sensitivity calculation without having to resample.
6.4.2. Loading Samples¶
Instead of generating samples, they can be loaded:
SamplesDirectory
- Type
String
- Default value
- Description
Path to an ‘optimization’ directory containing the results of a previously run sampling. First looks for a ‘glompo_log.h5’ file. If not found, will look for ‘running_loss.txt’ and ‘running_active_parameters.txt’ in a sub-directory. The sub-directory used will depend on the DataSet Name. For the Reweight calculation only a ‘glompo_log.h5’ file (with residuals) may be used.
6.4.3. Kernels¶
The most important configurable options are the kernels applied to the parameter values and loss values.
We generally recommend:
applying the Gaussian kernel to the parameter values (in order to capture dis/similarity between parameter sets);
applying the conjunctive-Gaussian kernel to the loss values (in order to focus the weight of the distribution on good minima); and
using polynomial or linear kernels only if you have a specific type of relationship/dependency you would like to investigate.
ParametersKernel
Alpha float
Gamma float
Polynomial
Order integer
Shift float
End
Sigma float
Type [Gaussian | ConjunctiveGaussian | Threshold | Polynomial | Linear]
End
ParametersKernel
- Type
Block
- Description
Kernel applied to the parameters for which sensitivity is being measured.
Alpha
- Type
Float
- Description
Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.
Gamma
- Type
Float
- Default value
0.1
- Description
Bandwidth parameter for the conjunctive-Gaussian kernel.
Polynomial
- Type
Block
- Description
Settings for the Polynomial kernel.
Order
- Type
Integer
- Default value
1
- Description
Maximum order of the polynomial.
Shift
- Type
Float
- Default value
0.0
- Description
Free parameter (≥ 0) trading off higher-order versus lower-order effects.
Sigma
- Type
Float
- Default value
0.3
- Description
Bandwidth parameter for the Gaussian kernel.
Type
- Type
Multiple Choice
- Default value
Gaussian
- Options
[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]
- Description
Name of the kernel to applied to the parameters for which sensitivity is being measured.
LossValuesKernel
- Type
Block
- Description
Kernel applied to the parameters for which sensitivity is being measured.
Alpha
- Type
Float
- Description
Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.
Gamma
- Type
Float
- Default value
0.3
- Description
Bandwidth parameter for the conjunctive-Gaussian kernel.
Polynomial
- Type
Block
- Description
Settings for the Polynomial kernel.
Order
- Type
Integer
- Default value
1
- Description
Maximum order of the polynomial.
Shift
- Type
Float
- Default value
0.0
- Description
Free parameter (≥ 0) trading off higher-order versus lower-order effects.
Sigma
- Type
Float
- Description
Bandwidth parameter for the Gaussian kernel. If not specified or -1, calculates a reasonable default based on the number of parameters being tested.
Type
- Type
Multiple Choice
- Default value
ConjunctiveGaussian
- Options
[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]
- Description
Name of the kernel to applied to the parameters for which sensitivity is being measured.
6.4.4. Other settings¶
The sensitivity analysis can be run on either the training set or the validation set:
SetToAnalyze
- Type
Multiple Choice
- Default value
TrainingSet
- Options
[TrainingSet, ValidationSet]
- GUI name
Analyze:
- Description
Name of the data set to use for the sensitivity analysis.
Generally, all of the samples generated are not used simultaneously in the sensitivity calculation. This is because the more samples used, the slower the calculation. It is also difficult to know if one has enough samples to capture an accurate approximation of the true sensitivity.
Therefore, suppose one has generated 10000 samples. It is often better to run 10 repeats (bootstraps) of the calculation with 1000 points each than it is to run a single calculation with 10000 points. The first configuration will calculate faster, and provide a spread of the data to evaluate the robustness of the result.
To specify the number of times you would like the calculation repeated:
NumberBootstraps
- Type
Integer
- Default value
1
- GUI name
Repeat calculation n times:
- Description
Number of repeats of the calculation with different sub-samples. A small spread from a large number of bootstraps provides confidence on the estimation of the sensitivity.
To specify how many points to use from our sample set in each calculation:
NumberCalculationSamples
- Type
Integer
- GUI name
Number of samples per repeat:
- Description
Number of samples from the full set available to use in the calculation. If not specified or -1, uses all available points. For the sensitivity calculation, this will be redrawn for every bootstrap.
To specify how the points are chosen from our sample set:
SampleWithReplacement
- Type
Bool
- Default value
Yes
- Description
Sample from the available data with or without replacement. This only has an effect if the number of samples for the calculation is less than the total number available otherwise replace is Yes by necessity.
To remove parameter sets which produced non-finite loss values:
FilterInfiniteValues
- Type
Bool
- Default value
Yes
- Description
If Yes, removes points from the calculation with non-finite loss values. Non-finite points can cause numerical issues in the sensitivity calculation.
Finally, we have included an extension to the sensitivity calculation:
RunReweightCalculation
- Type
Bool
- Default value
No
- Description
Run a more expensive sensitivity calculation that will also return suggested weights for the training set which will produce more balanced sensitivities between all the parameters. Note: The Gaussian kernel is recommended for the loss values kernel in this case.
Warning
The reweight calculation is experimental.
6.4.5. Technical details¶
For technical details of the sensitivity calculation see the API documentation