6.4. Task: Sensitivity¶
Task Sensitivity
runs an HSIC sensitivity analysis quantifying the effect of active parameters on the loss function.
To understand this task better, see the tutorial on Parameter sensitivity analysis.
6.4.1. Generating Samples¶
The first step of the sensitivity analysis is obtaining i.i.d uniformly-distributed random samples of the parameter space.
Samples can be automatically generated as part of the sensitivity calculation by using the RunSampling
key:
RunSampling
- Type:
Bool
- Default value:
No
- Description:
Produce a set of samples of the loss function and active parameters. Samples from the parameter space are drawn from a uniform random distribution. Such a set of samples serves as the input to the sensitivity calculation.
Note
If you chose to generate samples, then the shared collection keys must also be defined so that ParAMS can construct the loss function to sample.
If you generating samples, you may select the number and manner in which you would like to sample:
NumberSamples
- Type:
Integer
- Default value:
1000
- GUI name:
Generate n samples:
- Description:
Number of samples to generate during the sampling procedure.
RandomSeed
- Type:
Integer
- Description:
Random seed to use during the sampling procedure (for reproducibility).
SaveResiduals
- Type:
Bool
- Default value:
No
- Description:
During the sampling, save the individual difference between reference and predicted values for every sample and training set item. Required for the Reweight calculation, and will be automatically activated if the reweight calculation is requested. Saving and analyzing the residuals can provide valuable insight into your training set, but can quickly occupy a large amount of disk space. Only save the residuals if you would like to run the reweight calculation or have a particular reason to do so.
Tip
You can reuse the residuals to calculate new loss values:
with different loss functions;
after changes to weights/sigmas; and
after removing training set items.
This allows you to tweak and tailor your training set and run a new sensitivity calculation without having to resample.
6.4.2. Loading Samples¶
Instead of generating samples, they can be loaded:
SamplesDirectory
- Type:
String
- Default value:
- Description:
Path to an ‘optimization’ directory containing the results of a previously run sampling. First looks for a ‘glompo_log.h5’ file. If not found, will look for ‘running_loss.txt’ and ‘running_active_parameters.txt’ in a sub-directory. The sub-directory used will depend on the DataSet Name. For the Reweight calculation only a ‘glompo_log.h5’ file (with residuals) may be used.
6.4.3. Kernels¶
The most important configurable options are the kernels applied to the parameter values and loss values.
We generally recommend:
applying the Gaussian kernel to the parameter values (in order to capture dis/similarity between parameter sets);
applying the conjunctive-Gaussian kernel to the loss values (in order to focus the weight of the distribution on good minima); and
using polynomial or linear kernels only if you have a specific type of relationship/dependency you would like to investigate.
ParametersKernel
Alpha float
Gamma float
Polynomial
Order integer
Shift float
End
Sigma float
Type [Gaussian | ConjunctiveGaussian | Threshold | Polynomial | Linear]
End
ParametersKernel
- Type:
Block
- Description:
Kernel applied to the parameters for which sensitivity is being measured.
Alpha
- Type:
Float
- Description:
Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.
Gamma
- Type:
Float
- Default value:
0.1
- Description:
Bandwidth parameter for the conjunctive-Gaussian kernel.
Polynomial
- Type:
Block
- Description:
Settings for the Polynomial kernel.
Order
- Type:
Integer
- Default value:
1
- Description:
Maximum order of the polynomial.
Shift
- Type:
Float
- Default value:
0.0
- Description:
Free parameter (≥ 0) trading off higher-order versus lower-order effects.
Sigma
- Type:
Float
- Default value:
0.3
- Description:
Bandwidth parameter for the Gaussian kernel.
Type
- Type:
Multiple Choice
- Default value:
Gaussian
- Options:
[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]
- Description:
Name of the kernel to applied to the parameters for which sensitivity is being measured.
LossValuesKernel
- Type:
Block
- Description:
Kernel applied to the parameters for which sensitivity is being measured.
Alpha
- Type:
Float
- Description:
Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.
Gamma
- Type:
Float
- Default value:
0.3
- Description:
Bandwidth parameter for the conjunctive-Gaussian kernel.
Polynomial
- Type:
Block
- Description:
Settings for the Polynomial kernel.
Order
- Type:
Integer
- Default value:
1
- Description:
Maximum order of the polynomial.
Shift
- Type:
Float
- Default value:
0.0
- Description:
Free parameter (≥ 0) trading off higher-order versus lower-order effects.
Sigma
- Type:
Float
- Description:
Bandwidth parameter for the Gaussian kernel. If not specified or -1, calculates a reasonable default based on the number of parameters being tested.
Type
- Type:
Multiple Choice
- Default value:
ConjunctiveGaussian
- Options:
[Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]
- Description:
Name of the kernel to applied to the parameters for which sensitivity is being measured.
6.4.4. Other settings¶
The sensitivity analysis can be run on either the training set or the validation set:
SetToAnalyze
- Type:
Multiple Choice
- Default value:
TrainingSet
- Options:
[TrainingSet, ValidationSet]
- GUI name:
Analyze:
- Description:
Name of the data set to use for the sensitivity analysis.
Generally, all of the samples generated are not used simultaneously in the sensitivity calculation. This is because the more samples used, the slower the calculation. It is also difficult to know if one has enough samples to capture an accurate approximation of the true sensitivity.
Therefore, suppose one has generated 10000 samples. It is often better to run 10 repeats (bootstraps) of the calculation with 1000 points each than it is to run a single calculation with 10000 points. The first configuration will calculate faster, and provide a spread of the data to evaluate the robustness of the result.
To specify the number of times you would like the calculation repeated:
NumberBootstraps
- Type:
Integer
- Default value:
1
- GUI name:
Repeat calculation n times:
- Description:
Number of repeats of the calculation with different sub-samples. A small spread from a large number of bootstraps provides confidence on the estimation of the sensitivity.
To specify how many points to use from our sample set in each calculation:
NumberCalculationSamples
- Type:
Integer
- GUI name:
Number of samples per repeat:
- Description:
Number of samples from the full set available to use in the calculation. If not specified or -1, uses all available points. For the sensitivity calculation, this will be redrawn for every bootstrap.
To specify how the points are chosen from our sample set:
SampleWithReplacement
- Type:
Bool
- Default value:
Yes
- Description:
Sample from the available data with or without replacement. This only has an effect if the number of samples for the calculation is less than the total number available otherwise replace is Yes by necessity.
To remove parameter sets which produced non-finite loss values:
FilterInfiniteValues
- Type:
Bool
- Default value:
Yes
- Description:
If Yes, removes points from the calculation with non-finite loss values. Non-finite points can cause numerical issues in the sensitivity calculation.
Finally, we have included an extension to the sensitivity calculation:
RunReweightCalculation
- Type:
Bool
- Default value:
No
- Description:
Run a more expensive sensitivity calculation that will also return suggested weights for the training set which will produce more balanced sensitivities between all the parameters. Note: The Gaussian kernel is recommended for the loss values kernel in this case.
Warning
The reweight calculation is experimental.
6.4.5. Technical details¶
For technical details of the sensitivity calculation see the API documentation