6.4. Task: Sensitivity¶

Task Sensitivity runs an HSIC sensitivity analysis quantifying the effect of active parameters on the loss function.

To understand this task better, see the tutorial on Parameter sensitivity analysis.

6.4.1. Generating Samples¶

The first step of the sensitivity analysis is obtaining i.i.d uniformly-distributed random samples of the parameter space.

Samples can be automatically generated as part of the sensitivity calculation by using the RunSampling key:

RunSampling

Type: Bool
Default value: No
Description: Produce a set of samples of the loss function and active parameters. Samples from the parameter space are drawn from a uniform random distribution. Such a set of samples serves as the input to the sensitivity calculation.

Note

If you chose to generate samples, then the shared collection keys must also be defined so that ParAMS can construct the loss function to sample.

If you generating samples, you may select the number and manner in which you would like to sample:

NumberSamples

Type: Integer
Default value: 1000
GUI name: Generate n samples:
Description: Number of samples to generate during the sampling procedure.

RandomSeed

Type: Integer
Description: Random seed to use during the sampling procedure (for reproducibility).

SaveResiduals

Type: Bool
Default value: No
Description: During the sampling, save the individual difference between reference and predicted values for every sample and training set item. Required for the Reweight calculation, and will be automatically activated if the reweight calculation is requested. Saving and analyzing the residuals can provide valuable insight into your training set, but can quickly occupy a large amount of disk space. Only save the residuals if you would like to run the reweight calculation or have a particular reason to do so.

Tip

You can reuse the residuals to calculate new loss values:

with different loss functions;
after changes to weights/sigmas; and
after removing training set items.

This allows you to tweak and tailor your training set and run a new sensitivity calculation without having to resample.

6.4.2. Loading Samples¶

Instead of generating samples, they can be loaded:

SamplesDirectory

Type: String
Default value
Description: Path to an ‘optimization’ directory containing the results of a previously run sampling. First looks for a ‘glompo_log.h5’ file. If not found, will look for ‘running_loss.txt’ and ‘running_active_parameters.txt’ in a sub-directory. The sub-directory used will depend on the DataSet Name. For the Reweight calculation only a ‘glompo_log.h5’ file (with residuals) may be used.

6.4.3. Kernels¶

The most important configurable options are the kernels applied to the parameter values and loss values.

We generally recommend:

applying the Gaussian kernel to the parameter values (in order to capture dis/similarity between parameter sets);
applying the conjunctive-Gaussian kernel to the loss values (in order to focus the weight of the distribution on good minima); and
using polynomial or linear kernels only if you have a specific type of relationship/dependency you would like to investigate.

ParametersKernel
   Alpha float
   Gamma float
   Polynomial
      Order integer
      Shift float
   End
   Sigma float
   Type [Gaussian | ConjunctiveGaussian | Threshold | Polynomial | Linear]
End

ParametersKernel

Type: Block
Description: Kernel applied to the parameters for which sensitivity is being measured.

Alpha

Type: Float
Description: Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.

Gamma

Type: Float
Default value: 0.1
Description: Bandwidth parameter for the conjunctive-Gaussian kernel.

Polynomial

Type: Block
Description: Settings for the Polynomial kernel.

Order

Type: Integer
Default value: 1
Description: Maximum order of the polynomial.

Shift

Type: Float
Default value: 0.0
Description: Free parameter (≥ 0) trading off higher-order versus lower-order effects.

Sigma

Type: Float
Default value: 0.3
Description: Bandwidth parameter for the Gaussian kernel.

Type

Type: Multiple Choice
Default value: Gaussian
Options: [Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]
Description: Name of the kernel to applied to the parameters for which sensitivity is being measured.

LossValuesKernel

Type: Block
Description: Kernel applied to the parameters for which sensitivity is being measured.

Alpha

Type: Float
Description: Cut-off parameter for the Threshold kernel between zero and one. All loss values are scaled by taking the logarithm and then adjusted to a range between zero and one. This parameter is a value within this scaled space.

Gamma

Type: Float
Default value: 0.3
Description: Bandwidth parameter for the conjunctive-Gaussian kernel.

Polynomial

Type: Block
Description: Settings for the Polynomial kernel.

Order

Type: Integer
Default value: 1
Description: Maximum order of the polynomial.

Shift

Type: Float
Default value: 0.0
Description: Free parameter (≥ 0) trading off higher-order versus lower-order effects.

Sigma

Type: Float
Description: Bandwidth parameter for the Gaussian kernel. If not specified or -1, calculates a reasonable default based on the number of parameters being tested.

Type

Type: Multiple Choice
Default value: ConjunctiveGaussian
Options: [Gaussian, ConjunctiveGaussian, Threshold, Polynomial, Linear]
Description: Name of the kernel to applied to the parameters for which sensitivity is being measured.

6.4.4. Other settings¶

The sensitivity analysis can be run on either the training set or the validation set:

SetToAnalyze

Type: Multiple Choice
Default value: TrainingSet
Options: [TrainingSet, ValidationSet]
GUI name: Analyze:
Description: Name of the data set to use for the sensitivity analysis.

Generally, all of the samples generated are not used simultaneously in the sensitivity calculation. This is because the more samples used, the slower the calculation. It is also difficult to know if one has enough samples to capture an accurate approximation of the true sensitivity.

Therefore, suppose one has generated 10000 samples. It is often better to run 10 repeats (bootstraps) of the calculation with 1000 points each than it is to run a single calculation with 10000 points. The first configuration will calculate faster, and provide a spread of the data to evaluate the robustness of the result.

To specify the number of times you would like the calculation repeated:

NumberBootstraps

Type: Integer
Default value: 1
GUI name: Repeat calculation n times:
Description: Number of repeats of the calculation with different sub-samples. A small spread from a large number of bootstraps provides confidence on the estimation of the sensitivity.

To specify how many points to use from our sample set in each calculation:

NumberCalculationSamples

Type: Integer
GUI name: Number of samples per repeat:
Description: Number of samples from the full set available to use in the calculation. If not specified or -1, uses all available points. For the sensitivity calculation, this will be redrawn for every bootstrap.

To specify how the points are chosen from our sample set:

SampleWithReplacement

Type: Bool
Default value: Yes
Description: Sample from the available data with or without replacement. This only has an effect if the number of samples for the calculation is less than the total number available otherwise replace is Yes by necessity.

To remove parameter sets which produced non-finite loss values:

FilterInfiniteValues

Type: Bool
Default value: Yes
Description: If Yes, removes points from the calculation with non-finite loss values. Non-finite points can cause numerical issues in the sensitivity calculation.

Finally, we have included an extension to the sensitivity calculation:

RunReweightCalculation

Type: Bool
Default value: No
Description: Run a more expensive sensitivity calculation that will also return suggested weights for the training set which will produce more balanced sensitivities between all the parameters. Note: The Gaussian kernel is recommended for the loss values kernel in this case.

Warning

The reweight calculation is experimental.

6.4.5. Technical details¶

For technical details of the sensitivity calculation see the API documentation

Electronic Structure

ADF

Periodic DFT

DFTB & MOPAC

Interatomic Potentials

ReaxFF

Machine Learning Potentials

Force Fields

kMC and Microkinetics

Bumblebee: OLED stacks

Fluid Thermodynamics

COSMO-RS

Workflows and Utilities

OLED workflows

ChemTraYzer2

Conformers

Reactions Discovery

AMS Driver

Properties

PES Exploration

Molecular Dynamics

Monte Carlo

Interfaces

ParAMS

PLAMS

GUI

VASP

Downloads

Windows

Mac

Linux

Documentation

Overview

Tutorials

Installation Manual

Brochures

Other Resources

Changelog

Workshops

Knowledgebank

FAQ

Pricing and licensing