6.6. Task: MachineLearning¶
Set Task MachineLearning
to fit machine learning potentials (ML
potentials). In ParAMS, all supported types of ML potentials can be trained as committee
models that provide an estimate of the uncertainty of predicted energies and
forces during production simulations.
Training ML Potentials through ParAMS requires
the job collection, and
training and validation sets
You can construct these using the results importers, just as for ReaxFF and DFTB parametrization.
Note
Unlike ReaxFF and DFTB parametrization, no Parameter Interface is needed. This is because ML potentials usually contain many thousands of parameters. It is typically not useful to manually control the values and ranges for all of those parameters.
You also need to specify which Backend to use, for example M3GNet.
6.6.1. Requirements for job collection and data sets¶
The machine learning potentials are trained in a quite different way from how ParAMS trains ReaxFF and DFTB.
6.6.1.1. Only singlepoint calculations in the job collection¶
For ML potentials, only singlepoint calculations may enter the job collection. The original reference job can still be of any type (geometry optimization, PES Scan, …).
Example: if you import a DFT-calculated bond scan (PES Scan), you must import it using the “Add PESScan Singlepoints” option, not “add singlejob with Task=’PESScan’”.
Any jobs in the job collection with Task different from “SinglePoint” will be ignored.
6.6.1.2. Only single extractors in the training and validation sets¶
Similarly, for the training and validation sets, the expressions can only contain one extractor acting on a single job. This means that you cannot train reaction energies. Instead, you can (and should) train the total energy. As a result it is extra important that all reference data was calculated using a single level of theory.
When training forces, you must extract all force components from the job. However, depending on the backend, you may be able to set the force weights.
For task MachineLearning, only a small set of extractors (that act on singlepoint jobs) are supported:
energy
forces
Examples:
Expression |
Task Optimization |
Task MachineLearning |
---|---|---|
|
OK |
OK |
|
OK |
OK |
|
OK |
Not OK |
|
OK |
Not OK |
|
OK |
Not OK |
Expressions that do not follow the above requirements will be ignored during the ML training, but they will still be stored on disk. This means that if you after training your ML potential switch to the ParAMS SinglePoint Task, you can use any expressions and job tasks to test/validate/benchmark your trained potential.
6.6.1.3. The engine settings must be the same for all jobs¶
When you train for example DFTB, you can have different engine settings for different jobs. For example, you might want the k-space sampling to be different depending on the system.
However, when training machine learning potentials, you cannot set any job-dependent (structure-dependent) engine settings. Every job (structure) will use the same settings.
6.6.2. Machine Learning Input Structure¶
The input for the ParAMS Task MachineLearning
is structured as follows:
MachineLearning
has multiple backends that can be selected through the Backend
key.
Each backend has a corresponding block (the same name as the value of the Backend
key) with settings specific to that backend.
Additionally, there are several shared keywords, such as MaxEpochs
that modify the behaviour of the backends in the same way.
Each backend might support multiple models and has a corresponding block (the same name as the value of the Model
key) with settings specific to that model.
For example the number of layers or how to initialize the parameters.
Some models consist of only a single key rather than a block. For example when a backend supports loading some file that contains model settings and parameters.
Any number of settings may exist in the top level of a backend block that are appropriate for all models.
The MachineLearning%LoadModel
loads a previously fitted model from a ParAMS
results directory. The ParAMS results directory must contain the two
subdirectories optimization
and settings_and_initial_data
. Enabling
MachineLearning%LoadModel
enforces the same Backend
and
CommitteeSize
as in the previous job and will ignore the model keys.
Instead it reads them from the previous ParAMS calculation. Any settings in
the model blocks are ignored. If any settings in the backend blocks are
incompatible with the loaded model then ParAMS will crash or behave undefined.
The exact same backend and model settings are used for every committee member no matter the CommitteeSize
, although the models can still be different due to stochastic effects (e.g. random parameters or a stochastic optimization algorithm).
When using LoadModel
the committee from the previous calculation is used.
Tip
Learn using the ParAMS input for Task MachineLearning
from the tutorials.
MachineLearning
- Type
Block
- Description
Options for Task MachineLearning.
Backend
- Type
Multiple Choice
- Default value
M3GNet
- Options
[M3GNet, NequIP]
- Description
The backend to use. You must separately install the backend before running a training job.
MaxEpochs
- Type
Integer
- Default value
1000
- Description
Set the maximum number of epochs a backend should perform.
LossCoeffs
- Type
Block
- Description
Modify the coefficients for the machine learning loss function. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
AverageForcePerAtom
- Type
Bool
- Default value
No
- Description
For each force data entry, divide the loss contribution by the number of concomittent atoms. This is the same as the behavior for ParAMS Optimization, but it is turned off by default in Task MachineLearning. For machine learning, setting this to ‘No’ can be better since larger molecules will contribute more to the loss. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
Energy
- Type
Float
- Default value
10.0
- GUI name
Energy coefficient
- Description
Coefficient for the contribution of loss due to the energy. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
Forces
- Type
Float
- Default value
1.0
- GUI name
Forces coefficient
- Description
Coefficient for the contribution of loss due to the forces. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
Target
- Type
Block
- Description
Target values for stopping training. If both the training and validation metrics are smaller than the specified values, the training will stop early. Only supported by the M3GNet backend.
Forces
- Type
Block
- Description
Forces (as reported by the backend)
Enabled
- Type
Bool
- Default value
Yes
- Description
Whether to use target values for forces.
MAE
- Type
Float
- Default value
0.05
- Unit
eV/angstrom
- Description
MAE for forces (as reported by the backend).
LoadModel
- Type
String
- Description
Load a previously fitted model from a ParAMS results directory. A ParAMS results directory should contain two subdirectories
optimization
andsettings_and_initial_data
. This option ignores all settings inside model blocks.
CommitteeSize
- Type
Integer
- Default value
1
- Description
The number of independently trained ML potentials.
6.6.3. Backends: M3GNet, NequIP, …¶
6.6.3.1. Installation¶
The ML backends are not included by default with AMS or ParAMS, as they can be quite large. Before you can train an ML potential, you need to install the corresponding backend either through the AMS package manager or manually.
Tip
Before training a custom model with ParAMS, we recommend that you first test the ML backend in a production (for example, molecular dynamics or geometry optimization) simulation with some already created parameters. For example, follow the M3GNet GUI tutorial to make sure that the M3GNet backend has been installed correctly.
6.6.3.2. M3GNet¶
MachineLearning
- Type
Block
- Description
Options for Task MachineLearning.
M3GNet
- Type
Block
- Description
Options for M3GNet fitting.
Custom
- Type
Block
- Description
Specify a custom M3GNet model.
Cutoff
- Type
Float
- Default value
5.0
- Unit
angstrom
- Description
Cutoff radius of the graph
MaxL
- Type
Integer
- Default value
3
- Description
Include spherical components up to order MaxL. Higher gives a better angular resolution, but increases computational cost substantially.
MaxN
- Type
Integer
- Default value
3
- Description
Include radial components up to the MaxN’th root of the spherical Bessel function. Higher gives a better radial resolution, but increases computational cost substantially.
NumBlocks
- Type
Integer
- Default value
3
- GUI name
Number of convolution blocks:
- Description
Number of convolution blocks.
NumNeurons
- Type
Integer
- Default value
64
- GUI name
Number of neurons per layer
- Description
Number of neurons in each layer.
ThreebodyCutoff
- Type
Float
- Default value
4.0
- Unit
angstrom
- Description
Cutoff radius of the three-body interaction.
LearningRate
- Type
Float
- Default value
0.001
- Description
Learning rate for the M3GNet weight optimization.
Model
- Type
Multiple Choice
- Default value
UniversalPotential
- Options
[UniversalPotential, Custom, ModelDir]
- Description
How to specify the model for the M3GNet backend. Either a Custom model can be made from scratch or an existing model directory can be loaded to obtain the model settings.
ModelDir
- Type
String
- Description
Path to the directory defining the model. This folder should contain the files: ‘checkpoint’, ‘m3gnet.data-00000-of-00001’, ‘ m3gnet.index’ and ‘m3gnet.json’
UniversalPotential
- Type
Block
- Description
Settings for (transfer) learning with the M3GNet Universal Potential.
Featurizer
- Type
Bool
- Default value
No
- GUI name
Train featurizer
- Description
Train the Featurizer layer of the M3GNet universal potential.
Final
- Type
Bool
- Default value
Yes
- GUI name
Train final layer
- Description
Train the Final layer of the M3GNet universal potential.
GraphLayer1
- Type
Bool
- Default value
No
- GUI name
Train layer 1 - graph
- Description
Train the first Graph layer of the M3GNet universal potential.
GraphLayer2
- Type
Bool
- Default value
No
- GUI name
Train layer 2 - graph
- Description
Train the second Graph layer of the M3GNet universal potential.
GraphLayer3
- Type
Bool
- Default value
Yes
- GUI name
Train layer 3 - graph
- Description
Train the third Graph layer of the M3GNet universal potential.
ThreeDInteractions1
- Type
Bool
- Default value
No
- GUI name
Train layer 1 - 3D interactions
- Description
Train the first ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.
ThreeDInteractions2
- Type
Bool
- Default value
No
- GUI name
Train layer 2 - 3D interactions
- Description
Train the second ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.
ThreeDInteractions3
- Type
Bool
- Default value
Yes
- GUI name
Train layer 3 - 3D interactions
- Description
Train the third ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.
M3Gnet produces the parameter directory <calculation name>.results/optimization/m3gnet/results/model which contains the parametrized model and can be used with the MLPotential engine. Set Backend M3GNet and ParameterDir to the path of the deployed model.
The M3GNet universal potential has the following architecture/structure:
Layer (type) |
Param # |
---|---|
radius_cutoff_graph_converter (RadiusCutoffGraphConverter) |
0 (unused) |
graph_featurizer (GraphFeaturizer) |
6080 |
graph_update_func (GraphUpdateFunc) |
192 |
spherical_bessel_with_harmonics (SphericalBesselWithHarmonics) |
0 |
three_d_interaction (ThreeDInteraction) |
1737 |
three_d_interaction_1 (ThreeDInteraction) |
1737 |
three_d_interaction_2 (ThreeDInteraction) |
1737 |
graph_network_layer (GraphNetworkLayer) |
66432 |
graph_network_layer_1 (GraphNetworkLayer) |
66432 |
graph_network_layer_2 (GraphNetworkLayer) |
66432 |
pipe_24 (Pipe) |
16770 |
atom_ref_2 (AtomRef) |
0 |
Total params: 227,549
6.6.3.3. NequIP¶
Important
Training NequIP potentials with ParAMS is not a fully supported feature. To use NequIP with AMS, or to train NequIP with ParAMS, you need to manually install it into the AMS Python environment.
SCM does not provide any packages for NequIP and cannot provide support for the installation. But we have compiled some helpful tips in the Engine ASE documentation that may help you with the installation.
The options for NequIP are:
MachineLearning
- Type
Block
- Description
Options for Task MachineLearning.
NequIP
- Type
Block
- Description
Options for NequIP fitting.
Custom
- Type
Block
- Description
Specify a custom NequIP model.
LMax
- Type
Integer
- Default value
1
- Description
Maximum L value. 1 is probably high enough.
MetricsKey
- Type
Multiple Choice
- Default value
validation_loss
- Options
[training_loss, validation_loss]
- Description
Which metric to use to generate the ‘best’ model.
NumLayers
- Type
Integer
- Default value
4
- Description
Number of interaction layers in the NequIP neural network.
RMax
- Type
Float
- Default value
3.5
- Unit
angstrom
- GUI name
Distance cutoff
- Description
Distance cutoff for interactions.
LearningRate
- Type
Float
- Default value
0.005
- Description
Learning rate for the NequIP weight optimization
Model
- Type
Multiple Choice
- Default value
Custom
- Options
[Custom, ModelFile]
- Description
How to specify the model for the NequIP backend. Either a Custom model can be made from scratch or an existing ‘model.pth’ file can be loaded to obtain the model settings.
ModelFile
- Type
String
- Description
Path to the model.pth file defining the model.
UseRescalingFromLoadedModel
- Type
Bool
- Default value
Yes
- Description
When loading a model with LoadModel or NequiP%ModelFile do not recalculate the dataset rescaling but use the value from the loaded model.
NequIP produces the file <calculation name>.results/optimization/nequip/results/model.pth which contains the deployed model and can be used with the MLPotential engine. Set Backend NequIP and ParameterFile to the path of the deployed model.
6.6.4. ML Parallelization¶
Parallelization options can be set with ParallelLevels
.
Note that Task MachineLearning does not perform AMS jobs during optimization, so the parallelization options are different.
Select the maximum number of parallel committee members with CommitteeMembers or set it to zero to run all committee members in parallel (up to the maximum number of cores or the NSCM environment variable). Select the number of cores each committee is allowed to use with Cores or set it to zero (default) to evenly distribute the available cores over the committee members running in parallel.
Some backends may spawn additional threads for database management, but they should not be using substantial cpu time. GPU offloading is supported through TensorFlow or Pytorch depending on the backend. Currently there are no settings available in ParAMS for GPU offloading, the backends use GPU resources according to their documentation.
ParallelLevels
- Type
Block
- GUI name
Parallelization distribution:
- Description
Distribution of threads/processes between the parallelization levels.
CommitteeMembers
- Type
Integer
- Default value
1
- GUI name
Number of parallel committee members
- Description
Maximum number of committee member optimizations to run in parallel. If set to zero will take the minimum of MachineLearning%CommitteeSize and the number of available cores (NSCM)
Cores
- Type
Integer
- Default value
0
- GUI name
Processes (per Job)
- Description
Number of cores to use per committee member optimization. By default (0) the available cores (NSCM) divided equally among committee members. When using GPU offloading, consider setting this to 1.