6.6. Task: MachineLearning¶
Set Task MachineLearning
to fit machine learning potentials (ML
potentials). In ParAMS, all supported types of ML potentials can be trained as committee
models that provide an estimate of the uncertainty of predicted energies and
forces during production simulations.
Training ML Potentials through ParAMS requires
the job collection, and
training and validation sets
You can construct these using the results importers, just as for ReaxFF and DFTB parametrization.
Note
Unlike ReaxFF and DFTB parametrization, no Parameter Interface is needed. This is because ML potentials usually contain many thousands of parameters. It is typically not useful to manually control the values and ranges for all of those parameters.
You also need to specify which Backend to use, for example M3GNet.
6.6.1. Requirements for job collection and data sets¶
The machine learning potentials are trained in a quite different way from how ParAMS trains ReaxFF and DFTB.
6.6.1.1. Only singlepoint calculations in the job collection¶
For ML potentials, only singlepoint calculations may enter the job collection. The original reference job can still be of any type (geometry optimization, PES Scan, …).
Example: if you import a DFT-calculated bond scan (PES Scan), you must import it using the “Add PESScan Singlepoints” option, not “add singlejob with Task=’PESScan’”.
Any jobs in the job collection with Task different from “SinglePoint” will be ignored.
6.6.1.2. Only single extractors in the training and validation sets¶
Similarly, for the training and validation sets, the expressions can only contain one extractor acting on a single job. This means that you cannot train reaction energies. Instead, you can (and should) train the total energy. As a result it is extra important that all reference data was calculated using a single level of theory.
When training forces, you must extract all force components from the job. However, depending on the backend, you may be able to set the force weights.
For task MachineLearning, only a small set of extractors (that act on singlepoint jobs) are supported:
energy
forces
Examples:
Expression |
Task Optimization |
Task MachineLearning |
---|---|---|
|
OK |
OK |
|
OK |
OK |
|
OK |
Not OK |
|
OK |
Not OK |
|
OK |
Not OK |
Expressions that do not follow the above requirements will be ignored during the ML training, but they will still be stored on disk. This means that if you after training your ML potential switch to the ParAMS SinglePoint Task, you can use any expressions and job tasks to test/validate/benchmark your trained potential.
6.6.1.3. The engine settings must be the same for all jobs¶
When you train for example DFTB, you can have different engine settings for different jobs. For example, you might want the k-space sampling to be different depending on the system.
However, when training machine learning potentials, you cannot set any job-dependent (structure-dependent) engine settings. Every job (structure) will use the same settings.
6.6.2. Machine Learning Input Structure¶
The input for the ParAMS Task MachineLearning
is structured as follows:
MachineLearning
has multiple backends that can be selected through the Backend
key.
Each backend has a corresponding block (the same name as the value of the Backend
key) with settings specific to that backend.
Additionally, there are several shared keywords, such as MaxEpochs
that modify the behaviour of the backends in the same way.
Each backend might support multiple models and has a corresponding block (the same name as the value of the Model
key) with settings specific to that model.
For example the number of layers or how to initialize the parameters.
Some models consist of only a single key rather than a block. For example when a backend supports loading some file that contains model settings and parameters.
Any number of settings may exist in the top level of a backend block that are appropriate for all models.
The MachineLearning%LoadModel
loads a previously fitted model from a ParAMS
results directory. The ParAMS results directory must contain the two
subdirectories optimization
and settings_and_initial_data
. Enabling
MachineLearning%LoadModel
enforces the same Backend
and
CommitteeSize
as in the previous job and will ignore the model keys.
Instead it reads them from the previous ParAMS calculation. Any settings in
the model blocks are ignored. If any settings in the backend blocks are
incompatible with the loaded model then ParAMS will crash or behave undefined.
The exact same backend and model settings are used for every committee member no matter the CommitteeSize
, although the models can still be different due to stochastic effects (e.g. random parameters or a stochastic optimization algorithm).
When using LoadModel
the committee from the previous calculation is used.
Set RunAMSAtEnd
to run the job collection with the newly trained model once training is completed.
This will provide additional output such as scatter plots of prediction against reference values.
Tip
Learn using the ParAMS input for Task MachineLearning
from the tutorials.
MachineLearning
- Type:
Block
- Description:
Options for Task MachineLearning.
Backend
- Type:
Multiple Choice
- Default value:
M3GNet
- Options:
[M3GNet, NequIP]
- Description:
The backend to use. You must separately install the backend before running a training job.
MaxEpochs
- Type:
Integer
- Default value:
1000
- Description:
Set the maximum number of epochs a backend should perform.
LossCoeffs
- Type:
Block
- Description:
Modify the coefficients for the machine learning loss function. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
AverageForcePerAtom
- Type:
Bool
- Default value:
No
- Description:
For each force data entry, divide the loss contribution by the number of concomittent atoms. This is the same as the behavior for ParAMS Optimization, but it is turned off by default in Task MachineLearning. For machine learning, setting this to ‘No’ can be better since larger molecules will contribute more to the loss. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
Energy
- Type:
Float
- Default value:
10.0
- GUI name:
Energy coefficient
- Description:
Coefficient for the contribution of loss due to the energy. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
Forces
- Type:
Float
- Default value:
1.0
- GUI name:
Forces coefficient
- Description:
Coefficient for the contribution of loss due to the forces. For backends that support weights, this is on top of the supplied dataset weights and sigmas.
Target
- Type:
Block
- Description:
Target values for stopping training. If both the training and validation metrics are smaller than the specified values, the training will stop early. Only supported by the M3GNet backend.
Forces
- Type:
Block
- Description:
Forces (as reported by the backend)
Enabled
- Type:
Bool
- Default value:
Yes
- Description:
Whether to use target values for forces.
MAE
- Type:
Float
- Default value:
0.05
- Unit:
eV/angstrom
- Description:
MAE for forces (as reported by the backend).
LoadModel
- Type:
String
- Description:
Load a previously fitted model from a ParAMS results directory. A ParAMS results directory should contain two subdirectories
optimization
andsettings_and_initial_data
. This option ignores all settings inside model blocks.
CommitteeSize
- Type:
Integer
- Default value:
1
- Description:
The number of independently trained ML potentials.
RunAMSAtEnd
- Type:
Bool
- Default value:
Yes
- GUI name:
Run AMS at end
- Description:
Whether to run the (committee) ML potential through AMS at the end. This will create the energy/forces scatter plots for the final trained model.
6.6.3. Backends: M3GNet, NequIP, …¶
6.6.3.1. Installation¶
The ML backends are not included by default with AMS or ParAMS, as they can be quite large. Before you can train an ML potential, you need to install the corresponding backend either through the AMS package manager or manually.
Tip
Before training a custom model with ParAMS, we recommend that you first test the ML backend in a production (for example, molecular dynamics or geometry optimization) simulation with some already created parameters. For example, follow the M3GNet GUI tutorial to make sure that the M3GNet backend has been installed correctly.
6.6.3.2. M3GNet¶
MachineLearning
- Type:
Block
- Description:
Options for Task MachineLearning.
M3GNet
- Type:
Block
- Description:
Options for M3GNet fitting.
Custom
- Type:
Block
- Description:
Specify a custom M3GNet model.
Cutoff
- Type:
Float
- Default value:
5.0
- Unit:
angstrom
- Description:
Cutoff radius of the graph
MaxL
- Type:
Integer
- Default value:
3
- Description:
Include spherical components up to order MaxL. Higher gives a better angular resolution, but increases computational cost substantially.
MaxN
- Type:
Integer
- Default value:
3
- Description:
Include radial components up to the MaxN’th root of the spherical Bessel function. Higher gives a better radial resolution, but increases computational cost substantially.
NumBlocks
- Type:
Integer
- Default value:
3
- GUI name:
Number of convolution blocks:
- Description:
Number of convolution blocks.
NumNeurons
- Type:
Integer
- Default value:
64
- GUI name:
Number of neurons per layer
- Description:
Number of neurons in each layer.
ThreebodyCutoff
- Type:
Float
- Default value:
4.0
- Unit:
angstrom
- Description:
Cutoff radius of the three-body interaction.
LearningRate
- Type:
Float
- Default value:
0.001
- Description:
Learning rate for the M3GNet weight optimization.
Model
- Type:
Multiple Choice
- Default value:
UniversalPotential
- Options:
[UniversalPotential, Custom, ModelDir]
- Description:
How to specify the model for the M3GNet backend. Either a Custom model can be made from scratch or an existing model directory can be loaded to obtain the model settings.
ModelDir
- Type:
String
- Description:
Path to the directory defining the model. This folder should contain the files: ‘checkpoint’, ‘m3gnet.data-00000-of-00001’, ‘ m3gnet.index’ and ‘m3gnet.json’
UniversalPotential
- Type:
Block
- Description:
Settings for (transfer) learning with the M3GNet Universal Potential.
Featurizer
- Type:
Bool
- Default value:
No
- GUI name:
Train featurizer
- Description:
Train the Featurizer layer of the M3GNet universal potential.
Final
- Type:
Bool
- Default value:
Yes
- GUI name:
Train final layer
- Description:
Train the Final layer of the M3GNet universal potential.
GraphLayer1
- Type:
Bool
- Default value:
No
- GUI name:
Train layer 1 - graph
- Description:
Train the first Graph layer of the M3GNet universal potential.
GraphLayer2
- Type:
Bool
- Default value:
No
- GUI name:
Train layer 2 - graph
- Description:
Train the second Graph layer of the M3GNet universal potential.
GraphLayer3
- Type:
Bool
- Default value:
Yes
- GUI name:
Train layer 3 - graph
- Description:
Train the third Graph layer of the M3GNet universal potential.
ThreeDInteractions1
- Type:
Bool
- Default value:
No
- GUI name:
Train layer 1 - 3D interactions
- Description:
Train the first ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.
ThreeDInteractions2
- Type:
Bool
- Default value:
No
- GUI name:
Train layer 2 - 3D interactions
- Description:
Train the second ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.
ThreeDInteractions3
- Type:
Bool
- Default value:
Yes
- GUI name:
Train layer 3 - 3D interactions
- Description:
Train the third ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.
M3Gnet produces the parameter directory <calculation name>.results/optimization/m3gnet/results/model which contains the parametrized model and can be used with the MLPotential engine. Set Backend M3GNet and ParameterDir to the path of the deployed model.
The M3GNet universal potential has the following architecture/structure:
Layer (type) |
Param # |
---|---|
radius_cutoff_graph_converter (RadiusCutoffGraphConverter) |
0 (unused) |
graph_featurizer (GraphFeaturizer) |
6080 |
graph_update_func (GraphUpdateFunc) |
192 |
spherical_bessel_with_harmonics (SphericalBesselWithHarmonics) |
0 |
three_d_interaction (ThreeDInteraction) |
1737 |
three_d_interaction_1 (ThreeDInteraction) |
1737 |
three_d_interaction_2 (ThreeDInteraction) |
1737 |
graph_network_layer (GraphNetworkLayer) |
66432 |
graph_network_layer_1 (GraphNetworkLayer) |
66432 |
graph_network_layer_2 (GraphNetworkLayer) |
66432 |
pipe_24 (Pipe) |
16770 |
atom_ref_2 (AtomRef) |
0 |
Total params: 227,549
6.6.3.3. NequIP¶
Important
Training NequIP potentials with ParAMS is not a fully supported feature. To use NequIP with AMS, or to train NequIP with ParAMS, you need to manually install it into the AMS Python environment.
SCM does not provide any packages for NequIP and cannot provide support for the installation. But we have compiled some helpful tips in the Engine ASE documentation that may help you with the installation.
The options for NequIP are:
MachineLearning
- Type:
Block
- Description:
Options for Task MachineLearning.
NequIP
- Type:
Block
- Description:
Options for NequIP fitting.
Custom
- Type:
Block
- Description:
Specify a custom NequIP model.
LMax
- Type:
Integer
- Default value:
1
- Description:
Maximum L value. 1 is probably high enough.
MetricsKey
- Type:
Multiple Choice
- Default value:
validation_loss
- Options:
[training_loss, validation_loss]
- Description:
Which metric to use to generate the ‘best’ model.
NumLayers
- Type:
Integer
- Default value:
4
- Description:
Number of interaction layers in the NequIP neural network.
RMax
- Type:
Float
- Default value:
3.5
- Unit:
angstrom
- GUI name:
Distance cutoff
- Description:
Distance cutoff for interactions.
LearningRate
- Type:
Float
- Default value:
0.005
- Description:
Learning rate for the NequIP weight optimization
Model
- Type:
Multiple Choice
- Default value:
Custom
- Options:
[Custom, ModelFile]
- Description:
How to specify the model for the NequIP backend. Either a Custom model can be made from scratch or an existing ‘model.pth’ file can be loaded to obtain the model settings.
ModelFile
- Type:
String
- Description:
Path to the model.pth file defining the model.
UseRescalingFromLoadedModel
- Type:
Bool
- Default value:
Yes
- Description:
When loading a model with LoadModel or NequiP%ModelFile do not recalculate the dataset rescaling but use the value from the loaded model.
NequIP produces the file <calculation name>.results/optimization/nequip/results/model.pth which contains the deployed model and can be used with the MLPotential engine. Set Backend NequIP and ParameterFile to the path of the deployed model.
6.6.4. ML Parallelization¶
Parallelization options can be set with ParallelLevels
.
Note that Task MachineLearning does not perform AMS jobs during optimization, so the parallelization options are different.
Select the maximum number of parallel committee members with CommitteeMembers or set it to zero to run all committee members in parallel (up to the maximum number of cores or the NSCM environment variable). Select the number of cores each committee is allowed to use with Cores or set it to zero (default) to evenly distribute the available cores over the committee members running in parallel.
Some backends may spawn additional threads for database management, but they should not be using substantial cpu time. GPU offloading is supported through TensorFlow or Pytorch depending on the backend. Currently there are no settings available in ParAMS for GPU offloading, the backends use GPU resources according to their documentation.
ParallelLevels
- Type:
Block
- GUI name:
Parallelization distribution:
- Description:
Distribution of threads/processes between the parallelization levels.
CommitteeMembers
- Type:
Integer
- Default value:
1
- GUI name:
Number of parallel committee members
- Description:
Maximum number of committee member optimizations to run in parallel. If set to zero will take the minimum of MachineLearning%CommitteeSize and the number of available cores (NSCM)
Cores
- Type:
Integer
- Default value:
0
- GUI name:
Processes (per Job)
- Description:
Number of cores to use per committee member optimization. By default (0) the available cores (NSCM) divided equally among committee members. When using GPU offloading, consider setting this to 1.