6.6. Task: MachineLearning

Set Task MachineLearning to fit machine learning potentials (ML potentials). In ParAMS, all supported types of ML potentials can be trained as committee models that provide an estimate of the uncertainty of predicted energies and forces during production simulations.

Training ML Potentials through ParAMS requires

  • the job collection, and

  • training and validation sets

You can construct these using the results importers, just as for ReaxFF and DFTB parametrization.

Note

Unlike ReaxFF and DFTB parametrization, no Parameter Interface is needed. This is because ML potentials usually contain many thousands of parameters. It is typically not useful to manually control the values and ranges for all of those parameters.

You also need to specify which Backend to use, for example M3GNet.

6.6.1. Requirements for job collection and data sets

The machine learning potentials are trained in a quite different way from how ParAMS trains ReaxFF and DFTB.

6.6.1.1. Only singlepoint calculations in the job collection

For ML potentials, only singlepoint calculations may enter the job collection. The original reference job can still be of any type (geometry optimization, PES Scan, …).

Example: if you import a DFT-calculated bond scan (PES Scan), you must import it using the “Add PESScan Singlepoints” option, not “add singlejob with Task=’PESScan’”.

Any jobs in the job collection with Task different from “SinglePoint” will be ignored.

6.6.1.2. Only single extractors in the training and validation sets

Similarly, for the training and validation sets, the expressions can only contain one extractor acting on a single job. This means that you cannot train reaction energies. Instead, you can (and should) train the total energy. As a result it is extra important that all reference data was calculated using a single level of theory.

When training forces, you must extract all force components from the job. However, depending on the backend, you may be able to set the force weights.

For task MachineLearning, only a small set of extractors (that act on singlepoint jobs) are supported:

  • energy

  • forces

Examples:

Expression

Task Optimization

Task MachineLearning

energy("job")

OK

OK

forces("job")

OK

OK

energy("job1")-energy("job2")

OK

Not OK

forces("job", 3, 2)

OK

Not OK

cell_volume("job")

OK

Not OK

Expressions that do not follow the above requirements will be ignored during the ML training, but they will still be stored on disk. This means that if you after training your ML potential switch to the ParAMS SinglePoint Task, you can use any expressions and job tasks to test/validate/benchmark your trained potential.

6.6.1.3. The engine settings must be the same for all jobs

When you train for example DFTB, you can have different engine settings for different jobs. For example, you might want the k-space sampling to be different depending on the system.

However, when training machine learning potentials, you cannot set any job-dependent (structure-dependent) engine settings. Every job (structure) will use the same settings.

6.6.2. Machine Learning Input Structure

The input for the ParAMS Task MachineLearning is structured as follows:

MachineLearning has multiple backends that can be selected through the Backend key. Each backend has a corresponding block (the same name as the value of the Backend key) with settings specific to that backend. Additionally, there are several shared keywords, such as MaxEpochs that modify the behaviour of the backends in the same way.

Each backend might support multiple models and has a corresponding block (the same name as the value of the Model key) with settings specific to that model. For example the number of layers or how to initialize the parameters.

Some models consist of only a single key rather than a block. For example when a backend supports loading some file that contains model settings and parameters.

Any number of settings may exist in the top level of a backend block that are appropriate for all models.

The MachineLearning%LoadModel loads a previously fitted model from a ParAMS results directory. The ParAMS results directory must contain the two subdirectories optimization and settings_and_initial_data. Enabling MachineLearning%LoadModel enforces the same Backend and CommitteeSize as in the previous job and will ignore the model keys. Instead it reads them from the previous ParAMS calculation. Any settings in the model blocks are ignored. If any settings in the backend blocks are incompatible with the loaded model then ParAMS will crash or behave undefined.

The exact same backend and model settings are used for every committee member no matter the CommitteeSize, although the models can still be different due to stochastic effects (e.g. random parameters or a stochastic optimization algorithm). When using LoadModel the committee from the previous calculation is used.

Set RunAMSAtEnd to run the job collection with the newly trained model once training is completed. This will provide additional output such as scatter plots of prediction against reference values.

Tip

Learn using the ParAMS input for Task MachineLearning from the tutorials.

MachineLearning
Type:

Block

Description:

Options for Task MachineLearning.

Backend
Type:

Multiple Choice

Default value:

M3GNet

Options:

[M3GNet, NequIP]

Description:

The backend to use. You must separately install the backend before running a training job.

MaxEpochs
Type:

Integer

Default value:

1000

Description:

Set the maximum number of epochs a backend should perform.

LossCoeffs
Type:

Block

Description:

Modify the coefficients for the machine learning loss function. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

AverageForcePerAtom
Type:

Bool

Default value:

No

Description:

For each force data entry, divide the loss contribution by the number of concomittent atoms. This is the same as the behavior for ParAMS Optimization, but it is turned off by default in Task MachineLearning. For machine learning, setting this to ‘No’ can be better since larger molecules will contribute more to the loss. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Energy
Type:

Float

Default value:

10.0

GUI name:

Energy coefficient

Description:

Coefficient for the contribution of loss due to the energy. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Forces
Type:

Float

Default value:

1.0

GUI name:

Forces coefficient

Description:

Coefficient for the contribution of loss due to the forces. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Target
Type:

Block

Description:

Target values for stopping training. If both the training and validation metrics are smaller than the specified values, the training will stop early. Only supported by the M3GNet backend.

Forces
Type:

Block

Description:

Forces (as reported by the backend)

Enabled
Type:

Bool

Default value:

Yes

Description:

Whether to use target values for forces.

MAE
Type:

Float

Default value:

0.05

Unit:

eV/angstrom

Description:

MAE for forces (as reported by the backend).

LoadModel
Type:

String

Description:

Load a previously fitted model from a ParAMS results directory. A ParAMS results directory should contain two subdirectories optimization and settings_and_initial_data. This option ignores all settings inside model blocks.

CommitteeSize
Type:

Integer

Default value:

1

Description:

The number of independently trained ML potentials.

RunAMSAtEnd
Type:

Bool

Default value:

Yes

GUI name:

Run AMS at end

Description:

Whether to run the (committee) ML potential through AMS at the end. This will create the energy/forces scatter plots for the final trained model.

6.6.3. Backends: M3GNet, NequIP, …

6.6.3.1. Installation

The ML backends are not included by default with AMS or ParAMS, as they can be quite large. Before you can train an ML potential, you need to install the corresponding backend either through the AMS package manager or manually.

Tip

Before training a custom model with ParAMS, we recommend that you first test the ML backend in a production (for example, molecular dynamics or geometry optimization) simulation with some already created parameters. For example, follow the M3GNet GUI tutorial to make sure that the M3GNet backend has been installed correctly.

6.6.3.2. M3GNet

MachineLearning
Type:

Block

Description:

Options for Task MachineLearning.

M3GNet
Type:

Block

Description:

Options for M3GNet fitting.

Custom
Type:

Block

Description:

Specify a custom M3GNet model.

Cutoff
Type:

Float

Default value:

5.0

Unit:

angstrom

Description:

Cutoff radius of the graph

MaxL
Type:

Integer

Default value:

3

Description:

Include spherical components up to order MaxL. Higher gives a better angular resolution, but increases computational cost substantially.

MaxN
Type:

Integer

Default value:

3

Description:

Include radial components up to the MaxN’th root of the spherical Bessel function. Higher gives a better radial resolution, but increases computational cost substantially.

NumBlocks
Type:

Integer

Default value:

3

GUI name:

Number of convolution blocks:

Description:

Number of convolution blocks.

NumNeurons
Type:

Integer

Default value:

64

GUI name:

Number of neurons per layer

Description:

Number of neurons in each layer.

ThreebodyCutoff
Type:

Float

Default value:

4.0

Unit:

angstrom

Description:

Cutoff radius of the three-body interaction.

LearningRate
Type:

Float

Default value:

0.001

Description:

Learning rate for the M3GNet weight optimization.

Model
Type:

Multiple Choice

Default value:

UniversalPotential

Options:

[UniversalPotential, Custom, ModelDir]

Description:

How to specify the model for the M3GNet backend. Either a Custom model can be made from scratch or an existing model directory can be loaded to obtain the model settings.

ModelDir
Type:

String

Description:

Path to the directory defining the model. This folder should contain the files: ‘checkpoint’, ‘m3gnet.data-00000-of-00001’, ‘ m3gnet.index’ and ‘m3gnet.json’

UniversalPotential
Type:

Block

Description:

Settings for (transfer) learning with the M3GNet Universal Potential.

Featurizer
Type:

Bool

Default value:

No

GUI name:

Train featurizer

Description:

Train the Featurizer layer of the M3GNet universal potential.

Final
Type:

Bool

Default value:

Yes

GUI name:

Train final layer

Description:

Train the Final layer of the M3GNet universal potential.

GraphLayer1
Type:

Bool

Default value:

No

GUI name:

Train layer 1 - graph

Description:

Train the first Graph layer of the M3GNet universal potential.

GraphLayer2
Type:

Bool

Default value:

No

GUI name:

Train layer 2 - graph

Description:

Train the second Graph layer of the M3GNet universal potential.

GraphLayer3
Type:

Bool

Default value:

Yes

GUI name:

Train layer 3 - graph

Description:

Train the third Graph layer of the M3GNet universal potential.

ThreeDInteractions1
Type:

Bool

Default value:

No

GUI name:

Train layer 1 - 3D interactions

Description:

Train the first ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

ThreeDInteractions2
Type:

Bool

Default value:

No

GUI name:

Train layer 2 - 3D interactions

Description:

Train the second ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

ThreeDInteractions3
Type:

Bool

Default value:

Yes

GUI name:

Train layer 3 - 3D interactions

Description:

Train the third ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

M3Gnet produces the parameter directory <calculation name>.results/optimization/m3gnet/results/model which contains the parametrized model and can be used with the MLPotential engine. Set Backend M3GNet and ParameterDir to the path of the deployed model.

The M3GNet universal potential has the following architecture/structure:

Layer (type)

Param #

radius_cutoff_graph_converter (RadiusCutoffGraphConverter)

0 (unused)

graph_featurizer (GraphFeaturizer)

6080

graph_update_func (GraphUpdateFunc)

192

spherical_bessel_with_harmonics (SphericalBesselWithHarmonics)

0

three_d_interaction (ThreeDInteraction)

1737

three_d_interaction_1 (ThreeDInteraction)

1737

three_d_interaction_2 (ThreeDInteraction)

1737

graph_network_layer (GraphNetworkLayer)

66432

graph_network_layer_1 (GraphNetworkLayer)

66432

graph_network_layer_2 (GraphNetworkLayer)

66432

pipe_24 (Pipe)

16770

atom_ref_2 (AtomRef)

0

Total params: 227,549

6.6.3.3. NequIP

Important

Training NequIP potentials with ParAMS is not a fully supported feature. To use NequIP with AMS, or to train NequIP with ParAMS, you need to manually install it into the AMS Python environment.

SCM does not provide any packages for NequIP and cannot provide support for the installation. But we have compiled some helpful tips in the Engine ASE documentation that may help you with the installation.

The options for NequIP are:

MachineLearning
Type:

Block

Description:

Options for Task MachineLearning.

NequIP
Type:

Block

Description:

Options for NequIP fitting.

Custom
Type:

Block

Description:

Specify a custom NequIP model.

LMax
Type:

Integer

Default value:

1

Description:

Maximum L value. 1 is probably high enough.

MetricsKey
Type:

Multiple Choice

Default value:

validation_loss

Options:

[training_loss, validation_loss]

Description:

Which metric to use to generate the ‘best’ model.

NumLayers
Type:

Integer

Default value:

4

Description:

Number of interaction layers in the NequIP neural network.

RMax
Type:

Float

Default value:

3.5

Unit:

angstrom

GUI name:

Distance cutoff

Description:

Distance cutoff for interactions.

LearningRate
Type:

Float

Default value:

0.005

Description:

Learning rate for the NequIP weight optimization

Model
Type:

Multiple Choice

Default value:

Custom

Options:

[Custom, ModelFile]

Description:

How to specify the model for the NequIP backend. Either a Custom model can be made from scratch or an existing ‘model.pth’ file can be loaded to obtain the model settings.

ModelFile
Type:

String

Description:

Path to the model.pth file defining the model.

UseRescalingFromLoadedModel
Type:

Bool

Default value:

Yes

Description:

When loading a model with LoadModel or NequiP%ModelFile do not recalculate the dataset rescaling but use the value from the loaded model.

NequIP produces the file <calculation name>.results/optimization/nequip/results/model.pth which contains the deployed model and can be used with the MLPotential engine. Set Backend NequIP and ParameterFile to the path of the deployed model.

6.6.4. ML Parallelization

Parallelization options can be set with ParallelLevels. Note that Task MachineLearning does not perform AMS jobs during optimization, so the parallelization options are different.

Select the maximum number of parallel committee members with CommitteeMembers or set it to zero to run all committee members in parallel (up to the maximum number of cores or the NSCM environment variable). Select the number of cores each committee is allowed to use with Cores or set it to zero (default) to evenly distribute the available cores over the committee members running in parallel.

Some backends may spawn additional threads for database management, but they should not be using substantial cpu time. GPU offloading is supported through TensorFlow or Pytorch depending on the backend. Currently there are no settings available in ParAMS for GPU offloading, the backends use GPU resources according to their documentation.

ParallelLevels
Type:

Block

GUI name:

Parallelization distribution:

Description:

Distribution of threads/processes between the parallelization levels.

CommitteeMembers
Type:

Integer

Default value:

1

GUI name:

Number of parallel committee members

Description:

Maximum number of committee member optimizations to run in parallel. If set to zero will take the minimum of MachineLearning%CommitteeSize and the number of available cores (NSCM)

Cores
Type:

Integer

Default value:

0

GUI name:

Processes (per Job)

Description:

Number of cores to use per committee member optimization. By default (0) the available cores (NSCM) divided equally among committee members. When using GPU offloading, consider setting this to 1.