6.6. Task: MachineLearning

Set Task MachineLearning to fit machine learning potentials (ML potentials). In ParAMS, all supported types of ML potentials can be trained as committee models that provide an estimate of the uncertainty of predicted energies and forces during production simulations.

Training ML Potentials through ParAMS requires

  • the job collection, and

  • training and validation sets

You can construct these using the results importers, just as for ReaxFF and DFTB parametrization.

Note

Unlike ReaxFF and DFTB parametrization, no Parameter Interface is needed. This is because ML potentials usually contain many thousands of parameters. It is typically not useful to manually control the values and ranges for all of those parameters.

You also need to specify which Backend to use, for example M3GNet.

6.6.1. Requirements for job collection and data sets

The machine learning potentials are trained in a quite different way from how ParAMS trains ReaxFF and DFTB.

6.6.1.1. Only singlepoint calculations in the job collection

For ML potentials, only singlepoint calculations may enter the job collection. The original reference job can still be of any type (geometry optimization, PES Scan, …).

Example: if you import a DFT-calculated bond scan (PES Scan), you must import it using the “Add PESScan Singlepoints” option, not “add singlejob with Task=’PESScan’”.

Any jobs in the job collection with Task different from “SinglePoint” will be ignored.

6.6.1.2. Only single extractors in the training and validation sets

Similarly, for the training and validation sets, the expressions can only contain one extractor acting on a single job. This means that you cannot train reaction energies. Instead, you can (and should) train the total energy. As a result it is extra important that all reference data was calculated using a single level of theory.

When training forces, you must extract all force components from the job. However, depending on the backend, you may be able to set the force weights.

For task MachineLearning, only a small set of extractors (that act on singlepoint jobs) are supported:

  • energy

  • forces

Examples:

Expression

Task Optimization

Task MachineLearning

energy("job")

OK

OK

forces("job")

OK

OK

energy("job1")-energy("job2")

OK

Not OK

forces("job", 3, 2)

OK

Not OK

cell_volume("job")

OK

Not OK

Expressions that do not follow the above requirements will be ignored during the ML training, but they will still be stored on disk. This means that if you after training your ML potential switch to the ParAMS SinglePoint Task, you can use any expressions and job tasks to test/validate/benchmark your trained potential.

6.6.1.3. The engine settings must be the same for all jobs

When you train for example DFTB, you can have different engine settings for different jobs. For example, you might want the k-space sampling to be different depending on the system.

However, when training machine learning potentials, you cannot set any job-dependent (structure-dependent) engine settings. Every job (structure) will use the same settings.

6.6.2. Machine Learning Input Structure

The input for the ParAMS Task MachineLearning is structured as follows:

MachineLearning has multiple backends that can be selected through the Backend key. Each backend has a corresponding block (the same name as the value of the Backend key) with settings specific to that backend. Additionally, there are several shared keywords, such as MaxEpochs that modify the behaviour of the backends in the same way.

Each backend might support multiple models and has a corresponding block (the same name as the value of the Model key) with settings specific to that model. For example the number of layers or how to initialize the parameters.

Some models consist of only a single key rather than a block. For example when a backend supports loading some file that contains model settings and parameters.

Any number of settings may exist in the top level of a backend block that are appropriate for all models.

The MachineLearning%LoadModel loads a previously fitted model from a ParAMS results directory. The ParAMS results directory must contain the two subdirectories optimization and settings_and_initial_data. Enabling MachineLearning%LoadModel enforces the same Backend and CommitteeSize as in the previous job and will ignore the model keys. Instead it reads them from the previous ParAMS calculation. Any settings in the model blocks are ignored. If any settings in the backend blocks are incompatible with the loaded model then ParAMS will crash or behave undefined.

The exact same backend and model settings are used for every committee member no matter the CommitteeSize, although the models can still be different due to stochastic effects (e.g. random parameters or a stochastic optimization algorithm). When using LoadModel the committee from the previous calculation is used.

Tip

Learn using the ParAMS input for Task MachineLearning from the tutorials.

MachineLearning
Type

Block

Description

Options for Task MachineLearning.

Backend
Type

Multiple Choice

Default value

M3GNet

Options

[M3GNet, NequIP]

Description

The backend to use. You must separately install the backend before running a training job.

MaxEpochs
Type

Integer

Default value

1000

Description

Set the maximum number of epochs a backend should perform.

LossCoeffs
Type

Block

Description

Modify the coefficients for the machine learning loss function. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

AverageForcePerAtom
Type

Bool

Default value

No

Description

For each force data entry, divide the loss contribution by the number of concomittent atoms. This is the same as the behavior for ParAMS Optimization, but it is turned off by default in Task MachineLearning. For machine learning, setting this to ‘No’ can be better since larger molecules will contribute more to the loss. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Energy
Type

Float

Default value

10.0

GUI name

Energy coefficient

Description

Coefficient for the contribution of loss due to the energy. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Forces
Type

Float

Default value

1.0

GUI name

Forces coefficient

Description

Coefficient for the contribution of loss due to the forces. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Target
Type

Block

Description

Target values for stopping training. If both the training and validation metrics are smaller than the specified values, the training will stop early. Only supported by the M3GNet backend.

Forces
Type

Block

Description

Forces (as reported by the backend)

Enabled
Type

Bool

Default value

Yes

Description

Whether to use target values for forces.

MAE
Type

Float

Default value

0.05

Unit

eV/angstrom

Description

MAE for forces (as reported by the backend).

LoadModel
Type

String

Description

Load a previously fitted model from a ParAMS results directory. A ParAMS results directory should contain two subdirectories optimization and settings_and_initial_data. This option ignores all settings inside model blocks.

CommitteeSize
Type

Integer

Default value

1

Description

The number of independently trained ML potentials.

6.6.3. Backends: M3GNet, NequIP, …

6.6.3.1. Installation

The ML backends are not included by default with AMS or ParAMS, as they can be quite large. Before you can train an ML potential, you need to install the corresponding backend either through the AMS package manager or manually.

Tip

Before training a custom model with ParAMS, we recommend that you first test the ML backend in a production (for example, molecular dynamics or geometry optimization) simulation with some already created parameters. For example, follow the M3GNet GUI tutorial to make sure that the M3GNet backend has been installed correctly.

6.6.3.2. M3GNet

MachineLearning
Type

Block

Description

Options for Task MachineLearning.

M3GNet
Type

Block

Description

Options for M3GNet fitting.

Custom
Type

Block

Description

Specify a custom M3GNet model.

Cutoff
Type

Float

Default value

5.0

Unit

angstrom

Description

Cutoff radius of the graph

MaxL
Type

Integer

Default value

3

Description

Include spherical components up to order MaxL. Higher gives a better angular resolution, but increases computational cost substantially.

MaxN
Type

Integer

Default value

3

Description

Include radial components up to the MaxN’th root of the spherical Bessel function. Higher gives a better radial resolution, but increases computational cost substantially.

NumBlocks
Type

Integer

Default value

3

GUI name

Number of convolution blocks:

Description

Number of convolution blocks.

NumNeurons
Type

Integer

Default value

64

GUI name

Number of neurons per layer

Description

Number of neurons in each layer.

ThreebodyCutoff
Type

Float

Default value

4.0

Unit

angstrom

Description

Cutoff radius of the three-body interaction.

LearningRate
Type

Float

Default value

0.001

Description

Learning rate for the M3GNet weight optimization.

Model
Type

Multiple Choice

Default value

UniversalPotential

Options

[UniversalPotential, Custom, ModelDir]

Description

How to specify the model for the M3GNet backend. Either a Custom model can be made from scratch or an existing model directory can be loaded to obtain the model settings.

ModelDir
Type

String

Description

Path to the directory defining the model. This folder should contain the files: ‘checkpoint’, ‘m3gnet.data-00000-of-00001’, ‘ m3gnet.index’ and ‘m3gnet.json’

UniversalPotential
Type

Block

Description

Settings for (transfer) learning with the M3GNet Universal Potential.

Featurizer
Type

Bool

Default value

No

GUI name

Train featurizer

Description

Train the Featurizer layer of the M3GNet universal potential.

Final
Type

Bool

Default value

Yes

GUI name

Train final layer

Description

Train the Final layer of the M3GNet universal potential.

GraphLayer1
Type

Bool

Default value

No

GUI name

Train layer 1 - graph

Description

Train the first Graph layer of the M3GNet universal potential.

GraphLayer2
Type

Bool

Default value

No

GUI name

Train layer 2 - graph

Description

Train the second Graph layer of the M3GNet universal potential.

GraphLayer3
Type

Bool

Default value

Yes

GUI name

Train layer 3 - graph

Description

Train the third Graph layer of the M3GNet universal potential.

ThreeDInteractions1
Type

Bool

Default value

No

GUI name

Train layer 1 - 3D interactions

Description

Train the first ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

ThreeDInteractions2
Type

Bool

Default value

No

GUI name

Train layer 2 - 3D interactions

Description

Train the second ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

ThreeDInteractions3
Type

Bool

Default value

Yes

GUI name

Train layer 3 - 3D interactions

Description

Train the third ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

M3Gnet produces the parameter directory <calculation name>.results/optimization/m3gnet/results/model which contains the parametrized model and can be used with the MLPotential engine. Set Backend M3GNet and ParameterDir to the path of the deployed model.

The M3GNet universal potential has the following architecture/structure:

Layer (type)

Param #

radius_cutoff_graph_converter (RadiusCutoffGraphConverter)

0 (unused)

graph_featurizer (GraphFeaturizer)

6080

graph_update_func (GraphUpdateFunc)

192

spherical_bessel_with_harmonics (SphericalBesselWithHarmonics)

0

three_d_interaction (ThreeDInteraction)

1737

three_d_interaction_1 (ThreeDInteraction)

1737

three_d_interaction_2 (ThreeDInteraction)

1737

graph_network_layer (GraphNetworkLayer)

66432

graph_network_layer_1 (GraphNetworkLayer)

66432

graph_network_layer_2 (GraphNetworkLayer)

66432

pipe_24 (Pipe)

16770

atom_ref_2 (AtomRef)

0

Total params: 227,549

6.6.3.3. NequIP

Important

Training NequIP potentials with ParAMS is not a fully supported feature. To use NequIP with AMS, or to train NequIP with ParAMS, you need to manually install it into the AMS Python environment.

SCM does not provide any packages for NequIP and cannot provide support for the installation. But we have compiled some helpful tips in the Engine ASE documentation that may help you with the installation.

The options for NequIP are:

MachineLearning
Type

Block

Description

Options for Task MachineLearning.

NequIP
Type

Block

Description

Options for NequIP fitting.

Custom
Type

Block

Description

Specify a custom NequIP model.

LMax
Type

Integer

Default value

1

Description

Maximum L value. 1 is probably high enough.

MetricsKey
Type

Multiple Choice

Default value

validation_loss

Options

[training_loss, validation_loss]

Description

Which metric to use to generate the ‘best’ model.

NumLayers
Type

Integer

Default value

4

Description

Number of interaction layers in the NequIP neural network.

RMax
Type

Float

Default value

3.5

Unit

angstrom

GUI name

Distance cutoff

Description

Distance cutoff for interactions.

LearningRate
Type

Float

Default value

0.005

Description

Learning rate for the NequIP weight optimization

Model
Type

Multiple Choice

Default value

Custom

Options

[Custom, ModelFile]

Description

How to specify the model for the NequIP backend. Either a Custom model can be made from scratch or an existing ‘model.pth’ file can be loaded to obtain the model settings.

ModelFile
Type

String

Description

Path to the model.pth file defining the model.

UseRescalingFromLoadedModel
Type

Bool

Default value

Yes

Description

When loading a model with LoadModel or NequiP%ModelFile do not recalculate the dataset rescaling but use the value from the loaded model.

NequIP produces the file <calculation name>.results/optimization/nequip/results/model.pth which contains the deployed model and can be used with the MLPotential engine. Set Backend NequIP and ParameterFile to the path of the deployed model.

6.6.4. ML Parallelization

Parallelization options can be set with ParallelLevels. Note that Task MachineLearning does not perform AMS jobs during optimization, so the parallelization options are different.

Select the maximum number of parallel committee members with CommitteeMembers or set it to zero to run all committee members in parallel (up to the maximum number of cores or the NSCM environment variable). Select the number of cores each committee is allowed to use with Cores or set it to zero (default) to evenly distribute the available cores over the committee members running in parallel.

Some backends may spawn additional threads for database management, but they should not be using substantial cpu time. GPU offloading is supported through TensorFlow or Pytorch depending on the backend. Currently there are no settings available in ParAMS for GPU offloading, the backends use GPU resources according to their documentation.

ParallelLevels
Type

Block

GUI name

Parallelization distribution:

Description

Distribution of threads/processes between the parallelization levels.

CommitteeMembers
Type

Integer

Default value

1

GUI name

Number of parallel committee members

Description

Maximum number of committee member optimizations to run in parallel. If set to zero will take the minimum of MachineLearning%CommitteeSize and the number of available cores (NSCM)

Cores
Type

Integer

Default value

0

GUI name

Processes (per Job)

Description

Number of cores to use per committee member optimization. By default (0) the available cores (NSCM) divided equally among committee members. When using GPU offloading, consider setting this to 1.