6.6. Task: MachineLearning¶

Set Task MachineLearning to fit machine learning potentials (ML potentials). In ParAMS, all supported types of ML potentials can be trained as committee models that provide an estimate of the uncertainty of predicted energies and forces during production simulations.

Training ML Potentials through ParAMS requires

the job collection, and
training and validation sets

You can construct these using the results importers, just as for ReaxFF and DFTB parametrization.

Note

Unlike ReaxFF and DFTB parametrization, no Parameter Interface is needed. This is because ML potentials usually contain many thousands of parameters. It is typically not useful to manually control the values and ranges for all of those parameters.

You also need to specify which Backend to use, for example M3GNet.

6.6.1. Requirements for job collection and data sets¶

The machine learning potentials are trained in a quite different way from how ParAMS trains ReaxFF and DFTB.

6.6.1.1. Only singlepoint calculations in the job collection¶

For ML potentials, only singlepoint calculations may enter the job collection. The original reference job can still be of any type (geometry optimization, PES Scan, …).

Example: if you import a DFT-calculated bond scan (PES Scan), you must import it using the “Add PESScan Singlepoints” option, not “add singlejob with Task=’PESScan’”.

Any jobs in the job collection with Task different from “SinglePoint” will be ignored.

6.6.1.2. Only single extractors in the training and validation sets¶

Similarly, for the training and validation sets, the expressions can only contain one extractor acting on a single job. This means that you cannot train reaction energies. Instead, you can (and should) train the total energy. As a result it is extra important that all reference data was calculated using a single level of theory.

When training forces, you must extract all force components from the job. However, depending on the backend, you may be able to set the force weights.

For task MachineLearning, only a small set of extractors (that act on singlepoint jobs) are supported:

energy
forces

Examples:

Expression	Task Optimization	Task MachineLearning
`energy("job")`	OK	OK
`forces("job")`	OK	OK
`energy("job1")-energy("job2")`	OK	Not OK
`forces("job", 3, 2)`	OK	Not OK
`cell_volume("job")`	OK	Not OK

Expressions that do not follow the above requirements will be ignored during the ML training, but they will still be stored on disk. This means that if you after training your ML potential switch to the ParAMS SinglePoint Task, you can use any expressions and job tasks to test/validate/benchmark your trained potential.

6.6.1.3. The engine settings must be the same for all jobs¶

When you train for example DFTB, you can have different engine settings for different jobs. For example, you might want the k-space sampling to be different depending on the system.

However, when training machine learning potentials, you cannot set any job-dependent (structure-dependent) engine settings. Every job (structure) will use the same settings.

6.6.2. Machine Learning Input Structure¶

The input for the ParAMS Task MachineLearning is structured as follows:

MachineLearning has multiple backends that can be selected through the Backend key. Each backend has a corresponding block (the same name as the value of the Backend key) with settings specific to that backend. Additionally, there are several shared keywords, such as MaxEpochs that modify the behaviour of the backends in the same way.

Each backend might support multiple models and has a corresponding block (the same name as the value of the Model key) with settings specific to that model. For example the number of layers or how to initialize the parameters.

Some models consist of only a single key rather than a block. For example when a backend supports loading some file that contains model settings and parameters.

Any number of settings may exist in the top level of a backend block that are appropriate for all models.

The MachineLearning%LoadModel loads a previously fitted model from a ParAMS results directory. The ParAMS results directory must contain the two subdirectories optimization and settings_and_initial_data. Enabling MachineLearning%LoadModel enforces the same Backend and CommitteeSize as in the previous job and will ignore the model keys. Instead it reads them from the previous ParAMS calculation. Any settings in the model blocks are ignored. If any settings in the backend blocks are incompatible with the loaded model then ParAMS will crash or behave undefined.

The exact same backend and model settings are used for every committee member no matter the CommitteeSize, although the models can still be different due to stochastic effects (e.g. random parameters or a stochastic optimization algorithm). When using LoadModel the committee from the previous calculation is used.

Tip

Learn using the ParAMS input for Task MachineLearning from the tutorials.

MachineLearning

Type: Block
Description: Options for Task MachineLearning.

Backend

Type: Multiple Choice
Default value: M3GNet
Options: [M3GNet, NequIP]
Description: The backend to use. You must separately install the backend before running a training job.

MaxEpochs

Type: Integer
Default value: 1000
Description: Set the maximum number of epochs a backend should perform.

LossCoeffs

Type: Block
Description: Modify the coefficients for the machine learning loss function. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

AverageForcePerAtom

Type: Bool
Default value: No
Description: For each force data entry, divide the loss contribution by the number of concomittent atoms. This is the same as the behavior for ParAMS Optimization, but it is turned off by default in Task MachineLearning. For machine learning, setting this to ‘No’ can be better since larger molecules will contribute more to the loss. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Energy

Type: Float
Default value: 10.0
GUI name: Energy coefficient
Description: Coefficient for the contribution of loss due to the energy. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Forces

Type: Float
Default value: 1.0
GUI name: Forces coefficient
Description: Coefficient for the contribution of loss due to the forces. For backends that support weights, this is on top of the supplied dataset weights and sigmas.

Target

Type: Block
Description: Target values for stopping training. If both the training and validation metrics are smaller than the specified values, the training will stop early. Only supported by the M3GNet backend.

Forces

Type: Block
Description: Forces (as reported by the backend)

Enabled

Type: Bool
Default value: Yes
Description: Whether to use target values for forces.

MAE

Type: Float
Default value: 0.05
Unit: eV/angstrom
Description: MAE for forces (as reported by the backend).

LoadModel

Type: String
Description: Load a previously fitted model from a ParAMS results directory. A ParAMS results directory should contain two subdirectories optimization and settings_and_initial_data. This option ignores all settings inside model blocks.

CommitteeSize

Type: Integer
Default value: 1
Description: The number of independently trained ML potentials.

6.6.3. Backends: M3GNet, NequIP, …¶

6.6.3.1. Installation¶

The ML backends are not included by default with AMS or ParAMS, as they can be quite large. Before you can train an ML potential, you need to install the corresponding backend either through the AMS package manager or manually.

Tip

Before training a custom model with ParAMS, we recommend that you first test the ML backend in a production (for example, molecular dynamics or geometry optimization) simulation with some already created parameters. For example, follow the M3GNet GUI tutorial to make sure that the M3GNet backend has been installed correctly.

6.6.3.2. M3GNet¶

MachineLearning

Type: Block
Description: Options for Task MachineLearning.

M3GNet

Type: Block
Description: Options for M3GNet fitting.

Custom

Type: Block
Description: Specify a custom M3GNet model.

Cutoff

Type: Float
Default value: 5.0
Unit: angstrom
Description: Cutoff radius of the graph

MaxL

Type: Integer
Default value: 3
Description: Include spherical components up to order MaxL. Higher gives a better angular resolution, but increases computational cost substantially.

MaxN

Type: Integer
Default value: 3
Description: Include radial components up to the MaxN’th root of the spherical Bessel function. Higher gives a better radial resolution, but increases computational cost substantially.

NumBlocks

Type: Integer
Default value: 3
GUI name: Number of convolution blocks:
Description: Number of convolution blocks.

NumNeurons

Type: Integer
Default value: 64
GUI name: Number of neurons per layer
Description: Number of neurons in each layer.

ThreebodyCutoff

Type: Float
Default value: 4.0
Unit: angstrom
Description: Cutoff radius of the three-body interaction.

LearningRate

Type: Float
Default value: 0.001
Description: Learning rate for the M3GNet weight optimization.

Model

Type: Multiple Choice
Default value: UniversalPotential
Options: [UniversalPotential, Custom, ModelDir]
Description: How to specify the model for the M3GNet backend. Either a Custom model can be made from scratch or an existing model directory can be loaded to obtain the model settings.

ModelDir

Type: String
Description: Path to the directory defining the model. This folder should contain the files: ‘checkpoint’, ‘m3gnet.data-00000-of-00001’, ‘ m3gnet.index’ and ‘m3gnet.json’

UniversalPotential

Type: Block
Description: Settings for (transfer) learning with the M3GNet Universal Potential.

Featurizer

Type: Bool
Default value: No
GUI name: Train featurizer
Description: Train the Featurizer layer of the M3GNet universal potential.

Final

Type: Bool
Default value: Yes
GUI name: Train final layer
Description: Train the Final layer of the M3GNet universal potential.

GraphLayer1

Type: Bool
Default value: No
GUI name: Train layer 1 - graph
Description: Train the first Graph layer of the M3GNet universal potential.

GraphLayer2

Type: Bool
Default value: No
GUI name: Train layer 2 - graph
Description: Train the second Graph layer of the M3GNet universal potential.

GraphLayer3

Type: Bool
Default value: Yes
GUI name: Train layer 3 - graph
Description: Train the third Graph layer of the M3GNet universal potential.

ThreeDInteractions1

Type: Bool
Default value: No
GUI name: Train layer 1 - 3D interactions
Description: Train the first ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

ThreeDInteractions2

Type: Bool
Default value: No
GUI name: Train layer 2 - 3D interactions
Description: Train the second ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

ThreeDInteractions3

Type: Bool
Default value: Yes
GUI name: Train layer 3 - 3D interactions
Description: Train the third ThreeDInteractions (three-body terms) layer of the M3GNet universal potential.

M3Gnet produces the parameter directory <calculation name>.results/optimization/m3gnet/results/model which contains the parametrized model and can be used with the MLPotential engine. Set Backend M3GNet and ParameterDir to the path of the deployed model.

The M3GNet universal potential has the following architecture/structure:

Layer (type)	Param #
radius_cutoff_graph_converter (RadiusCutoffGraphConverter)	0 (unused)
graph_featurizer (GraphFeaturizer)	6080
graph_update_func (GraphUpdateFunc)	192
spherical_bessel_with_harmonics (SphericalBesselWithHarmonics)	0
three_d_interaction (ThreeDInteraction)	1737
three_d_interaction_1 (ThreeDInteraction)	1737
three_d_interaction_2 (ThreeDInteraction)	1737
graph_network_layer (GraphNetworkLayer)	66432
graph_network_layer_1 (GraphNetworkLayer)	66432
graph_network_layer_2 (GraphNetworkLayer)	66432
pipe_24 (Pipe)	16770
atom_ref_2 (AtomRef)	0

Total params: 227,549

6.6.3.3. NequIP¶

Important

Training NequIP potentials with ParAMS is not a fully supported feature. To use NequIP with AMS, or to train NequIP with ParAMS, you need to manually install it into the AMS Python environment.

SCM does not provide any packages for NequIP and cannot provide support for the installation. But we have compiled some helpful tips in the Engine ASE documentation that may help you with the installation.

The options for NequIP are:

MachineLearning

Type: Block
Description: Options for Task MachineLearning.

NequIP

Type: Block
Description: Options for NequIP fitting.

Custom

Type: Block
Description: Specify a custom NequIP model.

LMax

Type: Integer
Default value: 1
Description: Maximum L value. 1 is probably high enough.

MetricsKey

Type: Multiple Choice
Default value: validation_loss
Options: [training_loss, validation_loss]
Description: Which metric to use to generate the ‘best’ model.

NumLayers

Type: Integer
Default value: 4
Description: Number of interaction layers in the NequIP neural network.

RMax

Type: Float
Default value: 3.5
Unit: angstrom
GUI name: Distance cutoff
Description: Distance cutoff for interactions.

LearningRate

Type: Float
Default value: 0.005
Description: Learning rate for the NequIP weight optimization

Model

Type: Multiple Choice
Default value: Custom
Options: [Custom, ModelFile]
Description: How to specify the model for the NequIP backend. Either a Custom model can be made from scratch or an existing ‘model.pth’ file can be loaded to obtain the model settings.

ModelFile

Type: String
Description: Path to the model.pth file defining the model.

UseRescalingFromLoadedModel

Type: Bool
Default value: Yes
Description: When loading a model with LoadModel or NequiP%ModelFile do not recalculate the dataset rescaling but use the value from the loaded model.

NequIP produces the file <calculation name>.results/optimization/nequip/results/model.pth which contains the deployed model and can be used with the MLPotential engine. Set Backend NequIP and ParameterFile to the path of the deployed model.

6.6.4. ML Parallelization¶

Parallelization options can be set with ParallelLevels. Note that Task MachineLearning does not perform AMS jobs during optimization, so the parallelization options are different.

Select the maximum number of parallel committee members with CommitteeMembers or set it to zero to run all committee members in parallel (up to the maximum number of cores or the NSCM environment variable). Select the number of cores each committee is allowed to use with Cores or set it to zero (default) to evenly distribute the available cores over the committee members running in parallel.

Some backends may spawn additional threads for database management, but they should not be using substantial cpu time. GPU offloading is supported through TensorFlow or Pytorch depending on the backend. Currently there are no settings available in ParAMS for GPU offloading, the backends use GPU resources according to their documentation.

ParallelLevels

Type: Block
GUI name: Parallelization distribution:
Description: Distribution of threads/processes between the parallelization levels.

CommitteeMembers

Type: Integer
Default value: 1
GUI name: Number of parallel committee members
Description: Maximum number of committee member optimizations to run in parallel. If set to zero will take the minimum of MachineLearning%CommitteeSize and the number of available cores (NSCM)

Cores

Type: Integer
Default value: 0
GUI name: Processes (per Job)
Description: Number of cores to use per committee member optimization. By default (0) the available cores (NSCM) divided equally among committee members. When using GPU offloading, consider setting this to 1.

Electronic Structure

ADF

Periodic DFT

DFTB & MOPAC

Interatomic Potentials

ReaxFF

Machine Learning Potentials

Force Fields

kMC and Microkinetics

Bumblebee: OLED stacks

Fluid Thermodynamics

COSMO-RS

Workflows and Utilities

OLED workflows

ChemTraYzer2

Conformers

Reactions Discovery

AMS Driver

Properties

PES Exploration

Molecular Dynamics

Monte Carlo

Interfaces

ParAMS

PLAMS

GUI

VASP

Downloads

Windows

Mac

Linux

Documentation

Overview

Tutorials

Installation Manual

Brochures

Other Resources

Changelog

Webinars

Workshops

Knowledgebank

FAQ

Pricing and licensing

6.6. Task: MachineLearning¶

6.6.1. Requirements for job collection and data sets¶

6.6.1.1. Only singlepoint calculations in the job collection¶

6.6.1.2. Only single extractors in the training and validation sets¶

6.6.1.3. The engine settings must be the same for all jobs¶

6.6.2. Machine Learning Input Structure¶

6.6.3. Backends: M3GNet, NequIP, …¶

6.6.3.1. Installation¶

6.6.3.2. M3GNet¶

6.6.3.3. NequIP¶

6.6.4. ML Parallelization¶