Theory and usage¶

The MLPotential engine in the Amsterdam Modeling Suite can calculate the potential energy surface using several different types of machine learning (ML) potentials.

What’s new in AMS2023.1?¶

New model: M3GNet-UP-2022 based on M3GNet. This is a universal potential (UP) that can be used for the entire periodic table of elements up to, but excluding, Curium (Cm, 96).
New backend: M3GNet
PiNN is no longer a backend in MLPotential, but you can use it through Engine ASE.

Quickstart guide¶

To set up a simple MLPotential job using the graphical user interface, see the

Theory of ML potentials¶

With machine learning potentials, it is possible to quickly evaluate the energies and forces in a system with close to first-principles accuracy. Machine learning potentials are fitted (trained, parameterized) to reproduce reference data, typically calculated using an ab initio or DFT method. Machine learning potentials are sometimes referred to as machine learning force fields, or as interatomic potentials based on machine learning.

Several types of machine learning potentials exist, for example neural-network-based methods and kernel-based methods.

Several types of neural network potentials exist. It is common for such potentials to calculate the total energy as a sum of atomic contributions. In a high-dimensional neural network potential (HDNNP), as proposed by Behler and Parrinello 1, each atomic contribution is calculated by means of a feed-forward neural network, that takes in a representation of the chemical environment around the atom as input. This representation, or atomic environment descriptor or fingerprint, consists of a vector of rotationally, translationally, and permutationally invariant functions known as atom-centered symmetry functions (ACSF).

Graph convolutional neural network potentials (GCNNPs), or message-passing network neural potentials, similarly construct the total energy by summing up atomic contribution, but the appropriate representations of local atomic chemical environments are learned from the reference data.

Kernel-based methods make predictions based on how similar a system is to the systems in the training set.

There are also other types of machine learning potentials. For more detailed information, see for example references 2 and 3.

Installation and uninstallation¶

The Amsterdam Modeling Suite requires the installation of additional Python packages to run the machine learning potential backends.

If you set up an MLPotential job via the graphical user interface, you will be asked to install the packages if they have not been installed already when you save your input. You can also use the package manager. A command-line installation tool can also be used, for instance to install the torchani backend:

"$AMSBIN"/amspackages install torchani

You can use the command line installer to install these packages on a remote system, so that you can seamlessly run MLPotential jobs also on remote machines.

The packages are installed into the AMS Python environment, and do not affect any other Python installation on the system. For the installation, an internet connection is required, unless you have configured the AMS package manager for offline use .

To uninstall a package, e.g. torchani, run:

"$AMSBIN"/amspackages remove torchani

Installing GPU enabled backends using AMSpackages¶

New in version AMS2023.101.

Various versions of the ML potential packages are available through the AMSpackages, with different system dependencies such as GPU drivers. The option can be selected under the “ML options” menu in the graphical package manager (SCM -> Packages). You can choose from the following options,

CPU, will install CPU-only backends, including PyTorch and Tensorflow-CPU.
GPU (Cuda 11.6), will install GPU enabled backends, including Tensorflow, and a CUDA 11.6 specific version of pyTorch.
GPU (Cuda 11.7), will install GPU enabled backends, including Tensorflow, but will include CUDA 11.7 enabled pyTorch instead.

The default is CPU. Note that this is the only option available under MacOS.

Using the package manager on the the command line or in shell scripts you can use the --alt flag, together with one of the options. On the command line the options are denoted as mlcpu, mlcu116 and mlcu117 respectively. To install GPU enabled versions of the ML potential backends on the command line, for instance using the CUDA 11.7 enabled version of PyTorch:

$ "$AMSBIN"/amspackages --alt mlcu117 install mlpotentials
Going to install packages:
nvidia-cuda-runtime-cu11 v[11.7.99] - build:0
tensorflow v[2.9.1] - build:0
All ML Potential backends v[2.0.0] - build:0
torch v[1.13.1+cu117] - build:0
nvidia-cudnn-cu11 v[8.5.0.96] - build:0
M3GNet ML Backend v[0.2.4] - build:0
sGDML Calculator patch v[0.4.4] - build:0
TorchANI Calculator patch v[2.2] - build:0
SchNetPack ML Backend v[1.0.0] - build:0
nvidia-cuda-nvrtc-cu11 v[11.7.99] - build:0
nvidia-cublas-cu11 v[11.10.3.66] - build:0
ANI Models for TorchANI backend v[2.2] - build:0
TorchANI NN module patch v[2.2] - build:0
TorchANI ML backend v[2.2] - build:0
sGDML ML backend v[0.4.4] - build:0

Alternatively, to install a single backend for instance torchani:

"$AMSBIN"/amspackages --alt mlcu117 install torchani

To change the default value, you can set an environment variable SCM_AMSPKGS_ALTERNATIVES. For advanced configuration options of the package installation, see also the package manager instructions.

Installing packages using pip¶

The package manager installs trusted and tested versions of packages from our website, but if you require a different version you can use pip to install packages from https://pypi.org:

"$AMSBIN"/amspython -m pip install -U torch

Note

Packages installed through pip alone by the user will not show up as installed in the package manager, but they will be detected and used if possible.

If you install a package into your amspython environment, using amspython -m pip install, the package manager will not display it in its overview. However, it will allow you to make use of it for running calculations with the ML Potential module. If you want to make sure that the version you installed will be detected, you can use

$ "$AMSBIN"/amspackages check --pip torch
05-11 10:47:57 torch is not installed!
05-11 10:47:57 User installed version located through pip: torch==1.8.1

Not all versions of the packages on PyPI work with our ML potential backends.

Included (pre-parameterized) models¶

A model is the combination of a functional form with a set of parameters. Four pre-parameterized models can be selected: M3GNet-UP-2022 (Universal Potential), ANI-2x, ANI-1ccx, and ANI-1x. The predictions from the ANI-* models are calculated from ensembles, meaning that the final prediction is an average over several independently trained neural networks.

Table 1 Pre-parameterized models for the MLPotential engine¶
	M3GNet-UP-2022	ANI-2x	ANI-1ccx	ANI-1x
Functional form	NNP	HDNNP	HDNNP	HDNNP
Ensemble size	1	8	8	8
Atomic environment descriptor	m3gnet	ACSF	ACSF	ACSF
Supported elements	H, He, Li, .., Am	H, C, N, O, F, S, Cl	H, C, N, O	H, C, N, O
Training set structures	materials project	organic molecules	organic molecules	organic molecules
Reference method	PBE	ωB97-x/6-31G(d)	DLPNO-CCSD(T)/CBS	ωB97-x/6-31G(d)
Backend	m3gnet	TorchANI	TorchANI	TorchANI
Reference	4	5	6	7

For the ANI-*x models, the standard deviation for the energy predictions are calculated for the “main” output molecule (e.g., the final point of a geometry optimization). The summary statistics can be found in the mlpotential.txt file in the worker.0 subdirectory of the results directory.

Model

Type: Multiple Choice
Default value: ANI-2x
Options: [Custom, ANI-1ccx, ANI-1x, ANI-2x, M3GNet-UP-2022]
Description: Select a particular parameterization. ANI-1x and ANI-2x: based on DFT (wB97X) ANI-1cxx: based on DLPNO-CCSD(T)/CBS M3GNet-UP-2022: based on DFT (PBE) data. ANI-1x and ANI-1ccx have been parameterized to give good geometries, vibrational frequencies, and reaction energies for gasphase organic molecules containing H, C, O, and N. ANI-2x can also handle the atoms F, S, and Cl. M3GNet-UP-2022 is a universal potential (UP) for the entire periodic table and has been primarily trained to crystal data (energies, forces, stresses) from the Materials Project. Set to Custom to specify the backend and parameter files yourself.

Custom models (custom parameters)¶

Set Model to Custom and specify which backend to use with the Backend option. In a typical case, you would have used that backend to train your own machine learning potential.

The backend reads the parameters, and any other necessary information (for example neural network architecture), from either a file or a directory. Specify the ParameterFile or ParameterDir option accordingly, with a path to the file or directory. Read the backend’s documentation to find out which option is appropriate.

Some backends may require that an energy unit (MLEnergyUnit) and/or distance unit (MLDistanceUnit) be specified. These units correspond to the units used during the training of the machine learning potential.

Example:

Engine MLPotential
    Backend SchNetPack
    Model Custom
    ParameterFile ethanol.schnet-model
    MLEnergyUnit kcal/mol
    MLDistanceUnit angstrom
EndEngine

Backend

Type: Multiple Choice
Options: [M3GNet, NequIP, SchNetPack, sGDML, TorchANI]
Description: The machine learning potential backend.

MLDistanceUnit

Type: Multiple Choice
Default value: Auto
Options: [Auto, angstrom, bohr]
GUI name: Internal distance unit
Description: Unit of distances expected by the ML backend (not the ASE calculator). The ASE calculator may require this information.

MLEnergyUnit

Type: Multiple Choice
Default value: Auto
Options: [Auto, Hartree, eV, kcal/mol, kJ/mol]
GUI name: Internal energy unit
Description: Unit of energy output by the ML backend (not the unit output by the ASE calculator). The ASE calculator may require this information.

ParameterDir

Type: String
Default value
GUI name: Parameter directory
Description: Path to a set of parameters for the backend, if it expects to read from a directory.

ParameterFile

Type: String
Default value
Description: Path to a set of parameters for the backend, if it expects to read from a file.

Backends¶

Table 2 Backends supported by the MLPotential engine.¶
	M3GNet	SchNetPack	sGDML	TorchANI
Reference	4	10	11	12
Methods	m3gnet	HDNNPs, GCNNPs, …	GDML, sGDML	[ensembles of] HDNNPs
Pre-built models	M3GNet-UP-2022	none	none	ANI-1x, ANI-2x, ANI-1ccx
Parameters from	ParameterDir	ParameterFile	ParameterFile	ParameterFile
Kernel-based	No	No	Yes	No
ML framework	TensorFlow 2.9.1	PyTorch	none, PyTorch	PyTorch

Note

Starting with AMS2023, PiNN 9 is only supported as a custom Calculator through Engine ASE 8.

Note

For sGDML, the order of the atoms in the input file must match the order of atoms which was used during the fitting of the model.

Note

If you use a custom parameter file with TorchANI, the model specified via ParameterFile filename.pt is loaded with torch.load('filename.pt')['model'], such that a forward call should be accessible via torch.load('filename.pt')['model']((species, coordinates)). The energy shifter is not read from custom parameter files, so the absolute predicted energies will be shifted with respect to the reference data, but this does not affect relative energies (e.g., reaction energies).

CPU and GPU (CUDA), parallelization¶

By default a calculation will run on the CPU, and use all available CPU power. To limit the number of threads, the NumThreads keyword can be used if the backend uses PyTorch as its machine learning framework. Alternatively, you can set the environment variable OMP_NUM_THREADS.

To use a CUDA-enabled GPU, ensure that a CUDA-enabled version of TensorFlow or PyTorch has been installed (see Installation and uninstallation). Then set Device to the device on which you would like to run, for example, cuda:0. Calculations are typically much faster on the GPU than on the CPU.

Device

Type: Multiple Choice
Default value
Options: [, cpu, cuda:0, cuda:1]
Description: Device on which to run the calculation (e.g. cpu, cuda:0). If empty, the device can be controlled using environment variables for TensorFlow or PyTorch.

NumThreads

Type: String
Default value
GUI name: Number of threads
Description: Number of threads. If not empty, OMP_NUM_THREADS will be set to this number; for PyTorch-engines, torch.set_num_threads() will be called.

Note

Because the calculation runs in a separate process, the number of threads is controlled by the input keyword NumThreads and not by the environment variable NSCM. We recommend setting NSCM=1 when using the MLPotential engine.

Only single-node calculations are currently supported.

Troubleshooting¶

If you run a PyTorch-based backend and receive an error message starting with:

sh: line 1: 1351 Illegal instruction: 4 sh

you may be attempting to run PyTorch on a rather old cpu. You could try to upgrade PyTorch to a newer version:

"$AMSBIN"/amspython -m pip install torch -U -f https://download.pytorch.org/whl/torch_stable.html

If this does not help, please contact SCM support.

Support¶

SCM does not provide support for parameterization using the MLPotential backends. SCM only provides technical (non-scientific) support for running simulations via the AMS driver.

Technical information¶

Each of the supported backends can be used as ASE (Atomic Simulation Environment) calculators. The MLPotential engine is an interface to those ASE calculators. The communication between the AMS driver and the backends is implemented with a named pipe interface. The MLPotential engine launches a python script, ase_calculators.py, which initializes the ASE calculator. The exact command that is executed is written as WorkerCommand in the output.

References¶

1: J. Behler, M. Parrinello. Phys. Rev. Lett. 98 (2007) 146401 https://doi.org/10.1103/PhysRevLett.98.146401
2: J. Behler. J. Chem. Phys. 145 (2016) 170901. https://doi.org/10.1063/1.4966192
3: T. Mueller, A. Hernandez, C. Wang. J. Chem. Phys. 152 (2020) 050902. https://doi.org/10.1063/1.4966192
4(1,2): C. Chen, S. P. Ong. Nature Computational Science 2, 718–728 (2022). arXiv.2202.02450.
5: C. Devereux et al., J. Chem. Theory Comput. 16 (2020) 4192-4202. https://doi.org/10.1021/acs.jctc.0c00121
6: J. S. Smith et al., Nat. Commun. 10 (2019) 2903. https://doi.org/10.1038/s41467-019-10827-4
7: J. S. Smith et al., J. Chem. Phys. 148 (2018) 241733. https://doi.org/10.1063/1.5023802
8: https://wiki.fysik.dtu.dk/ase/index.html
9: Y. Shao et al., J. Chem. Inf. Model. 60 (2020) 1184-1193. https://doi.org/10.1021/acs.jcim.9b00994
10: K. T. Schütt et al., J. Chem. Theory Comput. 15 (2019) 448-455. https://doi.org/10.1021/acs.jctc.8b00908
11: S. Chmiela et al. Comp. Phys. Commun. 240 (2019) 38-45. https://doi.org/10.1016/j.cpc.2019.02.007
12: X. Gao et al. J. Chem. Inf. Model (2020). https://doi.org/10.1021/acs.jcim.0c00451

Electronic Structure

ADF

Periodic DFT

DFTB & MOPAC

Interatomic Potentials

ReaxFF

Machine Learning Potentials

Force Fields

kMC and Microkinetics

Bumblebee: OLED stacks

Fluid Thermodynamics

COSMO-RS

Workflows and Utilities

OLED workflows

ChemTraYzer2

Conformers

Reactions Discovery

AMS Driver

Properties

PES Exploration

Molecular Dynamics

Monte Carlo

Interfaces

ParAMS

PLAMS

GUI

VASP

Downloads

Windows

Mac

Linux

Documentation

Overview

Tutorials

Installation Manual

Brochures

Other Resources

Changelog

Workshops

Knowledgebank

FAQ

Pricing and licensing

Theory and usage¶

What’s new in AMS2023.1?¶

Quickstart guide¶

Theory of ML potentials¶

Installation and uninstallation¶

Installing GPU enabled backends using AMSpackages¶

Installing packages using pip¶

Included (pre-parameterized) models¶

Custom models (custom parameters)¶

Backends¶

CPU and GPU (CUDA), parallelization¶

Troubleshooting¶

Support¶

Technical information¶

References¶