Theory and usage¶
The MLPotential engine in the Amsterdam Modeling Suite can calculate the potential energy surface using several different types of machine learning (ML) potentials.
What’s new in AMS2023.1?¶
New model: M3GNet-UP-2022 based on M3GNet. This is a universal potential (UP) that can be used for the entire periodic table of elements up to, but excluding, Curium (Cm, 96).
New backend: M3GNet
PiNN is no longer a backend in MLPotential, but you can use it through Engine ASE.
Quickstart guide¶
To set up a simple MLPotential job using the graphical user interface, see the
Theory of ML potentials¶
With machine learning potentials, it is possible to quickly evaluate the energies and forces in a system with close to first-principles accuracy. Machine learning potentials are fitted (trained, parameterized) to reproduce reference data, typically calculated using an ab initio or DFT method. Machine learning potentials are sometimes referred to as machine learning force fields, or as interatomic potentials based on machine learning.
Several types of machine learning potentials exist, for example neural-network-based methods and kernel-based methods.
Several types of neural network potentials exist. It is common for such potentials to calculate the total energy as a sum of atomic contributions. In a high-dimensional neural network potential (HDNNP), as proposed by Behler and Parrinello 1, each atomic contribution is calculated by means of a feed-forward neural network, that takes in a representation of the chemical environment around the atom as input. This representation, or atomic environment descriptor or fingerprint, consists of a vector of rotationally, translationally, and permutationally invariant functions known as atom-centered symmetry functions (ACSF).
Graph convolutional neural network potentials (GCNNPs), or message-passing network neural potentials, similarly construct the total energy by summing up atomic contribution, but the appropriate representations of local atomic chemical environments are learned from the reference data.
Kernel-based methods make predictions based on how similar a system is to the systems in the training set.
There are also other types of machine learning potentials. For more detailed information, see for example references 2 and 3.
Installation and uninstallation¶
The Amsterdam Modeling Suite requires the installation of additional Python packages to run the machine learning potential backends.
If you set up an MLPotential job via the graphical user interface, you will be asked to install the packages if they have not been installed already when you save your input. You can also use the package manager. A command-line installation tool can also be used, for instance to install the torchani backend:
"$AMSBIN"/amspackages install torchani
You can use the command line installer to install these packages on a remote system, so that you can seamlessly run MLPotential jobs also on remote machines.
The packages are installed into the AMS Python environment, and do not affect any other Python installation on the system. For the installation, an internet connection is required, unless you have configured the AMS package manager for offline use .
To uninstall a package, e.g. torchani
, run:
"$AMSBIN"/amspackages remove torchani
Installing GPU enabled backends using AMSpackages¶
New in version AMS2023.101.
Various versions of the ML potential packages are available through the AMSpackages, with different system dependencies such as GPU drivers. The option can be selected under the “ML options” menu in the graphical package manager (SCM -> Packages). You can choose from the following options,
CPU, will install CPU-only backends, including PyTorch and Tensorflow-CPU.
GPU (Cuda 11.6), will install GPU enabled backends, including Tensorflow, and a CUDA 11.6 specific version of pyTorch.
GPU (Cuda 11.7), will install GPU enabled backends, including Tensorflow, but will include CUDA 11.7 enabled pyTorch instead.
The default is CPU. Note that this is the only option available under MacOS.
Using the package manager on the the command line or in shell scripts you can use the --alt
flag, together with one of the options.
On the command line the options are denoted as mlcpu, mlcu116 and mlcu117 respectively. To install GPU enabled versions of the ML potential backends on the command line, for instance using the CUDA 11.7 enabled version of PyTorch:
$ "$AMSBIN"/amspackages --alt mlcu117 install mlpotentials
Going to install packages:
nvidia-cuda-runtime-cu11 v[11.7.99] - build:0
tensorflow v[2.9.1] - build:0
All ML Potential backends v[2.0.0] - build:0
torch v[1.13.1+cu117] - build:0
nvidia-cudnn-cu11 v[8.5.0.96] - build:0
M3GNet ML Backend v[0.2.4] - build:0
sGDML Calculator patch v[0.4.4] - build:0
TorchANI Calculator patch v[2.2] - build:0
SchNetPack ML Backend v[1.0.0] - build:0
nvidia-cuda-nvrtc-cu11 v[11.7.99] - build:0
nvidia-cublas-cu11 v[11.10.3.66] - build:0
ANI Models for TorchANI backend v[2.2] - build:0
TorchANI NN module patch v[2.2] - build:0
TorchANI ML backend v[2.2] - build:0
sGDML ML backend v[0.4.4] - build:0
Alternatively, to install a single backend for instance torchani:
"$AMSBIN"/amspackages --alt mlcu117 install torchani
To change the default value, you can set an environment variable SCM_AMSPKGS_ALTERNATIVES
. For advanced configuration options of the package installation, see also the package manager instructions.
Installing packages using pip¶
The package manager installs trusted and tested versions of packages from our website, but if you require a different version you can use pip to install packages from https://pypi.org:
"$AMSBIN"/amspython -m pip install -U torch
Note
Packages installed through pip alone by the user will not show up as installed in the package manager, but they will be detected and used if possible.
If you install a package into your amspython environment, using amspython -m pip install
, the package manager will not display it in its overview. However, it will allow you to make use of it for running calculations with the ML Potential module. If you want to make sure that the version you installed will be detected, you can use
$ "$AMSBIN"/amspackages check --pip torch
05-11 10:47:57 torch is not installed!
05-11 10:47:57 User installed version located through pip: torch==1.8.1
Not all versions of the packages on PyPI work with our ML potential backends.
Included (pre-parameterized) models¶
A model is the combination of a functional form with a set of parameters. Four pre-parameterized models can be selected: M3GNet-UP-2022 (Universal Potential), ANI-2x, ANI-1ccx, and ANI-1x. The predictions from the ANI-* models are calculated from ensembles, meaning that the final prediction is an average over several independently trained neural networks.
M3GNet-UP-2022 |
ANI-2x |
ANI-1ccx |
ANI-1x |
|
---|---|---|---|---|
Functional form |
NNP |
HDNNP |
HDNNP |
HDNNP |
Ensemble size |
1 |
8 |
8 |
8 |
Atomic environment descriptor |
m3gnet |
ACSF |
ACSF |
ACSF |
Supported elements |
H, He, Li, .., Am |
H, C, N, O, F, S, Cl |
H, C, N, O |
H, C, N, O |
Training set structures |
materials project |
organic molecules |
organic molecules |
organic molecules |
Reference method |
PBE |
ωB97-x/6-31G(d) |
DLPNO-CCSD(T)/CBS |
ωB97-x/6-31G(d) |
Backend |
m3gnet |
TorchANI |
TorchANI |
TorchANI |
Reference |
For the ANI-*x models, the standard deviation for the energy predictions are
calculated for the “main” output molecule (e.g., the final point of a geometry
optimization). The summary statistics can be found in the mlpotential.txt
file in the worker.0
subdirectory of the results directory.
Model
- Type
Multiple Choice
- Default value
ANI-2x
- Options
[Custom, ANI-1ccx, ANI-1x, ANI-2x, M3GNet-UP-2022]
- Description
Select a particular parameterization. ANI-1x and ANI-2x: based on DFT (wB97X) ANI-1cxx: based on DLPNO-CCSD(T)/CBS M3GNet-UP-2022: based on DFT (PBE) data. ANI-1x and ANI-1ccx have been parameterized to give good geometries, vibrational frequencies, and reaction energies for gasphase organic molecules containing H, C, O, and N. ANI-2x can also handle the atoms F, S, and Cl. M3GNet-UP-2022 is a universal potential (UP) for the entire periodic table and has been primarily trained to crystal data (energies, forces, stresses) from the Materials Project. Set to Custom to specify the backend and parameter files yourself.
Custom models (custom parameters)¶
Set Model
to Custom and specify which backend to use with the Backend
option.
In a typical case, you would have used that backend to train your own machine
learning potential.
The backend reads the parameters, and any other necessary information (for
example neural network architecture), from either a file or a directory.
Specify the ParameterFile
or ParameterDir
option accordingly, with a
path to the file or directory. Read the backend’s documentation to find out
which option is appropriate.
Some backends may require that an energy unit (MLEnergyUnit
) and/or distance
unit (MLDistanceUnit
) be specified. These units correspond to the units
used during the training of the machine learning potential.
Example:
Engine MLPotential
Backend SchNetPack
Model Custom
ParameterFile ethanol.schnet-model
MLEnergyUnit kcal/mol
MLDistanceUnit angstrom
EndEngine
Backend
- Type
Multiple Choice
- Options
[M3GNet, NequIP, SchNetPack, sGDML, TorchANI]
- Description
The machine learning potential backend.
MLDistanceUnit
- Type
Multiple Choice
- Default value
Auto
- Options
[Auto, angstrom, bohr]
- GUI name
Internal distance unit
- Description
Unit of distances expected by the ML backend (not the ASE calculator). The ASE calculator may require this information.
MLEnergyUnit
- Type
Multiple Choice
- Default value
Auto
- Options
[Auto, Hartree, eV, kcal/mol, kJ/mol]
- GUI name
Internal energy unit
- Description
Unit of energy output by the ML backend (not the unit output by the ASE calculator). The ASE calculator may require this information.
ParameterDir
- Type
String
- Default value
- GUI name
Parameter directory
- Description
Path to a set of parameters for the backend, if it expects to read from a directory.
ParameterFile
- Type
String
- Default value
- Description
Path to a set of parameters for the backend, if it expects to read from a file.
Backends¶
M3GNet |
SchNetPack |
sGDML |
TorchANI |
|
---|---|---|---|---|
Reference |
||||
Methods |
m3gnet |
HDNNPs, GCNNPs, … |
GDML, sGDML |
[ensembles of] HDNNPs |
Pre-built models |
M3GNet-UP-2022 |
none |
none |
ANI-1x, ANI-2x, ANI-1ccx |
Parameters from |
ParameterDir |
ParameterFile |
ParameterFile |
ParameterFile |
Kernel-based |
No |
No |
Yes |
No |
ML framework |
TensorFlow 2.9.1 |
PyTorch |
none, PyTorch |
PyTorch |
Note
Starting with AMS2023, PiNN 9 is only supported as a custom Calculator through Engine ASE 8.
Note
For sGDML, the order of the atoms in the input file must match the order of atoms which was used during the fitting of the model.
Note
If you use a custom parameter file with TorchANI, the model specified
via ParameterFile filename.pt
is loaded with
torch.load('filename.pt')['model']
, such that a forward call should be
accessible via torch.load('filename.pt')['model']((species,
coordinates))
. The energy shifter is not read from custom parameter files,
so the absolute predicted energies will be shifted with respect to the
reference data, but this does not affect relative energies (e.g., reaction energies).
CPU and GPU (CUDA), parallelization¶
By default a calculation will run on the CPU, and use all available CPU power.
To limit the number of threads, the NumThreads
keyword can be used if the
backend uses PyTorch as its machine learning framework. Alternatively, you can
set the environment variable OMP_NUM_THREADS.
To use a CUDA-enabled GPU, ensure that a CUDA-enabled version of TensorFlow or
PyTorch has been installed (see Installation and uninstallation). Then set Device
to
the device on which you would like to run, for example, cuda:0.
Calculations are typically much faster on the GPU than on the CPU.
Device
- Type
Multiple Choice
- Default value
- Options
[, cpu, cuda:0, cuda:1]
- Description
Device on which to run the calculation (e.g. cpu, cuda:0). If empty, the device can be controlled using environment variables for TensorFlow or PyTorch.
NumThreads
- Type
String
- Default value
- GUI name
Number of threads
- Description
Number of threads. If not empty, OMP_NUM_THREADS will be set to this number; for PyTorch-engines, torch.set_num_threads() will be called.
Note
Because the calculation runs in a separate process, the number of threads is controlled by the input keyword NumThreads and not by the environment variable NSCM. We recommend setting NSCM=1 when using the MLPotential engine.
Only single-node calculations are currently supported.
Troubleshooting¶
If you run a PyTorch-based backend and receive an error message starting with:
sh: line 1: 1351 Illegal instruction: 4 sh
you may be attempting to run PyTorch on a rather old cpu. You could try to upgrade PyTorch to a newer version:
"$AMSBIN"/amspython -m pip install torch -U -f https://download.pytorch.org/whl/torch_stable.html
If this does not help, please contact SCM support.
Support¶
SCM does not provide support for parameterization using the MLPotential backends. SCM only provides technical (non-scientific) support for running simulations via the AMS driver.
Technical information¶
Each of the supported backends can be used as ASE (Atomic Simulation
Environment) calculators. The MLPotential engine
is an interface to those ASE calculators. The communication between the AMS
driver and the backends is implemented with a named pipe interface. The MLPotential engine launches a
python script, ase_calculators.py
, which initializes the ASE calculator.
The exact command that is executed is written as WorkerCommand
in the
output.
References¶
- 1
J. Behler, M. Parrinello. Phys. Rev. Lett. 98 (2007) 146401 https://doi.org/10.1103/PhysRevLett.98.146401
- 2
J. Behler. J. Chem. Phys. 145 (2016) 170901. https://doi.org/10.1063/1.4966192
- 3
T. Mueller, A. Hernandez, C. Wang. J. Chem. Phys. 152 (2020) 050902. https://doi.org/10.1063/1.4966192
- 4(1,2)
C. Chen, S. P. Ong. Nature Computational Science 2, 718–728 (2022). arXiv.2202.02450.
- 5
C. Devereux et al., J. Chem. Theory Comput. 16 (2020) 4192-4202. https://doi.org/10.1021/acs.jctc.0c00121
- 6
J. S. Smith et al., Nat. Commun. 10 (2019) 2903. https://doi.org/10.1038/s41467-019-10827-4
- 7
J. S. Smith et al., J. Chem. Phys. 148 (2018) 241733. https://doi.org/10.1063/1.5023802
- 8
- 9
Y. Shao et al., J. Chem. Inf. Model. 60 (2020) 1184-1193. https://doi.org/10.1021/acs.jcim.9b00994
- 10
K. T. Schütt et al., J. Chem. Theory Comput. 15 (2019) 448-455. https://doi.org/10.1021/acs.jctc.8b00908
- 11
S. Chmiela et al. Comp. Phys. Commun. 240 (2019) 38-45. https://doi.org/10.1016/j.cpc.2019.02.007
- 12
X. Gao et al. J. Chem. Inf. Model (2020). https://doi.org/10.1021/acs.jcim.0c00451