4.1. Train M3GNet with the ParAMS GUI¶
Important
This tutorial is only compatible with ParAMS 2024.1 or later.
This example shows how to train your own M3GNet machine learning potential, either
from scratch (
Model Custom
), orby fine-tuning/retraining the M3GNet-UP-2022 universal potential (
Model UniversalPotential
)
See also
Prerequisite: Follow the Getting Started: Lennard-Jones and Import training data (GUI) tutorials to get familiar with the ParAMS GUI.
In this tutorial, the training data has already been prepared.
In general, when training ML potentials like M3GNet you can only train to single-point energy and forces. For details, see Requirements for job collection and data sets
4.1.1. Open the example input file¶
You should now see input options for Machine Learning in the bottom left panel.
params.in
You can also find this file in $AMSHOME/scripting/scm/params/examples/M3GNet/params.in
.
This loads a few liquid argon structures and sets up some ML training settings.
4.1.2. Job collection, training and validation sets¶
For training machine learning potentials, you can only train to single-point energy and forces.
Here, you see that all jobs are of type “Single Point + gradients”. This is the only type of job that can be used during the training. The job collection can also contain other types of jobs, but they will then not be used during the training but will simply be run after the training has finished.
Tip
When importing training data into ParAMS:
Use the “add trajectory singlepoints” importer to import data from trajectories
Use the “add pesscan singlepoints” importer to import data from PES scans
If you use the “add single job” importer, make sure that “Task (for new job)” is set to singlepoint !
Here, you can see energies and forces for the training set.
Here, you can see energies and forces for the validation set.
For task Machine Learning, you should always have at least one entry in the validation set.
Note
The energy and forces for a given job must belong to the same data set.
Example: Both the energy and forces for uff_md_frame001
are in the
validation set. It is not allowed to split these so that for example
the energy is in the training set and the forces in the validation set.
4.1.3. Machine learning basic settings¶
The bottom left panel shows the machine learning settings.
Max epochs is the maximum number of epochs for the training.
Committee size specifies how many independent ML models to train. The average of all models is taken as the final predicted energy/forces. Setting this to a number larger than 1 will significantly increase the computational cost and memory requirements, but will let you get a measure for the uncertainty of the predicted values.
Load model lets you load a previous model from a ParAMS results directory. In this way, you can continue training from previous results, either for more epochs or with extra training data.
Backend: What type of machine learning model to use. Here we use
M3GNet
. Note that you must first separately install M3GNet from the AMS package manager (SCM → Packages).Learning rate: The learning rate.
For the M3GNet backend, you can select two different Models:
Custom: this trains a new neural network completely from scratch. You can decide the architecture for this neural network, for example the number of neurons per layer and the number of convolution blocks. Note that you will typically require quite a lot of training data to train a good model using Custom.
UniversalPotential: this starts the training from the M3GNet-UP-2022 universal potential. This typically makes training much faster. You can choose which layers to retrain, but not change any other parameters.
150
4.1.4. Other machine learning settings¶
For task Machine Learning, the weights of the different training set entries can be set both
in the “W” column in the top right table (on the training set and validation set panels), and
on the Options → Loss function settings panel.
The W column lets you set different weights for different individual entries. On the Options → Loss function panel, you can instead set the relative weights for energy and forces.
If you find that the energy is not trained accurately enough, you may consider increasing the Energy coefficient on the Options → Loss function panel.
This panel lets you set a target value for the MAE on the forces. This means that if the MAE is smaller than this threshold for both the training and validation, the training stops. This can help to
prevent overfitting, and
stop the training when the model has acceptable accuracy in order to not keep on training when the model is already good enough
4.1.5. Run the M3GNet training¶
m3gnet_tutorial.params
Wait for the job to finish. It may take a few minutes.
4.1.6. View the M3GNet training results¶
Here, you can see graphs for loss function and stats vs. epoch.
You may find that the training ended in fewer than the 150 maximum epochs. This is because both the training and validation MAE on the forces became smaller than 0.01 eV/Å (the Target value that was specified).
At the end of the training, you also get a scatter plot with predicted vs. reference forces and energies. You can switch between Best training and Best validation to see the training and validation performance.
Note
For Task MachineLearning, the scatter plot only appears when the training has finished.
4.1.7. Use the retrained model for production calculations¶
This shows the contents of part of the AMS text input file you would need to provide
to use the retrained model in a production calculation. In particular, it shows you the ParameterDir
containing the trained parameters.
Tip
Copy the ParameterDir
directory to some place that is easier to access, in case you want to reuse it many times.
There are two ways to import these engine settings into AMSinput:
4.1.7.1. Method 1: Open optimized engine in AMSinput¶
This opens a new AMSinput window with the MLpotential engine selected.
This gives you a warning to double-check the input. It should switch
automatically to the MLpotential engine with Model Custom
. If you click the
next to Custom
, you will additionally see the path to the
ParameterDir and the units used.
4.1.7.2. Method 2: Copy-paste into AMSinput¶
Engine MLPotential
… EndEngine
blockThis gives you a warning to double-check the input. It should switch
automatically to the MLpotential engine with Model Custom
. If you click the
next to Custom
, you will additionally see the path to the
ParameterDir and the units used.