5.3. Refitting HF Charges with the ACKS2 Model

This example demonstrates the use of the ParAMS Main Script and will assume that the reader is already familiar with this section. Following the paper Atom-condensed Kohn-Sham DFT approximated to second order (ACKS2) by Verstraelen et al., we will show how to refit the ReaxFF charges model, specifically for the atoms \(\mathrm{H}\) and \(\mathrm{F}\).

Looking inside the $AMSHOME/scripting/scm/params/examples/HF_ACKS2 directory, you should find the following structure:

HF_ACKS2
├── inputs
│   ├── CHONSMgPNaFBLi-e.ff
│   └── reference.dat
├── params.conf.py
└── run.py

Inside the inputs folder is the force field that will be parameterized. The file reference.dat is holding the reference data that we will be fitting to: Partial charge at the Hydrogen as a function of the H-F distance. The reference data has been extracted from the publication’s Fig. 3a.

5.3.1. The Config File

As explained in the main script section, all further settings regarding a parameterization are saved in the params.conf.py file. Taking a look at the first lines, we will see the following:

### REQUIRED: Construct an instance of an interface to one of the different
### parameterized methods.
interface = ReaxParams('inputs/CHONSMgPNaFBLi-e.ff')

# We want to parameterize only the EEM model for H and F
active_params = [
    'F:gamma_i;;24;;gammaEEM, EEM shielding',
    'F:chi_i;;24,25;;EEM electronegativity',
    'F:eta_i;;24,25;;EEM hardness',
    'F:C_i;;25;;Atomic softness cutoff parameter',
    'H:gamma_i;;24;;gammaEEM, EEM shielding',
    'H:chi_i;;24,25;;EEM electronegativity',
    'H:eta_i;;24,25;;EEM hardness',
    'H:C_i;;25;;Atomic softness cutoff parameter',
    'X_soft;;25;;ACKS2 softness parameter',
]
interface.is_active = len(interface)*[False] # Switch all parameters off
for name in active_params:                   # Switch the names above back on
  interface[name].is_active = True

While interface variable is simply pointing to the force field file, note the following lines. From the name of the force field it is clear that there are many more atomic parameters present than just the ones for Hydrogen and Fluorine. In this case, however, we explicitly want to fit only the ones that are related to the two atoms, and more specifically to the ACKS2 model. We therefore only select the parameters that are included in the active_params variable (for naming conventions, see the ReaxFF section). The last three lines are making use of the parameter interfaces API to conveniently switch relevant parameters on, and the remaining ones off (for more information about the functionality of the parameter interfaces, see the respective Parameter Interfaces section).

5.3.2. Preparing the YAML Files from Input

As noted in the main script section, the only additional files needed for the optimization are the two YAML files jobcollection.yaml and trainingset.yaml.

Execute the following in a terminal:

>>> ./run.py prep
Preparing files from inputs/*
Storing 'outputs/jobcollection.yaml'
Storing 'outputs/trainingset.yaml'
Preparations done!

You can now run 'params optimize'.

We encourage you to open and inspect the created YAML files in a text editor.

5.3.3. Running the Optimization

We have now generated a Job Collection and Data Set from the reference.dat in our inputs. Typing params optimize will start the optimization:

>>> params optimize
Reading ParAMS config ...
eRaxFF, will enable related parameters.
Number of active parameters to optimize: 9
Loading job collection from outputs/jobcollection.yaml ... done.
Loading data set from outputs/trainingset.yaml ... done.
No reference engines used.
Checking project for consistency ...
No issues found.
Starting parameter optimization.
Initial f(x)=2.952e+00
Plotting functionality not guaranteed: Matplotlib appears to be out of date. Consider upgrading.
Initial f(x) = 1.754e-01
At CMA Iteration: 0001. Best f(x)=1.507e-01.
At CMA Iteration: 0002. Best f(x)=1.056e-01.
At CMA Iteration: 0003. Best f(x)=1.056e-01.
...
Callback: MaxIter
Fitness function value is f(x) = 8.177e-02 after 0:02:01
Parameter optimization successful!

The optimization should finish after about two minutes with the above messages, the folder structure should now be as follows:

HF_ACKS2
├── inputs
│   ├── CHONSMgPNaFBLi-e.ff
│   └── reference.dat
├── optimization  << Created by `params optimize`
│   ├── trainingset_best_params
│   ├── trainingset_history.dat
│   ├── cmadata
│   ├── plams_workdir
│   └── data
├── outputs  << Created by `./run.py prep`
│   ├── trainingset.yaml
│   └── jobcollection.yaml
├── params.conf.py
└── run.py

An optimized force field has been saved in optimization/trainingset_best_params. Before comparing the old and new force fields, we will take a closer look into the optimization directory.

We can check the course of the parameterization by plotting the trainingset_history.dat Inside the optimization/ directory call params plot . --yscale log to plot the loss function value as a function of the evaluation number:

/scm-uploads/doc.2020/params/_images/example_HF_history.png

The plot shows the fitness function value for every parameter set that has been considered by the optimizer as a function of the evaluation number.


*.dat files in the optimization/data/trainingset_residuals directory contain the residuals vector \((w/\sigma)(y-\hat{y})\) for every improved parameter set. In this example, \(y\) and \(\hat{y}\) are our reference and predicted charges at the Hydrogen atom. A histogram of every residuals file can also be created with params plot. Generally, the closer a distribution of residuals is to zero, the better the parameters. Residuals for the initial force field (before the start of the optimization) are saved under data/initial_trainingset_residuals.


Finally, the optimization/data/trainingset_contributions directory contains the contributions of every data set entry to the the overall loss function value. This makes it easy to detect possible outliers. A plotted version of any contributions file looks as follows:

/scm-uploads/doc.2020/params/_images/example_HF_contr.png

The per entry contributions of the initial force field parameters (before the start of the optimization) are saved in initial_trainingset_contributions.

5.3.4. Comparing the new parameters

A final comparison between old and new parameters will likely be different for every application. In this example, we provide a simple evaluation with (make sure the matplotlib module up to date):

>>> ./run.py plot
Running jobs ...
Plotting ...
Done.

This part of the script will run all structures contained in inputs/reference.dat with both, the original and optimized force fields and plot the Hydrogen partial charges as a function of the interatomic distance. The plot can be found in outputs/results.png and should look similar to the following figure.

/scm-uploads/doc.2020/params/_images/example_HF_results.png

Running ./run.py clean will revert the directory to it’s original state, deleting all created files.

5.3.5. Playing with the example

Below are suggestions on how the existing example can be easily modified to produce different results:

Take a look at the callbacks section in the params.conf.py:

### OPTIONAL: List of callbacks to be called during the optimization.
### See the documentation for more details about callbacks.
callbacks = [
    MaxIter(300),                  # Stop the optimization after N evaluations
    # Timeout(1*60*60),            # Stop the optimization after N seconds, returning the best parameters so far
    # EarlyStopping(patience=150), # Stop the optimization if no improvement in the validation set aftert N evaluations
]

The example is limited to 300 evaluations. Set it to a higher number (or set a Timeout callback instead) to see if the resulting parameters can be improved further.


Take a look at the optimization_kwargs in the params.conf.py:

### See the documentation for more details about Optimization() keyword arguments.
optimization_kwargs = {
    # 'loss'                : 'sse',   # The loss function to be evaluated.
    # 'validation'          : None,     # Percentage of the training set to be used as validation, or another DataSet() instance
    # 'batch_size'          : None,     # At every iteration, only compute a maximum of `batch_size` properties
    # 'maxjobs'             : None,     # At every iteration, only run `maxjobs` random jobs and compute their properties
    # 'maxjobs_shuffle'     : False,    # If `maxjobs`, randomly pick a new subset at every iteration if `True`.
    # 'use_pipe'            : True,     # Use the AMSPipe interface where possible
    # 'n_cores'             : None,     # Use N CPU cores for the execution of jobs during an optimization. Defaults to the number of physical cores
}

To allow a validation set, uncomment the respective line (also see the Optimization sections for more information about the keyword arguments).