5.3. Parallel optimizers

This tutorial will show

  • how to run multiple optimizers in parallel,

  • how to control optimizers for better resource management, and

  • how to run different algorithms (Nelder-Mead, CMAES) in the same optimization

In a realistic, high-dimensional, reparameterization scenario (like ReaxFF) you will commonly need to run multiple optimizations to find a production quality force field. There are many reasons for this:

  1. You may want to start from several different parameter values.

  2. Even when starting from the same parameters, optimizers will rarely find the same minimum.

  3. You may want to run multiple optimizers to test the robustness of the minima you find, and compare how their losses and parameter values differ.

  4. The loss function is numerically difficult to optimize, so optimizers often get stuck at high loss values or are unable to find a good minimum.

In this example we will continue with the Getting Started: Lennard-Jones tutorial. It is simple, but we can demonstrate some optimizer misbehaviour and convergence to different minima if we setup the problem in a slightly more challenging way for the optimizers.

Before starting this tutorial, make sure you:

  • go through the Getting Started: Lennard-Jones tutorial, and

  • make a copy of the example directory $AMSHOME/scripting/scm/params/examples/LJ_Ar_multiopt.

5.3.1. Using multiple optimizers

Open the files:

Open the ParAMS GUI: SCM → ParAMS
File → Open the job_collection.yaml file in the example directory LJ_Ar_multiopt

This will automatically load all the input files.

The settings are as follows:

Click on OptimizationPanel to open the input panel
Set Starting points generator to Random.
Set Max loss function calls to 2500.
Set Max optimizers converged to 5.
Set Number of parallel optimizers to 5.

Starting points generator Random makes the optimizers start with random parameter, some very far from the minimum. This makes the optimization much more challenging.

The optimization will exit if

  • 5 optimizers have converged, or

  • the optimizers have used a total of 2500 function evaluations.

5 optimizers will be run in parallel.

Click on the MoreBtn button next to Optimizers.
Set the Type to Scipy.
Set Algorithm to Nelder-Mead.

Run the optimization.

File → Run
When prompted to save, click Yes
Save the job as multiopt1.params

This will open AMSjobs. Switch back to the ParAMS GUI and click on the Graphs tab. After a short while you should start seeing your results come in.

Note

The number of parallel optimizers is the maximum number of optimizers that can be running at any time. It should be determined by your system resources. It is not necessarily the total number of optimizers that will be run. For that, see Add a spawning limit.

Towards the end of the output/logfile, you can see why the optimization finished:

[15.07|09:40:49] Exit conditions met:
                     MaxTotalFunctionCalls(fmax=2500) = False |
                     MaxOptimizersConverged(nconv=5) = True

Below is an example run we performed.

../../_images/LJ_Ar_multiopt_1.png

Fig. 5.2 Example development of multiple optimizers with random starts.

Your optimizers will behave a little differently, but hopefully you will see similar behavior.

  • Optimizer 2 started in a bad location and was barely able to improve on its starting value.

  • Optimizer 3 improved significantly on its starting value, but actually converged to the wrong minimum.

  • Optimizers 1, 4 and 5 all eventually found the correct minimum.

  • Optimizer 6, which started after optimizer 3 converged, started in quite a good value but was hardly able to improve on it, despite being far from the minimum.

  • Optimizers 7 and 8 were started very late and were not given time to develop before the exit condition of 5 converged optimizers was met and the optimization exited.

This simple example illustrates the importance of running multiple optimizations. There is no guarantee that a single optimizer (even if it looks like it has improved on its starting value) has actually found a good minimum. When more parameters are optimized this problem becomes even more pronounced.

5.3.1.1. Loss graphs with multiple optimizers

5.3.1.1.1. Global evaluation numbers

In Fig. 5.2, the loss value evaluated by each optimizer is plotted against a unique, ordered, global evaluation ID number. In other words, it is not the number of evaluations a single optimizer has used, but the total used by all optimizers up to that point.

This approach allows us to visualize how multiple optimizers are related in time. For example, we can see that optimizer 6 was started later in time than the first five.

The evaluation numbers are unique and appear consistently throughout the logs. For example, consider the log for optimizer 2 (found at results/optimization/optimizer_002/training_set_results/running_loss.txt):

#evaluation training_set_loss log10(training_set_loss) time_seconds
000003  3200228859.254910 10.260 1.10
000009  3131651984.274525 10.258 2.33
000018  2942423641.003031 10.203 3.68
000024  2454079400.461060 10.129 4.45

This does not mean that optimizer 2 has evaluated the loss function 24 times. It has been logged 4 times which corresponds to the 3rd, 9th, 18th and 24th overall calls. The loss function is logged whenever it becomes smaller, or by default every 10 local optimizer evaluations (this can be configured on the DataSet → LoggingInterval panel).

Note

The default logging system prints global evaluation numbers. If you need local evaluation numbers, use the HDF5 logging architecture.

5.3.1.1.2. Non-finite values

The gaps seen in the optimizer 2 trajectory represent non-finite results. These are encountered by optimizers that attempt parameters which produce crashed jobs, or non-physical results. They are also returned if constraints are violated.

5.3.1.1.3. Identifying optimizers

We have added optimizer numbers to the image above for ease of reference in the tutorial. To see the optimizer number yourself, mouse over the curve you are interested in. This will give you pop-up of the form:

<(optimizer#)> loss: (eval#), (lossvalue)

If you would like to see just one optimizer you can click on the curve you would like to see, and the others will disappear. Clicking on it again will show all the optimizers again.

To show a subset of optimizers:

On the loss graph click Data From
On the dropdown click Show Only Selected Optimizers…
In the popup enter a space separated list of optimizers you would like to see, e.g.: 2 4 7
Click OK.

This will hide all the optimizer trajectories except the ones you have chosen.

To show all the trajectories again:

On the loss graph click Data From
On the dropdown click Show All

5.3.2. Using stoppers

In Fig. 5.2 we can identify two inefficiencies:

  • Optimizers 2 and 3 are clearly stuck in bad minima. They appear converged, but at loss values which are much worse than ones being simultaneously explored by optimizers 1, 4 and 5.

  • Optimizer 1, 4 and 5 end up converging to the same minimum. Multiple optimizers finding the exact same minimum is a waste of resources.

These problems can be solved with Optimizer Stoppers. Stoppers

  • are simple conditions which can be combined to form more complex stop criteria,

  • stop optimizers early if they are behaving poorly,

  • let you use computational resources more efficiently,

  • help you identify better minima within the same period of time

This managed parallel optimization approached was developed by Freitas Gustavo and Verstraelen (2021).

For now we will setup Stoppers which directly address the 2 inefficiencies we identified above.

Click OptimizationPanel to open the input panel if it is not already open
Options → Optimizer Stoppers
Click the AddButton icon which will add a new Stopper
Change Type to Current Function Value Unmoving
Set the Number of function calls to 10
Set the Tolerance to 0.05

The Current Function Value Unmoving Stopper will stop optimizers which are no longer significantly improving their function value and are exploring values which are worse than the best optimizer.

Click the AddButton icon to add a second Stopper
Change Type to Max Interoptimizer Distance
Set Max Relative Distance to 0.01

The Max Interoptimizer Distance Stopper will stop optimizers which are close together (approaching the same minimum for example).

In the Combine stoppers field enter 1 | 2

This means that a stop will be triggered if the conditions of Stopper #1 or Stopper #2 are met.

You can combine Stoppers in and combinations too using the & symbol, and they can also be nested with parentheses.

For example, (1 | 2) & 3 means: Stop if Stopper #3 is true and either Stopper #1 or Stopper #2 is true.

Note

The or combination of all Stoppers is the default combination. We only entered it explicitly here to draw your attention to Stopper combinations. You could leave this unassigned in this example and achieve the same result.

Run the optimization.

File → Save As
Save the job as multiopt2.params
File → Run

Below is an example run we performed.

../../_images/LJ_Ar_multiopt_2.png

Fig. 5.3 Example development of multiple optimizers with random starts and stoppers.

In this figure you can see that the following optimizers were stopped:

  • flat optimizers which struggled to improve on their loss value (Current Function Value Unmoving)

  • optimizers approaching the same minimum as the best optimizer (Max Interoptimizer Distance)

Compare Fig. 5.3 and Fig. 5.2. The result of using Stoppers were:

  • starting many more optimizers within approximately the same number of function calls

  • global minimum was still found

  • the search was much more exploratory and our time was used more efficiently.

On a harder optimization problem like ReaxFF, this managed approach may allow you to find more (and hopefully better) minima than a simple parallel approach.

5.3.2.1. Loss graphs with stoppers

To show why optimizers have been stopped, icons are appended to the end of their trajectories:

Table 5.1 Stop icons on loss graphs

Icon

Description

Optimizer converged naturally

Optimizer stopped by Stopper

Job exit

To get more details about the stop, you can mouse over the icon at the end of its trajectory.

5.3.3. Add a spawning limit

In Fig. 5.2 and Fig. 5.3 you can see optimizers which were started near the end of the optimization and given very little time to iterate at all.

You may want to limit the number of optimizers you start to

  • prevent optimizers from starting (spawning) near the end of the job, or

  • get a specific number of results.

This can be done with spawning controls.

Click OptimizationPanel to open the input panel if it is not already open
Options → Optimizer Spawning
Under Stop spawning new optimizers after set n loss function evaluations to 1500

This will stop new optimizers from starting after 1500 total function evaluations, but it will not affect optimizers which are already working. It is different from the Max Total Function Calls Exit Condition as follows:

Spawning control

Exit condition

GUI Panel

Options → Optimizer Spawning

Main Optimization panel

Input file block

ControlOptimizerSpawning

ExitCondition

Triggered at #evaluations (this tutorial)

1500

2500

Affects currently running optimizers

No

Yes (complete exit)

Causes the job to exit

Only if all optimizers stop

Yes, always

We previously specified to run at most 5 optimizers in parallel. By applying spawning control, fewer than 5 optimizers may run in parallel after iteration 1500.

Save the job as multiopt3.params and run.

Our example run is below where you can see that all optimizers were given enough time to develop and be stopped or converge.

../../_images/LJ_Ar_multiopt_3.png

Fig. 5.4 Example development of multiple optimizers with random starts, stoppers, and spawn control.

5.3.4. Experiment with different optimizers

So far we only used the Nelder-Mead algorithm. However, you may want to

  • compare how different optimizers perform on a problem, and

  • see how different hyperparameter settings change performance.

For this we will remove the interoptimizer distance Stopper since we would like to see which optimizer is better at finding the minimum.

Click OptimizationPanel to open the input panel if it is not already open
Options → Optimizer Stoppers
Click DeleteButton button for the Max Interoptimizer Distance Stopper
Right-click on the Combine stoppers field
Select Reset value to default

Next we will add a second optimizer to the pool of available optimizers that can be used during the optimization:

Click OptimizationPanel to open the input panel if it is not already open
Options → Optimizers
Click the AddButton button to add a new Optimizer
For Optimization algorithm #2, set Type to CMAES
Set σ₀ to 0.15
Set Pop size to 8

We now want to control when each optimizer type gets started. By default ParAMS will simply cycle through the set sequentially, which is usually a good choice.

For this tutorial, we will start several Nelder-Mead optimizers and then several CMA-ES optimizers and then compare which performs better.

Click OptimizationPanel to open the input panel if it is not already open
Options → Optimizer Spawning
Set Select optimizer by to Chain
Set Thresholds to 500

With this setup,

  • Nelder-Mead optimizers start before 500 global function evaluations

  • CMA-ES optimizers start after 500 global function evaluations

Save the job as multiopt4.params and run.

../../_images/LJ_Ar_multiopt_4.png

Fig. 5.5 Example development of multiple optimizers with random starts, stoppers, and different types of optimizers. All optimizers started before evaluation 500 are Nelder-Mead. All optimizers started thereafter are CMA-ES.

In our run we see that CMA-ES optimizers are much more exploratory than Nelder-Mead and oscillate quite wildly while Nelder-Mead attempts to go straight to a minima. Of the 7 Nelder-Mead optimizers started here, only one found the minimum. Of the 8 CMA-ES optimizers started, 3 found the minimum. However, CMA’s exploratory nature can make it quite slow.

Tip

You can see what type an optimizer is in the results file by looking at: results/optimization/optimizer_xxx/opt_type.txt