4.9. Optimization¶

The Optimization class is where the other components – Job Collection, Data Set, Parameter Interfaces and Optimizers – come together. It is responsible for the selection, generation, execution and evaluation of new jobs for every new parameter set.

4.9.1. Optimization Setup¶

The optimization can be further controlled by providing a number of optional keyword arguments to the Optimization instance. While the full list of arguments is documented in the API section below, the most relevant ones are presented here.

parallellevels: An instance of the ParallelLevels class describing how the optimization is to be parallelized.
constraints: Constraints additionally define the parameter search space by checking if every solution is consistent with the definition.
callbacks: A list of callback instances. Callbacks provide a versatile way to interact with the optimization process at every iteration.
validation: Percentage of the training_set entries to be used for validation. Can be used with the Early Stopping callback.
loss: The loss function to be used for this optimization instance.
batch_size: Instead of evaluating all properties in the training_set, evaluate a maximum of randomly picked batch_size entries per iteration.

4.9.2. Optimization API¶

class Optimization¶

The top level class managing an entire optimization.

__init__(job_collection: scm.params.core.jobcollection.JobCollection, data_sets: Union[scm.params.core.dataset.DataSet, Sequence[scm.params.core.dataset.DataSet]], parameter_interface: Type[scm.params.parameterinterfaces.base.BaseParameters], optimizer: Type[scm.params.optimizers.base.BaseOptimizer], workdir: str = 'opt', plams_workdir_path: str = None, validation: float = None, callbacks: Sequence[scm.params.core.callbacks.Callback] = None, constraints: Sequence[scm.params.parameterinterfaces.base.Constraint] = None, parallel: scm.params.common.parallellevels.ParallelLevels = None, verbose: bool = True, skip_x0: bool = False, logger_every: Union[dict, int] = None, loss: Union[scm.params.core.lossfunctions.Loss, Sequence[scm.params.core.lossfunctions.Loss]] = 'sse', batch_size: Union[int, Sequence[int]] = None, use_pipe: Union[bool, Sequence[bool]] = True, data_set_names: Sequence[str] = None, eval_every: Union[int, Sequence[int]] = 1, maxjobs: Union[None, Sequence[int]] = None, maxjobs_shuffle: Union[bool, Sequence[bool]] = False)¶

Parameters:

job_collection : JobCollection: Job collection holding all jobs necessary to evaluate the data_sets
data_sets : DataSet, list(DataSet): Data Set(s) to be evaluated.
In the most simple case, one data set will be evaluated as the training set. Multiple data sets can be passed to be evaluated sequentially at every optimizer step. In this case, the first data set will be interpreted as the training set, the second as a validation set.
parameter_interface : any parameter interface: The interface to the parameters that are to be optimized.
optimizer : optimizer class: An instance of an optimizer class.
workdir : optional, str: The working directory for this optimization. Once optimize() is called, will switch to it.
plams_workdir_path : optional, str: The folder in which the PLAMS working directory is created. By default the PLAMS working directory is created inside of $SCM_TMPDIR or /tmp if the former is not defined. When running on a compute cluster this variable can be set to a local directory of the machine where the jobs are running, avoiding a potentially slow PLAMS working directory that is mounted over the network.
validation : optional, float, int: If the passed value is 0<float<1, a validation set will be created from a validation percentage of the first data set in data_sets. If the passed value is 1<float<len(data_sets[0]), will create a validation set with validation entries taken from the first data set in data_sets. If you would like to pass a DataSet instance instead, you can do so in the data_sets parameter.
callbacks : optional, List of callback instances: List of callbacks interacting with the optimization instance. A Logger callback will always be added if not already present in the list. See also the logger_every argument.
constraints : optional, List of parameter constraints: Additional constraints for candidate solutions of $\boldsymbol{x}^*$. If the any of these return False, the solution will not be considered.
parallel : optional, ParallelLevels: Configuration for the parallelization at all levels of a parameter optimization.
verbose : bool: Print the current best loss function value each time we improve
skip_x0 : bool: Before an optimization process starts, a DataSet will be evaluated with the initial parameters $oldsymbol{x}_0$. If this initial evaluation returns an infinite loss function value, will raise an error by default. This behavior is expecting that the initial parameters are generally valid and the cause of the non finite loss is probably due to bad plams.Settings of an entry in the JobCollection.
However, if it is not known if the initial parameters can be trusted or raising an error is not desired for other reasons, this parameter can be set to True to skip the initial evaluation.
logger_every : dict or int: See every_n_iter in Logger. This option is ignored if a Logger is provided in the callbacks.

Per Data Set Parameters:

Note

The following parameters will be applied to all entries in data_sets, meaning each Data Set will be evaluated with the same settings. To override this, any of the parameters below can also take a list with the same number of elements as len(data_sets), mapping individual settings to every data_sets entry.

loss : optional, Loss, str

A Loss Function instance to compute the loss of every new parameter set. Residual Sum of Squares by default.

batch_size : optional, int

The number of entries to be evaluated per epoch. If None, all entries will be evaluated.

Note: One job calculation can have multiple property entries in a training set (e.g. Energy and Forces), thus, this parameter is not the same as as `maxjobs.

Note: If both, maxjobs and batch_size are set, the former will be applied first. If the resulting set is still larger than batch_size, will apply filtering by batch_size.

use_pipe : optional, bool

Whether to use the AMSWorker interface for suitable jobs.

data_set_names : optional, List[str]

When evaluating multiple data_sets, can be set to give each entry a name. Possible logger callbacks will create and write data into this subdirectory.
Defaults to ['training_set', 'validation_set', 'data_set03', ..., 'data_setXX']

eval_every : optional, int

Evaluate the Data Set at every eval_every call.

Warning

The first entry in data_sets represents the training set and must be evaluated at every call. It’s frequency will always be 1.

maxjobs : optional, int

Whether to limit each Data Set evaluation to a subset of maximum maxjobs. Igonored if None.

maxjobs_shuffle : optional, bool

If maxjobs is set, will generate a new subset of the Data Set with maxjobs at every evaluation.

optimize() → scm.params.optimizers.base.MinimizeResult¶: Start the optimization given the initial parameters

initial_eval()¶: Evaluate x0 before the optimization. Returns (fx, abort) where abort is a bool signifying whether to abort (whether a callback returned True)

summary(file=None)¶: Prints a summary of the current instance

__str__()¶: Return str(self).

delete()¶: Remove the working directory from disk

Electronic Structure

ADF

Periodic DFT

DFTB & MOPAC

Interatomic Potentials

ReaxFF

Machine Learning Potentials

Force Fields

kMC and Microkinetics

Bumblebee: OLED stacks

Fluid Thermodynamics

COSMO-RS

Workflows and Utilities

OLED workflows

ChemTraYzer2

Conformers

Reactions Discovery

AMS Driver

Properties

PES Exploration

Molecular Dynamics

Monte Carlo

Interfaces

ParAMS

PLAMS

GUI

VASP

Downloads

Windows

Mac

Linux

Documentation

Overview

Tutorials

Installation Manual

Brochures

Other Resources

Changelog

Workshops

Knowledgebank

FAQ

Pricing and licensing

4.9. Optimization¶

4.9.1. Optimization Setup¶

4.9.2. Optimization API¶