7.10.2. Optimization¶
The Optimization
class is where the other components
– Job Collection, Data Set, Parameter Interfaces and optimizer – come together.
It is responsible for the selection, generation, execution and evaluation of new jobs for every new parameter set.
See also
Architecture Quick Reference for an overview
A Optimization
instance will usually be initialized once every other component is defined:
>>> interface = ReaxFFParameters('path/to/ffield.ff')
>>> jc = JobCollection('path/to/jobcol.yml')
>>> training_set = DataSet('path/to/data_set.yml')
>>> optimizer = CMAOptimizer(popsize=15)
>>> optimization = Optimization(jc, training_set, interface, optimizer)
Once initialized, the following will run a complete optimization:
>>> optimization.optimize()
After instantiation, a summary of all relevant settings can be printed with summary()
:
>>> optimization.summary()
Optimization() Instance Settings:
=================================
Workdir: opt
JobCollection size: 20
Interface: ReaxFFParameters
Active parameters: 207
Optimizer: CMAOptimizer
Evaluators:
-----------
Name: training_set (_LossEvaluator)
Loss: SSE
Evaluation interval: 1
Data Set entries: 20
Data Set jobs: 20
Batch size: None
CPU cores: 6
Use PIPE: True
---
===
7.10.2.1. Optimization Setup¶
The optimization can be further controlled by providing a number of optional keyword arguments to the Optimization
instance.
While the full list of arguments is documented in the API section below,
the most relevant ones are presented here.
- parallellevels
An instance of the ParallelLevels class describing how the optimization is to be parallelized.
- constraints
Constraints additionally define the parameter search space by checking if every solution is consistent with the definition.
- validation
Percentage of the training_set entries to be used for validation.
- loss
The loss function to be used for this optimization instance.
- batch_size
Instead of evaluating all properties in the training_set, evaluate a maximum of randomly picked batch_size entries per iteration.
7.10.2.2. Optimization API¶
-
class
Optimization
(job_collection: scm.params.core.jobcollection.JobCollection, data_sets: Union[scm.params.core.dataset.DataSet, Sequence[scm.params.core.dataset.DataSet]], parameter_interface: scm.params.parameterinterfaces.base.BaseParameters, optimizer: Optional[Union[scm.glompo.optimizers.baseoptimizer.BaseOptimizer, scm.glompo.opt_selectors.baseselector.BaseSelector]] = None, workdir: str = 'optimization', plams_workdir_path: Optional[str] = None, validation: Optional[float] = None, constraints: Optional[Sequence[scm.params.parameterinterfaces.base.Constraint]] = None, parallel: Optional[scm.params.common.parallellevels.ParallelLevels] = None, verbose: bool = True, skip_x0: bool = False, logger_every: Optional[Union[dict, int]] = None, loss: Union[scm.params.core.lossfunctions.Loss, Sequence[scm.params.core.lossfunctions.Loss]] = 'sse', batch_size: Optional[Union[int, Sequence[int]]] = None, use_pipe: Union[bool, Sequence[bool]] = True, data_set_names: Optional[Sequence[str]] = None, eval_every: Union[int, Sequence[int]] = 1, maxjobs: Union[None, Sequence[int]] = None, maxjobs_shuffle: Union[bool, Sequence[bool]] = False, resume_checkpoint: Optional[Union[str, pathlib.Path]] = None, **glompo_kwargs)¶ Brings ParAMS components together and allows for configuration of optimization manager.
For compatibility the signature remains the same.
For most parameters the meaning of the parameters remains the same, only those that have changed are documented below.
- Parameters
- optimizer
Accepts a single optimizer as normal, but now also accepts a GloMPO BaseSelector collection of optimizers if you would like to use more than one.
Note
The worker argument of the optimizers will be overwritten by the product of all values in parallel except for optimizations.
Important
Since multiple optimizers can be started in parallel the instance given to this argument will only be used as a template from which other instances will be made. This means the instance given here will not be used for optimization. Keep this in mind if you intend to retain a reference to the optimizer instance for later post-processing.
- title
The working directory for this optimization. Once
optimize()
is called, will NOT switch to it. (see glompo_kwargs)- verbose
Active GloMPO’s logging progress feedback.
- glompo_kwargs
GloMPO related arguments sent to
GloMPOManager.setup()
.The following extra keywords are allowed:
'scaler'
Extra keyword which specifies the type of scaling used by function. Defaults to a linear scaling of all parameters between 0 and 1 if none of the used optimizers requests a particular scaling. An error will be raised if there is a conflict between any combination of this keyword and those mandated by the optimizers.
The following keywords will be ignored if provided:
'opt_selector'
Constructed from optimizer.
'bounds'
Automatically extracted from parameter_interface.
'task'
It is constructed within this class from job_collection, data_set, parameter_interface.
'working_dir'
title will be used as this parameter.
'overwrite_existing'
No overwriting allowed according to ParAMS behavior. title will be incremented until a non-existent directory is found.
'max_jobs'
Will be calculated from parallel.
'backend'
Only
'threads'
are allowed within ParAMS.'is_log_detailed'
This must be
True
for the sake of ParAMS internals.
-
__init__
(job_collection: scm.params.core.jobcollection.JobCollection, data_sets: Union[scm.params.core.dataset.DataSet, Sequence[scm.params.core.dataset.DataSet]], parameter_interface: scm.params.parameterinterfaces.base.BaseParameters, optimizer: Optional[Union[scm.glompo.optimizers.baseoptimizer.BaseOptimizer, scm.glompo.opt_selectors.baseselector.BaseSelector]] = None, workdir: str = 'optimization', plams_workdir_path: Optional[str] = None, validation: Optional[float] = None, constraints: Optional[Sequence[scm.params.parameterinterfaces.base.Constraint]] = None, parallel: Optional[scm.params.common.parallellevels.ParallelLevels] = None, verbose: bool = True, skip_x0: bool = False, logger_every: Optional[Union[dict, int]] = None, loss: Union[scm.params.core.lossfunctions.Loss, Sequence[scm.params.core.lossfunctions.Loss]] = 'sse', batch_size: Optional[Union[int, Sequence[int]]] = None, use_pipe: Union[bool, Sequence[bool]] = True, data_set_names: Optional[Sequence[str]] = None, eval_every: Union[int, Sequence[int]] = 1, maxjobs: Union[None, Sequence[int]] = None, maxjobs_shuffle: Union[bool, Sequence[bool]] = False, resume_checkpoint: Optional[Union[str, pathlib.Path]] = None, **glompo_kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
classmethod
read
(input_text: Union[str, scm_libbase_internal.InputFile, pathlib.Path], **kwargs) → scm.params.core.parameteroptimization.Optimization¶ Create and optimization instance by reading an AMS style input file.
-
optimize
() → scm.glompo.optimizers.baseoptimizer.MinimizeResult¶ Start the optimization given the initial parameters.
-
initial_eval
() → float¶ Evaluate x0 before the optimization.
- Returns
- float
Error value using parameters as loaded from the parameter interface.
- Raises
- ValueError
If fx is a non-finite value.
-
summary
(file=None)¶ Prints a summary of the current instance
-
__str__
()¶ Return str(self).
-
delete
()¶ Remove the working directory from disk.
-
_relog_bests
(task: scm.params.core.opt_components._Step)¶ Evaluate the saved best points for the points being restarted and the overall best and use them to prime new Loggers. This ensures the correct ‘best’ value is returned even if that evaluation does not appear in the ‘running_*.txt’ files.