4.12. Callbacks¶
Callbacks allow for further interaction with a running Optimization class. A useful callback could, for example, signal the optimization to stop after a certain time, or when overfitting.
Callback instances can be passed when a Optimization
instance is created:
callbacks = [Timeout(60*60), Logger()]
my_optimization = Optimization(*args, callbacks=callbacks)
4.12.1. Default callbacks¶
The Logger and Stopfile callbacks are always present. To have custom settings for the Logger or Stopfile, add instances of those to the callbacks.
4.12.2. Logger¶
-
class
Logger
(path=None, every_n_iter=None, group_by=None, iteration_digits=6, sort_stats_by='contribution')¶ Note
This callback is always included in an optimization
Logs and saves the following data produced during an Optimization to disk for every Data Set provided:
- best/*: Same as latest/* but for the iteration with the lowest loss function.
- history/ : Same as latest/* but stored every history iterations.
- initial/*: Same as latest/* but for the initial parameter settings.
- latest/active_parameters.txt: iteration and list of current parameter values.
- latest/data_set_predictions.yaml: file in DataSet yaml format that can be loaded with a DataSetEvaluator.
- latest/engine.txt: Engine settings block for AMS.
- latest/evaluation.txt: Current evaluation number.
- latest/loss.txt: The loss function value.
- latest/parameter_interface.yaml: Parameters in yaml format.
- latest/pes_predictions/*.txt: Predictions for all entries with the
pes
extractor. - latest/scatter_plots/*.txt: Current predictions per extractor or grouped as defined by the group_by attribute.
- latest/stats.txt: Current MAE and RMSE per extractor or grouped as defined by the group_by attribute.
- running_active_parameters.txt: The values of the (active) parameters, stored every ‘parameters’ iterations.
- running_loss.txt: iteration and loss function value. Plottable with
params plot
. - running_stats.txt: mae and rmse per extractor or grouped as defined by the group_by attirbute.
Depending on the parameter interface, also:
- latest/lj_parameters.txt: For Lennard-Jones optimizations, contains the LJ parameters
- latest/ffield.ff: For ReaxFF optimizations, contains the parameters in ffield format
- latest/xtb_files: For GFN1-xTB optimizations, contains the parameters in a format that can be read by AMS
The files are stored in a directory called dataset_name_results, e.g. training_set_results.
Note
The output to screen files may not always be strictly increasing with respect to the evaluation number. This can happen if you run the parametrization in parallel and frequently write the output.
Parameters: - path : str
- Base name of the path where the data should be written to.
Defaults to theOptimization
workdir. - every_n_iter : int OR dict of dict
If an integer, will correspond to setting the ‘general’ option (see below) to that value for all data sets
Dictionary with keys like ‘training_set’, ‘validation_set’, ‘data_set03’ etc.
Each key is another dictionary, e.g.
every_n_iter = { 'training_set':{ 'general': 10, 'parameters': 500, 'history': 500, 'flush': 10, } }
Where the numbers give how often to write the output.
- ‘general’: writing to stdout, running_loss and running_stats and all latest/ and best/ files
- ‘parameters’: how often to write running_active_parameters.txt
- ‘history’: save a copy of “latest/” every N steps
- ‘flush’: how often to flush the output streams
- group_by : tuple of strings
A tuple in the same format as accepted by DataSetEvaluator.group_by().
Examples
group_by=('Extractor', 'Expression') # default group_by=('Group', 'SubGroup') # requires that the dataset entries have Group and SubGroup metadata
- iteration_digits : int
Number of digits to print the iteration number to file names and latest/evaluation.txt, running_loss.txt, and running_stats.txt.
Example: iteration_digits == 6 will print the first iteration as 000001
Example: iteration_digits == None will print the first iteration as 1
- sort_stats_by : str
Sort the latest/stats.txt and related files.
‘contribution’: sort by contribution to loss function value
‘rmse’: sort by rmse
‘mae’: sort by mae
None: do not sort (use the same order as in the original DataSet)
4.12.3. Timeout¶
-
class
Timeout
(timeout_seconds, verbose=True)¶ Stop the optimization after timeout_seconds seconds. If verbose, prints a message when invoked.
4.12.4. Target Value¶
-
class
TargetValue
(min_fx, verbose=True)¶ Stop the optimization when the training set loss is less or equal to min_fx. If verbose, prints a message when invoked.
4.12.5. Maximum Iterations¶
-
class
MaxIter
(max_iter, verbose=True)¶ Stop the optimization after max_iter evaluations. If verbose, prints a message when invoked.
4.12.6. Early Stopping¶
-
class
EarlyStopping
(patience=0, watch='training_set', verbose=True)¶ Stop the optimization if the data set defined in watch does not improve after patience iterations. If verbose, prints a message when invoked.
4.12.7. Stopfile¶
-
class
Stopfile
(frequency=1, fname='STOP', verbose=True)¶ Every frequency evaluations, check if a file named fname exists and stop the optimization if it does. Note that paths will be relative to the optimization directory.
4.12.8. Time per Evaluation¶
-
class
TimePerEval
(printfrequency=100, watch=None, moving_average=100)¶ Print the average evaluation time of a new parameter set x every printfrequency iterations.
Parameters: - printfrequency : int > 0
- Print the average evaluation time every n iterations
- watch : List[str]
- List of data set names to watch. Defaults to all data sets
- moving_average : int >= 0
- Only consider the last n evaluations for the average time
4.12.9. Load Average¶
-
class
LoadAvg
(fname='loads.dat', frequency=20)¶ Wrapper around psutil.getloadavg(), printing the otput to fname. Requires psutil version >= 5.6.2.
Note that when using relative file paths, the location will be relative to the optimization direvtory.
4.12.10. User-Defined Callbacks¶
The abstract Callback
class allows the user to define custom optimization hooks.
We will demonstrate the implementation of EarlyStopping
as an example below.
from scm.params import Callback
class EarlyStopping(Callback):
def __init__(self, patience=0, watch='training_set'):
self.watch = watch
self.patience = patience
self.count = 0
self.fxmin = float('inf')
def __call__(self, evaluator_return):
"""
Callbacks operate on **ALL** Data Sets that are evaluated at every optimization step,
meaning there could be more than one Data Set involved: This is for example the case when splitting
into a training and a validation set.
A named tuple `evaluator_return` will always be passed to the `__call__` method.
See below or the `Callback` class API to see how it unpacks.
You can filter which Data Sets the callback operates on by checking the passed `name` argument,
which is always unique per Optimization instance.
"""
fx, x, name, ncalled, interface, data_set, residuals, contrib, time = evaluator_return
# the tuple's contents can also be accessed through the respective attribute, i.e. `evaluator_return.fx`
# Only apply to the set to watch
if name != self.watch:
return
if np.isnan(fx): # nan is a special placeholder and means no evaluation of this DataSet for this call
return
if fx < self.fxmin:
self.count = 0 # Reset the counter if we improved
self.fxmin = fx # Adjust the best fx value
else:
self.count += 1 # Patience counter
return self.count > self.patience # True if we need to stop
def reset(self):
# Implementation of this method makes the instance re-usable for multiple Optimizations
self.count = 0
self.fxmin = float('inf')
def on_end(self):
# This method will be called once the Optimization is complete. Implement as needed
pass
4.12.11. Callback API¶
-
class
Callback
¶ Abstract base class for callbacks
-
__call__
(evalret: scm.params.core.opt_components.EvaluatorReturn) → Any¶ This method will be called by the optimizer at the end of every step.
Parameters: - evalret : EvaluatorReturn (named tuple)
A named tuple returned by scm.params.core.opt_components.EvaluatorReturn. The tuple unpacks to
(fx, x, name, ncalled, interface, data_set, residuals, contrib, time)
. The names above also double as instance variables (e.g., fx can be accessed with evalret.fx).- fx : float
- Loss function value of x
- x : Sequence[float]
- The current set of parameters suggested by the optimizer. (real, not scaled)
- name : str
- Name of the Date Set as set by the
Optimization
class. Can be ‘training_set’, ‘validation_set’ and ‘data_setXX’ (wherte XX is an int) by default - ncalled : int
- The number of times this Data Set has been evaluated
- interface : BaseParameters subclass
- The interface that was used for this evaluation of the data set
- data_set : DataSet
- A tuple of
DataSet
and the last evaluation’s (non flattened) residuals vector - residuals : Lisd[1d-array]
- A list of 1d numpy arrays holding the residuals to each data set entry such that \(r=y-\hat{y}\). See Data Set for more information.
- contrib : List
- List of per entry contributions to the loss function value (see also
scm.params.core.data_set.DataSet.evaluate()
). - time : float
- Wall time (in seconds) this evaluation took
Returns: Any value other than None will be interpreted by the optimizer as a signal to stop the optimization process.
-
reset
()¶ This method should re-initialize the callback and will be called when a new
Optimization
instance is created containing this callback. It should reset the callback to it’s initial state, making the same instance available for multipleOptimization
instances (e.g.: In case ofTimeout
a reset of the same instance is necessary to reset the timer).
-
on_end
()¶ This method will be called once the optimization is complete
-