3.6.1. CMA-ES

Implementation of the Covariance Matrix Adaptation Evolution Strategy [1]. The optimizer can be used out of the box:

>>> optimizer = CMAOptimizer(sigma=0.15, popsize=15)
>>> runner = Optimization('name', JobCollection, DataSet, interface, optimizer)
>>> runner.optimize()
>>> print(runner.results.x)
array([x0, ..., xN])
>>> print(runner.results.fx)
1234.5
class CMAOptimizer(sigma=0.5, sampler='full', verbose=True, **cmasettings)

Implementation of the Covariance Matrix Adaptation Evolution Strategy

Parameters:

sigma : optional, float
Standard deviation for all parameters (all parameters must be scaled accordingly).
Defines the search space as the std dev from an initial x0. Larger values will sample a wider Gaussian.
sampler : optional, str, ‘vkd’ or ‘vd’
Allows the use of GaussVDSampler and GaussVkDSampler settings.
verbose : bool
Produce additional output on the status of the optimization.
cmasettings : optional kwargs

cma module-specific settings as k,v pairs. See cma.s.pprint(cma.CMAOptions()) (below) for a list of available options. Most useful keys are: timeout, tolstagnation, popsize. Additionally, the following options are supported through this class:

  • minsigma: Termination if sigma < minsigma
  • maxsigma: Limit sigma to min(sigma,maxsigma). Defaults to 10*sigma
__init__(sigma=0.5, sampler='full', verbose=True, **cmasettings)

Initialize with the above parameters.

3.6.1.1. List of valid cmasettings

>>> import cma
>>> cma.s.pprint(cma.CMAOptions())
AdaptSigma                True                                  or False or any CMAAdaptSigmaBase class e.g. CMAAdaptSigmaTPA, CMAAdaptSigmaCSA
CMA_active                True                                  negative update, conducted after the original update
CMA_cmean                 1                                     learning rate for the mean value
CMA_const_trace           False                                 normalize trace, 1, True, "arithm", "geom", "aeig", "geig" are valid
CMA_diagonal              0*100*N/popsize**0.5                  nb of iterations with diagonal covariance matrix, True for always
CMA_eigenmethod           np.linalg.eigh                        or cma.utils.eig or pygsl.eigen.eigenvectors
CMA_elitist               False                                 or "initial" or True, elitism likely impairs global search performance
CMA_mirrors               popsize < 6                           values <0.5 are interpreted as fraction, values >1 as numbers (rounded), otherwise about 0.16 is used
CMA_mirrormethod          2                                     0=unconditional, 1=selective, 2=selective with delay
CMA_mu                    None                                  parents selection parameter, default is popsize // 2
CMA_on                    1                                     multiplier for all covariance matrix updates
CMA_sampler               None                                  a class or instance that implements the interface of `cma.interfaces.StatisticalModelSamplerWithZeroMeanBaseClass`
CMA_sampler_options       {}                                    options passed to `CMA_sampler` class init as keyword arguments
CMA_rankmu                1.0                                   multiplier for rank-mu update learning rate of covariance matrix
CMA_rankone               1.0                                   multiplier for rank-one update learning rate of covariance matrix
CMA_recombination_weights None                                  a list, see class RecombinationWeights, overwrites CMA_mu and popsize options
CMA_dampsvec_fac          np.Inf                                tentative and subject to changes, 0.5 would be a "default" damping for sigma vector update
CMA_dampsvec_fade         0.1                                   tentative fading out parameter for sigma vector update
CMA_teststds              None                                  factors for non-isotropic initial distr. of C, mainly for test purpose, see CMA_stds for production
CMA_stds                  None                                  multipliers for sigma0 in each coordinate, not represented in C, makes scaling_of_variables obsolete
CSA_dampfac               1                                     positive multiplier for step-size damping, 0.3 is close to optimal on the sphere
CSA_damp_mueff_exponent   0.5                                   zero would mean no dependency of damping on mueff, useful with CSA_disregard_length option
CSA_disregard_length      False                                 True is untested, also changes respective parameters
CSA_clip_length_value     None                                  poorly tested, [0, 0] means const length N**0.5, [-1, 1] allows a variation of +- N/(N+2), etc.
CSA_squared               False                                 use squared length for sigma-adaptation
BoundaryHandler           BoundTransform                        or BoundPenalty, unused when ``bounds in (None, [None, None])``
bounds                    [None, None]                          lower (=bounds[0]) and upper domain boundaries, each a scalar or a list/vector
conditioncov_alleviate    [1e8, 1e12]                           when to alleviate the condition in the coordinates and in main axes
fixed_variables           None                                  dictionary with index-value pairs like {0:1.1, 2:0.1} that are not optimized
ftarget                   -inf                                  target function value, minimization
integer_variables         []                                    index list, invokes basic integer handling: prevent std dev to become too small in the given variables
is_feasible               is_feasible                           a function that computes feasibility, by default lambda x, f: f not in (None, np.NaN)
maxfevals                 inf                                   maximum number of function evaluations
maxiter                   100 + 150 * (N+3)**2 // popsize**0.5  maximum number of iterations
mean_shift_line_samples   False                                 sample two new solutions colinear to previous mean shift
mindx                     0                                     minimal std in any arbitrary direction, cave interference with tol*
minstd                    0                                     minimal std (scalar or vector) in any coordinate direction, cave interference with tol*
maxstd                    inf                                   maximal std in any coordinate direction
pc_line_samples           False                                 one line sample along the evolution path pc
popsize                   4+int(3*np.log(N))                    population size, AKA lambda, number of new solution per iteration
randn                     np.random.randn                       randn(lam, N) must return an np.array of shape (lam, N), see also cma.utilities.math.randhss
scaling_of_variables      None                                  depreciated, rather use fitness_transformations.ScaleCoordinates instead (or possibly CMA_stds).
                                                                Scale for each variable in that effective_sigma0 = sigma0*scaling. Internally the variables are divided by scaling_of_variables and sigma is unchanged, default is `np.ones(N)`
seed                      time                                  random number seed for `numpy.random`; `None` and `0` equate to `time`, `np.nan` means "do nothing", see also option "randn"
signals_filename          None                                  cma_signals.in
termination_callback      None                                  a function returning True for termination, called in `stop` with `self` as argument, could be abused for side effects
timeout                   inf                                   stop if timeout seconds are exceeded, the string "2.5 * 60**2" evaluates to 2 hours and 30 minutes
tolconditioncov           1e14                                  stop if the condition of the covariance matrix is above `tolconditioncov`
tolfacupx                 1e3                                   termination when step-size increases by tolfacupx (diverges). That is, the initial step-size was chosen far too small and better solutions were found far away from the initial solution x0
tolupsigma                1e20                                  sigma/sigma0 > tolupsigma * max(eivenvals(C)**0.5) indicates "creeping behavior" with usually minor improvements
tolfun                    1e-11                                 termination criterion: tolerance in function value, quite useful
tolfunhist                1e-12                                 termination criterion: tolerance in function value history
tolstagnation             int(100 + 100 * N**1.5 / popsize)     termination if no improvement over tolstagnation iterations
tolx                      1e-11                                 termination criterion: tolerance in x-changes
transformation            None                                  depreciated, use cma.fitness_transformations.FitnessTransformation instead.
                                                                [t0, t1] are two mappings, t0 transforms solutions from CMA-representation to f-representation (tf_pheno),
                                                                t1 is the (optional) back transformation, see class GenoPheno
typical_x                 None                                  used with scaling_of_variables
updatecovwait             None                                  number of iterations without distribution update, name is subject to future changes
verbose                   3                                     verbosity e.g. of initial/final message, -1 is very quiet, -9 maximally quiet, may not be fully implemented
verb_append               0                                     initial evaluation counter, if append, do not overwrite output files
verb_disp                 100                                   verbosity: display console output every verb_disp iteration
verb_filenameprefix       outcmaes                              output filenames prefix
verb_log                  1                                     verbosity: write data to files every verb_log iteration, writing can be time critical on fast to evaluate functions
verb_plot                 0                                     in fmin(): plot() is called every verb_plot iteration
verb_time                 True                                  output timings on console
vv                        {}                                    ? versatile set or dictionary for hacking purposes, value found in self.opts["vv"]

3.6.1.2. References

[1]Hansen, N. and A. Ostermeier, Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation 2001, 9 (2), 159-195