4.15.1. Working with Parameter Interfaces¶
The following features mainly work with Parameter Interfaces.
4.15.1.1. Active Parameter Search¶
This class allows to reduce the dimensionality of the parameter search space by performing a sensitivity analysis on each active parameter individually, or in a small set.
Synopsis
>>> ff = ReaxFFParameters('path/to/ffield.ff')
>>> ds = DataSet('path/to/data_set.yml')
>>> jc = JobCollection('path/to/jobcol.yml')
>>> aps = ActiveParameterSearch(ff, ds, jc)
>>> ids, fx = aps.scan(steps=[1.1], dim=1, verbose=True)
>>> results = aps.get_results()
>>> ff.is_active = aps.get_is_active(n=20)
scan()
returns the scanned ids of the active subset, and the respective
loss function values.
>>> # Set only the first three parameters to active:
>>> ff.is_active = len(ff)*[False]
>>> for i in range(3):
>>> ff[i].is_active = True
>>> len(ff.active)
3
>>> aps = ActiveParameterSearch(ff, ds, jc)
>>> aps.scan()
(array([[0],
[1],
[2]]), array([[[-0.16769481]],
[[ 0.33069672]],
[[-0.09795433]]]))
The first return value are the scanned ids, the second one an array of loss function values.
The parameter search can also scan a subset of active parameters, rather than scanning every one individually:
>>> aps.scan(dim=2)
(array([[0, 1],
[0, 2],
[1, 2]]), array([[-0.28081611],
[ 0.02706811],
[-0.2683532 ]]))
The step size and number can be set with the steps argument. Each entry is a multiplier to the initial parameters, generating a new set from \(\boldsymbol{x}_\mathrm{scaled} = scale*\boldsymbol{x}_0\).
>>> aps.scan(steps=[0.9,1.2])
(array([[0],
[1],
[2]]), array([[-0.55754578, -0.26966971],
[-0.21234735, -0.19213127],
[-0.16746101, -0.19213127]]))
The results are also stored in the attributes fx0, ids and fx
after scan()
has been called:
>>> aps.ids
array([[0],
[1],
[2]])
>>> aps.fx
array([[[-0.55754578, -0.26966971]],
[[-0.21234735, -0.19213127]],
[[-0.16746101, -0.19213127]]])
For relative sensitivities, use the fx0
attribute:
>>> rel_fx = aps.fx[:,:,ds_id] / aps.fx0[ds_id]
get_results()
returns a dictionary with ranked (sorted) results.
>>> results = aps.get_results(data_set_id=0, mode='highest_absolute', tol=1e-8)
>>> print(results)
{
'failed_parameter_combinations_in_active': array([[0]]),
'loss_per_step': array([[9364.07969285], [9364.26884909]]),
'n_failed_parameter_combinations': 1,
'parameter_combinations_in_active': array([[1], [2]]),
'relative_loss_per_step': array([[0.65549032], [0.65550357]]),
'score': array([4921.53116177, 4921.34200554]),
'sorted_parameter_names': ['H:eta_i;;24,25;;EEM hardness', 'O:eta_i;;24,25;;EEM hardness'], 'n_parameter_combinations': 2,
'sorted_parameter_names_with_effect': ['H:eta_i;;24,25;;EEM hardness', 'O:eta_i;;24,25;;EEM hardness']
'sorted_parameters_in_active': array([1, 2]),
}
In the above examples, 3 parameters were scanned with dim=1
. One of the calculations failed, and two succeeded. The keys of this dictionary are
failed_parameter_combinations_in_active
: a 2D array with shape(nfailed, dim)
. At least one calculation failed when those parameters were scaled (perhaps to physically unreasonable values). The indices refer to the parameters that can be accessed byparameter_interface.active[index]
.loss_per_step
: a 2D array with shape(nsuccess, dim)
. The evaluated loss function for every scaled parameter combination.n_failed_parameter_combinations
: an integernfailed
parameter_combinations_in_active
: a 2D array with shape(nsuccess, dim)
. The successful parameter combinations.relative_loss_per_step
: a 2D array with shape(nsuccess, dim)
. The values ofloss_per_step
divided by the original loss function for unscaled parameters (fx0
).score
: a 1D array with shape(nsuccess,)
. The score used for ranking parameter combinations.sorted_parameter_names
: a 1D list. Ifdim==1
each parameter name corresponds to the corresponding score. Ifdim>1
, then if for example the combination[4,5]
received the highest score, then the first element ofsorted_parameter_names
will correspond to parameter 4 and the second element to parameter 5.sorted_parameter_names_with_effect
: Only applicable ifdim==1
. A sorted list of parameter names for which the score is greater thantol
. (Failed parameters are not included)sorted_parameters_in_active
: Same assorted_parameter_names
but with the parameter IDs in the active subset instead.
Once a scan is complete, get_is_active()
will return an array of bools, that can be assigned to the parameter
interface’s is_active
attribute:
>>> ff.is_active = aps.get_is_active(n=20)
Multiple Data Sets can be evaluated with one Parameter Search instance, provided
they all can be calculated with the same Job Collection.
To do so, a list of data sets can be passed when instantiating.
This results in the attribute shapes fx0.shape == (len(ds))
and
fx.shape == (len(ff.active), len(ids), len(ds))
.
>>> aps = ActiveParameterSearch(ff, [ds1, ds2], jc)
>>> ids, fx = aps.scan()
>>> fx_ds1 = fx[:,:,0] # select scanned results of the first data set
>>> fx_ds2 = fx[:,:,1] # select scanned results of the second data set
In such cases get_is_active()
method’s data_set_id argument
can be passed to specify which data set results to use for the evaluation:
>>> aps = ActiveParameterSearch(ff, [ds1, ds2], jc)
>>> ids, fx = aps.scan()
>>> active_based_on_ds1 = aps.get_is_active(10, data_set_id=0)
>>> active_based_on_ds2 = aps.get_is_active(10, data_set_id=1)
API
-
class
ActiveParameterSearch
(parameter_interface, data_sets, job_collection, file=None)¶ Allows to scan for the most sensitive parameters of a ParameterInterface instance, given a Data Set.
Note
Will only scan the active subset of parameters.
The following are available after
scan()
has been called:Attributes: - fx0 : float
- The fx value of the initial parameters
- ids : ndarray
- The last return value of
scan()[0]
- fx : ndarray
- The last return value of
scan()[1]
-
__init__
(parameter_interface, data_sets, job_collection, file=None)¶ Initialize a Parameter Search instance with the given interface, data_sets and job_collection.
Previous results can be loaded by providing the optional file argument.The data_sets argument can either be a single
DataSet
instance, or a list of them. The latter assumes that all Data Sets in the list can be calculated from the job_collection. If multiple Data Sets are provided, theget_is_active()
method’s data_set_id can be used to specify which of the sets are used for the best parameter evaluation.- file : str
- path to file saved with self.save(). Loads the saved ids, fx, and fx0 from that file.
-
scan
(steps: Sequence = None, dim=1, loss='sse', parallel=None, verbose=True)¶ Start the scan.
Note
Parameters that have a value of zero will be shifted by (step-1) instead.
After calling this method, the
get_is_active()
andsave()
methods can be called.Parameters: - steps : Sequence[float]
- Number of steps and the respective scaling for each step. Default: [1.05]
- dim : 1 <= int <= len(parameter_interface.active)
- If dim > 1, will scan dim parameters at once on a combinatorial grid of len(parameters) over dim points. Possiby costly, as \(N_\mathrm{evals} = \binom {N_\mathrm{params}}{dim}\).
- loss : str,
Loss
- The Loss function to be used for the Data Set evaluation.
- parallel : ParallelLevels
- Calculate parallel.parametervectors parameter sets at once, each set set running parallel.jobs jobs in parallel. Defaults to ParallelLevels(parametervectors=NCPU).
Returns: - self.ids : ndarray
- 2d array of indices for the parameter_interface.active subset of parameters, each element i maps to the scanned parameter(s) of parameter_interface.active[i].
- self.fx : ndarray
- Array of shape
(len(ids), len(steps), len(data_sets))
. In the same order as ids, the fitness function values for the modified parameter sets. Will contain a list of multiple fx values, if len(steps) > 1.
-
get_results
(data_set_id: int = 0, mode: str = 'highest_absolute', tol=1e-08)¶ - data_set_id : int
- data_set id
- mode : str
- ‘highest_absolute’ or ‘lowest_relative’
- tol : float
- Tolerance for deciding if a parameter has NO effect. Scores > tol count as having an effect.
Returns: a dict
-
get_is_active
(n: Union[int, slice], data_set_id: int = 0, mode: str = 'highest_absolute') → List¶ Can only be called after
scan()
.
Given the initial parameter interface, return theParameterInterface.is_active
attribute with n most sensitive parameters marked as active. The returned List can be used to set the parameter interface:>>> params.is_active = ActiveParameterSearch.get_is_active(10)
- Valid mode argument values are
'lowest_relative'
: Will determine the best parameters by selecting lowest values as determined by (fx/fx0).mean(-1)'highest_absolute'
: Will determine the best parameters by selecting highest values as determined by abs(fx-fx0).mean(-1)
If dim>1 was requested during the scan, the number of active parameters will be equal to set(dim*n).
When multiple data_sets have been provided at init, the data_set_id can be used to specify, which of the sets should be used for the best parameters evaluation.
-
save
(fname)¶ Saves
ids
,fx0
andfx
to fname
-
static
load
(fname)¶ Loads and returns a triplet of
ids
,fx
andfx0
from fname
-
__str__
()¶ Return str(self).