4.2. Job and Engine Collections¶
Collections represent containers for AMS jobs that need calculation before a Data Set can be evaluated. They store all settings necessary for the execution in a human-readable YAML format. We divide these settings into two Collections: Job and Engine Collection.
The Job Collection holds information relevant to the
AMS driver,
such as the chemical system, the driver task, and the properties to calculate.
The Engine Collection stores different AMS engine blocks.
Combining an entry from the Job Collection with any entry from the Engine Collection
ensures that results are comparable:
Nothing in the job should change apart from the engine that is used to execute it.
Fundamentally, collections behave like dictionaries with key-value pairs.
We refer to the key as ID (or jobID in case of the Job Collection).
Important
Each ID within a Collection needs to be unique due to the dict-like nature of the classes.
Contents of this Page
4.2.1. Job Collection¶
The Job Collection stores input that can be read by the AMS driver, alongside with optional metadata. When stored to disk, the data looks similar to the following:
---
ID: H2O_001
ReferenceEngineID: myDFTengine_1
AMSInput: |
Task SinglePoint
system
atoms
H -0.7440600000 1.1554900000 -0.0585900000
O 0.6438200000 1.2393700000 0.0060400000
H -3.3407000000 0.2702700000 0.7409900000
end
end
Source: ThisAwesomePaper
PlotIt: True
---
ID: CH4_001
ReferenceEngineID: DFTB_1
AMSInput: |
Task GeometryOptimization
properties
Gradients True
end
system
atoms
C -0.7809900000 1.1572800000 -0.0369200000
H 0.6076000000 1.2309400000 0.0140600000
H 1.3758800000 0.0685800000 0.0285600000
H 0.7425100000 -1.1714200000 -0.0084500000
H -0.6465400000 -1.2538900000 -0.0595900000
end
end
...
The collection above contains two entries.
We can recognize the ID, by which each entry is labeled.
Everything else is a textual representation of the value stored under this ID.
At runtime, each entry is stored as a JCEntry
instance.
Basic usage is discussed below.
The Job Collection inherits basic dictionary functionality. Consequently, all commonly known methods are available – with a few additions:
>>> jc.keys() # All IDs:
dict_keys(['H2O_001'])
>>> jc() # A shortcut to list(jc.keys())
['H2O_001']
>>> jc.values() # All JCEntries:
dict_values([<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>])
>>> for k,v in jc.items(): # Or both of the above
>>> ...
4.2.1.1. Adding Jobs¶
Important
The Job Collection only stores instances of JCEntry
.
An instance of the Job Collection can be initialized without further parameters.
The instance can be populated with JCEntry
objects only.
Every JCEntry
instance needs to have at least the attributes
settings
and molecule
defined.
Adding a reference_engine
string and metadata
is optional:
>>> jc = JobCollection()
>>> jce = JCEntry()
>>> jce.settings.input.AMS.Task = 'SinglePoint' # Must be a PLAMS Settings() instance
>>> jce.molecule = h2o_mol # Must be a PLAMS Molecule() instance
>>> jce.reference_engine = 'myDFTengine' # Optional: A string that matches the ID of an EngineCollection entry
>>> jce.metadata['Source'] = 'SomePaper' # Optional: Metadata can be added to the `metadata` dictionary
>>> jc.add_entry('H2O_001',jce) # Adding `jce` with ID 'H20_001'.
>>> # jc['H2O_001'] = jce # This is the same as the line above.
>>> # Adding more entries with the same ID to `jc` is not possible anymore.
All attributes can also be assigned when instantiating the object:
>>> jce = JCEntry(settings, molecule, refengine, **metadata)
See below for a textual representation of jce
.
4.2.1.2. Lookup¶
>>> 'H2O_001' in jc
True
>>> jc['H2O_001'] # Get the respective JCEntry
>>> jc('H2O_001') # Same
<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>
4.2.1.3. Removing Entries¶
>>> del jc['H2O_001']
>>> len(jc)
0
>>> # Alternatively:
>>> jc.remove_entry('H2O_001')
4.2.1.4. Renaming Entries¶
>>> oldkey = 'H2O_001'
>>> newkey = 'H2O_002'
>>> jc.rename_entry(oldkey, newkey)
Collections can be added. Duplicate keys will use the value of the first argument (jc rather than jc2):
>>> added_jc = jc + jc2
4.2.1.5. Comparison¶
>>> jc == jc
True
Metadata can be stored per-entry, as shown above.
For the storage of global metadata or comments, the header
attribute
can be used to store a string:
>>> comments = """this is a multiline
>>> header comment"""
>>> jc.header = comments
>>> jc.store('jc_with_header.yaml') # The header string will be stored when writing to YAML
The header
is also available to the Data Set and Engine Collection classes.
4.2.1.6. Saving and loading¶
Storing and loading collections can be done with:
>>> jc.store('jobs.yml') # Store the collection in a YAML format.
>>> jc.pickle_dump('jobs.pkl') # Or in a pickled binary format.
This produces:
---
ID: H2O_001
ReferenceEngineID: myDFTengine
AMSInput: |
system
atoms
H 0.0000000000 0.0000000000 0.3753600000
H 0.0000000000 0.0000000000 -0.3753600000
end
end
task SinglePoint
Source: SomePaper
...
The textual representation of a single JCEntry
can also be invoked by calling the str()
method.
Calling print(jc['H2O_001'])
would produce the same output as above (since our Job Collection only has one entry).
The file can then be loaded:
>>> jc2 = JobCollection('jobs.yml') # From YAML
>>> jc2 = JobCollection('jobs.pkl') # From a pickled file
>>> jc == jc2
True
Note
When working with large Job Collections, storing and loading binary files is significantly faster than the YAML counterpart.
4.2.1.7. Generating AMSJobs¶
The JobCollection.to_amsjobs()
method can be used to quickly generate plams.AMSJob
instances
from all the entries in a Job Collection.
You can limit the output to a specific subset of entries by providing the jobids argument.
An additional engine_settings argument can be passed to be added to all AMSJob.settings
,
making the returned AMSJobs executable:
>>> engine_settings = Settings()
>>> engine_settings.input.ams.BAND # Assuming we would like to run all jobs in `jc` with BAND
>>> jobids = ['job1', 'job2']
>>> jobs = jc.to_amsjobs(jobids, engine_settings)
>>> all(isinstance(job, AMSJob) for job in jobs)
True
>>> [job.run() for job in jobs] # The jobs can now be executed by PLAMS
4.2.1.8. Running Collection Jobs¶
All entries in a Job Collection can be calculated at once with the JobCollection.run()
method,
returning a respective dictionary of {jobID : plams.AMSResults} pairs.
This can be useful when a manual interaction with the job results is needed, given a specific engine
(for example when calculating the reference data):
>>> len(jc)
20
>>> engine = Settings() # The JCEntries do not include engine settings
>>> engine.input.BAND # We would like to run all stored jobs with BAND
>>> results = jc.run(engine) # Will run all jobs in jc and return their results object
>>> all(r.ok() for r in results.values()) # The returned value is a dict of {jobID : AMSResults}
True
>>> energies = [r.get_energy() for r in results.values()] # We can now process the results
Alternatively, a subset of jobs can be calculated by providing the jobids argument:
>>> ids_to_run = ['myjob1', 'myotherjob']
>>> results = jc.run(engine, jobids=ids_to_run)
>>> len(results)
2
Note
This method uses the AMSWorker interface where possible. Use the use_pipe keyword to disable it.
4.2.2. Engine Collection¶
Engine Collections are very similar to the Job Collection:
The user can work with it in exactly the same manner.
The main difference between those two is that the Engine Collection is storing Engine
instances instead of JCEntry
.
A textual representation looks similar to this:
---
ID: DFTB_1
AMSInput: |
engine DFTB
Model DFTB3
ResourcesDir DFTB.org/3ob-3-1
endengine
Comment: My favourite engine.
...
Important
The Engine Collection only stores instances of Engine
.
Within each entry, only the settings
attribute must be defined. The remaining metadata is optional.
>>> ec = EngineCollection()
>>> e = Engine()
>>> e.settings.input.DFTB.model = 'DFTB3' # e.settings is a PLAMS Settings() instance.
>>> e.settings.input.DFTB.ResourceDir = 'DFTB.org/3ob-3-1'
>>> e.metadata['Comment'] = 'My favourite engine.' # This is optional.
>>> ec.add_entry('DFTB_1',e)
>>> # print(ec['DFTB_1']) reproduces the textual representation above
See also
For further examples on how to work with the collection, please refer to the Job Collection section.
4.2.3. Collections API¶
4.2.3.1. JCEntry¶
-
class
JCEntry
(settings=None, molecule=None, reference_engine: str = None, extra_engine: str = None, **metadata)¶ A class representing a single job collection entry, i.e., an AMS job with optionally an associated reference engine and metadata.
Attributes: - settings : plams.Settings
plams.Settings()
instance, holding the input for the job.
If no settings are provided, a new object representing a single point calculation will be created.Additionally, the following strings can be used as shortcuts to automatically initialize a Settings instance with an appropriate AMS Task:
- ‘sp’: SinglePoint
- ‘go’: GeometryOptimization
- ‘md’: MolecularDynamics
- ‘pes’: PESScan
- ‘ts’: TransitionStateSearch
Important
Can not be empty when adding a class instance to the
JobCollection
.- molecule : plams.Molecule
plams.Molecule()
for the system of interest.Important
Can not be empty when adding a class instance to the
JobCollection
.- reference_engine : optional, str
- ID of the reference engine, used for lookup in the
EngineCollection
. - extra_engine : optional, str
ID of the extra engine, used for look up in the
EngineCollection
. Specifying extra_engine allows you to have different per-job engine settings during the parametrization. When parametrizing DFTB or xTB, this can for example be used to have different k-space samplings for different jobs.Important
The respective settings of the
Engine
instance should include the complete Engine block, which should match the parametrization interface. For example, if parametrizingXTBParameters
, appropriate Engine settings would beSettings.input.dftb.kspace.quality = 'Basic'
.- metadata : optional
- Additional keyword arguments will be interpreted as metadata and stored in this variable.
-
__init__
(settings=None, molecule=None, reference_engine: str = None, extra_engine: str = None, **metadata)¶ Creates a new job collection entry.
-
classmethod
from_amsjob
(amsjob, reference_engine=None, extra_engine=None, task=None, molecule='final', remove_bonds=False, **metadata) → Tuple[str, scm.params.core.jobcollection.JCEntry]¶ Returns a 2-tuple (suggested_name, JCEntry)
JCEntry contains AMS settings and system from amsjob. suggested_name == amsjob.name
amsjob can either be an AMSJob, an AMSResults, or a string pointing to the job directory or ams.rkf file
The task is by default the same as in the amsjob, but can be changed with the task argument.
- molecule : str
- ‘initial’ will get the initial system for finished AMSJob ‘final’ the final system for finished AMSJob ‘jobmolecule’ the AMSJob.molecule ‘first_history_indices’ read the frame given by the first entry in PESScan%HistoryIndices in History
- This method adds the following additional metadata to the resulting JCEntry instance:
- Origin - path to the calculation from which this instance originated
- OriginalEnergyHartree - calculated energy of the system (if present in the original job)
-
__str__
()¶ Returns a string representation of a job collection entry.
-
copy
()¶ Create a copy of this entry.
-
is_pipeable
() → bool¶ Based on
settings
, return whether the job can be calculated using the AMSWorker interface.
-
__eq__
(other)¶ Check if two collections are the same.
4.2.3.2. JobCollection¶
See also
This class inherits from BaseCollection
. Most methods can be found there.
-
class
JobCollection
(*a, **kw)¶ A class representing a job collection, i.e. a collection of JCEntry instances.
Attributes: - header : dict
- A dictionary with global metadata that will be printed at the beginning of the file
when
store()
is called. Will always contain the ParAMS version number and class name. - engines : EngineCollection
- An
EngineCollection
instance attached to this collection. Used inrun()
andrun_reference()
.
-
__init__
(*a, **kw)¶ Creates a new collection, optionally populating it with entries from yamlfile.
-
load
(fpath='job_collection.yaml')¶ Collective method for the load_yaml and pickle_load methods below, called at init.
-
load_yaml
(yamlfile)¶ Loads all job collection entries from a (compressed) YAML file and adds them to the job collection.
-
duplicate_entry
(key, newkey)¶ Duplicates this colection’s entry associated with key and stores it under newkey
-
store
(yamlfile='job_collection.yaml')¶ Stores the JobCollection to a (compressed) YAML file.
The file will be automatically compressed when the file ending is .gz or .gzip. If at least one Engine is defined in theself.engines
attribute, will also store the Engine Collection as under the same name as yamlfile appended by ‘_engines’).
-
from_jobids
(jobids: Set[str]) → scm.params.core.jobcollection.JobCollection¶ Generates a subset of self, reduced to entries in jobids.
-
writexyz
(filename: str, jobids: Sequence = None)¶ Writes geometries in this instance to one xyz trajectory file.
Parameters: - filename : str
- Path to the xyz file that will be written
- jobids : optional, sequence of strings
- Write only the jobIDs present in this Sequence.
-
to_amsjobs
(jobids: Sequence = None, engine_settings: scm.plams.core.settings.Settings = None) → List[scm.plams.interfaces.adfsuite.ams.AMSJob]¶ Batch-generate a list of
plams.AMSJob
from entries in the Job Collection.
If engine_settings is provided, will__add__()
the instance to each entry’s settings when generating the jobs.This method is equivalent to:
engine_settings = Settings() engine_settings.input.BAND jobs = [AMSJob(name=ename, molecule=e.molecule, settings=e.settings+engine_settings) for ename,e in JobCollection().items() if ename in jobids]
Parameters: - jobids : optional, sequence of strings
- A sequence of keys that will be used to generate the AMSJobs. Defaults to all jobs in the collection.
- engine_settings : optional, plams.Settings
- A
plams.Settings
instance that will be added to everyAMSJob.settings
.
Returns: List[plams.AMSJob]
-
run
(engine_settings: Union[scm.plams.core.settings.Settings, Type[scm.params.parameterinterfaces.base.BaseParameters]], jobids: Sequence = None, parallel: scm.params.common.parallellevels.ParallelLevels = None, use_pipe=True, _skip_normjobs=False) → Dict[str, Union[scm.plams.interfaces.adfsuite.ams.AMSResults, scm.plams.interfaces.adfsuite.amsworker.AMSWorkerResults]]¶ Run all jobs in the job collection using engine settings from engine_settings. If a jobcollection entry has an extra_engine defined, you must also specify an engine_collection which contains the definition of the extra_engine that is used to augment the engine settings on a per-job basis.
Returns the respective AMSResults dict.
When running jobs that are incompatible with the AMSWorker interface or when use_pipe=False, this method will use the regular PLAMS backend. Note that when
plams.init()
is not called prior to this method, all executed job results will be stored in the system’s temporary directory only for as long as the return value is referenced at runtime. You can make the results storage persistent or change the PLAMS working directory by manually calling plams.init before calling this method.Parameters: - engine_settings : plams.Settings or Parameter Interface type
- A
plams.Settings
instance representing the AMS engine block, or a parameter interface.
Every entry will be executed with this engine. The engine can be augmented if there is an engine_collection and the job collection entry has an extra_engine (ExtraEngineID) defined. - jobids : optional, Sequence[str]
- A Sequence of jobids that will be calculated.
Defaults to all jobs in the collection. - parallel : optional, ParallelLevels
- Parallelization for running the jobs from the collection.
- use_pipe : bool
- Whether to use the AMSWorker interface or not.
- _skip_normjobs : bool
- When both,
plams.AMSWorker
andplams.AMSJobs
need to be computed, skip the computation of the latter if any of the previousplams.AMSWorkerResults
arenot results.ok()
. By default, this is set to True during an optimization, to save time, as one failed job equals in the cost function being inf.
Returns: - results : dict
- Dictionary mapping the jobID to a
plams.AMSResults
orplams.AMSWorkerResults
.
-
run_reference
(jobids: Sequence = None, parallel: scm.params.common.parallellevels.ParallelLevels = None, use_pipe=True, _skip_normjobs=False) → Dict[str, Union[scm.plams.interfaces.adfsuite.ams.AMSResults, scm.plams.interfaces.adfsuite.amsworker.AMSWorkerResults]]¶ Only useful if not all reference engines per entry are the same (otherwise same functionality as
run()
).
Runs multiple jobs with different engines, as defined by each entry’sreference_engine
attribute. The corresponding settings will be obtained fromself.engines
. Assumes that all entries have areference_engine
defined, and all values are also present inself.engines
. Will raise a ValueError otherwise.See
run()
for a description of the remaining parameters.
4.2.3.3. Engine¶
-
class
Engine
(settings=None, metadata=None)¶ A class representing an AMS engine, i.e. its input (the engine block) and optional metadata.
Attributes:
- settings : plams.Settings
A
plams.Settings
instance, holding the AMS input information for the Engine.Important
Can not be empty when adding the class instance to the
EngineCollection
.- metadata : dict
- Additional metadata entries can be stored in this variable.
- type : str
- String representation of the engine used. Will be generated automatically.
-
__init__
(settings=None, metadata=None)¶ Create a new Engine entry.
-
__str__
()¶ Returns a string representation of an AMS engine.
-
__eq__
(other)¶ Check if two collections are the same.
-
copy
()¶ Return a copy of this entry
4.2.3.4. EngineCollection¶
See also
This class inherits from BaseCollection
. Most methods can be found there.
-
class
EngineCollection
(yamlfile=None, _gui=False)¶ A class representing a collection of engines.
Attributes: - header : dict
- A dictionary with global metadata that will be printed at the beginning of the file
when
store()
is called. Will always contain the ParAMS version number and class name.
-
load
(yamlfile='engine_collection.yaml')¶ Loads all engines from a yaml file and adds them to the collection.
-
store
(yamlfile='engine_collection.yaml')¶ Stores the EngineCollection to a (compressed) YAML file.
The file will be automatically compressed when the file ending is .gz or .gzip.
4.2.3.5. Collection Base Class¶
All collections inherit from this base class.
-
class
BaseCollection
(yamlfile=None, _gui=False)¶ Base class for JobCollection and EngineCollection
Attributes: - header : dict
- A dictionary with global metadata that will be printed at the beginning of the file
when
store()
is called. Will always contain the ParAMS version number and class name.
-
__init__
(yamlfile=None, _gui=False)¶ Creates a new collection, optionally populating it with entries from yamlfile.
-
load
(yamlfile) → str¶ Abstract method, call in the child’s load method. This method returns the raw string from file and extracts the header. Call it (with super()) before or after the actual loading.
-
store
(yamlfile)¶ Stores the entire collection in a (compressed) yamlfile.
-
add_entry
(eid: str, entry: Any, replace=False)¶ Adds an entry to the collection.
Parameters:
- eid : str
- Unique ID for the entry. Will warn and convert to string when a non-string IDs is provided.
- entry : subject to
_check_entry()
. - This subclass is meant to store the actual contents. The structure of the subclass will be different, depending on the collection.
- replace : bool
- By default, adding entries with an ID that is already present will raise a KeyError. Set this to True if you would like to overwrite the entry stored at that ID instead.
-
add_entry_nonstrict
(eid, entry, reuse_existing=False)¶ Adds an entry to the collection. If the eid already exists, creates a new unique name by appending an integer.
- reuse_existing : bool
- If True, compare the contents of the current entry to each existing entry. If the new entry duplicates an old one, do not add anything and return the existing eid. The type added to the collection (e.g. JCEntry or Engine) must implement the __eq__ method to compare values.
-
remove_entry
(eid)¶ Removes an entry matching eid from the collection, or throws an exception if the entry is not found.
-
rename_entry
(oldkey, newkey)¶ Rename an entry in the collection to be associated with newkey
-
duplicate_entry
(oldid: str, newid: str)¶ Maps an entry with an existing oldid to a new entry with newid (without removing the old one)
-
_check_entry
(eid, entry, replace=False)¶ Abstract method. Add additional checks here, then call
super()._check_entry(eid,entry)
.
-
__str__
()¶ Return str(self).
-
items
()¶ Return all key:value pairs in collection.
-
values
()¶ Return all entries in collection.
-
keys
()¶ Return all IDs in collection.
-
__getitem__
(key)¶ Get the entry with matching key (ID).
-
__setitem__
(key, value)¶ Same as
add_entry()
with replace=True.
-
__delitem__
(key)¶ Same as
remove_entry()
.
-
__len__
()¶ Return number of entries in collection.
-
__iter__
()¶ Iterate over
key:value
pairs.
-
__add__
(other)¶ Add two classes to return a new collection. Entries from other will only be added if not already present in self.
-
update
(other)¶ Update the instance with entries from other (possibly overwriting existing entries).
-
__contains__
(key)¶ Check if ID == key is in collection.
-
__eq__
(other)¶ Check if two collections are the same.
-
__ne__
(other)¶ Return self!=value.
-
__repr__
()¶ Return repr(self).