3.2. Collections¶
The Collections represent all AMS jobs that need to be run before a Data Set can be evaluated. They store all settings necessary for the execution in a human-readable YAML format. We divide these settings into two Collections: Job and Engine Collection.
The Job Collection holds information relevant to the AMS driver, such as the chemical system, the driver task, and the properties to calculate. The Engine Collection is the part where different AMS engine blocks are stored.
Combining an entry from the Job Collection with any entry from the Engine Collection ensures that results are comparable: Nothing in the job should change apart from the engine that is used to execute it.
A Collection always behaves like a python dictionary through a key-value pair per entry. Within a collection, we call the key jobID (or just ID).
Important
Each ID within a Collection needs to be unique due to the dict-like nature of the classes.
3.2.1. Job Collection¶
The Job Collection is mainly meant to store input that can be read by the AMS driver, alongside with optional metadata. When stored to disk, the data looks similar to the following:
---
ID: H2O_001
ReferenceEngineID: myDFTengine_1
AMSInput: |
Task SinglePoint
system
atoms
H -0.7440600000 1.1554900000 -0.0585900000
O 0.6438200000 1.2393700000 0.0060400000
H -3.3407000000 0.2702700000 0.7409900000
end
end
Source: ThisAwesomePaper
PlotIt: True
---
ID: CH4_001
ReferenceEngineID: DFTB_1
AMSInput: |
Task GeometryOptimization
properties
Gradients True
end
system
atoms
C -0.7809900000 1.1572800000 -0.0369200000
H 0.6076000000 1.2309400000 0.0140600000
H 1.3758800000 0.0685800000 0.0285600000
H 0.7425100000 -1.1714200000 -0.0084500000
H -0.6465400000 -1.2538900000 -0.0595900000
end
end
...
The collection above contains two entries.
We can recognize the ID, by which each entry is labeled.
Everything else is a textual representation of the value stored under this ID.
At runtime, each entry is stored as a JCEntry
instance.
Basic usage is discussed below.
3.2.1.1. Adding Jobs¶
Important
The Job Collection only stores instances of JCEntry
.
An instance of the Job Collection can be initialized without further parameters.
The instance can be populated with JCEntry
objects only.
Every JCEntry
instance needs to have at least the attributes
settings
and molecule
defined.
Adding a reference_engine
string and metadata
is optional:
>>> jc = JobCollection()
>>> jce = JCEntry()
>>> jce.settings.input.AMS.Task = 'SinglePoint' # Must be a PLAMS Settings() instance
>>> jce.molecule = h2o_mol # Must be a PLAMS Molecule() instance
>>> jce.reference_engine = 'myDFTengine' # Optional: A string that matches the ID of an EngineCollection entry
>>> jce.metadata['Source'] = 'SomePaper' # Optional: Metadata can be added to the `metadata` dictionary
>>> jc.add_entry('H2O_001',jce) # Adding `jce` with ID 'H20_001'.
>>> # jc['H2O_001'] = jce # This is the same as the line above.
>>> # Adding more entries with the same ID to `jc` is not possible anymore.
All attributes can also be assigned when instantiating the object:
>>> jce = JCEntry(settings, molecule, refengine, **metadata)
See below for a textual representation of jce
.
3.2.1.2. Working with the Collection¶
The Job Collection is set up to behave like a dictionary. Consequently, all commonly known methods are available – with a couple of additions:
>>> jc.keys() # All IDs:
dict_keys(['H2O_001'])
>>> jc() # A shortcut to list(jc.keys())
['H2O_001']
>>> jc.values() # All JCEntries:
dict_values([<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>])
>>> for k,v in jc.items(): # Or both of the above
>>> ...
Lookup:
>>> 'H2O_001' in jc
True
>>> jc['H2O_001'] # Get the respective JCEntry
>>> jc('H2O_001') # Same as call on the ID
<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>
Removing Entries:
>>> del jc['H2O_001']
>>> len(jc)
0
>>> # Alternatively:
>>> jc.remove_entry('H2O_001')
Renaming Entries:
>>> oldkey = 'H2O_001'
>>> newkey = 'H2O_002'
>>> jc.rename_entry(oldkey, newkey)
Collections can be added. Duplicate keys will use the value of the first argument (jc rather than jc2):
>>> added_jc = jc + jc2
Comparison:
>>> jc == jc
True
Metadata can be stored per-entry, as shown above.
For the storage of global metadata or comments, the header
attribute
can be used to store a string:
>>> comments = """this is a multiline
>>> header comment"""
>>> jc.header = comments
>>> jc.store('jc_with_header.yaml') # The header string will be stored when writing to YAML
The header
is also available to the Data Set and Engine Collection classes.
3.2.1.3. I/O¶
Storing and loading collections can be done with:
>>> jc.store('jobs.yml') # Store the collection in a YAML format.
>>> jc.pickle_dump('jobs.pkl') # Or in a pickled binary format.
This produces:
---
ID: H2O_001
ReferenceEngineID: myDFTengine
AMSInput: |
system
atoms
H 0.0000000000 0.0000000000 0.3753600000
H 0.0000000000 0.0000000000 -0.3753600000
end
end
task SinglePoint
Source: SomePaper
...
The textual representation of a single JCEntry
can also be invoked by calling the str()
method.
Calling print(jc['H2O_001'])
would produce the same output as above (since our Job Collection only has one entry).
The file can then be loaded:
>>> jc2 = JobCollection('jobs.yml') # From YAML
>>> jc2 = JobCollection('jobs.pkl') # From a pickled file
>>> jc == jc2
True
Note
When working with large Job Collections, storing and loading binary files is significantly faster than the YAML counterpart.
3.2.1.4. Generating AMSJobs¶
The JobCollection.to_amsjobs()
method can be used to quickly generate plams.AMSJob
instances
from all the entries in a Job Collection.
You can limit the output to a specific subset of entries by providing the jobids argument.
An additional engine_settings argument can be passed to be added to all AMSJob.settings
,
making the returned AMSJobs executable:
>>> engine_settings = Settings()
>>> engine_settings.input.ams.BAND # Assuming we would like to run all jobs in `jc` with BAND
>>> jobids = ['job1', 'job2']
>>> jobs = jc.to_amsjobs(jobids, engine_settings)
>>> all(isinstance(job, AMSJob) for job in jobs)
True
>>> [job.run() for job in jobs] # The jobs can now be executed by PLAMS
3.2.1.5. Running Collection Jobs¶
All entries in a Job Collection can be calculated at once with the JobCollection.run()
method,
returning a respective dictionary of {jobID : plams.AMSResults} pairs.
This can be useful when a manual interaction with the job results is needed, given a specific engine
(for example when calculating the reference data):
>>> len(jc)
20
>>> engine = Settings() # The JCEntries do not include engine settings
>>> engine.input.BAND # We would like to run all stored jobs with BAND
>>> results = jc.run(engine) # Will run all jobs in jc and return their results object
>>> all(r.ok() for r in results.values()) # The returned value is a dict of {jobID : AMSResults}
True
>>> energies = [r.get_energy() for r in results.values()] # We can now process the results
Alternatively, a subset of jobs can be calculated by providing the jobids argument:
>>> ids_to_run = ['myjob1', 'myotherjob']
>>> results = jc.run(engine, jobids=ids_to_run)
>>> len(results)
2
Note
This method uses the AMSWorker interface where possible. Use the use_pipe keyword to disable it.
3.2.2. Engine Collection¶
Engine Collections are very similar to the Job Collection:
The user can work with it in exactly the same manner.
The main difference between those two is that the Engine Collection is storing Engine
instances instead of JCEntry
.
A textual representation looks similar to this:
---
ID: DFTB_1
AMSInput: |
engine DFTB
Model DFTB3
ResourcesDir DFTB.org/3ob-3-1
endengine
Comment: My favourite engine.
...
Important
The Engine Collection only stores instances of Engine
.
Within each entry, only the settings
attribute must be defined. The remaining metadata is optional.
>>> ec = EngineCollection()
>>> e = Engine()
>>> e.settings.input.DFTB.model = 'DFTB3' # e.settings is a PLAMS Settings() instance.
>>> e.settings.input.DFTB.ResourceDir = 'DFTB.org/3ob-3-1'
>>> e.metadata['Comment'] = 'My favourite engine.' # This is optional.
>>> ec.add_entry('DFTB_1',e)
>>> # print(ec['DFTB_1']) reproduces the textual representation above
See also
For further examples on how to work with the collection, please refer to the Job Collection section.
3.2.3. Collections API¶
3.2.3.1. JCEntry¶
-
class
JCEntry
(settings=None, molecule=None, refengine=None, **metadata)¶ A class representing a single job collection entry, i.e., an AMS job with optionally an associated reference engine and metadata.
Attributes:
- settings : plams.Settings
plams.Settings()
instance, holding the input for the job.Important
Can not be empty when adding the class instance to the
JobCollection
.- molecule : plams.Molecule
plams.Molecule()
for the system of interest.Important
Can not be empty when adding the class instance to the
JobCollection
.- reference_engine : optional, str
- ID of the reference engine, used for lookup in the
EngineCollection
. - metadata : optional
- Additional keyword arguments will be interpreted as metadata and stored in this variable.
-
__init__
(settings=None, molecule=None, refengine=None, **metadata)¶ Creates a new job collection entry.
-
__str__
()¶ Returns a string representation of a job collection entry.
-
copy
()¶ Create a copy of this entry.
3.2.3.2. JobCollection¶
See also
This class inherits from BaseCollection
. Most methods can be found there.
-
class
JobCollection
(yamlfile=None)¶ A class representing a job collection, i.e. a collection of JCEntry instances.
-
load
(fpath='jobcollection.yaml')¶ Collective method for the load_yaml and pickle_load methods below, called at init.
-
load_yaml
(yamlfile)¶ Loads all job collection entries from a yaml file and adds them to the job collection.
To load from pickled files, usepickle_load()
instead.
-
duplicate_entry
(key, newkey)¶ Duplicates this colection’s entry associated with key and stores it under newkey
-
pickle_dump
(fpath)¶ Store the job collection under a pickled fpath.
-
pickle_load
(fpath)¶ Load from a pickled fpath.
-
store
(yamlfile='jobcollection.yaml')¶ Stores the entire collection in yamlfile.
-
from_jobids
(jobids: Set[str]) → scm.params.core.jobcollection.JobCollection¶ Generates a subset of self, reduced to entries in jobids.
-
to_amsjobs
(jobids: Sequence = None, engine_settings: scm.plams.core.settings.Settings = None) → List[scm.plams.interfaces.adfsuite.ams.AMSJob]¶ Batch-generate a list of
plams.AMSJob
from entries in the Job Collection.
If engine_settings is provided, will__add__()
the instance to each entry’s settings when generating the jobs.This method is equivalent to:
engine_settings = Settings() engine_settings.input.BAND jobs = [AMSJob(name=ename, molecule=e.molecule, settings=e.settings+engine_settings) for ename,e in JobCollection().items() if ename in jobids]
Parameters: - jobids : optional, sequence of strings
- A sequence of keys that will be used to generate the AMSJobs. Defaults to all jobs in the collection.
- engine_settings : optional, plams.Settings
- A
plams.Settings
instance that will be added to everyAMSJob.settings
.
Returns: List[plams.AMSJob]
-
run
(engine_settings: scm.plams.core.settings.Settings, jobids: Sequence = None, parallel: scm.params.common.parallellevels.ParallelLevels = None, use_pipe=True, _skip_normjobs=False) → Dict[str, Union[scm.plams.interfaces.adfsuite.ams.AMSResults, scm.plams.interfaces.adfsuite.amsworker.AMSWorkerResults]]¶ Run all jobs in the engine collection with engine_settings and return the respective AMSResults dict.
When running jobs that are incompatible with the AMSWorker interface or when use_pipe=False, this method will use the regular PLAMS backend. Note that when
plams.init()
is not called prior to this method, all executed job results will be stored in the system’s temporary directory only for as long as the return value is referenced at runtime. You can make the results storage persistent or change the PLAMS working directory by manually calling plams.init before calling this method.Parameters: - engine_settings : plams.Settings
- A
plams.Settings
instance representing the AMS engine block.
Every entry will be executed with this engine. - jobids : Sequence[str]
- A Sequence of jobids that will be calculated.
Defaults to all jobs in the collection. - parallel : optional, ParallelLevels
- Parallelization for running the jobs from the collection.
- use_pipe : bool
- Whether to use the AMSWorker interface or not.
- _skip_normjobs : bool
- When both,
plams.AMSWorker
andplams.AMSJobs
need to be computed, skip the computation of the latter if any of the previousplams.AMSWorkerResults
arenot results.ok()
. By default, this is set to True during an optimization, to save time, as one failed job equals in the cost function being inf.
Returns: - results : dict
- Dictionary mapping the jobID to a
plams.AMSResults
orplams.AMSWorkerResults
.
-
3.2.3.3. Engine¶
-
class
Engine
(settings=None, metadata={})¶ A class representing an AMS engine, i.e. its input (the engine block) and optional metadata.
Attributes:
- settings : plams.Settings
A
plams.Settings
instance, holding the AMS input information for the Engine.Important
Can not be empty when adding the class instance to the
EngineCollection
.- metadata : dict
- Additional metadata entries can be stored in this variable.
- type : str
- String representation of the engine used. Will be generated automatically.
-
__init__
(settings=None, metadata={})¶ Create a new Engine entry.
-
__str__
()¶ Returns a string representation of an AMS engine.
-
copy
()¶ Return a copy of this entry
3.2.3.4. EngineCollection¶
See also
This class inherits from BaseCollection
. Most methods can be found there.
3.2.3.5. Collection Base Class¶
All collections inherit from this base class.
-
class
BaseCollection
(yamlfile=None)¶ Base class for collections: JobCollection, EngineCollection, …
-
__init__
(yamlfile=None)¶ Creates a new collection, optionally populating it with entries from yamlfile.
-
load
(yamlfile)¶ Abstract method. Define in child class. The routine extracts the header from file. Call it (with super()) before or after the actual loading.
-
store
(yamlfile)¶ Stores the entire collection in yamlfile.
-
add_entry
(eid: str, entry: Any)¶ Adds an entry to the collection.
Parameters:
- eid : str
- Unique ID for the entry
- entry : subject to
_check_entry()
. - This subclass is meant to store the actual contents. The structure of the subclass will be different, depending on the collection.
-
remove_entry
(eid)¶ Removes an entry matching eid from the collection, or throws an exception if the entry is not found.
-
rename_entry
(oldkey, newkey)¶ Rename an entry in the collection to be associated with newkey
-
_check_entry
(eid, entry)¶ Abstract method. Add additional checks here, then call
super()._check_entry(eid,entry)
.
-
__str__
()¶ Return str(self).
-
items
()¶ Return all key:value pairs in collection.
-
values
()¶ Return all entries in collection.
-
keys
()¶ Return all IDs in collection.
-
__getitem__
(key)¶ Get the entry with matching key (ID).
-
__setitem__
(key, value)¶ Same as
add_entry()
.
-
__delitem__
(key)¶ Same as
remove_entry()
.
-
__len__
()¶ Return number of entries in collection.
-
__iter__
()¶ Iterate over
key:value
pairs.
-
__add__
(other)¶ Add two classes to return a new collection. Entries from other will only be added if not already present in self.
-
__contains__
(key)¶ Check if ID == key is in collection.
-
__call__
(*id)¶ - If called without arguments:
- Same as
keys()
. - If called with id:
- Same as
__getitem__()
.
-
__eq__
(other)¶ Check if two collections are the same.
-
__ne__
(other)¶ Return self!=value.
-