3.2. Collections

The Collections represent all AMS jobs that need to be run before a Data Set can be evaluated. They store all settings necessary for the execution in a human-readable YAML format. We divide these settings into two Collections: Job and Engine Collection.

See also

YAML – Homepage and on Wikipedia.

The Job Collection holds information relevant to the AMS driver, such as the chemical system, the driver task, and the properties to calculate. The Engine Collection is the part where different AMS engine blocks are stored.

Combining an entry from the Job Collection with any entry from the Engine Collection ensures that results are comparable: Nothing in the job should change apart from the engine that is used to execute it.

A Collection always behaves like a python dictionary through a key-value pair per entry. Within a collection, we call the key jobID (or just ID).

Important

Each ID within a Collection needs to be unique due to the dict-like nature of the classes.

3.2.1. Job Collection

The Job Collection is mainly meant to store input that can be read by the AMS driver, alongside with optional metadata. When stored to disk, the data looks similar to the following:

---
ID: H2O_001
ReferenceEngineID: myDFTengine_1
AMSInput: |
   Task SinglePoint
   system
     atoms
        H     -0.7440600000      1.1554900000     -0.0585900000
        O      0.6438200000      1.2393700000      0.0060400000
        H     -3.3407000000      0.2702700000      0.7409900000
     end
   end
Source: ThisAwesomePaper
PlotIt: True
---
ID: CH4_001
ReferenceEngineID: DFTB_1
AMSInput: |
   Task GeometryOptimization
   properties
     Gradients True
   end
   system
     atoms
        C     -0.7809900000      1.1572800000     -0.0369200000
        H      0.6076000000      1.2309400000      0.0140600000
        H      1.3758800000      0.0685800000      0.0285600000
        H      0.7425100000     -1.1714200000     -0.0084500000
        H     -0.6465400000     -1.2538900000     -0.0595900000
     end
   end
...

The collection above contains two entries. We can recognize the ID, by which each entry is labeled. Everything else is a textual representation of the value stored under this ID. At runtime, each entry is stored as a JCEntry instance. Basic usage is discussed below.


3.2.1.1. Adding Jobs

Important

The Job Collection only stores instances of JCEntry.

An instance of the Job Collection can be initialized without further parameters. The instance can be populated with JCEntry objects only. Every JCEntry instance needs to have at least the attributes settings and molecule defined. Adding a reference_engine string and metadata is optional:

>>> jc = JobCollection()
>>> jce = JCEntry()
>>> jce.settings.input.AMS.Task = 'SinglePoint' # Must be a PLAMS Settings() instance
>>> jce.molecule = h2o_mol                      # Must be a PLAMS Molecule() instance
>>> jce.reference_engine = 'myDFTengine'        # Optional: A string that matches the ID of an EngineCollection entry
>>> jce.metadata['Source'] = 'SomePaper'        # Optional: Metadata can be added to the `metadata` dictionary
>>> jc.add_entry('H2O_001',jce)                 # Adding `jce` with ID 'H20_001'.
>>> # jc['H2O_001'] = jce                       # This is the same as the line above.
>>> # Adding more entries with the same ID to `jc` is not possible anymore.

All attributes can also be assigned when instantiating the object:

>>> jce = JCEntry(settings, molecule, refengine, **metadata)

See below for a textual representation of jce.


3.2.1.2. Working with the Collection

The Job Collection is set up to behave like a dictionary. Consequently, all commonly known methods are available – with a couple of additions:

>>> jc.keys() # All IDs:
dict_keys(['H2O_001'])
>>> jc() # A shortcut to list(jc.keys())
['H2O_001']
>>> jc.values() # All JCEntries:
dict_values([<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>])
>>> for k,v in jc.items(): # Or both of the above
>>>   ...

Lookup:

>>> 'H2O_001' in jc
True
>>> jc['H2O_001']      # Get the respective JCEntry
>>> jc('H2O_001')      # Same as call on the ID
<scm.params.core.jobcollection.JCEntry object at 0x7f5a0510db38>

Removing Entries:

>>> del jc['H2O_001']
>>> len(jc)
0
>>> # Alternatively:
>>> jc.remove_entry('H2O_001')

Renaming Entries:

>>> oldkey = 'H2O_001'
>>> newkey = 'H2O_002'
>>> jc.rename_entry(oldkey, newkey)

Collections can be added. Duplicate keys will use the value of the first argument (jc rather than jc2):

>>> added_jc = jc + jc2

Comparison:

>>> jc == jc
True

Metadata can be stored per-entry, as shown above. For the storage of global metadata or comments, the header attribute can be used to store a string:

>>> comments = """this is a multiline
>>> header comment"""
>>> jc.header = comments
>>> jc.store('jc_with_header.yaml') # The header string will be stored when writing to YAML

The header is also available to the Data Set and Engine Collection classes.


3.2.1.3. I/O

Storing and loading collections can be done with:

>>> jc.store('jobs.yml')       # Store the collection in a YAML format.
>>> jc.pickle_dump('jobs.pkl') # Or in a pickled binary format.

This produces:

---
ID: H2O_001
ReferenceEngineID: myDFTengine
AMSInput: |
   system
     atoms
              H      0.0000000000      0.0000000000      0.3753600000
              H      0.0000000000      0.0000000000     -0.3753600000
     end
   end
   task SinglePoint
Source: SomePaper
...

The textual representation of a single JCEntry can also be invoked by calling the str() method. Calling print(jc['H2O_001']) would produce the same output as above (since our Job Collection only has one entry).

The file can then be loaded:

>>> jc2 = JobCollection('jobs.yml') # From YAML
>>> jc2 = JobCollection('jobs.pkl') # From a pickled file
>>> jc == jc2
True

Note

When working with large Job Collections, storing and loading binary files is significantly faster than the YAML counterpart.


3.2.1.4. Generating AMSJobs

The JobCollection.to_amsjobs() method can be used to quickly generate plams.AMSJob instances from all the entries in a Job Collection. You can limit the output to a specific subset of entries by providing the jobids argument. An additional engine_settings argument can be passed to be added to all AMSJob.settings, making the returned AMSJobs executable:

>>> engine_settings = Settings()
>>> engine_settings.input.ams.BAND # Assuming we would like to run all jobs in `jc` with BAND
>>> jobids = ['job1', 'job2']
>>> jobs = jc.to_amsjobs(jobids, engine_settings)
>>> all(isinstance(job, AMSJob) for job in jobs)
True
>>> [job.run() for job in jobs] # The jobs can now be executed by PLAMS

3.2.1.5. Running Collection Jobs

All entries in a Job Collection can be calculated at once with the JobCollection.run() method, returning a respective dictionary of {jobID : plams.AMSResults} pairs. This can be useful when a manual interaction with the job results is needed, given a specific engine (for example when calculating the reference data):

>>> len(jc)
20
>>> engine = Settings() # The JCEntries do not include engine settings
>>> engine.input.BAND   # We would like to run all stored jobs with BAND
>>> results = jc.run(engine) # Will run all jobs in jc and return their results object
>>> all(r.ok() for r in results.values()) # The returned value is a dict of {jobID : AMSResults}
True
>>> energies = [r.get_energy() for r in results.values()] # We can now process the results

Alternatively, a subset of jobs can be calculated by providing the jobids argument:

>>> ids_to_run = ['myjob1', 'myotherjob']
>>> results = jc.run(engine, jobids=ids_to_run)
>>> len(results)
2

Note

This method uses the AMSWorker interface where possible. Use the use_pipe keyword to disable it.

3.2.2. Engine Collection

Engine Collections are very similar to the Job Collection: The user can work with it in exactly the same manner. The main difference between those two is that the Engine Collection is storing Engine instances instead of JCEntry. A textual representation looks similar to this:

---
ID: DFTB_1
AMSInput: |
   engine DFTB
     Model DFTB3
     ResourcesDir DFTB.org/3ob-3-1
   endengine
Comment: My favourite engine.
...

Important

The Engine Collection only stores instances of Engine.

Within each entry, only the settings attribute must be defined. The remaining metadata is optional.

>>> ec = EngineCollection()
>>> e  = Engine()
>>> e.settings.input.DFTB.model = 'DFTB3' # e.settings is a PLAMS Settings() instance.
>>> e.settings.input.DFTB.ResourceDir = 'DFTB.org/3ob-3-1'
>>> e.metadata['Comment'] = 'My favourite engine.' # This is optional.
>>> ec.add_entry('DFTB_1',e)
>>> # print(ec['DFTB_1']) reproduces the textual representation above

See also

For further examples on how to work with the collection, please refer to the Job Collection section.

3.2.3. Collections API

3.2.3.1. JCEntry

class JCEntry(settings=None, molecule=None, refengine=None, **metadata)

A class representing a single job collection entry, i.e., an AMS job with optionally an associated reference engine and metadata.

Attributes:

settings : plams.Settings

plams.Settings() instance, holding the input for the job.

Important

Can not be empty when adding the class instance to the JobCollection.

molecule : plams.Molecule

plams.Molecule() for the system of interest.

Important

Can not be empty when adding the class instance to the JobCollection.

reference_engine : optional, str
ID of the reference engine, used for lookup in the EngineCollection.
metadata : optional
Additional keyword arguments will be interpreted as metadata and stored in this variable.
__init__(settings=None, molecule=None, refengine=None, **metadata)

Creates a new job collection entry.

__str__()

Returns a string representation of a job collection entry.

copy()

Create a copy of this entry.

is_pipeable() → bool

Based on settings, return whether the job can be calculated using the AMSWorker interface.

3.2.3.2. JobCollection

See also

This class inherits from BaseCollection. Most methods can be found there.

class JobCollection(yamlfile=None)

A class representing a job collection, i.e. a collection of JCEntry instances.

load(fpath='jobcollection.yaml')

Collective method for the load_yaml and pickle_load methods below, called at init.

load_yaml(yamlfile)

Loads all job collection entries from a yaml file and adds them to the job collection.
To load from pickled files, use pickle_load() instead.

duplicate_entry(key, newkey)

Duplicates this colection’s entry associated with key and stores it under newkey

pickle_dump(fpath)

Store the job collection under a pickled fpath.

pickle_load(fpath)

Load from a pickled fpath.

store(yamlfile='jobcollection.yaml')

Stores the entire collection in yamlfile.

from_jobids(jobids: Set[str]) → scm.params.core.jobcollection.JobCollection

Generates a subset of self, reduced to entries in jobids.

to_amsjobs(jobids: Sequence = None, engine_settings: scm.plams.core.settings.Settings = None) → List[scm.plams.interfaces.adfsuite.ams.AMSJob]

Batch-generate a list of plams.AMSJob from entries in the Job Collection.
If engine_settings is provided, will __add__() the instance to each entry’s settings when generating the jobs.

This method is equivalent to:

engine_settings = Settings()
engine_settings.input.BAND
jobs = [AMSJob(name=ename, molecule=e.molecule, settings=e.settings+engine_settings) for ename,e in JobCollection().items() if ename in jobids]
Parameters:
jobids : optional, sequence of strings
A sequence of keys that will be used to generate the AMSJobs. Defaults to all jobs in the collection.
engine_settings : optional, plams.Settings
A plams.Settings instance that will be added to every AMSJob.settings.
Returns:List[plams.AMSJob]
run(engine_settings: scm.plams.core.settings.Settings, jobids: Sequence = None, parallel: scm.params.common.parallellevels.ParallelLevels = None, use_pipe=True, _skip_normjobs=False) → Dict[str, Union[scm.plams.interfaces.adfsuite.ams.AMSResults, scm.plams.interfaces.adfsuite.amsworker.AMSWorkerResults]]

Run all jobs in the engine collection with engine_settings and return the respective AMSResults dict.

When running jobs that are incompatible with the AMSWorker interface or when use_pipe=False, this method will use the regular PLAMS backend. Note that when plams.init() is not called prior to this method, all executed job results will be stored in the system’s temporary directory only for as long as the return value is referenced at runtime. You can make the results storage persistent or change the PLAMS working directory by manually calling plams.init before calling this method.

Parameters:
engine_settings : plams.Settings
A plams.Settings instance representing the AMS engine block.
Every entry will be executed with this engine.
jobids : Sequence[str]
A Sequence of jobids that will be calculated.
Defaults to all jobs in the collection.
parallel : optional, ParallelLevels
Parallelization for running the jobs from the collection.
use_pipe : bool
Whether to use the AMSWorker interface or not.
_skip_normjobs : bool
When both, plams.AMSWorker and plams.AMSJobs need to be computed, skip the computation of the latter if any of the previous plams.AMSWorkerResults are not results.ok(). By default, this is set to True during an optimization, to save time, as one failed job equals in the cost function being inf.
Returns:
results : dict
Dictionary mapping the jobID to a plams.AMSResults or plams.AMSWorkerResults.

3.2.3.3. Engine

class Engine(settings=None, metadata={})

A class representing an AMS engine, i.e. its input (the engine block) and optional metadata.

Attributes:

settings : plams.Settings

A plams.Settings instance, holding the AMS input information for the Engine.

Important

Can not be empty when adding the class instance to the EngineCollection.

metadata : dict
Additional metadata entries can be stored in this variable.
type : str
String representation of the engine used. Will be generated automatically.
__init__(settings=None, metadata={})

Create a new Engine entry.

__str__()

Returns a string representation of an AMS engine.

copy()

Return a copy of this entry

3.2.3.4. EngineCollection

See also

This class inherits from BaseCollection. Most methods can be found there.

class EngineCollection(yamlfile=None)

A class representing a collection of engines.

load(yamlfile='reference_engines.yaml')

Loads all engines from a yaml file and adds them to the collection.

store(yamlfile='reference_engines.yaml')

Stores the entire collection in yamlfile.

3.2.3.5. Collection Base Class

All collections inherit from this base class.

class BaseCollection(yamlfile=None)

Base class for collections: JobCollection, EngineCollection, …

__init__(yamlfile=None)

Creates a new collection, optionally populating it with entries from yamlfile.

load(yamlfile)

Abstract method. Define in child class. The routine extracts the header from file. Call it (with super()) before or after the actual loading.

store(yamlfile)

Stores the entire collection in yamlfile.

add_entry(eid: str, entry: Any)

Adds an entry to the collection.

Parameters:

eid : str
Unique ID for the entry
entry : subject to _check_entry().
This subclass is meant to store the actual contents. The structure of the subclass will be different, depending on the collection.
remove_entry(eid)

Removes an entry matching eid from the collection, or throws an exception if the entry is not found.

rename_entry(oldkey, newkey)

Rename an entry in the collection to be associated with newkey

_check_entry(eid, entry)

Abstract method. Add additional checks here, then call super()._check_entry(eid,entry).

__str__()

Return str(self).

items()

Return all key:value pairs in collection.

values()

Return all entries in collection.

keys()

Return all IDs in collection.

__getitem__(key)

Get the entry with matching key (ID).

__setitem__(key, value)

Same as add_entry().

__delitem__(key)

Same as remove_entry().

__len__()

Return number of entries in collection.

__iter__()

Iterate over key:value pairs.

__add__(other)

Add two classes to return a new collection. Entries from other will only be added if not already present in self.

__contains__(key)

Check if ID == key is in collection.

__call__(*id)
If called without arguments:
Same as keys().
If called with id:
Same as __getitem__().
__eq__(other)

Check if two collections are the same.

__ne__(other)

Return self!=value.